What's in Apache #NiFi 2.0.0? Check out our latest blog post with the highlights of NiFi TNG. https://lnkd.in/gCA6s-bJ #dataengineering
Datavolo’s Post
More Relevant Posts
-
Did you see that Apache NiFi released 2.0.0 yesterday? Check out our team's latest blog post to learn more about what these updates entail! https://lnkd.in/gCA6s-bJ #apachenifi #nifi #nifi2
Next Generation Apache NiFi | NiFi 2.0.0 is GA
https://datavolo.io
To view or add a comment, sign in
-
I've been super fortunate to return to Apache #NiFi in my developer advocate 🥑 role at Datavolo, but I do realize that MANY of you have never seen, much less considered, this powerful tool. For NiFi newbs, I wrote this quick blog post showcasing that it is not just a typical low-code visual drag/n/drop development tool -- it is ALSO a full-featured, high-performance, and insanely-scalable RUNTIME environment to execute your #dataengineering pipelines on. https://lnkd.in/efPnS9VY
develop, deploy, execute & monitor in one tool (welcome to apache nifi)
http://lestermartin.blog
To view or add a comment, sign in
-
Lately, I've been working on making Apache Flink more observable because having deep visibility into your Flink deployment is crucial to ensuring your data-streaming applications are able to run smoothly. I've discussed some of the fundamentals of Apache Flink and how to monitor and observe some of the components inside Flink. Find the write-up below: #apacheflink #datastreaming #realtime #message #monitoring https://lnkd.in/gfpbEWsW
Apache Flink — Observability
sarthak-acoustic.medium.com
To view or add a comment, sign in
-
When it comes to open source software, i enjoy trying them all and contributing to the community. If you are yet to use Apache Flink or feel like knowing more or trying to use, it is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Flink has been designed to run in all common cluster environments, perform computations at in-memory speed and at any scale. Use Cases: *Event Driven Applications *Stream & Batch Analytics *Data Pipelines & ETL To learn more: https://flink.apache.org/ Here for more on Apache Flink Release Announcements: https://lnkd.in/e7yagNaw I am excited for more coming!
Apache Flink® — Stateful Computations over Data Streams
flink.apache.org
To view or add a comment, sign in
-
How does Apache NiFi support clustering on #Kubernetes? NiFi 2.0.0 brings a number of new features and improvements, including native clustering on Kubernetes without the need for Apache ZooKeeper. The following post describes the framework libraries and implementation decisions that power NiFi cluster leader election and shared state management for Kubernetes deployments. https://lnkd.in/ghB_MjmB
Bringing Kubernetes Clustering to Apache NiFi
exceptionfactory.com
To view or add a comment, sign in
-
Sometimes useful features collide. My latest post covers a couple little "gotchas" you might run into when using Apache NiFi and Progress MarkLogic together. https://lnkd.in/eMjwhxkT #nifi #marklogic #progress #bigdata
String interpolation in Apache NiFi
https://www.4vservices.com
To view or add a comment, sign in
-
Hi everybody. I will be presenting "Understanding Apache Flink" online for the Los Angeles Java User's Group on April 3, 2024 at 6:00 p.m Pacific Time. Details and registration can be found at https://lnkd.in/g8_xzcJp If anybody has a location where we can actually meet, please contact me because that would be preferable. Here is the agenda. Note that this event will be held online since we currently do not have a meeting location. Afterwards, if anybody is up to meet for a drink, let me know in advanced so that we can plan. This presentation is the first part of a series of three Apache Flink presentations that I (Marco Villalobos) am writing. 1. Understanding Apache Flink. 2. Apache Flink Patterns. 3. Understanding Apache Flink Stateful Functions. This first presentation introduces Apache Flink and its core concepts. The target audience is software engineers that need an introduction to Apache Flink. Additionally, this presentation offers an opportunity to learn and integrate many different technologies. It offers the following: 1. A complete Apache Flink job that uses the Data-Stream API and SQL API that writes all incoming data into s3 in Parquet format and writes aggregate time-series data into Influx DB. 2. A data generator deployed into a Kubernetes cluster. 3. An Apache Kafka Cluster deployed into a Kubernetes cluster with the Strimzi Kafka Operator. 4. An Apache Flink job deployed into a Kubernetes cluster with the Apache Flink Kubernetes Operator. 5. Localstack deployed into a Kubernetes cluster that simulates the S3 Amazon Web Services. 6. InfluxDB time-series database and Telegraf ingestion component deployed into a Kubernetes cluster.
Understanding Apache Flink, Wed, Apr 3, 2024, 6:00 PM | Meetup
meetup.com
To view or add a comment, sign in
-
🎉 We're thrilled to announce the release of Apache Flink 1.19.0! 🎉 This release packs a punch with numerous improvements and new features. Overall, 162 people contributed to this release, including our very own Release Manager, Jing Ge (Head of Engineering). In this release, a total of 33 FLIPs were completed along with 600+ issues. Thank you! 🙌 Let’s delve into it! 1️⃣ Flink SQL Improvements: Enhancements in custom parallelism for Table/SQL sources, configurable SQL gateway Java options, and more flexibility in configuring state time-to-live using SQL hints. 2️⃣ Named Parameters for Functions and Procedures: Say goodbye to strict parameter positions! Now you can call functions and stored procedures using named parameters, making your queries more intuitive and flexible. 3️⃣ Window TVF Aggregation Features: Enjoy support for SESSION Window TVF in streaming mode and utilize changelog inputs for window TVF aggregation, enhancing your streaming analytics capabilities, and much more! Full details of this release and to dive deeper into each feature, check out our blog: https://bit.ly/3wYYPdl #ApacheFlink #DataProcessing #BigData #OpenSource
Announcing the Release of Apache Flink 1.19
ververica.com
To view or add a comment, sign in
-
Been following the Datafusion project for a bit, along with Andy Grove’s slow but steady contribution to it. (https://lnkd.in/ekBTqhZK my post from a year ago) When I used to use pySpark for my main workflow - the whole “ok you are telling me there’s this huge overhead of the task distribution AND JVM conversion of each job from python to java (py4j)” I’m not a computer science person but it felt like bonkers. Like everyone else in the data world, we probably all dabbled to try to find something in between pandas and Spark circa 2015-2020, trying stuff like cuDF, pandararell, dask, Modin. Although DuckDB and Polars have been enjoying success in making “small data” (gigabyte scale) analytics possible, fundamentally the Spark remains king on terabyte+ scale analytics. There is a challenger on that front as well, powered by Rust and the guy who’s been everywhere when it comes to GPU accelerated query processing (believe he contributed to the GPU accelerated Spark effort and RAPIDS cudf). Oh yea he recently left NVIDIA to join Apple, which also gives you a decent idea of Apple’s interests. The takeaway point is that all the efforts that win out tends to be the ones that focus on middle-layer of data representation in memory (read: products using Apache Arrow and Datafusion). Once there’s a solid, well-planned out core data representation engine, writing query engines become simpler.
Congratulations to the Apache DataFusion community on graduating from the Apache Arrow project and becoming a new top-level project. This is a significant milestone. I would also like to thank the Apache Arrow community for allowing the project to incubate there for the past five years and helping the community learn The Apache Way. https://lnkd.in/gKauteh7
GitHub - apache/datafusion: Apache DataFusion SQL Query Engine
github.com
To view or add a comment, sign in
-
🫵Here are 5 tips to optimize your Apache Beam streaming pipelines- 1) Choose the Right Runner: Use a runner like GCP Dataflow, Apache Flink or Apache Spark that matches your needs for speed and scalability. 2) Use Windowing and Triggers Wisely: Configure how events are grouped and results are processed to manage latency effectively. 3) Optimize I/O Operations: Batch reads and writes and use efficient file formats such as Avro or Parquet to reduce I/O overhead. 4) Efficient Data Partitioning: Distribute data evenly across workers to avoid overloading some and underutilizing others. 5) Combine Transformations: Reduce processing steps by combining multiple operations, which cuts down on overhead and improves efficiency. Check out my GitHub repo for an example implementation: https://lnkd.in/dh_UaJzs Feel free to share your additional tips or questions in the comments. Apache Beam #data #bigdata #gcp #googlecloudplatform #python #learning #linkedin #dataengineering #apache #github
GitHub - AbhijitMahajan14/Dataflow-Beam: This repository serves as my personal log and resource center while I learn Apache Beam. Each day, I push new code, experiments, and notes to document what I've learned. My goal is to build a comprehensive collection of Apache Beam examples and best practices.
github.com
To view or add a comment, sign in
6,667 followers
Excellent information about the new features of NiFi 2.0.0! Data engineering procedures will undoubtedly be improved by these updates.