Did you see that Apache NiFi released 2.0.0 yesterday? Check out our team's latest blog post to learn more about what these updates entail! https://lnkd.in/gCA6s-bJ #apachenifi #nifi #nifi2
Datavolo’s Post
More Relevant Posts
-
What's in Apache #NiFi 2.0.0? Check out our latest blog post with the highlights of NiFi TNG. https://lnkd.in/gCA6s-bJ #dataengineering
Next Generation Apache NiFi | NiFi 2.0.0 is GA
https://datavolo.io
To view or add a comment, sign in
-
I've been super fortunate to return to Apache #NiFi in my developer advocate 🥑 role at Datavolo, but I do realize that MANY of you have never seen, much less considered, this powerful tool. For NiFi newbs, I wrote this quick blog post showcasing that it is not just a typical low-code visual drag/n/drop development tool -- it is ALSO a full-featured, high-performance, and insanely-scalable RUNTIME environment to execute your #dataengineering pipelines on. https://lnkd.in/efPnS9VY
develop, deploy, execute & monitor in one tool (welcome to apache nifi)
http://lestermartin.blog
To view or add a comment, sign in
-
Great article about NiFi and k8s!!
How does Apache NiFi support clustering on #Kubernetes? NiFi 2.0.0 brings a number of new features and improvements, including native clustering on Kubernetes without the need for Apache ZooKeeper. The following post describes the framework libraries and implementation decisions that power NiFi cluster leader election and shared state management for Kubernetes deployments. https://lnkd.in/ghB_MjmB
Bringing Kubernetes Clustering to Apache NiFi
exceptionfactory.com
To view or add a comment, sign in
-
How does Apache NiFi support clustering on #Kubernetes? NiFi 2.0.0 brings a number of new features and improvements, including native clustering on Kubernetes without the need for Apache ZooKeeper. The following post describes the framework libraries and implementation decisions that power NiFi cluster leader election and shared state management for Kubernetes deployments. https://lnkd.in/ghB_MjmB
Bringing Kubernetes Clustering to Apache NiFi
exceptionfactory.com
To view or add a comment, sign in
-
When it comes to open source software, i enjoy trying them all and contributing to the community. If you are yet to use Apache Flink or feel like knowing more or trying to use, it is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Flink has been designed to run in all common cluster environments, perform computations at in-memory speed and at any scale. Use Cases: *Event Driven Applications *Stream & Batch Analytics *Data Pipelines & ETL To learn more: https://flink.apache.org/ Here for more on Apache Flink Release Announcements: https://lnkd.in/e7yagNaw I am excited for more coming!
Apache Flink® — Stateful Computations over Data Streams
flink.apache.org
To view or add a comment, sign in
-
How simple and pragmatically you create a Reactive Redis application via Quarkus #java #quarkus #redis #cache #nosql #development #code #programming
Using the Redis Client
quarkus.io
To view or add a comment, sign in
-
Been following the Datafusion project for a bit, along with Andy Grove’s slow but steady contribution to it. (https://lnkd.in/ekBTqhZK my post from a year ago) When I used to use pySpark for my main workflow - the whole “ok you are telling me there’s this huge overhead of the task distribution AND JVM conversion of each job from python to java (py4j)” I’m not a computer science person but it felt like bonkers. Like everyone else in the data world, we probably all dabbled to try to find something in between pandas and Spark circa 2015-2020, trying stuff like cuDF, pandararell, dask, Modin. Although DuckDB and Polars have been enjoying success in making “small data” (gigabyte scale) analytics possible, fundamentally the Spark remains king on terabyte+ scale analytics. There is a challenger on that front as well, powered by Rust and the guy who’s been everywhere when it comes to GPU accelerated query processing (believe he contributed to the GPU accelerated Spark effort and RAPIDS cudf). Oh yea he recently left NVIDIA to join Apple, which also gives you a decent idea of Apple’s interests. The takeaway point is that all the efforts that win out tends to be the ones that focus on middle-layer of data representation in memory (read: products using Apache Arrow and Datafusion). Once there’s a solid, well-planned out core data representation engine, writing query engines become simpler.
Congratulations to the Apache DataFusion community on graduating from the Apache Arrow project and becoming a new top-level project. This is a significant milestone. I would also like to thank the Apache Arrow community for allowing the project to incubate there for the past five years and helping the community learn The Apache Way. https://lnkd.in/gKauteh7
GitHub - apache/datafusion: Apache DataFusion SQL Query Engine
github.com
To view or add a comment, sign in
-
🫵Here are 5 tips to optimize your Apache Beam streaming pipelines- 1) Choose the Right Runner: Use a runner like GCP Dataflow, Apache Flink or Apache Spark that matches your needs for speed and scalability. 2) Use Windowing and Triggers Wisely: Configure how events are grouped and results are processed to manage latency effectively. 3) Optimize I/O Operations: Batch reads and writes and use efficient file formats such as Avro or Parquet to reduce I/O overhead. 4) Efficient Data Partitioning: Distribute data evenly across workers to avoid overloading some and underutilizing others. 5) Combine Transformations: Reduce processing steps by combining multiple operations, which cuts down on overhead and improves efficiency. Check out my GitHub repo for an example implementation: https://lnkd.in/dh_UaJzs Feel free to share your additional tips or questions in the comments. Apache Beam #data #bigdata #gcp #googlecloudplatform #python #learning #linkedin #dataengineering #apache #github
GitHub - AbhijitMahajan14/Dataflow-Beam: This repository serves as my personal log and resource center while I learn Apache Beam. Each day, I push new code, experiments, and notes to document what I've learned. My goal is to build a comprehensive collection of Apache Beam examples and best practices.
github.com
To view or add a comment, sign in
-
This session by Jan Lukavský will introduce a platform created to bridge the existing gaps in data management while removing some of the complexities in existing Big Data ecosystem. The model is consistently applied across three abstract types of storages - streaming (e.g. Apache Kafka, Google Cloud PubSub), batch (e.g. Hadoop HDFS, S3, Google Cloud Storage) and random-access (e.g. Apache Cassandra, Apache HBase, Google Cloud BigTable). The platform also provides sources and sinks for data processing engines like Apache Beam or Apache Flink, so that sophisticated data transformations can be easily integrated into the platform as well. Wanna hear more about it? Join us at: https://lnkd.in/eJRNCHMz #bigdata #accesspatterns #datastorage #datastreaming The Apache Software Foundation #apachekafka #hadoop #apachecassandra #apachebeam #apacheflink
Community Over Code EU
eu.communityovercode.org
To view or add a comment, sign in
-
I have published my first public Helm Chart of Haproxy in Kubernetes for Postgres Patroni cluster. Check it out 😎 #kubernetes #k8s #helm #helmchart #patroni #haproxy #postgres #devops #sre
GitHub - ivanov-danil/haproxy: Helm Chart of Haproxy in K8S (for Postgres Patroni cluster)
github.com
To view or add a comment, sign in
6,667 followers
Tech Lead | Solution Architect | API | Microservices
1moTim Spann 🥑 good news !!! Do you know where I can find the Nifi v2.0 helm chart? Tks