Datavolo’s Post

View organization page for Datavolo, graphic

6,667 followers

2mo

Did you see that Apache NiFi released 2.0.0 yesterday? Check out our team's latest blog post to learn more about what these updates entail! https://lnkd.in/gCA6s-bJ #apachenifi #nifi #nifi2

Next Generation Apache NiFi | NiFi 2.0.0 is GA

https://datavolo.io

2 Comments

Alexandre Vicente Cavalcanti

Tech Lead | Solution Architect | API | Microservices

1mo

Tim Spann 🥑 good news !!! Do you know where I can find the Nifi v2.0 helm chart? Tks

3 Reactions

To view or add a comment, sign in

More Relevant Posts

Datavolo

6,667 followers
2mo
Report this post
What's in Apache #NiFi 2.0.0? Check out our latest blog post with the highlights of NiFi TNG. https://lnkd.in/gCA6s-bJ #dataengineering

Next Generation Apache NiFi | NiFi 2.0.0 is GA

https://datavolo.io

1 Comment
Like Comment
To view or add a comment, sign in
Lester Martin 🥑

dev advocate, trainer, blogger, data engineer
2mo
Report this post
I've been super fortunate to return to Apache #NiFi in my developer advocate 🥑 role at Datavolo, but I do realize that MANY of you have never seen, much less considered, this powerful tool. For NiFi newbs, I wrote this quick blog post showcasing that it is not just a typical low-code visual drag/n/drop development tool -- it is ALSO a full-featured, high-performance, and insanely-scalable RUNTIME environment to execute your #dataengineering pipelines on. https://lnkd.in/efPnS9VY

develop, deploy, execute & monitor in one tool (welcome to apache nifi)

http://lestermartin.blog

5 Comments
Like Comment
To view or add a comment, sign in
Lester Martin 🥑

dev advocate, trainer, blogger, data engineer
4mo
Report this post
Great article about NiFi and k8s!!

David Handermann

Senior Software Engineer at Snowflake
4mo

How does Apache NiFi support clustering on #Kubernetes? NiFi 2.0.0 brings a number of new features and improvements, including native clustering on Kubernetes without the need for Apache ZooKeeper. The following post describes the framework libraries and implementation decisions that power NiFi cluster leader election and shared state management for Kubernetes deployments. https://lnkd.in/ghB_MjmB

Bringing Kubernetes Clustering to Apache NiFi

exceptionfactory.com
Like Comment
To view or add a comment, sign in
David Handermann

Senior Software Engineer at Snowflake
4mo
Report this post
How does Apache NiFi support clustering on #Kubernetes? NiFi 2.0.0 brings a number of new features and improvements, including native clustering on Kubernetes without the need for Apache ZooKeeper. The following post describes the framework libraries and implementation decisions that power NiFi cluster leader election and shared state management for Kubernetes deployments. https://lnkd.in/ghB_MjmB

Bringing Kubernetes Clustering to Apache NiFi

exceptionfactory.com

8 Comments
Like Comment
To view or add a comment, sign in
Gbenga Oridupa

Tech+Data+Cloud Evangelist | BI | Analytics Engineer | Data Engineer | Data Scientist/Analyst | AI/ML | DevOps | Cloud Engineer /Architect | System Engineer | SecOps || Developer | Speaker, Writer, Trainer & Leader.
2mo
Report this post
When it comes to open source software, i enjoy trying them all and contributing to the community. If you are yet to use Apache Flink or feel like knowing more or trying to use, it is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Flink has been designed to run in all common cluster environments, perform computations at in-memory speed and at any scale. Use Cases: *Event Driven Applications *Stream & Batch Analytics *Data Pipelines & ETL To learn more: https://flink.apache.org/ Here for more on Apache Flink Release Announcements: https://lnkd.in/e7yagNaw I am excited for more coming!

Apache Flink® — Stateful Computations over Data Streams

flink.apache.org

2 Comments
Like Comment
To view or add a comment, sign in
Hamed Hatami

Technical Lead | Solution Architect | System Architect | Kubernetes | Cloud | Senior Java Developer | Software Engineer
6mo Edited
Report this post
How simple and pragmatically you create a Reactive Redis application via Quarkus #java #quarkus #redis #cache #nosql #development #code #programming

Using the Redis Client

quarkus.io
Like Comment
To view or add a comment, sign in
Jung Hoon Son, M.D.

Knowledge Architect @ Abbvie // informaticist // clinical data // knowledge // pragmatic data engineer
8mo Edited
Report this post
Been following the Datafusion project for a bit, along with Andy Grove’s slow but steady contribution to it. (https://lnkd.in/ekBTqhZK my post from a year ago) When I used to use pySpark for my main workflow - the whole “ok you are telling me there’s this huge overhead of the task distribution AND JVM conversion of each job from python to java (py4j)” I’m not a computer science person but it felt like bonkers. Like everyone else in the data world, we probably all dabbled to try to find something in between pandas and Spark circa 2015-2020, trying stuff like cuDF, pandararell, dask, Modin. Although DuckDB and Polars have been enjoying success in making “small data” (gigabyte scale) analytics possible, fundamentally the Spark remains king on terabyte+ scale analytics. There is a challenger on that front as well, powered by Rust and the guy who’s been everywhere when it comes to GPU accelerated query processing (believe he contributed to the GPU accelerated Spark effort and RAPIDS cudf). Oh yea he recently left NVIDIA to join Apple, which also gives you a decent idea of Apple’s interests. The takeaway point is that all the efforts that win out tends to be the ones that focus on middle-layer of data representation in memory (read: products using Apache Arrow and Datafusion). Once there’s a solid, well-planned out core data representation engine, writing query engines become simpler.

Andy Grove

Apache Arrow & DataFusion PMC Member. Original creator of DataFusion.
8mo Edited

Congratulations to the Apache DataFusion community on graduating from the Apache Arrow project and becoming a new top-level project. This is a significant milestone. I would also like to thank the Apache Arrow community for allowing the project to incubate there for the past five years and helping the community learn The Apache Way. https://lnkd.in/gKauteh7

GitHub - apache/datafusion: Apache DataFusion SQL Query Engine

github.com
Like Comment
To view or add a comment, sign in
Abhijit Mahajan

Data Engineer at Onix | Cloud Migration and Modernization
6mo
Report this post
🫵Here are 5 tips to optimize your Apache Beam streaming pipelines- 1) Choose the Right Runner: Use a runner like GCP Dataflow, Apache Flink or Apache Spark that matches your needs for speed and scalability. 2) Use Windowing and Triggers Wisely: Configure how events are grouped and results are processed to manage latency effectively. 3) Optimize I/O Operations: Batch reads and writes and use efficient file formats such as Avro or Parquet to reduce I/O overhead. 4) Efficient Data Partitioning: Distribute data evenly across workers to avoid overloading some and underutilizing others. 5) Combine Transformations: Reduce processing steps by combining multiple operations, which cuts down on overhead and improves efficiency. Check out my GitHub repo for an example implementation: https://lnkd.in/dh_UaJzs Feel free to share your additional tips or questions in the comments. Apache Beam #data #bigdata #gcp #googlecloudplatform #python #learning #linkedin #dataengineering #apache #github

GitHub - AbhijitMahajan14/Dataflow-Beam: This repository serves as my personal log and resource center while I learn Apache Beam. Each day, I push new code, experiments, and notes to document what I've learned. My goal is to build a comprehensive collection of Apache Beam examples and best practices.

github.com

3 Comments
Like Comment
To view or add a comment, sign in
Clowder Space

87 followers
8mo
Report this post
This session by Jan Lukavský will introduce a platform created to bridge the existing gaps in data management while removing some of the complexities in existing Big Data ecosystem. The model is consistently applied across three abstract types of storages - streaming (e.g. Apache Kafka, Google Cloud PubSub), batch (e.g. Hadoop HDFS, S3, Google Cloud Storage) and random-access (e.g. Apache Cassandra, Apache HBase, Google Cloud BigTable). The platform also provides sources and sinks for data processing engines like Apache Beam or Apache Flink, so that sophisticated data transformations can be easily integrated into the platform as well. Wanna hear more about it? Join us at: https://lnkd.in/eJRNCHMz #bigdata #accesspatterns #datastorage #datastreaming The Apache Software Foundation #apachekafka #hadoop #apachecassandra #apachebeam #apacheflink

Community Over Code EU

eu.communityovercode.org
Like Comment
To view or add a comment, sign in
Danil Ivanoff

DevOps Engineer / SRE / Systems Engineer
7mo Edited
Report this post
I have published my first public Helm Chart of Haproxy in Kubernetes for Postgres Patroni cluster. Check it out 😎 #kubernetes #k8s #helm #helmchart #patroni #haproxy #postgres #devops #sre

GitHub - ivanov-danil/haproxy: Helm Chart of Haproxy in K8S (for Postgres Patroni cluster)

github.com
Like Comment
To view or add a comment, sign in

6,667 followers

View Profile Follow

Datavolo’s Post

More Relevant Posts

Explore topics