Datavolo’s Post

View organization page for Datavolo, graphic

6,667 followers

2mo

What's in Apache #NiFi 2.0.0? Check out our latest blog post with the highlights of NiFi TNG. https://lnkd.in/gCA6s-bJ #dataengineering

Next Generation Apache NiFi | NiFi 2.0.0 is GA

https://datavolo.io

1 Comment

Chattershots

2mo

Excellent information about the new features of NiFi 2.0.0! Data engineering procedures will undoubtedly be improved by these updates.

1 Reaction

To view or add a comment, sign in

More Relevant Posts

Datavolo

6,667 followers
2mo
Report this post
Did you see that Apache NiFi released 2.0.0 yesterday? Check out our team's latest blog post to learn more about what these updates entail! https://lnkd.in/gCA6s-bJ #apachenifi #nifi #nifi2

Next Generation Apache NiFi | NiFi 2.0.0 is GA

https://datavolo.io

2 Comments
Like Comment
To view or add a comment, sign in
Lester Martin 🥑

dev advocate, trainer, blogger, data engineer
2mo
Report this post
I've been super fortunate to return to Apache #NiFi in my developer advocate 🥑 role at Datavolo, but I do realize that MANY of you have never seen, much less considered, this powerful tool. For NiFi newbs, I wrote this quick blog post showcasing that it is not just a typical low-code visual drag/n/drop development tool -- it is ALSO a full-featured, high-performance, and insanely-scalable RUNTIME environment to execute your #dataengineering pipelines on. https://lnkd.in/efPnS9VY

develop, deploy, execute & monitor in one tool (welcome to apache nifi)

http://lestermartin.blog

5 Comments
Like Comment
To view or add a comment, sign in
Sarthak Dalabehera

platform engineering @ Rippling
3mo
Report this post
Lately, I've been working on making Apache Flink more observable because having deep visibility into your Flink deployment is crucial to ensuring your data-streaming applications are able to run smoothly. I've discussed some of the fundamentals of Apache Flink and how to monitor and observe some of the components inside Flink. Find the write-up below: #apacheflink #datastreaming #realtime #message #monitoring https://lnkd.in/gfpbEWsW

Apache Flink — Observability

sarthak-acoustic.medium.com

1 Comment
Like Comment
To view or add a comment, sign in
Gbenga Oridupa

Tech+Data+Cloud Evangelist | BI | Analytics Engineer | Data Engineer | Data Scientist/Analyst | AI/ML | DevOps | Cloud Engineer /Architect | System Engineer | SecOps || Developer | Speaker, Writer, Trainer & Leader.
2mo
Report this post
When it comes to open source software, i enjoy trying them all and contributing to the community. If you are yet to use Apache Flink or feel like knowing more or trying to use, it is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Flink has been designed to run in all common cluster environments, perform computations at in-memory speed and at any scale. Use Cases: *Event Driven Applications *Stream & Batch Analytics *Data Pipelines & ETL To learn more: https://flink.apache.org/ Here for more on Apache Flink Release Announcements: https://lnkd.in/e7yagNaw I am excited for more coming!

Apache Flink® — Stateful Computations over Data Streams

flink.apache.org

2 Comments
Like Comment
To view or add a comment, sign in
David Handermann

Senior Software Engineer at Snowflake
4mo
Report this post
How does Apache NiFi support clustering on #Kubernetes? NiFi 2.0.0 brings a number of new features and improvements, including native clustering on Kubernetes without the need for Apache ZooKeeper. The following post describes the framework libraries and implementation decisions that power NiFi cluster leader election and shared state management for Kubernetes deployments. https://lnkd.in/ghB_MjmB

Bringing Kubernetes Clustering to Apache NiFi

exceptionfactory.com

8 Comments
Like Comment
To view or add a comment, sign in
4V Services, LLC

262 followers
8mo
Report this post
Sometimes useful features collide. My latest post covers a couple little "gotchas" you might run into when using Apache NiFi and Progress MarkLogic together. https://lnkd.in/eMjwhxkT #nifi #marklogic #progress #bigdata

String interpolation in Apache NiFi

https://www.4vservices.com

1 Comment
Like Comment
To view or add a comment, sign in
Marco Villalobos

Software Architect and Software Engineer.
9mo
Report this post
Hi everybody. I will be presenting "Understanding Apache Flink" online for the Los Angeles Java User's Group on April 3, 2024 at 6:00 p.m Pacific Time. Details and registration can be found at https://lnkd.in/g8_xzcJp If anybody has a location where we can actually meet, please contact me because that would be preferable. Here is the agenda. Note that this event will be held online since we currently do not have a meeting location. Afterwards, if anybody is up to meet for a drink, let me know in advanced so that we can plan. This presentation is the first part of a series of three Apache Flink presentations that I (Marco Villalobos) am writing. 1. Understanding Apache Flink. 2. Apache Flink Patterns. 3. Understanding Apache Flink Stateful Functions. This first presentation introduces Apache Flink and its core concepts. The target audience is software engineers that need an introduction to Apache Flink. Additionally, this presentation offers an opportunity to learn and integrate many different technologies. It offers the following: 1. A complete Apache Flink job that uses the Data-Stream API and SQL API that writes all incoming data into s3 in Parquet format and writes aggregate time-series data into Influx DB. 2. A data generator deployed into a Kubernetes cluster. 3. An Apache Kafka Cluster deployed into a Kubernetes cluster with the Strimzi Kafka Operator. 4. An Apache Flink job deployed into a Kubernetes cluster with the Apache Flink Kubernetes Operator. 5. Localstack deployed into a Kubernetes cluster that simulates the S3 Amazon Web Services. 6. InfluxDB time-series database and Telegraf ingestion component deployed into a Kubernetes cluster.

Understanding Apache Flink, Wed, Apr 3, 2024, 6:00 PM | Meetup

meetup.com

1 Comment
Like Comment
To view or add a comment, sign in
Ververica | Original creators of Apache Flink®

12,494 followers
9mo Edited
Report this post
🎉 We're thrilled to announce the release of Apache Flink 1.19.0! 🎉 This release packs a punch with numerous improvements and new features. Overall, 162 people contributed to this release, including our very own Release Manager, Jing Ge (Head of Engineering). In this release, a total of 33 FLIPs were completed along with 600+ issues. Thank you! 🙌 Let’s delve into it! 1️⃣ Flink SQL Improvements: Enhancements in custom parallelism for Table/SQL sources, configurable SQL gateway Java options, and more flexibility in configuring state time-to-live using SQL hints. 2️⃣ Named Parameters for Functions and Procedures: Say goodbye to strict parameter positions! Now you can call functions and stored procedures using named parameters, making your queries more intuitive and flexible. 3️⃣ Window TVF Aggregation Features: Enjoy support for SESSION Window TVF in streaming mode and utilize changelog inputs for window TVF aggregation, enhancing your streaming analytics capabilities, and much more! Full details of this release and to dive deeper into each feature, check out our blog: https://bit.ly/3wYYPdl #ApacheFlink #DataProcessing #BigData #OpenSource

Announcing the Release of Apache Flink 1.19

ververica.com

1 Comment
Like Comment
To view or add a comment, sign in
Jung Hoon Son, M.D.

Knowledge Architect @ Abbvie // informaticist // clinical data // knowledge // pragmatic data engineer
8mo Edited
Report this post
Been following the Datafusion project for a bit, along with Andy Grove’s slow but steady contribution to it. (https://lnkd.in/ekBTqhZK my post from a year ago) When I used to use pySpark for my main workflow - the whole “ok you are telling me there’s this huge overhead of the task distribution AND JVM conversion of each job from python to java (py4j)” I’m not a computer science person but it felt like bonkers. Like everyone else in the data world, we probably all dabbled to try to find something in between pandas and Spark circa 2015-2020, trying stuff like cuDF, pandararell, dask, Modin. Although DuckDB and Polars have been enjoying success in making “small data” (gigabyte scale) analytics possible, fundamentally the Spark remains king on terabyte+ scale analytics. There is a challenger on that front as well, powered by Rust and the guy who’s been everywhere when it comes to GPU accelerated query processing (believe he contributed to the GPU accelerated Spark effort and RAPIDS cudf). Oh yea he recently left NVIDIA to join Apple, which also gives you a decent idea of Apple’s interests. The takeaway point is that all the efforts that win out tends to be the ones that focus on middle-layer of data representation in memory (read: products using Apache Arrow and Datafusion). Once there’s a solid, well-planned out core data representation engine, writing query engines become simpler.

Andy Grove

Apache Arrow & DataFusion PMC Member. Original creator of DataFusion.
8mo Edited

Congratulations to the Apache DataFusion community on graduating from the Apache Arrow project and becoming a new top-level project. This is a significant milestone. I would also like to thank the Apache Arrow community for allowing the project to incubate there for the past five years and helping the community learn The Apache Way. https://lnkd.in/gKauteh7

GitHub - apache/datafusion: Apache DataFusion SQL Query Engine

github.com
Like Comment
To view or add a comment, sign in
Abhijit Mahajan

Data Engineer at Onix | Cloud Migration and Modernization
6mo
Report this post
🫵Here are 5 tips to optimize your Apache Beam streaming pipelines- 1) Choose the Right Runner: Use a runner like GCP Dataflow, Apache Flink or Apache Spark that matches your needs for speed and scalability. 2) Use Windowing and Triggers Wisely: Configure how events are grouped and results are processed to manage latency effectively. 3) Optimize I/O Operations: Batch reads and writes and use efficient file formats such as Avro or Parquet to reduce I/O overhead. 4) Efficient Data Partitioning: Distribute data evenly across workers to avoid overloading some and underutilizing others. 5) Combine Transformations: Reduce processing steps by combining multiple operations, which cuts down on overhead and improves efficiency. Check out my GitHub repo for an example implementation: https://lnkd.in/dh_UaJzs Feel free to share your additional tips or questions in the comments. Apache Beam #data #bigdata #gcp #googlecloudplatform #python #learning #linkedin #dataengineering #apache #github

GitHub - AbhijitMahajan14/Dataflow-Beam: This repository serves as my personal log and resource center while I learn Apache Beam. Each day, I push new code, experiments, and notes to document what I've learned. My goal is to build a comprehensive collection of Apache Beam examples and best practices.

github.com

3 Comments
Like Comment
To view or add a comment, sign in

6,667 followers

View Profile Connect

Datavolo’s Post

More Relevant Posts

Explore topics