Curious to learn about the origins of Apache Hudi and all the 🎁 new features (storage optimizations, a new approach to concurrency control, multiple indexes, and more) in Hudi 1.0? Join Ananth P. of Data Engineering Weekly and Hudi creator and PMC Chair Vinoth Chandar for a special 🔥 fireside chat and 💡 Q&A next week. https://lnkd.in/gKbNyH9E
Onehouse’s Post
More Relevant Posts
-
Favorite talks from Open Source Data Summit 2024: Hudi 1.0. The vision is getting closer to building a transactional database for the lake. With the ability to support read heavy workloads and read/write/streaming workloads, they are adding the ability for concurrency control for multiple writers (scalable writes for data freshness), functional indexes (reshaping partitioning as another layer of the indexing system) and secondary indexes (just like OLTP). Watch the replay at https://lnkd.in/gGqSRwas. Y Ethan Guo Balaji Varadarajan #datalakehouse #dataengineering
To view or add a comment, sign in
-
#TiDB new integration TiGraph to support graph databases within the same relational databases achieving a massive 8700x performance on benchmarking. This innovation supports complex use cases like financial fraud detection, social network analysis, and knowledge graphs, merging graph traversal with SQL syntax. With its breakthrough performance and flexibility #mydbops #databases #scalablity #datamanagement #migrations #distributedsql https://lnkd.in/dPw-_GeF
Introducing TiGraph: Combining Graphs + the RDBMS Syntax
https://www.pingcap.com
To view or add a comment, sign in
-
At #OSACon 2023, nadine farah gave a fire talk 🔥 on how to transform raw data to gold with Apache Hudi and #DBT. Check out the recorded talk to learn: ➡️ The current challenges of building a medallion architecture at low-latency ➡️ How the new Hudi CDC feature unlocks incremental processing on the lake ➡️ How you can leverage DBT to transform changed records from inserts, updates and deletes to build a low latency medallion architecture https://lnkd.in/gV_jhTcz #hudi #dbt #techtalk
Data Alchemy: Transforming Raw Data to Gold with Apache Hudi and DBT
https://www.youtube.com/
To view or add a comment, sign in
-
New to Apache Hudi? George Yates have been putting some great introductory video content out there 🔥 Check them out. 👉 Introduction to Apache Hudi: https://lnkd.in/d5_MuiWe 👉 End-to-end incremental data lake with Hudi, Trino & Spark : https://lnkd.in/d546GD5f #dataengineering #softwareengineering
Introduction to Apache Hudi for Data Lake Management! Apache Hudi for Beginners!
https://www.youtube.com/
To view or add a comment, sign in
-
With the latest version, Nussknacker has become a powerful tool for those working with Apache Iceberg based Data Lakehouses. It can handle both 𝗯𝗮𝘁𝗰𝗵 𝗮𝗻𝗱 𝗻𝗲𝗮𝗿 𝗿𝗲𝗮𝗹-𝘁𝗶𝗺𝗲 workloads. By integrating Nussknacker with Apache Iceberg, you can do the following: 🟢 Ingest data: Load data into your Data Lakehouse. 🟠 Transform data: Clean, filter, and restructure your data. 🔴 Aggregate data: Summarize and group data. 🔵 Enrich data: Use ML inference, joins etc to add context to your data. 🟣 Apply 𝗯𝘂𝘀𝗶𝗻𝗲𝘀𝘀 𝗹𝗼𝗴𝗶𝗰 to data In this blog post, Arkadiusz Burdach will show you how to use Nussknacker to build a data pipeline in this setup 👉 https://lnkd.in/dQg48qSi #Iceberg #DataLakehouse
To view or add a comment, sign in
-
Recently, I wrote a blog post about Nussknacker integration with Flink catalogs. I prepared a step-by-step tutorial on configuring the setup combining Nu, Apache Iceberg and using them to implement an example business use case. My first impression of using Apache Iceberg is that it has a clean design and many things are rethought compared to old-school data lakes. Let me know, what you think about this idea.
With the latest version, Nussknacker has become a powerful tool for those working with Apache Iceberg based Data Lakehouses. It can handle both 𝗯𝗮𝘁𝗰𝗵 𝗮𝗻𝗱 𝗻𝗲𝗮𝗿 𝗿𝗲𝗮𝗹-𝘁𝗶𝗺𝗲 workloads. By integrating Nussknacker with Apache Iceberg, you can do the following: 🟢 Ingest data: Load data into your Data Lakehouse. 🟠 Transform data: Clean, filter, and restructure your data. 🔴 Aggregate data: Summarize and group data. 🔵 Enrich data: Use ML inference, joins etc to add context to your data. 🟣 Apply 𝗯𝘂𝘀𝗶𝗻𝗲𝘀𝘀 𝗹𝗼𝗴𝗶𝗰 to data In this blog post, Arkadiusz Burdach will show you how to use Nussknacker to build a data pipeline in this setup 👉 https://lnkd.in/dQg48qSi #Iceberg #DataLakehouse
To view or add a comment, sign in
-
My collegue Renu Rajagopal, hath crafted a most excellent blog on the art of synchronizing Iceberg Tables, governed by external hands, with the Catalog's integrative might, within the realm of watsonx.data. Iceberg can enable faster data lake analytics, time travel, partition evolution, ACID transactions, and more. Apache Iceberg is a key piece to achieving an open lakehouse architecture so you can reduce the cost of data warehouses and avoid vendor lock-in. Please read this 5 mins blog https://lnkd.in/grq_5Wek Please note that these are just author's thoughts and not official documentation/opinion of IBM
To view or add a comment, sign in
-
In Week-10 of Ultimate Big Data Master's Program by Sumit Mittal Sir at TrendyTech I learned about Apache Spark Optimizations, topics covered in tenth module are - 1. Internals of groupBy 2. Normal Join v/s Broadcast Join 3. Different types of Joins 4. Partition Skew 5. Adaptive Query Execution 6. Join Strategies 7. Optimizing Join of two large tables - Bucketing Saniya Farheen | Madhusmita Chelleng
To view or add a comment, sign in
-
Check out Gowthami Bhogireddy presenting at #EuroPython2024, in their words: "This talk explores how Bloomberg leverages Apache Iceberg and Parquet for high-performance, scalable data management. Learn how Iceberg's open-source table format offers ACID transactions, versioning, and schema evolution for efficient data handling" https://lnkd.in/d-PJzGDn 🐍
Taming One Quadrillion Data Points with Apache Iceberg and Parquet
ep2024.europython.eu
To view or add a comment, sign in
-
Soumil joins us for Episode 3 of 'Lakehouse Chronicles with Hudi' 🎉 If you know Soumil, he always strives to make implementing things simple and fun. In this episode, he will go over a demo to tackle a real-world problem (CDC) of bringing data from operational sources to a lakehouse using Hudi Streamer. Specifically, the demo will cover: - capturing changes from Postgres using Debezium-Postgres connector - publishing it to Kafka topics - using Hudi Streamer in continuous mode to read from #Kafka - ingest into data lakehouse (Hudi) - syncing to HMS & querying using Trino Join here: https://lnkd.in/d5PGzZqC #dataengineering #softwareengineering
To view or add a comment, sign in
8,368 followers
Software Engineer at Onehouse
5dHighly recommended