Onehouse’s Post

View organization page for Onehouse, graphic

8,368 followers

Curious to learn about the origins of Apache Hudi and all the 🎁 new features (storage optimizations, a new approach to concurrency control, multiple indexes, and more) in Hudi 1.0? Join Ananth P. of Data Engineering Weekly and Hudi creator and PMC Chair Vinoth Chandar for a special 🔥 fireside chat and 💡 Q&A next week. https://lnkd.in/gKbNyH9E

Webinar: Bridging the Gap: A Database Experience on the Data Lake

onehouse.ai

1 Comment

Mahaboob Pasha

Software Engineer at Onehouse

Highly recommended

To view or add a comment, sign in

More Relevant Posts

Albert Wong

Senior Solutions Architect | K8S, Application & Data Integration, Database | Open Source & Retail SME, 15x+ certifications, 3x Patent Author
3mo
Report this post
Favorite talks from Open Source Data Summit 2024: Hudi 1.0. The vision is getting closer to building a transactional database for the lake. With the ability to support read heavy workloads and read/write/streaming workloads, they are adding the ability for concurrency control for multiple writers (scalable writes for data freshness), functional indexes (reshaping partitioning as another layer of the indexing system) and secondary indexes (just like OLTP). Watch the replay at https://lnkd.in/gGqSRwas. Y Ethan Guo Balaji Varadarajan #datalakehouse #dataengineering
Like Comment
To view or add a comment, sign in
Kabilesh P.R

Founding Partner, Mydbops - Delivering expert database management | First TIDB Certified DBA in India | ScyllaDB | Cassandra | 3X AWS certified|OCP| RHCE
2mo
Report this post
#TiDB new integration TiGraph to support graph databases within the same relational databases achieving a massive 8700x performance on benchmarking. This innovation supports complex use cases like financial fraud detection, social network analysis, and knowledge graphs, merging graph traversal with SQL syntax. With its breakthrough performance and flexibility #mydbops #databases #scalablity #datamanagement #migrations #distributedsql https://lnkd.in/dPw-_GeF

Introducing TiGraph: Combining Graphs + the RDBMS Syntax

https://www.pingcap.com
Like Comment
To view or add a comment, sign in
The Open Source Analytics Community

345 followers
7mo
Report this post
At #OSACon 2023, nadine farah gave a fire talk 🔥 on how to transform raw data to gold with Apache Hudi and #DBT. Check out the recorded talk to learn: ➡️ The current challenges of building a medallion architecture at low-latency ➡️ How the new Hudi CDC feature unlocks incremental processing on the lake ➡️ How you can leverage DBT to transform changed records from inserts, updates and deletes to build a low latency medallion architecture https://lnkd.in/gV_jhTcz #hudi #dbt #techtalk

Data Alchemy: Transforming Raw Data to Gold with Apache Hudi and DBT

https://www.youtube.com/

1 Comment
Like Comment
To view or add a comment, sign in
Apache Hudi

10,798 followers
5mo
Report this post
New to Apache Hudi? George Yates have been putting some great introductory video content out there 🔥 Check them out. 👉 Introduction to Apache Hudi: https://lnkd.in/d5_MuiWe 👉 End-to-end incremental data lake with Hudi, Trino & Spark : https://lnkd.in/d546GD5f #dataengineering #softwareengineering

Introduction to Apache Hudi for Data Lake Management! Apache Hudi for Beginners!

https://www.youtube.com/
Like Comment
To view or add a comment, sign in
Nussknacker

529 followers
3mo Edited
Report this post
With the latest version, Nussknacker has become a powerful tool for those working with Apache Iceberg based Data Lakehouses. It can handle both 𝗯𝗮𝘁𝗰𝗵 𝗮𝗻𝗱 𝗻𝗲𝗮𝗿 𝗿𝗲𝗮𝗹-𝘁𝗶𝗺𝗲 workloads. By integrating Nussknacker with Apache Iceberg, you can do the following: 🟢 Ingest data: Load data into your Data Lakehouse. 🟠 Transform data: Clean, filter, and restructure your data. 🔴 Aggregate data: Summarize and group data. 🔵 Enrich data: Use ML inference, joins etc to add context to your data. 🟣 Apply 𝗯𝘂𝘀𝗶𝗻𝗲𝘀𝘀 𝗹𝗼𝗴𝗶𝗰 to data In this blog post, Arkadiusz Burdach will show you how to use Nussknacker to build a data pipeline in this setup 👉 https://lnkd.in/dQg48qSi #Iceberg #DataLakehouse
Like Comment
To view or add a comment, sign in
Arkadiusz Burdach

Head of Technology at Nussknacker
3mo
Report this post
Recently, I wrote a blog post about Nussknacker integration with Flink catalogs. I prepared a step-by-step tutorial on configuring the setup combining Nu, Apache Iceberg and using them to implement an example business use case. My first impression of using Apache Iceberg is that it has a clean design and many things are rethought compared to old-school data lakes. Let me know, what you think about this idea.
Nussknacker

529 followers
3mo Edited

With the latest version, Nussknacker has become a powerful tool for those working with Apache Iceberg based Data Lakehouses. It can handle both 𝗯𝗮𝘁𝗰𝗵 𝗮𝗻𝗱 𝗻𝗲𝗮𝗿 𝗿𝗲𝗮𝗹-𝘁𝗶𝗺𝗲 workloads. By integrating Nussknacker with Apache Iceberg, you can do the following: 🟢 Ingest data: Load data into your Data Lakehouse. 🟠 Transform data: Clean, filter, and restructure your data. 🔴 Aggregate data: Summarize and group data. 🔵 Enrich data: Use ML inference, joins etc to add context to your data. 🟣 Apply 𝗯𝘂𝘀𝗶𝗻𝗲𝘀𝘀 𝗹𝗼𝗴𝗶𝗰 to data In this blog post, Arkadiusz Burdach will show you how to use Nussknacker to build a data pipeline in this setup 👉 https://lnkd.in/dQg48qSi #Iceberg #DataLakehouse
Like Comment
To view or add a comment, sign in
Sudheesh SK

STSM, Master Inventor, Data & AI
7mo Edited
Report this post
My collegue Renu Rajagopal, hath crafted a most excellent blog on the art of synchronizing Iceberg Tables, governed by external hands, with the Catalog's integrative might, within the realm of watsonx.data. Iceberg can enable faster data lake analytics, time travel, partition evolution, ACID transactions, and more. Apache Iceberg is a key piece to achieving an open lakehouse architecture so you can reduce the cost of data warehouses and avoid vendor lock-in. Please read this 5 mins blog https://lnkd.in/grq_5Wek Please note that these are just author's thoughts and not official documentation/opinion of IBM
Like Comment
To view or add a comment, sign in
Harsh Agrawal

Data Engineer | Palantir Foundry | Big Data Analytics | Apache Spark | Hive | Ex-TCSer
8mo
Report this post
In Week-10 of Ultimate Big Data Master's Program by Sumit Mittal Sir at TrendyTech I learned about Apache Spark Optimizations, topics covered in tenth module are - 1. Internals of groupBy 2. Normal Join v/s Broadcast Join 3. Different types of Joins 4. Partition Skew 5. Adaptive Query Execution 6. Join Strategies 7. Optimizing Join of two large tables - Bucketing Saniya Farheen | Madhusmita Chelleng

3 Comments
Like Comment
To view or add a comment, sign in
EuroPython

4,135 followers
6mo
Report this post
Check out Gowthami Bhogireddy presenting at #EuroPython2024, in their words: "This talk explores how Bloomberg leverages Apache Iceberg and Parquet for high-performance, scalable data management. Learn how Iceberg's open-source table format offers ACID transactions, versioning, and schema evolution for efficient data handling" https://lnkd.in/d-PJzGDn 🐍

Taming One Quadrillion Data Points with Apache Iceberg and Parquet

ep2024.europython.eu
Like Comment
To view or add a comment, sign in
Apache Hudi

10,798 followers
1mo
Report this post
Soumil joins us for Episode 3 of 'Lakehouse Chronicles with Hudi' 🎉 If you know Soumil, he always strives to make implementing things simple and fun. In this episode, he will go over a demo to tackle a real-world problem (CDC) of bringing data from operational sources to a lakehouse using Hudi Streamer. Specifically, the demo will cover: - capturing changes from Postgres using Debezium-Postgres connector - publishing it to Kafka topics - using Hudi Streamer in continuous mode to read from #Kafka - ingest into data lakehouse (Hudi) - syncing to HMS & querying using Trino Join here: https://lnkd.in/d5PGzZqC #dataengineering #softwareengineering
Like Comment
To view or add a comment, sign in

8,368 followers

View Profile Connect

Onehouse’s Post

More Relevant Posts

Data Alchemy: Transforming Raw Data to Gold with Apache Hudi and DBT

https://www.youtube.com/

Introduction to Apache Hudi for Data Lake Management! Apache Hudi for Beginners!

https://www.youtube.com/

Explore topics