We’ve all heard compaction is key to mitigating the challenges of small files in your Apache Iceberg tables. But what if you could take it further by performing compaction yourself while understanding the metadata evolution of Iceberg tables in depth? Karthic Rao and Shreyas Mishra’s latest blog is a step-by-step guide to performing compaction using Apache Amoro and Apache Spark on Iceberg tables. They closely examine the evolution of metadata before and after the compaction process. With a Docker Compose setup, this hands-on tutorial is simple to follow and quick to implement! 🔗 https://lnkd.in/gKp4bPDi What’s inside? 📂 Small file problem explained: Why too many small files hurt performance. 🔍 Metadata evolution unveiled: How snapshots and manifests adapt after compaction. ⚙️ Hands-on with Apache Amoro: Using Spark and Amoro to optimize Iceberg tables. 🧰 Real-world example: Learn compaction with NYC Taxi data and see the impact firsthand. #apacheiceberg #datalakehouse #compaction #metadata #dataengineering #smallfileproblem #dataoptimization
e6data
Software Development
San Francisco, California 4,914 followers
The next-gen analytics engine for heavy workloads.
About us
e6data is the next-generation analytics engine, built for mission-critical, non-discretionary “heavy” workloads such as customer-facing analytics, internal ad-hoc reporting, and AI/ML applications. NASDAQ-listed enterprises and unicorns have achieved 10-100x faster queries at high concurrency, with over 60-80% cost savings on TCO with us. Our product offers 360-degree interoperability with existing components, and can be production-ready in 10 days or less with no app or data migration. Available on your chosen cloud and on the Marketplace of AWS, Azure, and Google Cloud.
- Website
-
www.e6data.com
External link for e6data
- Industry
- Software Development
- Company size
- 51-200 employees
- Headquarters
- San Francisco, California
- Type
- Privately Held
- Founded
- 2020
- Specialties
- Cloud Analytics, Big Data, Data Warehousing, Data Platform, Data Lakehouse, Query Engine, Distributed Computing, High Concurrency, Low Latency Analytics, Delta Lake, Iceberg, Hudi, Hive, CloudPrem, Lower TCO, Higher ROI, SQL Engine, and Columnar Processing
Locations
-
Primary
San Francisco, California, US
-
Bangalore, Karnataka, IN
Employees at e6data
-
Raj P.
Seasoned executive with a track record of transforming technology into high- value business and timely exits. Known for exceptional vision…
-
Sreedhar D
-
Deepa Prakash, PMP® CSM® CSPO® AWS Cloud Practitioner®
TPM | Ex Vice President | QA Engineering | Program Management | Digital & Agile Transformation Leader | Product QA | Ex Thomson Reuters | Ex VeriSign
-
Sudarshan Lakshminarasimhan
Founding Engineer- Performance and Research Engineering Lead at E6data
Updates
-
e6data reposted this
[repost Karthic Rao] This page isn't affiliated with the Apache Iceberg project and doesn’t represent PMC opinions. For official news, please check the communication channels provided by the project: https://lnkd.in/dQ76H72K
Delve into a hands-on tutorial on how Apache Iceberg’s sorting within partitions improves query performance in our latest blog post by Karthic Rao and Shreyas Mishra We use a practical example to demonstrate the enhancements in data skipping and reduced table scans for engines like #apache #spark and e6data while sorted columns are enabled. Read it here: https://lnkd.in/gTf9fvBx Comment below with your thoughts or questions. #DataLakehouse #DataAnalytics #ApacheIceberg #Partitioning #Sorting #BigData #QueryOptimization
Enhancing Query Performance in the Apache Iceberg: A Hands-On Guide to Sorting Within Partitions
e6data.com
-
Business catalogs often grab attention due to their governance, tagging, discovery, and lineage capabilities. But behind the scenes, it’s the low-level metadata catalog that lakehouse query engines like e6data, Spark, Trino, and others bank on for critical optimization functions like query planning, data pruning, and more. Why does this catalog matter? 🗂 Manages snapshots and manifests metadata for time travel and schema evolution. 📊 Enables query pruning with fine-grained stats and file-level metadata. ⚙️ Handles parquet metadata collection for large-scale operations, from millions of files to frequent updates. In this latest blog by Vishnu Vasanth and Karthic Rao, learn how this otherwise sidelined catalog is now becoming a critical capability inside lakehouse query engines: https://lnkd.in/g9qnK-j8 #datalakehouse #iceberg #metadatacatalogs #dataengineering #e6data #apacheiceberg #datamanagement
From Hidden to Hero: Low-Level Technical Metadata Catalogs’ Relevance for the Future of Lakehouses
e6data.com
-
e6data reposted this
Lakehouse Days @ e6data I had the privilege of attending the in-person meetup 'Lakehouse Days' at e6data, where we dove deep into the world of Apache Iceberg and its potential to revolutionize data management. A huge shoutout to Ankur Ranjan and team e6data for hosting such a well-structured and insightful event. The sessions were incredibly enriching and left me with a better understanding of modern data architectures. Kudos to the speakers for their expertise and engaging discussions. Your knowledge and passion truly made the event stand out! Looking forward to attending more such events and continuing to learn and grow in this dynamic field of data engineering. #Thankyou #LakehouseDays #ApacheIceberg #e6data #DataEngineering #LearningAndGrowing
Today, I attended an amazing in-person event hosted by e6data : “𝐀𝐩𝐚𝐜𝐡𝐞 𝐈𝐜𝐞𝐛𝐞𝐫𝐠: 𝐔𝐧𝐝𝐞𝐫𝐬𝐭𝐚𝐧𝐝𝐢𝐧𝐠 𝐭𝐡𝐞 𝐈𝐧𝐭𝐞𝐫𝐧𝐚𝐥𝐬, 𝐏𝐞𝐫𝐟𝐨𝐫𝐦𝐚𝐧𝐜𝐞, 𝐚𝐧𝐝 𝐅𝐮𝐭𝐮𝐫𝐞.” The sessions were packed with insights, real-world use cases, and discussions on how Apache Iceberg is shaping the future of data engineering. 𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬 𝐟𝐫𝐨𝐦 𝐭𝐡𝐞 𝐄𝐯𝐞𝐧𝐭: Sachin Tripathi, Shared an insightful session on Apache Iceberg, covering features like time travel, schema evolution, hidden partitioning, and catalogs to optimize data architecture. Soumil S. , Discussed AWS’s recent announcement on S3 Tables, showcasing how they are optimized for analytics workloads. Vipul Bharat Marlecha and Ankur Ranjan: Engaged in an open discussion about streaming ingestion with Apache Iceberg and explored how Netflix uses Iceberg at scale. Fenil Jain and Shreyas Mishra, - Delivered an incredible talk on streaming ingestion using Rust-based solutions, highlighting the benefits of Rust for building efficient, low-latency ingestion pipelines. A huge thanks to Ankur Ranjan and team e6data for organizing this event and to all the speakers for sharing their knowledge. It was an excellent opportunity to learn, network, and explore the advancements in open table formats and streaming ingestion. Looking forward to attending more such insightful events in the future! #ApacheIceberg #DataEngineering #StreamingIngestion #Rust #AWS #LakehouseDays #Startpracticing #Networking #arfin
-
e6data reposted this
Had a great time at the event—big thanks to Ankur Ranjan and the e6data team for a wonderful event focusing Apache Iceberg!
Today, I attended an amazing in-person event hosted by e6data : “𝐀𝐩𝐚𝐜𝐡𝐞 𝐈𝐜𝐞𝐛𝐞𝐫𝐠: 𝐔𝐧𝐝𝐞𝐫𝐬𝐭𝐚𝐧𝐝𝐢𝐧𝐠 𝐭𝐡𝐞 𝐈𝐧𝐭𝐞𝐫𝐧𝐚𝐥𝐬, 𝐏𝐞𝐫𝐟𝐨𝐫𝐦𝐚𝐧𝐜𝐞, 𝐚𝐧𝐝 𝐅𝐮𝐭𝐮𝐫𝐞.” The sessions were packed with insights, real-world use cases, and discussions on how Apache Iceberg is shaping the future of data engineering. 𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬 𝐟𝐫𝐨𝐦 𝐭𝐡𝐞 𝐄𝐯𝐞𝐧𝐭: Sachin Tripathi, Shared an insightful session on Apache Iceberg, covering features like time travel, schema evolution, hidden partitioning, and catalogs to optimize data architecture. Soumil S. , Discussed AWS’s recent announcement on S3 Tables, showcasing how they are optimized for analytics workloads. Vipul Bharat Marlecha and Ankur Ranjan: Engaged in an open discussion about streaming ingestion with Apache Iceberg and explored how Netflix uses Iceberg at scale. Fenil Jain and Shreyas Mishra, - Delivered an incredible talk on streaming ingestion using Rust-based solutions, highlighting the benefits of Rust for building efficient, low-latency ingestion pipelines. A huge thanks to Ankur Ranjan and team e6data for organizing this event and to all the speakers for sharing their knowledge. It was an excellent opportunity to learn, network, and explore the advancements in open table formats and streaming ingestion. Looking forward to attending more such insightful events in the future! #ApacheIceberg #DataEngineering #StreamingIngestion #Rust #AWS #LakehouseDays #Startpracticing #Networking #arfin
-
e6data reposted this
Kudos to Ankur Ranjan and e6data team for organizing such an amazing meetup. The innovation happening in the lakehouse space is a game changer!!
We had a great discussion about streaming ingestion and how Apache Iceberg simplifies the process. Big thanks to Vipul Bharat Marlecha for patiently answering all our questions and sharing valuable insights throughout the meetup.
-
e6data reposted this
Optimizing Iceberg tables can feel overwhelming, especially if it's your first time managing Iceberg tables in production. We’ve been there too! To help make the journey smoother, we’ve compiled a GitHub repository with resources and links on optimizing Iceberg tables. Find some of the top resources aggregated in one place, also feel free to help extend the curation for the rest of the community. 🔗https://lnkd.in/gXYJQU-r #iceberg #apacheiceberg #icebergtables #github #githubrepo #icebergcommunity Vishnu Vasanth Vignesh Ganesan Adishesh Kishore Srinath Prabhu Sudarshan Lakshminarasimhan Faiz Kothari Sweta Singh Vishal Masali Karthic Rao
-
Optimizing Iceberg tables can feel overwhelming, especially if it's your first time managing Iceberg tables in production. We’ve been there too! To help make the journey smoother, we’ve compiled a GitHub repository with resources and links on optimizing Iceberg tables. Find some of the top resources aggregated in one place, also feel free to help extend the curation for the rest of the community. 🔗https://lnkd.in/gXYJQU-r #iceberg #apacheiceberg #icebergtables #github #githubrepo #icebergcommunity Vishnu Vasanth Vignesh Ganesan Adishesh Kishore Srinath Prabhu Sudarshan Lakshminarasimhan Faiz Kothari Sweta Singh Vishal Masali Karthic Rao
-
e6data reposted this
Today, I attended an amazing in-person event hosted by e6data : “𝐀𝐩𝐚𝐜𝐡𝐞 𝐈𝐜𝐞𝐛𝐞𝐫𝐠: 𝐔𝐧𝐝𝐞𝐫𝐬𝐭𝐚𝐧𝐝𝐢𝐧𝐠 𝐭𝐡𝐞 𝐈𝐧𝐭𝐞𝐫𝐧𝐚𝐥𝐬, 𝐏𝐞𝐫𝐟𝐨𝐫𝐦𝐚𝐧𝐜𝐞, 𝐚𝐧𝐝 𝐅𝐮𝐭𝐮𝐫𝐞.” The sessions were packed with insights, real-world use cases, and discussions on how Apache Iceberg is shaping the future of data engineering. 𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬 𝐟𝐫𝐨𝐦 𝐭𝐡𝐞 𝐄𝐯𝐞𝐧𝐭: Sachin Tripathi, Shared an insightful session on Apache Iceberg, covering features like time travel, schema evolution, hidden partitioning, and catalogs to optimize data architecture. Soumil S. , Discussed AWS’s recent announcement on S3 Tables, showcasing how they are optimized for analytics workloads. Vipul Bharat Marlecha and Ankur Ranjan: Engaged in an open discussion about streaming ingestion with Apache Iceberg and explored how Netflix uses Iceberg at scale. Fenil Jain and Shreyas Mishra, - Delivered an incredible talk on streaming ingestion using Rust-based solutions, highlighting the benefits of Rust for building efficient, low-latency ingestion pipelines. A huge thanks to Ankur Ranjan and team e6data for organizing this event and to all the speakers for sharing their knowledge. It was an excellent opportunity to learn, network, and explore the advancements in open table formats and streaming ingestion. Looking forward to attending more such insightful events in the future! #ApacheIceberg #DataEngineering #StreamingIngestion #Rust #AWS #LakehouseDays #Startpracticing #Networking #arfin
-
🗓️ Mark your calendars for December 21st! Join Fenil Jain and Shreyas Mishra, Engineers at e6data, for an exciting session on streaming ingestion to Apache Iceberg using a rust-based solution. 📌 Session Details: - Topic: Streaming ingestion to Apache Iceberg using a rust-based solution - Time: 12:00 - 12:45 PM IST - Key focus: Discover how Rust’s unique features enable efficient ingestion pipelines for Apache Iceberg. Learn how to achieve low-latency ingestion, handle schema evolution, and enable real-time analytics—all without relying on tools like Flink or Spark. 👉 Register here: https://lu.ma/klljxqlg #iceberg #lakehousedays #rust #spark #flink #dataengineering #e6data