Apache XTable (Incubating)’s Post

Apache XTable (Incubating) reposted this

View profile for Dipankar Mazumdar, M.Sc 🥑, graphic

Staff Data Engineer Advocate @Onehouse.ai | Apache Hudi, Iceberg Contributor | Author of "Engineering Lakehouses"

Apache XTable in Production with Microsoft Azure OneLake! 🎉 Since Apache XTable (Incubating) is kind of in its early stages, I often get asked how I see it being adopted, and how this open-source solution can be applied to different use cases with lakehouse table formats. XTable started with the core idea around “interoperability”. That you should be able to write data in any format of your choice irrespective of whether it’s Apache Iceberg, Apache Hudi or Delta Lake. And then you can bring any compute engine of your choice that works well with a particular format (performance, integration wise) & run analytics on top. Each of these formats shines in specific use cases depending on its unique features! And so based on your use case & technical fit (in your data architecture), you should be free to use anything without being married to just one. On the query engine-side (warehouse, lake compute), more & more vendors are now looking at integrating with these open formats. In reality, it is tough to have robust support for every single format. By robust I mean - full write support, schema evolution, compaction. And even if they do work with multiple formats, it is practically tough to build optimization capabilities for each of these formats. So, to summarize, I see XTable having 2 major applications: ✅ On the compute-side with vendors using XTable as the interoperability layer ✅ Customers using multiple formats adding XTable to their existing data pipelines (say Apache Airflow operator or a lambda fn) Yesterday's announcement on Fabric OneLake-Snowflake interoperability is a critical example that solidifies point (1). With this feature, users can use OneLake shortcuts to point to an Iceberg table written using Snowflake (or another engine), and it will present that table as a Delta Lake table, which works well within Fabric ecosystem. This is powered by XTable 🚀 This abstraction at a user level, will allow disparate data sources to work together as "one single copy" irrespective of the table formats. Having open table format is a start towards an open architecture, but you need to have "interoperability" standards, because your tool stack/ecosystem can evolve over time. I elaborate these aspects in my blog that I attached in comments! #dataengineering #softwareengineering

  • No alternative text description for this image
Dipankar Mazumdar, M.Sc 🥑

Staff Data Engineer Advocate @Onehouse.ai | Apache Hudi, Iceberg Contributor | Author of "Engineering Lakehouses"

1mo
Like
Reply
Lakshmi Shiva Ganesh Sontenam

Data Engineering - Retail Anly. | Visual Illustrator | Medium✍️

3w

Very informative Dipankar Mazumdar, M.Sc 🥑 love this! • Table formats like Iceberg, Delta Lake, and Hudi enable schema evolution, time travel, and partition pruning, powering in-place analytics. • But with multiple formats—and more likely to emerge—interoperability is becoming a challenge. • A unified framework like Apache XTable (Incubating) could simplify metadata management and enable seamless compatibility across tools. 📌 also check out my post on the emergence of table formats to in-place analytics here: [https://www.linkedin.com/posts/shivaga9esh_inplaceabranalytics-activity-7266820428608745472-OhIK?utm_source=share&utm_medium=member_ios].

Avinash Mirtipati

Senior Consultant @ StatusNeo | Expert Data Engineer | AWS Certified Solutions Architect – Professional

1mo

Very helpful

See more comments

To view or add a comment, sign in

Explore topics