Apache Iceberg Workshops

Apache Iceberg Workshops

Data Infrastructure and Analytics

This page shares Iceberg Educational Content

About us

Apache Iceberg is an open source Data Lakehouse table format. This page exists to help people learn how to use Apache Iceberg and learn more about how it works. Two best places to get started are with the following articles: - Apache Iceberg 101 https://www.dremio.com/blog/apache-iceberg-101-your-guide-to-learning-apache-iceberg-concepts-and-practices/ - Apache Iceberg FAQ https://www.dremio.com/blog/apache-iceberg-faq/

Industry
Data Infrastructure and Analytics
Company size
1 employee
Type
Educational

Employees at Apache Iceberg Workshops

Updates

  • Apache Iceberg Workshops reposted this

    View profile for Danica Fine, graphic

    Lead Developer Advocate, Open Source ❄️

    Have you settled on your New Year's resolutions for 2025 yet? What about "learn #apacheIceberg?" 🤔 Great idea! 💡 And I just so happen to have the perfect resource for you: "Apache Iceberg The Definitive Guide." 📖 From the motivation behind the #dataLakehouse paradigm to the nitty-gritty implementation of Iceberg tables, this book has everything you need to dive into Iceberg. And it's not just great for beginners. Seasoned users will find actionable tips for optimizing Iceberg tables as well as overviews of production use cases. The great news is that you don't have to wait for Santa 🎅 to bring you a physical copy this holiday season. The book is available right NOW for download from Dremio. Check it out! https://lnkd.in/gEZjXycr But I still recommend getting a physical copy... if for no other reason than to show it off on your bookshelf. 😉 So... ⬇️ Download the eBook now ✍️ Add the physical book to your wishlist 🤿 Get ready to dive into Iceberg in 2025 You won't regret it! We're nearing the end of our Iceberg #adventCalendar, but there's still more next week. So like, share, and follow for more! #dataLake #dataEngineering

    • No alternative text description for this image
  • Apache Iceberg Workshops reposted this

    View profile for Danica Fine, graphic

    Lead Developer Advocate, Open Source ❄️

    Our #apacheIceberg #adventCalendar continues today with an addition to YOUR calendar. 📆 Yes, you read that correctly. If you're in the #nyc area, we've got an event for you: an #openSource data deep dive! Join your fellow Iceberg and #apachePolaris (incubating) enthusiasts on Tuesday, January 21 for an evening of networking, use cases, and more. https://lu.ma/ep4vzkzm Our speaker list currently includes Alex Merced of Dremio, who will be presenting on catalog interoperability. 📚 But keep an eye out as we announce more speakers over the coming weeks... 👀 Who else would you want to see speaking at this event? 🤔 While you're waiting, RSVP to secure your spot and get ready to help us kick off another year of incredible Iceberg events in 2025. 🥳 Hope to see you there! And if for some reason you can't make it, be sure to like, reshare with your networks, and follow me for more Iceberg content for the rest of the month so that we can find an event for you! #dataLake #dataLakehouse #dataEngineering

    Open Source Data Deep Dive : New York · Luma

    Open Source Data Deep Dive : New York · Luma

    lu.ma

  • Apache Iceberg Workshops reposted this

    View profile for Alex Merced, graphic

    Co-Author of “Apache Iceberg: The Definitive Guide” | Senior Tech Evangelist at Dremio | LinkedIn Learning Instructor | Tech Content Creator

    This is a big deal, it blows open the compute options on your AWS Glue Iceberg Tables

    View profile for Roy Hasson, graphic

    Product @ Upsolver | Data engineer | Advocate for better data

    HOLD UP!! AWS released an Iceberg REST capability in Glue Data Catalog and didn't say anything? Let me show you 🫵 Key capabilities: 🟢 Fully Iceberg REST-spec compliant 🟢 Configures and works without a custom library 🟢 Works with Glue's IAM permissions and Lake Formation fine-grain controls 🟢 Credential vending, managed via Lake Formation My initial testing shows it works well. All databases and tables are available in Glue Data Catalog and Lake Formation so there is no need for additional catalog federation or anything wonky like you do with S3 Tables. For me, this is actually more impactful then S3 Tables. Now I can use standard Iceberg tables stored in S3 and optimized with my choice of optimizer (of course that is Upsolver for me) and still be able to use it with Iceberg-compatible engines without any additional integrations. Links with more detail in the comments. Good luck!

  • Apache Iceberg Workshops reposted this

    View profile for Yonatan Dolan, graphic

    Analytics Specialist, Apache Iceberg evangelist

    re:Invent is over, and now it's time to start summarizing and trying out all the new stuff that was launched. Here are my top 5 announcements from re:Invent season (not only last week), let me know yours... #5 - 𝐑𝐞𝐝𝐬𝐡𝐢𝐟𝐭 𝐦𝐮𝐥𝐭𝐢-𝐝𝐚𝐭𝐚 𝐰𝐚𝐫𝐞𝐡𝐨𝐮𝐬𝐞 𝐰𝐫𝐢𝐭𝐞𝐬 𝐭𝐡𝐫𝐨𝐮𝐠𝐡 𝐝𝐚𝐭𝐚 𝐬𝐡𝐚𝐫𝐢𝐧𝐠 - Data sharing has been around for over 3 years and now allowing customers to share compute not only for reads but also for writes, providing much greater control and flexibility for ETLs. #4 - 𝐙𝐞𝐫𝐨𝐄𝐓𝐋 𝐭𝐨 𝐈𝐜𝐞𝐛𝐞𝐫𝐠 - The most significant announcement is DynamoDB to Iceberg, but this also includes eight other SaaS sources (e.g., Salesforce, SAP OData, ServiceNow). I can’t wait to see additional ZeroETL integrations with Iceberg coming in 2025. #3 - 𝐒3 𝐭𝐚𝐛𝐥𝐞𝐬 - Certainly the one that got the most attention. Is this a game-changer for analytics? I think that remains to be seen, but it’s definitely simplifying the use of Iceberg for AWS customers. #2 - 𝐒𝐚𝐠𝐞𝐌𝐚𝐤𝐞𝐫 𝐔𝐧𝐢𝐟𝐢𝐞𝐝 𝐒𝐭𝐮𝐝𝐢𝐨 - an integrated data and AI development environment that enables collaboration and helps teams build data products faster. It’s still in preview, but I’m eager to start getting feedback from customers about it. #1 - 𝐅𝐢𝐫𝐞𝐡𝐨𝐬𝐞 𝐂𝐃𝐂 - Postgres and MySQL CDC directly to Apache Iceberg. This was launched the week before re:Invent but for me it's still the most exciting one. CDC used to be hard, brittle, cumbersome, and error-prone; Customers love Firehose because it's exactly the opposite - it’s simple, robust, and reliable, making CDC to your lakehouse significantly easier. There are additional ones that didn't make my list but are still exciting, specifically I would call out MSK Express brokers, Amazon Q integration with Quicksight and S3 Metadata

  • Apache Iceberg Workshops reposted this

    View profile for Roy Hasson, graphic

    Product @ Upsolver | Data engineer | Advocate for better data

    Quick demo of AWS's new S3 Tables for Apache Iceberg using Spark. I show you how to: 1. load the catalog client library in Spark. 2. create a table 3. insert data into the table 4. query the table Stay tuned for more videos as this product matures and I get to explore more capabilities. Thank you Jack Ye for fixing the connectivity issue so quickly🔥

  • Apache Iceberg Workshops reposted this

    View profile for Ian Whitestone, graphic

    Co-founder and CEO at SELECT - The Snowflake optimization and cost management platform (We're hiring!)

    AWS dropped a huge announcement yesterday that will have big ripple effects in the data industry. And in my opinion, it may have marked the death of Databrick's Delta Lake. So, what did they announce? A new service called Amazon S3 Tables. Under the hood, this is a brand new type of S3 bucket (called a "table bucket"), specifically optimized for storing data in Parquet and querying via Iceberg. You can think of the table bucket as your "database", and all the files stored in it will be "tables" -> hence "Amazon S3 Tables". The S3 Tables service will provide many services required to operationalize a data lake: table level permissions, metadata management, automatic file compaction/cleanup, and more. Why is this a big deal? Open data formats and data lakes have been all the rage over the past year. Many companies want to keep their data in their Cloud Storage provider and make it accessible to multiple services/query engines. AWS coming out and adding first class support for Parquet/Iceberg will lay down the foundations for this trend to accelerate. S3 Tables will become a new building block that many services (including Snowflake/Databricks) can and should build on top of. Now, back to Delta Lake... Delta Lake is the open source table format built & maintained by Databricks. It's an Iceberg alternative. Earlier this year, there were ongoing debates about what the best open source format is for your data lake. Iceberg and Delta Lake were the top two contenders. With AWS, the largest cloud provider, going out and building such a critical first class service centered entirely around Iceberg, they've gone out and stated very clearly: Iceberg is the winner. When a cloud giant this big throws all their weight behind Iceberg, people take notice. With this in mind, when given the choice between the two, who would bet on Delta Lake as their long term data lake file format that your whole company will build around? I certainly wouldn't. Exciting times.

    • No alternative text description for this image
  • Apache Iceberg Workshops reposted this

    View profile for Yonatan Dolan, graphic

    Analytics Specialist, Apache Iceberg evangelist

    In the last 2 years, most of my Apache Iceberg discussions with customers were around "𝘞𝘩at is 𝘐𝘤𝘦𝘣𝘦𝘳𝘨?" and the immediate follow-up was usually "𝘖𝘒, 𝘐 𝘶𝘯𝘥𝘦𝘳𝘴𝘵𝘢𝘯𝘥 𝘐𝘤𝘦𝘣𝘦𝘳𝘨, 𝘣𝘶𝘵 𝘸𝘩𝘺 𝘯𝘰𝘸?", so I decided to create this infographics which shows some figures that explains it. From the surge in Google Trends interest, the 81% growth YoY in companies using Iceberg to the number GitHub Stars... for me the answer is very clear, NOW is the time...

    • No alternative text description for this image

Similar pages