WTD Analytics’ Post

WTD Analytics reposted this

View profile for Vishal Waghmode, graphic

Founder @ WTD Analytics | Databricks MVP & Partner | Data Engineering Consulting

How we can manage Schema Inference and Evolution with Databricks Auto Loader ? Schema inference and evolution in Auto Loader simplify managing data schemas over time, especially when working with dynamically changing datasets. Here's how Auto Loader handles schema detection, evolution, and unexpected data all while keeping your data streams running smoothly. Schema Inference: - Automatically detects schemas when loading data. - Handles JSON, CSV, XML, Parquet, and Avro formats. - Saves schema history in the schema location. - Infers all columns as strings for untyped formats like JSON and CSV. Schema Evolution: - Detects and manages new columns as they appear. - Options to fail, rescue, or ignore new columns during schema evolution. - Default behavior is to stop the stream on encountering new columns and add them to the schema. Real-Life Example A retail company ingests JSON data for online orders. Initially, the schema includes order_id, customer_id, and order_date. Over time, new columns like coupon_code and delivery_time are added. With Auto Loader: - Schema inference detects new columns automatically. - Schema evolution adds coupon_code and delivery_time without manual intervention. - Unexpected data like malformed records are rescued for further analysis. #WhatsTheData #DataEngineering #Databricks

  • No alternative text description for this image
Ragavan Ramasamy

Senior Data Engineer @General Motors - Data Analytics | AI | ML

2w

Very helpful!

See more comments

To view or add a comment, sign in

Explore topics