WTD Analytics reposted this
How we can manage Schema Inference and Evolution with Databricks Auto Loader ? Schema inference and evolution in Auto Loader simplify managing data schemas over time, especially when working with dynamically changing datasets. Here's how Auto Loader handles schema detection, evolution, and unexpected data all while keeping your data streams running smoothly. Schema Inference: - Automatically detects schemas when loading data. - Handles JSON, CSV, XML, Parquet, and Avro formats. - Saves schema history in the schema location. - Infers all columns as strings for untyped formats like JSON and CSV. Schema Evolution: - Detects and manages new columns as they appear. - Options to fail, rescue, or ignore new columns during schema evolution. - Default behavior is to stop the stream on encountering new columns and add them to the schema. Real-Life Example A retail company ingests JSON data for online orders. Initially, the schema includes order_id, customer_id, and order_date. Over time, new columns like coupon_code and delivery_time are added. With Auto Loader: - Schema inference detects new columns automatically. - Schema evolution adds coupon_code and delivery_time without manual intervention. - Unexpected data like malformed records are rescued for further analysis. #WhatsTheData #DataEngineering #Databricks
Very helpful!
Check-out our latest blogs. https://wtdanalytics.com/blog