🎁 Early holiday gift from Onehouse: the top 5 tips for scaling Apache Spark! Tired of shuffle failures and out of memory errors? Spark jobs running too slow? We demystified the most important considerations to scaling your Spark workloads, including: - On-heap vs. off-heap memory - Spilling to disk - Optimizing your data structures - Choosing the right serialization technique - Adaptive query execution - Dynamic allocation - … and more! These tips come from the Onehouse team’s experiences operating and taming complex, petabyte-scale Spark workloads for the largest data lakes on the planet. Read the blog post here at https://lnkd.in/dVeXhn2E.