GetYourGuide’s Post

View organization page for GetYourGuide, graphic

83,784 followers

Using job clusters with dbt and Databricks? Two of our talented Data Engineers, Giovanni Corsetti Silva and Shaurya Sood, explain the process, challenges, and intuitive workarounds in our latest blog 🚀 This cost-effective method was vital in our journey to optimized data pipelines. Although it was a temporary fix, we thought we’d share our methodologies with you, our code-curious readers! Dive in for a step-by-step guide on how we leveraged Docker images and dealt with deployment while saving money and resources. Read it here https://lnkd.in/eKSPvuRp 👈 #TechBlog #DBT #DataBricks #Coding #CodingGuide #Engineering #Innovation

1 Comment

Giovanni Corsetti Silva

Data Engineer @GetYourGuide

Shaurya Sood good job 🤙

To view or add a comment, sign in

More Relevant Posts

QAbird

822 followers
3mo
Report this post
The ‘𝗙𝗼𝗿 𝗘𝗮𝗰𝗵’ task is now available in 𝗪𝗼𝗿𝗸𝗳𝗹𝗼𝘄𝘀! 🎉 This allows for a variety of input formats like key-value pairs, JSON, strings, and numbers.
Elżbieta Doniek

Data Engineer at Procter&Gamble
4mo

Hi Databricks Data Engineers! This is the feature I've been waiting for! The '𝗙𝗼𝗿 𝗘𝗮𝗰𝗵' task is now available in 𝗪𝗼𝗿𝗸𝗳𝗹𝗼𝘄𝘀! It allows inputs in various formats such as key-value pairs, JSON, strings, or numbers. Additionally, you can reference task values passed from preceding tasks. While there are some considerations regarding downstream tasks versus nested tasks, this improvement is truly fantastic! Happy coding! #DataEngineering #Databricks #DatabricksWorkflows #qabirdbricklab QAbird
Like Comment
To view or add a comment, sign in
Adarsh srivathsa M S

Azure Data Engineer at EY | 210K+ Post Impressions | Microsoft Azure | AI and Quantitative Modelling | SWE
1mo
Report this post
" 5 Reasons Why Databricks is Essential for Data Engineers " 🌟 Databricks is more than just a data platform; it’s an innovation hub for data engineers. Here’s why: 1️⃣ Unified Platform: Simplifies data integration by combining ETL, analytics, and machine learning in one ecosystem. 2️⃣ Delta Lake: Offers ACID compliance, time travel, and schema enforcement for reliable and scalable data lakes. 3️⃣ Collaborative Workspace: Seamlessly integrates notebooks for Python, SQL, and Scala, enabling teams to collaborate effectively. 4️⃣ Optimized Data Processing: Accelerates big data processing with Spark and Delta Engine, ensuring real-time insights. 5️⃣ Scalable Infrastructure: Automatically scales clusters based on workload, optimizing cost and performance. 🎥 Check out this https://lnkd.in/g9_-34qY for a deeper dive into how Databricks empowers modern data engineering! 🏷️Tagging Shashank Singh 🇮🇳 Priyanka Banerjee Manali Kulkarni R GANESH Korrapati Jaswanth Venkata Naga Sai Kumar Bysani for better reach. #databricks #dataengineering #muvidatalab #datalake #deltalake #bigdata #dataanalytics #spark #etl #dataplatform #collaboration

Introduction to Databricks for Data Engineers | Part 1

https://www.youtube.com/

1 Comment
Like Comment
To view or add a comment, sign in
Shwetank Singh

Lifelong learner, Data Engineer, Data Architect and Azure Architect. Trained 20k professionals.
6mo Edited
Report this post
Data Engineering 101: Day 12 - Delta Today we have an incredible resource for all data enthusiasts, engineers, and architects out there - "Data Engineering 101 - Delta". This comprehensive guide dives deep into Delta Lake, covering everything from the basics to advanced concepts with practical PySpark examples. Happy Learning! Follow Shwetank Singh for more learnings... #dataengineering #upskill #gritsetgrow #interviewprep #data #interview #basics #Spark #Databricks #delta

4 Comments
Like Comment
To view or add a comment, sign in
Technologia Corporation

8,103 followers
3w
Report this post
Mastering the art of data engineering starts with strong fundamentals and a vision for the future. 🚀 On Day 3 of our Data Engineering series, we’re diving into the Essential Skills every data engineer needs—from SQL and Python to cloud platforms and orchestration tools. Let’s build the foundation for success! 💡 #DataEngineering #TechSkills #BigData
Like Comment
To view or add a comment, sign in
Abdul Malik Mohammed

Azure Data Engineer
1mo
Report this post
Error Handling in Data Ingestion: Saving Your Pipelines from Chaos Data ingestion pipelines are like road trips—one unexpected bump, and the whole journey can go off track. Effective error handling ensures your pipeline stays on course, even when things go wrong. Logging: Use structured logging (e.g., log4j or Python’s logging module in Databricks) to capture detailed error messages and pipeline events. These logs are your GPS for troubleshooting. Retry Mechanisms: Set up retries for transient errors like network issues or temporary file locks. In Databricks, you can configure retry policies in workflows to automatically handle these hiccups. Dead Letter Queues: For unprocessable records, route them to a dead letter queue (DLQ) for review later. This avoids halting the pipeline for a single bad record. Pro Tip: Always validate data at the source to catch common errors early—because no one wants to debug a million-row ingestion job to find one pesky NULL. #DataEngineering #ErrorHandling #DataPipelines #AzureDatabricks
Like Comment
To view or add a comment, sign in
Roberto Zagni

Author. Principal consultant, senior leader in Data Architecture, SW craftsman and Scrum Master. Data and SW Engineering Evangelist. Data Vault 2.0 Certified Practitioner.
6mo
Report this post
Still a couple of days to take advantage of this 26% OFF offer and get yourself a book with a comprehensive discussion of modern Data Engineering and how to apply it with dbt, including a sample project with dbt Cloud and Snowflake. dbt and Snowflake has been shaping the modern data engineering landscape bringing on one side the best practices of software engineering to data and on the other side making available to anybody (including individuals and small companies) with no upfront investment the most powerful data platform that powers the innovative workloads in many Fortune 500 companies. This books discusses both the high level topics and then delves in the more technical details, down to the level code that you can take and bring to your projects.
Nilesh Kowadkar🎨

Top PM Fellow at Nextleap | Curator of an Exclusive Product Management Community, Running Study Room Sessions Every Weekend
6mo

😱 The data engineering race will play out, but one thing for certain is that data tools that can incorporate software engineering best practices into the data engineering field will continue to outgrow those that fail to evolve. and dbt Labs is one of them? . . . And Hence Roberto Zagni has authored a book on "Data Engineering with @dbt" which is hands-on guide for all data professionals who want to master dbt Labs skills!!! And Also the book is reviewed by Kent Graziano, Checkout the review in the graphic below and...... 😱 What are you waiting for? Go Buy it using below links 🛒 👉 Amazon link: 🔸 India : https://amzn.in/d/5iQOFDY 🔸 United States : https://a.co/d/0Wvhdpg 🔸 United Kingdom : https://amzn.eu/d/dfd90Fr . . . 🏆 Kudos to Packt Team Reshma Raman Govindan Kurumangattu Joseph Sunil Kirti Pisat Farheen Fathima Aparna Bhagat Apeksha Shetty Chayan Majumdar Vandita Grover Pratik Parikh Nivedita Singh Abdur Rahman 🏅 Huge Thanks to Technical Reviewers for there time and valuable suggestions Daniel Joshua Jayaraj Suresh Rathnakumar Hari Krishnan Umapathy Naresh Kumar #data #dataengineers #dataengineerjobs #dataengineering #data #datanalytics #dataloading #datatransformation #dbtlabs #datavault #datamastery #learning #ebooks #books #publishing
Like Comment
To view or add a comment, sign in
Rajkumar Manda

Software Engineer | Azure Data Engineer |Azure Data Factory | Azure Databricks | MY SQL |Cloud Technologies | SQL Server |T-SQL | SSIS | SSRS | oracle database | Python | Py Spark | Database Engineer| ADL Gen2|Azure SQL
1mo
Report this post
🚀 Master the Essentials of PySpark! 🔥 I've been diving deep into PySpark Fundamentals, and I'm excited to share some core takeaways that are essential for data professionals. From understanding RDDs and DataFrames to mastering transformations and actions, PySpark simplifies big data processing with unparalleled efficiency. This framework is perfect for tackling both batch and streaming data seamlessly. 🌟 💡 If you're looking to enhance your data engineering skills or transition into the big data world, PySpark is a must-learn! #PySpark #BigData #DataEngineering#LearningJourney#TechSkills

4 Comments
Like Comment
To view or add a comment, sign in
Yusuf Didighar

Data Engineering & BI Architect | Mentor
3mo
Report this post
If you are wondering what data engineers do in day-to-day work in databricks and pyspark, here is a list of top things you should know. 1. Different types of compute and clusters 2. Different ways to run a notebook from another notebook and differences between them. 3. Creating and Using Widgets 4. Creating workflows 5. Reading from different file formats csv, json, parquet, delta and database 6. Most used transformations : adding columns, aggregates, joins, filter. 7. Writing dataframes to external location like data lake or database 8. Creating and managing delta tables 9. Using different PySpark sql functions 10. Using unity catalog Follow me : Yusuf Didighar as I bring you unique and immense amount of content on data analytics and azure data engineering. Glad to contribute to help the community, lets learn and grow! #databricks #pyspark #dataengineering

1 Comment
Like Comment
To view or add a comment, sign in
Shivansh Vijay Nathani

Data Engineer @ Walmart | Developer Specializing in Search, Data Discovery Platforms and Data Engineering | Proficient in Crawler Development and Optimization | Cloud Technology Enthusiast | Agile Practitioner
10mo Edited
Report this post
Embarking on the path to becoming a certified Spark developer with Databricks. Stay tuned for regular updates on this thrilling learning journey! Instead of traditional note-taking, I've decided to share my learning journey through daily Medium articles! Expect a fresh post every day (I’ll try), documenting insights, challenges, and triumphs. Let's embark on this learning adventure together! P.S.: If you have any useful resources or insights to share on data engineering, please drop them in the comments or DM! #DataEngineering #TechJourney #ContinuousLearning #MediumArticle #LinkedInConnect

Navigating the Intersection: My Journey from Senior Software Engineer to Data Engineering…

link.medium.com

8 Comments
Like Comment
To view or add a comment, sign in
Rohit Patil

Data Engineering for General Mills | CPG Industry | Consumer & Market Insights.
3mo Edited
Report this post
I’m thrilled to announce that I’ll soon start sharing my real-time experiences and learnings around Data Engineering while working with dbt (Data Build Tool). As I dive deeper into this , I’m eager to contribute back to the community, helping others navigate the world of analytics engineering. Stay tuned for insights on: • Best practices • Common challenges (and solutions!) • How dbt is shaping modern data workflows. I plan to share also about GCP, Airflow, Bigquery and some more things. Looking forward to learning and growing together! #AnalyticsEngineering #DataEngineering #DataTransformation #LearningJourney

4 Comments
Like Comment
To view or add a comment, sign in

83,784 followers

View Profile Connect

GetYourGuide’s Post

More Relevant Posts

Introduction to Databricks for Data Engineers | Part 1

https://www.youtube.com/

Explore topics