From the course: Fundamentals of Data Transformation for Data Engineering

What we'll cover and what you should know - SQL Tutorial

From the course: Fundamentals of Data Transformation for Data Engineering

What we'll cover and what you should know

- [Instructor] So before we jump in, it's important to talk about where data transformation is today. And the first point I'd like to make is that data is ever-evolving, and that means that new libraries, tools, frameworks, and even data backends are always emerging, they're always coming out. And that's pretty evident when we look at things like new databases, trends in LLMs, and AI, which are changing largely how we do most of our jobs today, and even methods for data transformation and analysis. But there are some things that stay the same, and really those things are the patterns and intuition for taking data from source to insight. So even though our workflows might change, even though the sources we use, the destinations we use might shift, the patterns that we're going to use for extracting data, transforming it, loading it to a source, and then performing additional analysis, or even building machine learning models or LLMs, those patterns largely stay the same over time. And so it's with that context that I want to talk about the course a little bit. And when we look at the tools that we'll be learning, we're going to frame those in concepts, patterns, and methods for data transformation. Because who knows, right, (chuckles) you might not be using pandas or SQL forever to transform data, even though those are two very dominant methods today. And the fact of the matter is that tools change often, but outcomes change less. And that's what I'm trying to get at in this course welcome. And so our goal is to give you a basis for transforming data regardless of what tools you're using. It's really more about critical thinking and pattern matching than anything else. And so that's going to happen with SQL and pandas here because they are extremely popular and widely adopted. But what you learn can be applied to any language, whether that's Rust, whether that's Polars in Python, or whatever new data transformation framework emerges in the next few years. And so I want to cover some expected knowledge, what you should be bringing to this course, maybe some background that you've had before. So this is an intermediate level course, and that means we expect you to understand a bit about the concepts but not to be an expert. So ideally you'll have some familiarity of the SQL, the basics of a query structure, for example, what a SELECT statement is, how a typical query is formatted, as well as an understanding of left and inner joins. Now, that should also include some knowledge of Python, for example, what variables and functions are, how a typical Python file works. This entire course is actually going to be in Jupyter Notebooks that are hosted in GitHub Codespaces. So you'll need an account on GitHub. Ideally, you'll understand, you know, how to interact with the Jupyter Notebook, and you should also be familiar with pandas. So maybe know what a data frame is, how to select some data. Very basic stuff. Just understanding what some of the fundamentals are so when we jump right into transforming data, you don't feel lost. Now, if that's not the case, don't worry. Just brushing up on these concepts a little bit should be enough to help you get started. But this course is going to move pretty quickly. And so with that in mind, here are some resources if any of those concepts seem unfamiliar. The Data School has a really great Learn SQL page, LearnPython, you can check out some Pandas basics there. And lastly, throughout the course, the pandas docs, the DuckDB docs, are really great resources too. And that's something I'd love to emphasize as well. If you ever get lost, first checking documentation, second, performing a Google search, as basic as that sounds, checking Stack Overflow. These are really good ways to learn. And finally, asking LLMs because that's kind of the future. So let's talk a little bit about the course structure if all of that sounds good. First, we're going to start with an introduction to the course. Next, we're going to jump into transforming data with SQL. Specifically, we'll be using DuckDB, which is an in-memory analytical database that's column-oriented and optimized for typically the operations we use in data transformation. Once we're done with SQL, there will be a challenge at the end of that as well. We'll move on to pandas, and we'll go into the basics of data transformation with Python and pandas that will very closely mirror our work in SQL. So my goal for this course is to show you a bunch of ways to transform data with SQL, and then do pretty similar transformations with pandas so you get an idea of what's good in terms of SQL transformations, what you prefer when you're operating on data with SQL, and maybe the equivalent in Python and pandas. And I think that comparison will show you when SQL shines, when PANDAS shines, and where you can use the most effective method for transforming data. Because often, as a practitioner, you find yourself switching between SQL, switching between pandas. Sometimes data lives in a database, and you need to write SQL to get it out. Other times, the opposite's true, right? Data's in the cloud somewhere, and you need to use Python to extract it, and then you can manipulate it with whatever language you choose. Finally, we'll wrap up with a conclusion. We'll talk about some next steps and what you can do to continue your data transformation journey. Now, for each lesson, that is two and three on the preceding slide, so for the SQL and pandas, there'll be a familiar structure. And each lesson's going to have between eight and 10 videos on data transformation, on different methods of transforming data. Some will be more introductory, others will be more advanced, but we're going to walk through things pretty in detail. The videos are exploratory in nature. That means we're going to walk through analyzing a data set to achieve an objective or answer a question. And at the end of each lesson, so once we're done with all of SQL, and then again once we're done with pandas, there'll be a challenge or an exercise for you to push yourself with. And the materials follow directly for the lesson, so you shouldn't worry about anything not being in there. And all of the code in the videos is also available in the course repo. So you can check that out at any time. There'll also be guided solutions to these exercises available. And finally, I want to talk a little bit about how to be successful in this course. So first, there's a focus on critical thinking and not taking notes or memorizing what we're discussing. So when you watch the videos, ideally, really engage with what we're talking about and try to follow along with how we're approaching the problem. Don't worry about memorizing any code or anything, 'cause the code will always be there. And second, feel free to play around with that code as you go along. So tinker with the examples, ask your own questions. The course is set up in such a way that you can write arbitrary code in between the cells in these Jupyter Notebooks. So if you're following along in a notebook and something's interesting to you, pause the video, play around, write a query, visualize some data, you know, have fun with it. That's the whole point. And as I mentioned earlier, the internet, Stack Overflow, and ChatGPT or Claude, they're your friends, right? Don't limit the information you have access to, because when you're working, when you're doing your own projects, you're going to be using the internet. I use the internet all the time to help me out with syntax that I forget I use ChatGPT all the time because it makes the whole process easier and it helps you learn more. And lastly, there isn't much handholding, but that's sort of the point. I have a lot of belief specifically in people who take the initiative to take courses like this. So congratulations. But I know that you can do it. And even if you get to a point where you feel stuck, even if you get to a point where you feel like things are difficult, I would really encourage you to give it your best and try to press through because that's where personal growth happens. So with all of that, we'll move on and get started with the course.

Contents