From the course: Data Wrangling in Excel with Power Query

Data wrangling steps

Hi everyone, Jordan Goldmeier here. We are going to start at transaction data start here so you can go ahead and find that in your project files. What I'm going to do is give you a tour of the data that we were sent in. So you remembered from the last module the workflow we went through. So let me just show you this data set that's been sent in. So first thing I see here is we see first name and last name. So we have these headers and these seem to be inconsistent with their underscores with the rest of the data set. So we have email here, city, I'm looking at street number, then I see credit card here. It's shown to me in scientific notation, but I probably want to see the credit card. I see here now we see street name, street suffix. So probably we want to make columns E, G and H all one column. I see over here job title. That's good. We have a job category that's not filled in, so we probably want to get rid of that. And then if we want to add fields over here, we'd have to add them manually, right? So at this point, to get our data into a Power Query queryable format, and this is specific to this problem case. So we've opened it up in Excel, right? We've opened that data in Excel. You'll see in the next one we can actually connect to files that we don't open in Excel. But once we open in Excel, we need to tell power query a way to get access to it. And the way to do that is to use an Excel table. Table with a capital T If you've not used Excel tables before, I have a whole chapter on it or a module as it were, in Excel Dashboard Pro. So make sure you go ahead and watch that module. It'll tell you everything you need to know. But if you're comfortable moving forward here, let's go ahead and do it. So what I'm going to do is I'm going to hit Control T is going to insert a table here. I can also go to insert and hit table. So I've gone to the insert tab. You see table right there. I could do that. I could also do Control T, which is my favorite thing to do. So Control T, my table has headers. I'm going to go ahead and hit okay. We've turned it into a table. Now, I would love if I were to be lazy and to be lazy here to call this table one. But that's tacky and I swore an oath to never do that, so I have to call it something. So let's call it data set. I think that's the easiest thing to call it. Why not? So we're going to call it data set and we've turned into a table. And so now that it's as an Excel table, we are ready. We can go to this data tab here and we can click from table slash range like that. And now we're in. So if you've never seen Power Query before, we're going to just take it one step at a time. And what I'm going to do is I'm going to show you how to solve some of the problems that I'm solving with it. And then at the end of this module, I'm going to kind of take you on a tour of some of the features. But really, what I want you to do here is please do follow along if you'd like, but also you can watch and take notes. But really, I just want you to understand how we go from our first state, which is this kind of junky data set to doing and cleaning and wrangling all the things that we want in it. So I want to show you how easy it is for Power Query to do this. If you've been around the Microsoft Excel landscape or I should say the Microsoft landscape for a while, what you'll see is that these actions, as I showed you in the last one, were done by other tools. You could do them formulaically. Many times people used Microsoft Access for it. So this does replace part of what Microsoft Access does and a lot of people used SQL for it. Nowadays they might use R. All of it is just doing the same thing of data wrangling. So with that, let's just take a look at how easy some of these things are. So step one, I want to change the headings, right? So I'm looking at first name as lowercase, last name is kind of lowercase. Look at all the rest of these emails, city, street number. These are all fine. So I would like to change first name and last name to be consistent. So what can I do here? I'm going to double click first name. I just double clicked it in there. First name. I'll go ahead and hit enter on that. So now you see over here it says renamed columns and you might not have the formula bar showing on yours. We'll talk about that at the end of this module. So I'm going to double click over here and type in last name. I'll go ahead and hit enter. So we got first name, last name, now it's consistent. And over here I have a step. It says renamed columns. Well, boy, isn't that interesting? So we're going to talk about the source and the change step at the end of this module, but I just want you to ignore those for now. That's something that Excel did for you because it was trying to help you out. And we'll see at the end of this lesson, or I should say at this end of this entire course, it's not always helpful. So it's kind of something to watch out for. Okay. So we changed first name, we changed last name. What is our next step? The next step is we want to remove nulls from the last name field. So again, in Excel we might apply an advanced filter and then filter out the nulls. Here we can sort of think logically the same way. So we have first name, last name. Look at all these little dropdowns here. Right? So what I could do is maybe I could click on this dropdown. Oh, doesn't that look familiar? It's a filter, right? So right now you see it says select all, there's a null. There's a true. I don't even know what that true data type is, so I'm going to get rid of that because it looks junky and then I'm going to get rid of the nulls because we're not going to do work on the nulls. Something did not go correctly with that. Okay. So the next thing I see here is that street number and street name and street suffix are all in different columns and the credit card sort of butts into all of them. And maybe I don't want credit card there. So let's think. In this next step, I want to move credit cards. So I'm going to take this and I'm going to click and hold it so now I can move it to wherever I want. I'm going to move it right to after email, but before city. But there's no right answer necessarily on that. You can move it to wherever you want. I just want you to think if I were running the show, where would I move it? So I'm looking at credit card right now. It's in scientific notation. I perhaps want it to be text. So how can I convert it to text really easily? Look across the headers of all of these fields. You see that there's like an A,B,C here, a one, two, three here. So I'm in credit card. I'm going to put my mouse over the one, two, three. And you see I have this list of different types of fields. So in this case, I'm going to turn it into not anything that's a number here. I'm just going to turn it into a text. So you see text right there, I'll click that. You see it said changed type. And you know what? I add that type one, don't worry about that. So we change the type. Right. Now it's proper credit card field and I'm looking over here, we have street address. These are all three different columns. So perhaps I want to transform it all into one column. So I'm going to go ahead and do that. I'm going to hold shift and I'm going to select columns, street number, street name through street Suffix. So I'm just thinking one of the great things about Power Query is it really is drag and drop. So just think about the workflow that you would have to do in an ideal world if you could just do this thing. If you just take three columns and just merge them together easily. Well, that's what Power Query allows you to do. So I've told Power Query what I'm most interested in. I'm going to go and click transform here and the next thing I would like to do is merge these all together. So I'm looking in the table group don't see anything that's really interesting here. I see there's all this interesting stuff in the column group, but it's not really what I'm looking for. What am I trying to do? Well, I said it. Right. In the text column you see this thing that says merge. So whenever if you're trying to learn Power BI or excuse me. So if you're trying to learn Power Query, Microsoft has named things as sort of key phrases by the actions that they do. So if you just think what do I want to do, chances are there's a button that allows you to do it. So I'm going to go to transform. I'm going to click merge columns like that. And now it's asking us, Well, do you want a separation when we merge everything together? And of course we do, right? We want a space. So we're going to click on this dropdown and click space and then it's giving us the chance to rename this next column. So it's merged is what it's called here, but perhaps I want to call it something else. I want to call it address like that. So I'll type that in, go ahead and click okay. And you see now it's changed it to address, right? So you can see here that it's also actually added these steps. So the next thing I want to do is actually remove this thing that says job category because I don't really need that, right? So I'm going to right click that. Wouldn't it be great if you could just right click a column and hit delete? Well, you can in Power Query. So we're going to right click that. You see something here. It says remove. You can click remove on that. And now we're done. We're done with these steps, right? And since we're done, we can go to file and we're going to click close and load to like that. So what that does is it tells us it allows us to load it somewhere within Excel. So we're not going to get too complicated here. We're just going to put it into a table like that and you can go ahead and ignore this. That's just kind of an old query. But notice in here, notice in here that this is now our updated data set. So someone gave us a whole bunch of work and then we applied Power Query and it fixed it up for us, right? So we have our transaction data. That was our first data. This is the data we put in. So I'm going to take that and drag it to the left to just kind of think in terms of orders here. So I'll type in one transaction data here. And then here I'm going to type in two final data. So basically Power Query is going to get us to our final state. So what does that mean in terms of saving us work time? Well, let's think about that for a second. Right? So we saw that there were a lot of last name fields that have nulls in them, right? So let's say we get a new set of data and I'm okay with repeating data here. So I'm going to go to the bottom of the list here and I'm going to highlight 991 through 1001. You can just sort of watch this one. You don't need to follow along, but feel free if you want to. I'm going to take that. And if I paste it right down below here in this part here, you'll see that the table grew to consumer, which is why we love Excel tables. And let's say though all of these new people came in without last names. Okay. Except for one, they came in all without last names. And just to make this record a little bit different, I'm going to change it to Roxy. Okay. So we have this new data set. I added it to the bottom. Do I need to open Power Query and run it again? The answer is no, right? This is where it gets really cool. So if I go back to final data here and I'm in here, I've updated my original transactional data and I'm just going to go to the last record here and I go, I'm going to click in my table, go to table design, and I click refresh. You'll see it just added Roxy Rosalina. And remember, it didn't add the rest of them because there were no last names. It filtered those out. So basically you could just have this grow add infinitum, as it were, just to keep growing and growing. Assuming you have data that's not going to take over Excel, and then you could just literally run that query every single time once you got it right. Now, let's say you needed to edit it. Well, what could you do? You could just double click in here and actually do your edits. So that will take us into the next module.

Contents