From the course: Learning Amazon SageMaker (2019)

Unlock the full course today

Join today to access over 24,200 courses taught by industry experts.

Data summary tools

Data summary tools

- [Instructor] In the previous example we walked through a number of simple ways to create visualizations in a Jupyter Notebook but you can also create a number of data summary outputs as well, so these are raw summaries about the pandas DataFrame. And the main two tools that are used is one is the describe function, so again using the churn DataFrame that was imported earlier, you can run the describe function straight from it and that will take all the numerical values and generate a count, create some basic statistics, the mean, standard deviation, min/max and interquartile ranges and you can review these and see how the data has been shaped. Sometimes I find depending on the number of columns that it can be quite difficult to see what's happening from this point of view, so after the describe function, if you run capital T, that will transpose, so you can start scrolling down, so the first check that I like to look at is are the counts all the same? Usually if the counts are…

Contents