From the course: Machine Learning Foundations: Statistics
The quantiles and box plots - Python Tutorial
From the course: Machine Learning Foundations: Statistics
The quantiles and box plots
- [Instructor] We have explored percentiles on SAT and IQ examples. For example, if your result is in 80th percentile on the test, it means you're better than 80% of other students who took the test. Quartiles are connected with percentiles. We can break every set of numerical data into quartiles, which are just a fancy name for four equal sizes segments that each contain exactly a quarter or 25% of the data. We can represent their connection visually with the following table. Q1 or the end of the first quartile is the 25th percentile. So at Q1, we have 25% of the data points below that point. Q2 or the end of the second quartile is at the 50th percentile or median. 50% of the data points are below that point and 50% of the data points are above that point. Q3 or the end of the third quartile is at the 75th percentile. So in Q3, we have 75% of the data points below that point. In order to represent our data set split into quartiles easily, we use box plots. Box plot or a whisker plot is a chart that is used to visualize how given data is distributed using quartiles. It shows the minimum, maximum, median, first quartile, and the third quartile in the data set. We use box plots to understand the distribution of the data, identify outliers or anomalous data points. Outliers are the data points that differ significantly from the most of other points in the dataset, and to determine if our data is skewed. Now let's see an example of the box plot. With the help of numpy and matplotlib, we can effortlessly create box plots. Let's open our Jupyter Notebook 03_01 and take a look at our box plot. Just press Shift plus Enter around the code. In the box plot, a box is created from the first quartile to the third quartile. A vertical line is also there, which goes through the box at the median. Here x-axis denotes the data to be plotted, while the y-axis shows the frequency distribution. The boxplot function provides different customization possibilities to the box plot. We are using just a few of them such as notch equals True attribute, which will create the notch format to the box plot. patch_artist equals true, which fills the box plot with colors, so that we can see different colors to different boxes. You can find out more about box plot on the matplotlib documentation page.
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.