From the course: Data Wrangling in R

Unlock the full course today

Join today to access over 24,200 courses taught by industry experts.

Identifying and removing outliers

Identifying and removing outliers

From the course: Data Wrangling in R

Identifying and removing outliers

- [Instructor] Now I want to check the water quality dataset for any outliers that might represent errors requiring correction. I'm going to begin by doing a quick and dirty scatter plot of all of the data I'm going to use the GG plot function applied to the filtered water dataset, and set my variable mapping so that we have sample time on the X axis and the result on the Y axis. And then just ask for a quick and dirty scatterplot, Gian point. Now there's one point here that's clearly completely off the mark. There's no way that a temperature or pH could be over 1 million. So let's dig into that and take another look. I'm going to filter this dataset so that I only showcases where the result is over 1 million. And when I look at that, I see that it is a water temperature measurement, where the temperature is measured as over a million degrees Celsius. And unlike some of the earlier data issues I've faced, I don't know…

Contents