From the course: Data Wrangling in R
Unlock the full course today
Join today to access over 24,200 courses taught by industry experts.
Identifying and removing outliers
From the course: Data Wrangling in R
Identifying and removing outliers
- [Instructor] Now I want to check the water quality dataset for any outliers that might represent errors requiring correction. I'm going to begin by doing a quick and dirty scatter plot of all of the data I'm going to use the GG plot function applied to the filtered water dataset, and set my variable mapping so that we have sample time on the X axis and the result on the Y axis. And then just ask for a quick and dirty scatterplot, Gian point. Now there's one point here that's clearly completely off the mark. There's no way that a temperature or pH could be over 1 million. So let's dig into that and take another look. I'm going to filter this dataset so that I only showcases where the result is over 1 million. And when I look at that, I see that it is a water temperature measurement, where the temperature is measured as over a million degrees Celsius. And unlike some of the earlier data issues I've faced, I don't know…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
-
-
-
-
-
(Locked)
Understanding the water quality dataset1m 32s
-
(Locked)
Reading in the water quality dataset1m 35s
-
(Locked)
Filtering the water quality dataset5m 17s
-
(Locked)
Water quality data types3m 2s
-
(Locked)
Correcting data entry errors2m 43s
-
(Locked)
Identifying and removing outliers3m 42s
-
(Locked)
Converting temperature from Fahrenheit to Celsius2m 20s
-
(Locked)
Widening the water quality dataset4m 33s
-
(Locked)
-
-