From the course: Certified Analytics Professional (CAP) Cert Prep

Unlock the full course today

Join today to access over 24,200 courses taught by industry experts.

Cleaning, transforming, and validating data

Cleaning, transforming, and validating data

- [Instructor] Cleaning, transforming, and validating data is necessary, mainly because you cannot always dictate how data is collected and stored. Sometimes a dataset already exists in the form of a preexisting database. That data might have been collected for other purposes. It's also possible that you don't have full control over inputs coming into a data capturing mechanism. Take open-ended survey questions. Survey takers can provide whatever responses they want despite the desires of its creators. There are a couple of important concepts to understand in data cleaning, transformation, and validation. Technically correct data is the first one. It means that each data value is correctly stored under its intended variable. My last name, Ru, sometimes ends up in the first name column of a mailing list, and I get an email calling me incorrectly, which is an example of technically incorrect data. The second concept is…

Contents