In-class Activities: Week 4
Topic 1: Intro to Reproducibility: (10 Min)
-
Why should we practice ‘Reproducible research’?:
Breakout: (15 min)
- Program my lunch
Returning results: (30 min)
Break (10 min)
Coding Corrections: Intro (20 min)
The code below is for . You may use any scripted language you wish, but I may not be able to provide feedback.
2. Download the data we will be using in class
- Open the messy data file
demo_data.csv
by following this link
- save the file to the
data_raw
folder.
3. Data Cleaning: Overview
-
Characteristics of clean data set include:
- Free of duplicate rows/values
- Error-free (correct misspellings, eliminate special characters)
- correct data type for analysis
- outliers identified and dealt in the correct way
- “tidy" data structure
-
Take your data from
messy
toclean
in 5 steps- Familiarize yourself with the data set
- Check for structural errors
- Check for data irregularities
- Decide how to deal with missing values
4. Data Cleaning: Practice
1. Review the Data Set
-
Review the
.csv
file -
What things do you see that need to be corrected?
-
Make a list of the what you think needs to be corrected and the steps necessary to identify and implement each correction. Some of the things to look out for include:
- Numeric values stored as character data types
- Factors stred as characters
- Duplicate rows
- Spelling mistakes
- inconsistent formatting (eg., codes, capitalizations)
- White spaces
- Missing data
- Zeros instead of null values
- Special characters (e.g. commas in numeric values instead of decimals)
- column headings with spaces between words or that start with numerals
-
It is often useful to make an outline of the different steps. Note that there might be differetn ways to do the same thing, so an outline will help figure out which is best. For instance:
Option 1
1. Import table 1
2. Correct column headings in Table 1
3. Import table 2
4. Correct column headings in Table 2
5. Bind Table 1 and Table 2 Together
is less efficient than
Option 2
1. Import table 1
2. Import table 2
3. Bind Table 1 and Table 2 Together
4. Correct the column headings in the Table
2. Import & Edit Data
- Create a new
.r
file and save it ascleanup_code
in yourcode
folder - Annotate your code and key info: session info(), name and what for, etc. Add the “Steps” to the .R as Sections with shift-cmd-r (Mac) or shift-alt-r (PC).
- Load
tidyverse
library, import the data, and do some manipulations.
3. Code
- The R code we used in class can be downloaded here: Demo R Code You can copy and paste it into your own R script.
4. Assignment
- There is no assignment to submnit this week, but I expect you to run the code and be able to answer the questions at the end.