Readings: Week 3
Objectives and Competencies for this session:
- Be able to identify different categories of data
- Learn best practices for data entry
- Recognize and avoid common problems with data entry and formatting in spreadsheets
- Learn and be able to implement ‘Tidy’ format for data tables in spreadsheets
- Identify problems with and approaches for proper handling of dates in spreadsheets
- Learn how to export data from spreadsheets in open format
Pre-Class Preparation:
Readings
-
Tesi, W. 2020. An Outdated Version of Excel Led the U.K. to Undercount COVID-19 Cases. Slate. [read online] or [download pdf]
-
Stolberg et al. 2020. CDC Test Counting Error Leaves Epidemiologists ‘Really Baffled’. NY Times. [read online] or [download pdf]
-
Broman, K. W., and Woo, K. H. (2018). Data organization in spreadsheets. The American Statistician, 72(1), 2-10. [read online] or [download pdf] This paper is especially important; it may well be one of the more helpful papers you read while a student. Really.
-
Johnson, B. D., Dunlap, E., and Benoit, E. (2010). Organizing “mountains of words” for data analysis, both qualitative and quantitative. Substance Use & Misuse, 45(5), 648-70. [read online] or [download pdf].
Computer Resources
-
Please bring your computer. If you need to borrow a laptop or get access to a computer let me know.
-
You will need access to a spreadsheet program (e.g., MS Excel, LibreOffice, Google Sheets). You can use the program of your choice, but the exercises are optimized for MS Excel. UF students can download the Microsoft Office suite free of charge here; if you don’t want to install Excel you can use the online version for free (see this tutorial video on how to do so). LibreOffice is a free and open source package similar to (and compatible with) MS Office. It can be downloaded here. If you are using an online version (e.g., Excel from Office 365, Google Sheets) you will need to know how to save the files on your desktop - at least for the first few weeks of class.
-
It is time to install
R
andRStudio
. We won’t use them until Week 4, but it is worth installing them now to make sure they are working smoothly:(i) the R programming language: You can download the version of R for your computer operating system here; it’s free.
- If you need help: watch this video tutorial or contact me. Note that the tutorial requires you be a UF affiliate and either on campus or logged to the UF network via VPN.
(ii) RStudio is the interface program we use to work with R. There are other ‘environments’ for R programming, but RStudio is by far the most widely used and useful.
- We use the ‘Free Open-Source Desktop Version’, which you can download here. Choose the version for your computer operating system and install as you would any other software.
- If you need help: watch this video tutorial or contact me. Note that the tutorial requires you be a UF affiliate and either on campus or logged to the UF network via VPN. You can also
(iii) Verify the installations worked by opening RStudio to see if it opens properly. If you are really motivated, you can also install the
Tidyverse
library by starting RStudio and at the console typinginstall.packages("tidyverse")
. -
If you are interested in getting a jump-start learning how to use R, or would like some refresher training, I recommend the following:
(i) The LinkedIn Learning course by Barton Poulson is an excellent introduction that assumes no prior programming experience. You can find it here.
(ii) If you prefer a “written” course, the R for Reproducible Analysis course from The Carpentries is also very good. You can find it here.