Fundamentals of data reporting
Columbia University, March 2015
Schedule:
10:00 | Introductions and expectations for the day |
10:45 | Data journalism in the wild |
11:15 | Excel review: best practices and tidy data |
11:45 | First exercise: an internal affairs report |
12:30 | Lunch and review of morning. Optional discussion of acquiring data using open records laws. |
1:30 | More in Excel: Pivot tables, filtering and how to extend to other programs |
2:30 | Tableau Public: Sneaking into SQL with visualization |
3:30 | Getting data: The basics of web scraping and APIs |
Stories in the wild
- Elliot Jaspin’s story of finding racial cleansing in Census data.
- Money as a Weapon (started with database), The Washington Post, Dana Hedgpeth and Sarah Cohen
- Jo Becker and Ron Nixon’s Iran story (good example of “what’s supposed to happen?”), The New York Times
- Medicare, The Wall Street Journal, (very, very large fish)
- 60 data-driven ideas in 60 minutes from Mary Jo Webster and Jodi Upton, NICAR 14.
- A Death in St. Augustine, Frontline, as an example of weaving in data when you are working on a narrative. We’ll look at a piece of this story later on.
- The presentation from our class
- Mike Berens’ piece on the heart of data journalism, written when Bill Clinton was president.
Excel: Review and best practices
Handouts and reference
- Cheat sheet on keyboard shortcuts for Mac and Windows Excel.
- Video series on best practices (for refresher later on)
In class:
- We’ll look through this reconstruction of the career of Dr. Jack Kevorkian to see how to build a chronology for what we need.
- Create the outline of a spreadsheet on Bubba Harris based on the internal affairs report (not linked). You’ll decide which columns you need and how you’ll want to use it in a story. Try filling in about 5-10 rows.
- After class, you can see the spreadsheet I created from it, and the paragraphs in a story that were based on it (toward the end). If you’re interested in other data work for that story, hthe companion piece is []”Departments Slow To Police Their Own Abusers”](http://www.nytimes.com/projects/2013/police-domestic-abuse/)
Excel Part 2: Filtering and pivot tables
Handouts and reference
- Videos on filtering and pivot tables
- Handout, courtesy of IRE.
In class
- Working with Major League Baseball salaries
If there is time: Open refine to clean data
A start-to-finish example of using OpenRefine to create a tidy dataset, which we’ll use in the next step, using long-term managed care populations from the NYS Department of Health. Original spreadsheets downloaded from this state site and combined to create a single spreadsheet with all reports on one page. We’ll start from here, with this guide for OpenRefine.
Database concepts and viz: Tableau Public
- Training materials for IRE conferences. (Note: If you join IRE for $70, you can request a copy of desktop Tableau for free - normally about $2,000 / year.)
- Example dat
Going further
- Public records resources from previous class