Link Search Menu Expand Document

Fundamentals of data reporting

Columbia University, March 2015

Schedule:

10:00 Introductions and expectations for the day
10:45 Data journalism in the wild
11:15 Excel review: best practices and tidy data
11:45 First exercise: an internal affairs report
12:30 Lunch and review of morning. Optional discussion of acquiring data using open records laws.
1:30 More in Excel: Pivot tables, filtering and how to extend to other programs
2:30 Tableau Public: Sneaking into SQL with visualization
3:30 Getting data: The basics of web scraping and APIs

Stories in the wild

  • Elliot Jaspin’s story of finding racial cleansing in Census data.
  • Money as a Weapon (started with database), The Washington Post, Dana Hedgpeth and Sarah Cohen
  • Jo Becker and Ron Nixon’s Iran story (good example of “what’s supposed to happen?”), The New York Times
  • Medicare, The Wall Street Journal, (very, very large fish)
  • 60 data-driven ideas in 60 minutes from Mary Jo Webster and Jodi Upton, NICAR 14.
  • A Death in St. Augustine, Frontline, as an example of weaving in data when you are working on a narrative. We’ll look at a piece of this story later on.
  • The presentation from our class
  • Mike Berens’ piece on the heart of data journalism, written when Bill Clinton was president.

Excel: Review and best practices

Handouts and reference

  • Cheat sheet on keyboard shortcuts for Mac and Windows Excel.
  • Video series on best practices (for refresher later on)

In class:

  • We’ll look through this reconstruction of the career of Dr. Jack Kevorkian to see how to build a chronology for what we need.
  • Create the outline of a spreadsheet on Bubba Harris based on the internal affairs report (not linked). You’ll decide which columns you need and how you’ll want to use it in a story. Try filling in about 5-10 rows.
  • After class, you can see the spreadsheet I created from it, and the paragraphs in a story that were based on it (toward the end). If you’re interested in other data work for that story, hthe companion piece is []”Departments Slow To Police Their Own Abusers”](http://www.nytimes.com/projects/2013/police-domestic-abuse/)

Excel Part 2: Filtering and pivot tables

Handouts and reference

In class

  • Working with Major League Baseball salaries

If there is time: Open refine to clean data

A start-to-finish example of using OpenRefine to create a tidy dataset, which we’ll use in the next step, using long-term managed care populations from the NYS Department of Health. Original spreadsheets downloaded from this state site and combined to create a single spreadsheet with all reports on one page. We’ll start from here, with this guide for OpenRefine.

Database concepts and viz: Tableau Public

  • Training materials for IRE conferences. (Note: If you join IRE for $70, you can request a copy of desktop Tableau for free - normally about $2,000 / year.)
  • Example dat

Going further