3.1 - Tidy data and learning to read like a writer

Tuesday, January 28

Tidy datasets are all alike but every messy dataset is messy in its own way.

Hadley Wickham (again), with apologies to Leo Tolstoy

Due this week

  • Sunday, Jan. 26: Filter and pivot table homework on Canvas

In class

Tuesday

  • Understanding (and internalizing) tidy data principles
  • Reading (and viewing) like a writer – deconstructing long-form journalism to get from “do I like it?” to “How could I do it?” We’ll go through one of the stories assigned together.

Thursday

  • Catch-up day and review for Excel exam. Bring your questions, comments and complaints to class.

Practice data we can use in class, and you can use in preparation:

Preparation

(all for Tuesday)

Tidy data

  • Spend some time on the the tidy data chapter of our web site. Try to think about it in terms of the datasets you’ve already seen. Is the NEA data “tidy”? Why?

The idea of “tidy data” is among the most important concepts we’ll cover this semester. Understanding the principles well will help you imagine how a dataset SHOULD look, and what you can do with it if it does. Bring questions and comments about it to class – we’ll be coming back to this regularly.

Reading the data-driven story

From now on, we’ll be reading and viewing or listening to actual stories, usually three or four a week. Try to look at them like a writer, not a casual audience member.

Before examining this week’s pieces, review the tips on the course resources site on both how to look at data for news, and how to look at stories as if you were going to do them yourself.

This week’s stories:

  • APM Reports’ podcast series “In the Dark” Season 2, Episode 7 on the many trials of Curtis Flowers was driven by data reporting by Will Craft, June 15, 2018. (Scroll all the way past the latest developments to get a list of the episodes)

  • Long Island Divided”, by Ann Choi, Bill Dedman, Keith Herbert and Olivia Winslow, Newsday. It’s worth noting that Dedman was the lead author on the groundbreaking “Color of Money” story of 1989, which for the first time documented home mortgage redlining in Atlanta. There is also a video in the presentation on how they did the testing.

  • L.A. is slammed with record costs for legal payouts”, Emily Alpert Reyes and Ben Welsh. You might not understand the Python programs in this document, but it will help show you what an exemplary data diary document might look like. It self-documents the project by providing sources and listing the questions they asked.