Resources
These readings / videos / podcasts are techniques and examples of data reporting that don’t refer to specific skills or programs.
- Textbooks / handbooks
- About data reporting
- Numbers & visualization
- Selected stories
- Projects from me or my colleagues
- Open source investigations
Textbooks / handbooks
-
Data for Journalists, by Brant Houston. This is the fifth edition of the class “Computer Assisted Reporting” book first published more than 20 years ago. An adaptation of some of the material is in
-
“Data Literacy: A User’s Guide”, by David Herzog, 2015
-
“The data journalism handbook - an international guidebook to all forms of data journalism.
-
“The Art of Access: Strategies for Acquiring Public Records”, by David Cullier and Charles N. Davis, 2019.
-
“Numbers in the Newsroom”, Sarah Cohen, 2nd edition. (But hey – arithmetic doesn’t change. It doesn’t matter which edition you get.) I don’t think they print it anymore. A very old shorter handout is called “Danger! Numbers in the Newsroom”
-
Samanth Sunne’s “Diving into Data Journalism” from the American Press Institute.
-
Data Journalism Heist, by Paul Bradshaw. This book is geared toward journalists in the U.K., but the way Paul thinks about getting in and out of data quickly is useful.
-
There are so many good visualization textbooks and handbooks that it’s hard to choose just a few. Alberto Cairo’s “The Functional Art” is a good beginning point to understand the basics of visual encoding and also the use of visual exploration as well as communication. The first three chapters are available for free on his website.
About data reporting
-
Mike Berens on the heart of data journalism, written when Bill Clinton was president.
-
60 data-driven ideas in 60 minutes, from Mary Jo Webster and Jodi Upton, NICAR 14.
-
Serious Fun with Numbers, Dan Gilbert, Columbia Journalism Review. The story that won the Pulitzer is archived on the prize site: “The Money Prison”.
-
“The Serial-Killer Detector”, The New Yorker, Nov. 27, 2017 about Tom Hargrove’s career as a data reporter and how it led to the Murder Accountability Project
-
“A Data State of Mind”, Mary Jo Webster, Data-Driven Journalism, Sept. 2016
-
“Demystifying Data Journalism”, with Susan McGregor and Sarah Cohen at a Mashable conference. I think it’s 2012.
-
Interview with data editor Janet Roberts on the Reuters approach to data reporting, 2018, from the Data Journalism Awards
-
“How to Plan, Pitch and Do a Data-Driven Investigation”, Miguel Paz and Ryan McNeil, presentation from a CUNY course in 2016 (?). Many links are broken or private, but the ideas are still useful.
-
Bonus: Anna Lytical’s introduction to coding YouTube series (currently only 2 episodes). Anna Lytical is a drag queen persona of Google software engineer Billy Jacobson
Numbers & visualization
-
Addressing Journalistic innumeracy, John Whibey, Journalist’s Resource website.
-
This site’s Numbers in the newsroom page: overcoming your fear of math. (Video of similar lecture). It has some exercises that you can do at the bottom of the page.
-
Chapter from the original Data Journalism Handbook on finding insights through visualization, from Gregor Aisch, who is a longtime genius graphics reporter at the New York TImes.
-
Financial Times’ “visual vocabulary”, as a pdf document or online at Github repo
-
Slides from Peter Aldhous’s site on visualizing data for science investigations.
Selected stories
One way to keep up with stories is to subscribe to the Local Matters weekly newsletter, which curates stories done by more than 100 local newspapers, selected by reviewing their front pages on the Newseum site. Although it’s only newspapers, it gives you a good feel for the range of stories being done around the U.S.
This is a fairly random set of stories that we often use in class to discuss how empirical journalism works. Sometimes we just read about a story – not the whole thing itself. There are probably thousands of stories that could be on this list but these are very well known or have some aspect that makes them good for class reading.
-
The Bell, Calif., small-town corruption: “Is a City manager Worth $800,000?” and “Bell’s Money Flowed Uphill”
-
“Cops among Florida’s worst Speeders”, by Sally Kestin and John Maines in the Ft. Lauderdale Sun-Sentinel, 2012. Dan Nguyen deconstructed the story for Stanford University and compiled a smallish dataset for practice.
-
“Medicare Unmasked” from the Wall Street Journal (2014), in which reporters mined doctor billing records to find those who were fleecing the system. This link is to the Pulitzer Prize site because the Journal’s own site has kind of buried the big stories. Whether you use SQL or not, Dan Ngyuen’s tutorial using the data is a useful way to reconstruct a story like this.
-
Elliot Jaspin’s story of finding racial cleansing in Census data. (very old!)
-
Inside the Hidden World of Thefts, Joe Stephens and Mary Pat Flaherty, Washington Post, 2013. One of the first set of reporters to take advantage of new questions on the IRS 990 form available now in electronic form.
-
Boston Globe 2018: “For some State Police, it’s a posting in Paradise,” by Kay Lazar and Todd Wallack. Be sure to read Todd’s tweetstorm on what it took to get this data, and how he got past bureaucratic and public records battles.
-
Five-Thirty-Eight: Russian troll analysis on Twitter. Note the references to the original data, how it was collected, and the name of the project at Clemson – even if you don’t care about Russian trolls, it’s worth noting these kinds of resources for the future.
-
ProPublica Illinois: “How Chicago Ticket Debt Sends Black Motorists Into Bankruptcy”, using several databases to show how Chicago’s aggressive ticketing is driving people into bankruptcy, by Melissa Sanchez and Sandhya Kambhampati, February 2018. The writing is a little rough, but the data work is solid and, most importantly, the stories were well reporting and identified.
-
“L.A. is slammed with record costs for legal payouts”, Emily Alpert Reyes and Ben Welsh. Note the link at the bottom to github repo showing the data analysis.
-
Todd Wallack from the Boston Globe on using a little programming to get to a story on liquor licensing. Be sure to read the story to understand how the data fits in with the story. (2018)
Just wanted to say a few words about how data journalism can help reporters flesh out a story.
— Todd Wallack (@TWallack) November 16, 2017
Backgrounding / public records
-
Seattle Times’ award-winning coverage using backgrounding tools on deadline (2013?)
-
WFAA’s coverage of a fertilizer plant explosion in 2013. You have to use your IRE login to get this, since WFAA doesn’t have archives.
-
“I’ve sent out 1,018 Open Records Requests, and This is What I’ve Learned”, ProPublica Illinois, Sandhya Kambhampati, 2018
Projects from me or my colleagues
I’ll add projects that me or my colleagues have worked on not because they’re great, but because I understand the work that wento them. Sadly, The Washington Post’s site didn’t do a good job archiving some of them, so you have a pdf of the newspaper here.
-
A Death in St. Augustine from Frontline / NYT as an example of weaving data into a narrative. On the Times’ site, it was published as Two Gunshots on a Summer Night, with a secondary story that is a little more data-wonky. Note the ending comparison – those came from our various datasets. (2013)
-
“Police Chiefs, Looking to Diversify Forces, Face Structural Hurdles”, Matt Apuzzo and Sarah Cohen, The New York Times, Nov. 7, 2015. This is a routine story, not a project, which makes it a little easier to understand. The cleaned up data for this project is a good practice dataset, and you can see how I found Chief Riley. (2015)
Open source investigations
At some point, I’ll probably create a section for open source investigations. OSI is a term borrowed from intelligence agencies, which refer to investigations that are based on publicly available data, either for free or purchased. Many of these stories piece together visual clues to find answers. Some, especially in the human rights arenas, mix the archival data, images and social media with interviews of participants. It’s much more common internationally than in the US.
An example of an open source investigation:
This is a Twitter thread on how BBC Africa uncovered the place, time, and people involved in a horrifying murder in Camaroon. BBC Africa Eye published, “Anatomy of a Killer” based on the research.
THREAD
— BBC News Africa (@BBCAfrica) September 24, 2018
In July 2018, a horrifying video began to circulate on social media.
2 women & 2 young children are led away by a group of soldiers. They are blindfolded, forced to the ground, and shot 22 times. #BBCAfricaEye investigated this atrocity. This is what we found... pic.twitter.com/oFEYnTLT6z
OSI resources
-
“How to conduct an open-source investigation according to the founder of Bellingcat”, by Ned Bauman, New Yorker, August 30, 2018
-
Finding people of interest on Facebook, by the anonymouse “technisette” on We are OSINTCurio.us. You can also use Henk Van Ess’s form to create a link.
-
Tell, Explain, Describe … Asking the right questions to get the right answers”, from @nixintel
-
Follow @quiztime to see OSI challenges for geolocation puzzles each week. Here’s an example solution, again from Steve aka @nixintel.