GalleyCat FishbowlNY FishbowlDC UnBeige MediaJobsDaily SocialTimes AllFacebook AllTwitter LostRemote TVNewser TVSpy AgencySpy PRNewser


28 GB Of Raw Data Went Into California Watch’s Award-Winning “Decoding Prime” Series

Of all the winners announced this week for the 63rd annual George Polk Award, California Watch’s “Decoding Prime” series is the one that catches my eye.

California Watch, a project of the Center for Investigative Reporting, is only in its third year of existence after launching in 2009. The organization is joined by long-established names on the winners list like The New York Times, The Wall Street Journal, The Boston Globe, ABC 20/20, Bloomberg and The Associated Press.

So how does one brand new organization compete with years of legacy? To start, try  51 million patient records — about 28 gigabytes of raw data. That’s how much information was analyzed for the yearlong series of investigative stories that revealed a pattern at a California-based hospital chain of billing Medicare for numerous rare medical conditions for high-paying bonuses. Read more

International Data Journalism Awards debut

There’s no dearth of ways for journalists to congratulate and recognize themselves with awards. Whether you’re a small local newspaper or the most-watched national news show, there exists a seemingly endless list of contests and prizes to celebrate everything from the best public service journalism (Pulitzer anyone?) down to the most-specific specialized reporting (Media Orthopaedic Reporting Excellence Awards?). But within that sphere of contest categories, there’s not really been a contest solely focused on data journalism.

Now there is: The Data Journalism Awards, which purports to be “the first international contest recognizing outstanding work in the field of data journalism worldwide.”

Read more

6 Data Journalism Blogs To Bookmark, Part 2

Last week, I started a list of six data journalism blogs you should take note of. The post stemmed from a project some journalists are leading to develop a data-driven journalism handbook that covers all aspects of the field. This weekend, thanks to a massive effort by attendees at the Mozilla Festival in London, the project morphed from the bare bones of an idea into something very tangible.

In just two days, 55 contributors, from organizations such as the New York Times, the Guardian and Medill School of Journalism, were able to draft 60 pages, 20,000 words, and six chapters of the handbook. The goal is to have a comprehensive draft completed by the end of the year, said Liliana Bounegru of the European Journalism Centre, which is co-sponsoring production of the handbook. If you’re interested in contributing, email Bounegru at You can see what the group has so far at

Since the handbook is still being tweaked, why not check out these data journalism blogs? Read more

6 Data Journalism Blogs To Bookmark, Part 1

Today is the start of Mozilla Festival, a weekend-long celebration of sorts that brings together web developers, journalists, media educators and students to work on open web projects and learn from one another. #MozFest’s program includes design challenges, learning labs, presentations and more. There will also be plenty of time for people to simply chat with one another and possibly brainstorm the next idea that will transform the web.

One event that stood out to me calls for a group to kickstart the writing of a data-driven journalism handbook. Led by the Open Knowledge Foundation and the European Journalism Centre, the project’s goal is to create a handbook that will “get aspiring data journalists started with everything from finding and requesting data they need, using off the shelf tools for data analysis and visualisation, how to hunt for stories in big databases, how to use data to augment stories, and plenty more.”

Data journalism has quickly become a popular field yet many reporters are still in the dark about it. How do you go about getting the data? What do you do once you have the data? A perfect resource would be the data journalism handbook, but since it hasn’t been written yet, I came up with a list of six blogs that should definitely be added to your bookmarks tab, whether you’re looking for inspiration, basic skills, or advanced knowledge.

The first three are below and the last half will be published on Monday. Read more

Tool of the Day: Google Refine

Google Refine

When it comes to working with and presenting data, Google reigns supreme. We’ve covered Google’s Chart Wizard, Google’s Public Data Explorer, and even ways to run a news website using Google Docs (with WordPress). Another of Google’s powerful data tools, Google Refine, lets users work with “messy” data sets and transform them into something amazing. Check out Part 1 of the Google Refine screencast.

Unlike Google’s general web-based data services, Google Refine is a standalone desktop application. Formerly known as Freebase Gridworks, the Google Refine tool has been used by the Chicago, and most famously by ProPublica for their “Dollars for Docs” investigation series from October 2010. Once you download and install the Google Refine tool, you interact with it through your web browser. You can create a new project from scratch, or you can import data sets from files stored on your computer. When your data is imported, that’s where the real power of the tool comes through.

Google Refine screenshot

You can use facets and filters to create subsets of data, as well as format strings of data which match your search patterns. For example, if you see the term “as soon as possible” and “ASAP” in the same data set, you can reformat both data strings to match each other. For more complicated queries, you can use the Google Refine Expression Language (GREL) to create regular expressions and isolate substrings of data to separate columns.

Once you’re done with formatting your data, Google Refine lets you export your work in a number of different formats, including as an Excel spreadsheet, an HTML table, or as JSON data, which you can change to match a wiki-style format. Google Refine also lets you hook into open web services, such as Google’s Language Detection Service or the open map service Nominatim.

Google Refine is a free download and is available for Windows, Mac, and Linux.