Of all the winners announced this week for the 63rd annual George Polk Award, California Watch’s “Decoding Prime” series is the one that catches my eye.
California Watch, a project of the Center for Investigative Reporting, is only in its third year of existence after launching in 2009. The organization is joined by long-established names on the winners list like The New York Times, The Wall Street Journal, The Boston Globe, ABC 20/20, Bloomberg and The Associated Press.
So how does one brand new organization compete with years of legacy? To start, try 51 million patient records — about 28 gigabytes of raw data. That’s how much information was analyzed for the yearlong series of investigative stories that revealed a pattern at a California-based hospital chain of billing Medicare for numerous rare medical conditions for high-paying bonuses. Read more
There’s no dearth of ways for journalists to congratulate and recognize themselves with awards. Whether you’re a small local newspaper or the most-watched national news show, there exists a seemingly endless list of contests and prizes to celebrate everything from the best public service journalism (Pulitzer anyone?) down to the most-specific specialized reporting (Media Orthopaedic Reporting Excellence Awards?). But within that sphere of contest categories, there’s not really been a contest solely focused on data journalism.
Now there is: The Data Journalism Awards, which purports to be “the first international contest recognizing outstanding work in the field of data journalism worldwide.”
Last week, I started a list of six data journalism blogs you should take note of. The post stemmed from a project some journalists are leading to develop a data-driven journalism handbook that covers all aspects of the field. This weekend, thanks to a massive effort by attendees at the Mozilla Festival in London, the project morphed from the bare bones of an idea into something very tangible.
In just two days, 55 contributors, from organizations such as the New York Times, the Guardian and Medill School of Journalism, were able to draft 60 pages, 20,000 words, and six chapters of the handbook. The goal is to have a comprehensive draft completed by the end of the year, said Liliana Bounegru of the European Journalism Centre, which is co-sponsoring production of the handbook. If you’re interested in contributing, email Bounegru at email@example.com. You can see what the group has so far at bit.ly/ddjbook.
Since the handbook is still being tweaked, why not check out these data journalism blogs? Read more
Today is the start of Mozilla Festival, a weekend-long celebration of sorts that brings together web developers, journalists, media educators and students to work on open web projects and learn from one another. #MozFest’s program includes design challenges, learning labs, presentations and more. There will also be plenty of time for people to simply chat with one another and possibly brainstorm the next idea that will transform the web.
One event that stood out to me calls for a group to kickstart the writing of a data-driven journalism handbook. Led by the Open Knowledge Foundation and the European Journalism Centre, the project’s goal is to create a handbook that will “get aspiring data journalists started with everything from finding and requesting data they need, using off the shelf tools for data analysis and visualisation, how to hunt for stories in big databases, how to use data to augment stories, and plenty more.”
Data journalism has quickly become a popular field yet many reporters are still in the dark about it. How do you go about getting the data? What do you do once you have the data? A perfect resource would be the data journalism handbook, but since it hasn’t been written yet, I came up with a list of six blogs that should definitely be added to your bookmarks tab, whether you’re looking for inspiration, basic skills, or advanced knowledge.
The first three are below and the last half will be published on Monday. Read more
Unlike Google’s general web-based data services, Google Refine is a standalone desktop application. Formerly known as Freebase Gridworks, the Google Refine tool has been used by the Chicago Tribune, data.gov.uk, and most famously by ProPublica for their “Dollars for Docs” investigation series from October 2010. Once you download and install the Google Refine tool, you interact with it through your web browser. You can create a new project from scratch, or you can import data sets from files stored on your computer. When your data is imported, that’s where the real power of the tool comes through.
You can use facets and filters to create subsets of data, as well as format strings of data which match your search patterns. For example, if you see the term “as soon as possible” and “ASAP” in the same data set, you can reformat both data strings to match each other. For more complicated queries, you can use the Google Refine Expression Language (GREL) to create regular expressions and isolate substrings of data to separate columns.
Once you’re done with formatting your data, Google Refine lets you export your work in a number of different formats, including as an Excel spreadsheet, an HTML table, or as JSON data, which you can change to match a wiki-style format. Google Refine also lets you hook into open web services, such as Google’s Language Detection Service or the open map service Nominatim.
Google Refine is a free download and is available for Windows, Mac, and Linux.