AppNewser Appdata FishbowlNY FishbowlLA FishbowlDC more TVNewser TVSpy UnBeige AgencySpy PRNewser 10,000 Words MediaJobsDaily SocialTimes AllFacebook AllTwitter semanticweb.com

Orphan Works & Optical Character Recognition Software

adxgetmedia.pngWhile researching an essay about New York City poets and the Great Depression last year, this GalleyCat editor read through hundreds of pages from 1930s novels, periodicals, and self-published materials that couldn’t leave the New York Public Library.

Optical Character Recognition (OCR) software can help authors and researchers digging through a stack of orphan works. These specialized tools convert scanned, photographed, or written text into digital text. We test drove ABBYY FineReader Express for this article–the software voted “Best Text Recognition Tool” by Lifehacker readers.

The OCR company has been around for 20 years, and the program now recognizes 171 different languages. Embedded below, you can see screen shots of the text capture process–watching a 75-year-old self published poetry journal page enter the digital age.


In a telephone interview, senior product marketing manager Wendy Wang shared tips for writers hoping to utilize this technology while doing library research. She explained: “We have developed digital camera OCR–you can even use your cellphone camera. You can capture a certain book page image and go back to your office to develop the image.”

She had these tips for taking better photos of text: “Make sure have 5-megapixel camera. Lighting is another issue, libraries are kind of dark. For a better OCR result good lighting. Focus is also important. When you shake your camera, it will decrease the image resolution.”

Here is a photocopied page from the self-published Raven Poetry Circle Anthology, published in January 1934.

ravenpoetryanthologyoriginal.jpg

Here is a screen shot from inside the ABBYY FineReader Express program for the Mac–the text recognition software has recognized the portions outlined in green.

ravenabbyy23.jpg

Here is the Microsoft Word document version of the scanned text–a search-able and sharable digital copy of a 75-year-old orphaned text.

ravenfinal.jpg

MEDIABISTRO EVENTS

Use Social Media to Market Your Business

Launch a social media campaign that will build your brand and deliver results in our online Social Media Marketing Boot Camp starting June 7. Speakers include Abigail Cusick (Bravo Digital), Gregory Galant (Sawhorse Media), Alex Leo (Thomson Reuters Digital), Jim Tobin (Ignite Social Media), and many more. Read the reviews.