Google has acquired a Carnegie Mellon University spin-off that seeks to cut down on spam and fraud at Web sites while digitizing books.
ReCAPTCHA offers simple word puzzles that users must solve when registering at a Web site or completing an online purchase. Computers can’t decipher the twisted letters and numbers, ensuring that real people and not automated programs are at the keyboard.
Unlike other word puzzles, however, ReCAPTCHA’s text comes from actual books, letting the system create a digitized version in the process.
Terms of the deal, announced Wednesday, were not disclosed. Google said the ReCAPTCHA tool will continue to be available for use on any Web site.
Google Inc. is already behind a major project to digitize books and put them online, mostly by scanning pages and using optical character recognition, or OCR, to make the texts searchable. OCR doesn’t always work on text that is older, faded or distorted. In such cases, often the only way to digitize the works is to manually type them in.
ReCAPTCHA provides an alternative. Snippets that the computer doesn’t recognize are split up into single words that can be used as human tests at sites all over the Internet. The ReCAPTCHA system reassembles the text of the book from those responses.
“Google is the best fit for reCAPTCHA,” said Carnegie Mellon computer science professor Luis von Ahn, who developed the tool and launched the ReCAPTCHA company in 2008. “From the very start, people often assumed the project was connected to Google, so it only makes sense that reCAPTCHA Inc. ultimately would find a home within Google.”
Google, which opened an office on the university’s campus in 2006, is also involved in a project led by von Ahn that enlists Web users to play Internet-based games that help computers get smarter.
One of those games, ESP, has been licensed by Google as Google Image Labeler. In the online game, players are shown a picture and try to guess what words the other player will use to describe the image. The game helps improve image searches on the Internet by creating descriptions of uncaptioned images.
Von Ahn will remain with Carnegie Mellon while working at Google.