You May Have Helped Translate Books Without Knowing It
September 29, 2011 at 2:00 am Chad Upton 10 comments
By Chad Upton | Editor
If you’ve created a website account with Facebook, TicketMaster, Twitter, CNN, Craigslist or thousands of other sites, then you have helped translate text from old books and newspapers.
Websites that offer free accounts try to ensure that every account is created by a real human being instead of a computer “bot” — computer programs written to automatically create accounts and then spam those websites with ads. One way to protect against bots is to have people do something that is easy for a human but difficult for a computer: read distorted text.
That’s why you are frequently asked to solve these simple word puzzles. Although it’s a bit annoying, it helps make the website better by reducing spam and other abuse. There are many variants of these puzzles, called captchas, and if the website uses the reCAPTCHA system, your work can also benefit society.
You see, the unclear words in a reCAPTCHA are scanned from old books and newspapers for the google books project. After scanning the books, they run the scans through a computer to translate the images into text, but computer programs aren’t very good at reading blurry words. That’s exactly why this is a perfect way to test if someone is human, because a computer already tried to solve it and failed.
The solution you enter is compared to the solution that other people have entered for the same word. When the program gets a number of consistent answers from multiple people then that word is fixed in the digital copy of the book or newspaper.
This effort might sound futile, but approximately 200 million of these little puzzles are solved each day. Although it only takes you a few seconds, it adds up to around 150,000 hours of work per week. This is called “crowd-sourcing” — a small contribution from a lot of people that collectively make a significant contribution to something grand.
Broken Secrets | Facebook | Twitter | Email | Kindle
Sources: wikipedia (recaptcha), recaptcha.com
Entry filed under: Computers and Internet. Tags: books, captcha, google, newspaper, ny times, recaptcha, translate, volunteer.
1.
Richard | September 29, 2011 at 2:20 am
Hi Chad Upton,
This crowd-sourcing is really a good idea! but i want to know then how that ‘Captcha’ application rejects our wrong input when the ‘unclear word’ was posted for the first time?
2.
Elizabeth | September 29, 2011 at 7:56 am
Haven’t you ever typed the word in and been sure you typed it correctly but had it rejected anyway?
3.
Ash | January 21, 2012 at 12:47 pm
Hell yeah, I hate that.
4.
Chad Upton | September 30, 2011 at 7:39 am
Good question. reCAPTCHA uses two words, one of them is always known. If you get the known word wrong then it will reject your entry.
5.
Alef | September 29, 2011 at 12:21 pm
but how it will make the first entry accepted ever?
6.
ibneaters | September 29, 2011 at 2:28 pm
In my experience, if you get it close, it usually goes through.
7.
Jim | September 30, 2011 at 5:51 pm
Great article. I’ve heard that they were used to translate old text, but it never made sense to me how the computer would know whether or not you were right if it couldn’t tell what it said in the first place.
It never occurred to me that it bases it on everyone’s entries. So clever!
8.
MangoChutney | November 15, 2011 at 6:53 pm
Very interesting! Thanks for sharing! (^__^)v
9.
Fred Jones | February 22, 2012 at 11:57 am
YES, MASTER.