You May Have Helped Translate Books Without Knowing It

September 29, 2011 at 2:00 am 10 comments

By Chad Upton | Editor

If you’ve created a website account with Facebook, TicketMaster, Twitter, CNN, Craigslist or thousands of other sites, then you have helped translate text from old books and newspapers.

Websites that offer free accounts try to ensure that every account is created by a real human being instead of a computer “bot” — computer programs written to automatically create accounts and then spam those websites with ads. One way to protect against bots is to have people do something that is easy for a human but difficult for a computer: read distorted text.

That’s why you are frequently asked to solve these simple word puzzles. Although it’s a bit annoying, it helps make the website better by reducing spam and other abuse. There are many variants of these puzzles, called captchas, and if the website uses the reCAPTCHA system, your work can also benefit society.

You see, the unclear words in a reCAPTCHA are scanned from old books and newspapers for the google books project. After scanning the books, they run the scans through a computer to translate the images into text, but computer programs aren’t very good at reading blurry words. That’s exactly why this is a perfect way to test if someone is human, because a computer already tried to solve it and failed.

The solution you enter is compared to the solution that other people have entered for the same word. When the program gets a number of consistent answers from multiple people then that word is fixed in the digital copy of the book or newspaper.

This effort might sound futile, but approximately 200 million of these little puzzles are solved each day. Although it only takes you a few seconds, it adds up to around 150,000 hours of work per week. This is called “crowd-sourcing” — a small contribution from a lot of people that collectively make a significant contribution to something grand.

Broken Secrets | Facebook | Twitter | Email | Kindle

Sources: wikipedia (recaptcha), recaptcha.com

Entry filed under: Computers and Internet. Tags: , , , , , , , .

Some Fruit Seeds Contain Cyanide The Plastic End of a Shoelace is Called an Aglet

10 Comments Add your own

  • 1. Richard  |  September 29, 2011 at 2:20 am

    Hi Chad Upton,
    This crowd-sourcing is really a good idea! but i want to know then how that ‘Captcha’ application rejects our wrong input when the ‘unclear word’ was posted for the first time?

    Reply
    • 2. Elizabeth  |  September 29, 2011 at 7:56 am

      Haven’t you ever typed the word in and been sure you typed it correctly but had it rejected anyway?

      Reply
      • 3. Ash  |  January 21, 2012 at 12:47 pm

        Hell yeah, I hate that.

    • 4. Chad Upton  |  September 30, 2011 at 7:39 am

      Good question. reCAPTCHA uses two words, one of them is always known. If you get the known word wrong then it will reject your entry.

      Reply
  • 5. Alef  |  September 29, 2011 at 12:21 pm

    but how it will make the first entry accepted ever?

    Reply
  • 6. ibneaters  |  September 29, 2011 at 2:28 pm

    In my experience, if you get it close, it usually goes through.

    Reply
  • 7. Jim  |  September 30, 2011 at 5:51 pm

    Great article. I’ve heard that they were used to translate old text, but it never made sense to me how the computer would know whether or not you were right if it couldn’t tell what it said in the first place.
    It never occurred to me that it bases it on everyone’s entries. So clever!

    Reply
  • 8. MangoChutney  |  November 15, 2011 at 6:53 pm

    Very interesting! Thanks for sharing! (^__^)v

    Reply
  • 9. Fred Jones  |  February 22, 2012 at 11:57 am

    YES, MASTER.

    Reply
  • 10. Makandriaco  |  July 10, 2013 at 10:36 am

    As Chad Wrote before: All Captchas have two words. One of them is known and it does not necesarily come from this same repository, it has been created for that purpose. That word is the only one used for validation, you need to type that one correctly. But you could type anything for the other word and it will let you through. The system will take what you wrote and compare it to what others write, then, when enaugh exact matches are made the word is considered translated.

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Trackback this post  |  Subscribe to the comments via RSS Feed


Follow Broken Secrets

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 5,662 other followers

Big Awards


Best Personal Blog/Website (People's Voice)


W3 Award - Copy Writing

Read Secrets on Your Kindle

Categories

Play Hashi Link

Contact Info


Follow

Get every new post delivered to your Inbox.

Join 5,662 other followers

%d bloggers like this: