Slog

News & Arts

The Stranger Suggests

Critics' Best Bets
Music Arts & Food


Line Out

Music & the City
at Night

Wednesday, September 16, 2009

Now You Too Can Do Archiving Work For Google

Posted by on Wed, Sep 16, 2009 at 3:14 PM

2269029028_acb1c52622.jpg
Google has bought reCAPTCHA, which is a company that makes use of CAPTCHA technology in a useful, productive way.

[T]he words in many of the CAPTCHAs provided by reCAPTCHA come from scanned archival newspapers and old books. Computers find it hard to recognize these words because the ink and paper have degraded over time, but by typing them in as a CAPTCHA, crowds teach computers to read the scanned text.

In this way, reCAPTCHA’s unique technology improves the process that converts scanned images into plain text, known as Optical Character Recognition (OCR). This technology also powers large scale text scanning projects like Google Books and Google News Archive Search. Having the text version of documents is important because plain text can be searched, easily rendered on mobile devices and displayed to visually impaired users.

I love it when an annoying but necessary ritual can be transformed into something useful like this.

(Image from this blog post about the Top 10 Worst CAPTCHAs.)

 

Comments (11) RSS

Oldest First Unregistered On Registered On Add a comment
julie russell 1
I HATE those Effing boxes...I have a neurological disorder that makes me show up as a robot on like EVERY site...Seriously, my husband has to type the code for me.
Posted by julie russell http:// on September 16, 2009 at 3:43 PM
2
but by typing them in as a CAPTCHA, crowds teach computers to read the scanned text.

So they're training machines to get past the mechanism designed to prevent machines from being used to abuse Internet services? Be afraid.
Posted by uh oh. on September 16, 2009 at 3:45 PM
w7ngman 3
#2 it's just a stupid sentence. Crowds aren't "teaching computers to read the scanned text", they are telling the computer what it says.

What I don't get is how a reCAPTCHA is verified when it's first put into the system.
Posted by w7ngman http://userscripts.org/users/89370 on September 16, 2009 at 3:51 PM
Grrr 4
I'd like to see a CAPTCHA for death metal band logos.
Posted by Grrr on September 16, 2009 at 4:01 PM
5
@3 -"Crowds aren't "teaching computers to read the scanned text", they are telling the computer what it says."

And they're using it to make OCR systems much, much more sophisticated - and if given enough training examples, they will then be able to read CAPTCHAs. That's how machine learning works.

And as for your question on how they're verified? There's an easy way: get multiple users to enter it, and use majority rule.
Posted by uh oh again. on September 16, 2009 at 4:10 PM
6
@5 This is begging for a Colbert Nation gaming of the system.
Posted by pragmatic on September 16, 2009 at 4:12 PM
w7ngman 7
#5, "get multiple users to enter it, and use majority rule."

I know that's how it works, once multiple users enter it. What how are their entries verified? It's still a captcha.
Posted by w7ngman http://userscripts.org/users/89370 on September 16, 2009 at 4:19 PM
w7ngman 8
I'm guessing some human enters every word, then they just use the captcha part to check their work.
Posted by w7ngman http://userscripts.org/users/89370 on September 16, 2009 at 4:37 PM
josh 9
usually half of the captcha is known, the other half is being translated. so I'm *guessing* you just have to get the known part right to be proved nonrobotic.
Posted by josh http://www.sciencevsromance.net on September 16, 2009 at 4:42 PM
Julie in Eugene 10
@9 is right. This was on Nova a while back (an interesting profile of the guy who invented CAPTCHA). They pull two words, one a known word, one an unknown word. The known word is for proving yourself to be a human, and the system just assumes if you got the known word right, you probably also got the unknown word right.
Posted by Julie in Eugene on September 16, 2009 at 10:23 PM
11
TRANSLATING work, not ARCHIVING work, you fucking douchebag. can't you even afford a free dictionary, paul?
Posted by mmbb_c on September 16, 2009 at 10:55 PM

Add a comment

Advertisement
 

All contents © Index Newspapers, LLC
1535 11th Ave (Third Floor), Seattle, WA 98122
Contact Info | Privacy Policy | Terms of Use | Takedown Policy