Posted on Feb 7 2012

Editor’s Note: Guest Author Ajeet Pratap Maurya is a Software Engineer specializing in .NET technologies and iOS development.

How many of you encountered captcha while filling any online form? – And how many times you get annoyed filling those captcha.

Luis von Ahn- one of the founder of captcha said on his TED talk that while filling a captcha, people waste their 10 seconds. According to him more than 200 million captcha is entered per day on Internet. So, 10 sec for each 200 million captcha results to approx. 5000 hours. Now-a-days most of the websites are using reCaptcha- which was the initiative taken by Luis von Ahn to utilize those 5000 hours.

The question is while entering a reCaptcha how we are helping to digitize human knowledge. Scanning a Textbook gives us an image of every page with the text on it. The next step in the process is that the computer needs to decipher all of the words in that image using the technology called OCR (Optical Character Recognition).

The problem is that the OCR is not perfect; the computer will not be able to recognize the words from the textbooks which are old (around 30-40 years), faded inks, and pages turned to yellow. So in the reCaptcha, the words you see there are the words that are digitized and not recognize by the computers.

One of the above two words is from the digitized textbook for which the computer don’t know the meaning and other is the word that computer knows what it is. So user fills both the words correctly. This process is repeated of say 10 different users and if all of them entered that word correct, one more correct word is digitized. So every time you book any ticket on, create account or poke any of your friends on Facebook, create a twitter account etc. you are actually digitizing a book.

As most of the websites are using reCaptcha, so the words digitizing per day is very large i.e. 100 millions words per day which is equivalent to 2.5 million books a year and all these things are done by just typing captcha on the internet.

