Using OCR (Optical Charactor Recognition) to Read Very Simple Captcha Text


Nowadays most blogs and forum software integrates some form of captcha to help them from automated OCR bots flodding their blog comments or forum posts with spam. These automated bots (which I will not talk about) learns by studying differant captcha overtime. Here is an example of a OCR bot trying to learn from the captcha through bruteforce. The forum software logs “access denied” attempts:

untitled

In linux there is an OCR program called gocr, which can be taught to learn from captcha text (or any text in images format). Here is an example:

2007-12-11-031029_1280x800_scrot

 

gocr has many options including ignoring moise from the image and using default database or adding your own to learn from.

Check out the gocr man file for many option: http://www.penguin-soft.com/penguin/man/1/gocr.html



Subscribe without commenting


Leave a Reply

Note: Any comments are permitted only because the site owner is letting you post, and any comments will be removed for any reason at the absolute discretion of the site owner.