Linux.com

How to scan and OCR like a pro with open source tools

Posted by: Anonymous [ip: 71.234.246.22] on June 25, 2008 04:01 AM
Tesseract is the OCR package to use. I've had good success using it for a health-information application. It does a really good and accurate job on numbers (like dates and record numbers) even with a variety of fonts.

You may find it helpful to put an alarm around your tesseract runs. I've found some particularly nasty pages on which it hangs.

#

Return to How to scan and OCR like a pro with open source tools