Tips for Improving OCR Results
Tesseract is a library for performing optical character recognition, but it's important to know that Tesseract performs OCR best when it is given a preprocessed image that is ideally crystal clear black text on a pure white background.
The Tesseract library has a Wiki page on some processes that will improve the recognition results here: https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality
The key points for optimal recognition are:
- Use a decently scale image (> 300dpi)
- Use a black and white image
- Remove as much noise as possible
- Use a straight image (not skewed)
- Remove any borders