Skip to main content

Training Data

You will need to provide a trained data file for a language. Trained data files can be found in the Tesseract Tessdata Repository.

Currently this extension supports v4 of the language data which can be found in github in the below release:

https://github.com/tesseract-ocr/tessdata/releases/tag/4.1.0

You need to provide the files for the languages you intend on scanning for. The extension will search for the LANG.traineddata language file in the tessdata directory.

You should make sure that this tessdata directory is contained at the root (top level) of your AIR application.

For example, specifying eng+ita will search for eng.traineddata and ita.traineddata.

.
|___ tessdata
|____ eng.traineddata
|____ ita.traineddata

You specify the languages to use in the OCROptions instance:

var options:OCROptions = new OCROptions();
options.language = "eng+ita";

OCR.service.recognise( bitmapData, options );

By default, language is set to eng.

Android

On Android you can specify a different location for the tessdata directory by changing the OCROptions.dataPath property to point to an alternative location. However we suggest using the same location as iOS to keep your builds inline.