Custom OCR Language Packs

How to create custom language packs for use in IronOCR?

Creating a custom language pack requires training a new Tesseract 4 LTSM language file / dictionary from a font.

There are many tutorials available online explaining the steps required to do this. The process is not simple, but it is thankfully quite well-documented.

As a good place to start, we suggest this YouTube tutorial :

Once complete, the output will be a .traineddata file.

The .traineddata file can then be referenced in IronOCR as follows:


using IronOcr; 

var Ocr = new IronTesseract(); 
Ocr.UseCustomTesseractLanguageFile("mydir/custom.traineddata");  //<---your new font
// Multiple fonts can be used.

using (var Input = new OcrInput(@"images\image.png"))
var Result = Ocr.Read(Input);