Using Custom OCR Language Packs with IronOCR
How to create custom language packs for use in IronOCR?
Creating a custom language pack requires training a new Tesseract 4 LSTM language file/dictionary from a font.
There are many tutorials available online explaining the steps required to do this. The process is not simple, but it is thankfully quite well-documented.
As a good place to start, we suggest this YouTube tutorial from Gabriel Garcia (no affiliation) and their linked GitHub repository.
Once complete, the output will be a .traineddata file.
The .traineddata file can then be referenced in IronOCR as follows:
Documentation: IronOCR Custom Languages
using IronOcr;
class Program
{
static void Main()
{
// Initialize the IronTesseract OCR engine
var Ocr = new IronTesseract();
// Load your custom Tesseract language file (trained .traineddata file)
Ocr.UseCustomTesseractLanguageFile("mydir/custom.traineddata"); //<--- your new font
// Multiple fonts can be used by calling the method multiple times with different files
// Load an image into the OCR Input for processing
using (var Input = new OcrInput(@"images\image.png"))
{
// Perform OCR on the input image
var Result = Ocr.Read(Input);
// Output the recognized text to the console
Console.WriteLine(Result.Text);
}
}
}using IronOcr;
class Program
{
static void Main()
{
// Initialize the IronTesseract OCR engine
var Ocr = new IronTesseract();
// Load your custom Tesseract language file (trained .traineddata file)
Ocr.UseCustomTesseractLanguageFile("mydir/custom.traineddata"); //<--- your new font
// Multiple fonts can be used by calling the method multiple times with different files
// Load an image into the OCR Input for processing
using (var Input = new OcrInput(@"images\image.png"))
{
// Perform OCR on the input image
var Result = Ocr.Read(Input);
// Output the recognized text to the console
Console.WriteLine(Result.Text);
}
}
}





