How to use Custom Language with Tesseract

This article was translated from English: Does it need improvement?
Translated
View the article in English

When it comes to optical character recognition (OCR), you sometimes need to deal with custom languages, specialized scripts, or ciphers. To read an input image containing a custom language, the Tesseract engine must be provided with training data for that specific language. This data is stored in a special .traineddata file.

While the complex process of creating (training) this file is done using Tesseract's own tools, IronOCR fully supports using these custom language files. This lets you apply your trained model to decipher and read text from any input. In this how-to guide, we'll showcase how to load and use a custom .traineddata file with IronOCR.

Get started with IronOCR

立即開始在您的項目中使用 IronOCR 並免費試用。

第一步:
green arrow pointer


Custom Language with Tesseract

To use a custom language with Tesseract, we must first load our .traineddata file by calling the UseCustomTesseractLanguageFile method. This is an essential step, as this file contains all the training data that allows Tesseract to recognize the custom language's unique characters.

Afterward, we load our input document just as we would for a regular OCR operation. In this instance, we are loading a PDF containing custom language paragraphs using LoadPdf.

Finally, we use the Read method to extract the text from the input. The result can then be printed to the console or, as the example shows, saved (piped) to a text file for reference.

Input

We'll use this sample PDF, which contains text in our custom language, as the input.

We'll be using this custom language .traindata for our example.

Code Example

:path=/static-assets/ocr/content-code-examples/how-to/ocr-custom-language.cs
using IronOcr;
using System;
using System.IO;

var ocrTesseract = new IronTesseract();

// Load the traineddata file for the custom language
ocrTesseract.UseCustomTesseractLanguageFile("AMGDT.traineddata");

using var ocrInput = new OcrInput();
// Load the PDF containing text in the custom language
ocrInput.LoadPdf("custom.pdf");

var ocrResult = ocrTesseract.Read(ocrInput);

// Print text to the console
Console.WriteLine("--- OCR Result ---");
Console.WriteLine(ocrResult.Text);
Console.WriteLine("------------------");

// Pipe text to a .txt file
string outputFilePath = "ocr_output.txt";
File.WriteAllText(outputFilePath, ocrResult.Text);

Console.WriteLine($"\nSuccessfully saved text to {outputFilePath}");
IRON VB CONVERTER ERROR developers@ironsoftware.com
$vbLabelText   $csharpLabel

Output

OCR Output text

This output shows the result from our custom language model. As you can see, by providing the correct training data, IronOCR successfully deciphered the text, and the result is in plain English. Additionally, this is the txt output generated by the code.

常見問題解答

在 IronOCR 中使用 Tesseract 的自訂語言的目的是什麼?

在 IronOCR 中使用 Tesseract 的自訂語言功能,可以識別並提取圖像或 PDF 文件中包含的特殊腳本或預設不支援的語言的文字。這可以透過載入包含該語言所需訓練資料的自訂 `.traineddata` 檔案來實現。

如何在 IronOCR 中載入自訂語言訓練資料檔案?

您可以使用 `UseCustomTesseractLanguageFile` 方法在 IronOCR 中載入自訂語言訓練資料檔案。這一步至關重要,因為它為 Tesseract 引擎提供了識別自訂語言獨特字元所需的訓練資料。

使用 IronOCR 對包含自訂語言的影像進行 OCR 識別,具體步驟是什麼?

要使用 IronOCR 對自訂語言的圖像執行 OCR,首先下載 C# 庫,初始化 OCR 引擎,使用 `UseCustomTesseractLanguageFile` 加載自定義語言訓練數據,使用 `LoadImage` 加載輸入圖像,最後使用 `Read` 方法提取文字。

IronOCR 能否處理包含自訂語言文字的 PDF 檔案?

是的,IronOCR 可以處理包含自訂語言文字的 PDF 檔案。您可以使用 `LoadPdf` 方法載入 PDF 文件,然後使用 `Read` 方法根據提供的自訂語言訓練資料提取文字。

在 Tesseract 和 IronOCR 的上下文中,`.traineddata` 檔案是什麼?

`.traineddata` 文件是 Tesseract OCR 使用的資料文件,其中包含特定語言的訓練資料。它使 OCR 引擎能夠識別和處理該語言的字符,並且可以在 IronOCR 中用於處理自訂語言。

我是否需要為 IronOCR 中的每種自訂語言建立自己的 `.traineddata` 檔案?

不,您無需為每種自訂語言都建立自己的 `.traineddata` 檔案。您可以直接使用現有的 `.traineddata` 檔案(如果有的話)。但是,如果某種語言不受支持,您可能需要使用 Tesseract 的工具建立一個。

使用自訂語言時,IronOCR 支援哪些輸出格式?

IronOCR 在使用自訂語言時支援多種輸出格式,例如純文字輸出,可以列印到控制台或儲存到文字檔案。提取的文本可以根據需要進行進一步處理。

A PHP Error was encountered

Severity: Warning

Message: Illegal string offset 'name'

Filename: sections/author_component.php

Line Number: 18

Backtrace:

File: /var/www/ironpdf.com/application/views/main/sections/author_component.php
Line: 18
Function: _error_handler

File: /var/www/ironpdf.com/application/libraries/Render.php
Line: 63
Function: view

File: /var/www/ironpdf.com/application/views/products/sections/three_column_docs_page_structure.php
Line: 64
Function: main_view

File: /var/www/ironpdf.com/application/libraries/Render.php
Line: 88
Function: view

File: /var/www/ironpdf.com/application/views/products/how-to/index.php
Line: 2
Function: view

File: /var/www/ironpdf.com/application/libraries/Render.php
Line: 88
Function: view

File: /var/www/ironpdf.com/application/libraries/Render.php
Line: 552
Function: view

File: /var/www/ironpdf.com/application/controllers/Products/Howto.php
Line: 31
Function: render_products_view

File: /var/www/ironpdf.com/index.php
Line: 292
Function: require_once

A PHP Error was encountered

Severity: Warning

Message: Illegal string offset 'title'

Filename: sections/author_component.php

Line Number: 38

Backtrace:

File: /var/www/ironpdf.com/application/views/main/sections/author_component.php
Line: 38
Function: _error_handler

File: /var/www/ironpdf.com/application/libraries/Render.php
Line: 63
Function: view

File: /var/www/ironpdf.com/application/views/products/sections/three_column_docs_page_structure.php
Line: 64
Function: main_view

File: /var/www/ironpdf.com/application/libraries/Render.php
Line: 88
Function: view

File: /var/www/ironpdf.com/application/views/products/how-to/index.php
Line: 2
Function: view

File: /var/www/ironpdf.com/application/libraries/Render.php
Line: 88
Function: view

File: /var/www/ironpdf.com/application/libraries/Render.php
Line: 552
Function: view

File: /var/www/ironpdf.com/application/controllers/Products/Howto.php
Line: 31
Function: render_products_view

File: /var/www/ironpdf.com/index.php
Line: 292
Function: require_once

A PHP Error was encountered

Severity: Warning

Message: Illegal string offset 'comment'

Filename: sections/author_component.php

Line Number: 48

Backtrace:

File: /var/www/ironpdf.com/application/views/main/sections/author_component.php
Line: 48
Function: _error_handler

File: /var/www/ironpdf.com/application/libraries/Render.php
Line: 63
Function: view

File: /var/www/ironpdf.com/application/views/products/sections/three_column_docs_page_structure.php
Line: 64
Function: main_view

File: /var/www/ironpdf.com/application/libraries/Render.php
Line: 88
Function: view

File: /var/www/ironpdf.com/application/views/products/how-to/index.php
Line: 2
Function: view

File: /var/www/ironpdf.com/application/libraries/Render.php
Line: 88
Function: view

File: /var/www/ironpdf.com/application/libraries/Render.php
Line: 552
Function: view

File: /var/www/ironpdf.com/application/controllers/Products/Howto.php
Line: 31
Function: render_products_view

File: /var/www/ironpdf.com/index.php
Line: 292
Function: require_once

準備好開始了嗎?
Nuget 下載 5,044,537 | 版本: 2025.11 剛剛發布