Class TesseractConfiguration
A configuration object that fine-tunes Tesseract behavior at an Instance level. Gives access to every option available to tesseract command line or C++ API users.
Inheritance
Implements
Namespace: IronOcr
Assembly: IronOcr.dll
Syntax
public class TesseractConfiguration : Object
TesseractConfiguration is the settings object that fine-tunes how IronOCR's Tesseract engine recognizes text, exposing the same options a command-line or C++ Tesseract user would reach for. A developer uses it to switch engines, restrict which characters are accepted, turn on barcode or table reading, and emit searchable PDF or hOCR output, all without leaving managed code. It is the knob set most often when out-of-the-box accuracy needs tuning for a specific kind of document.
Every IronTesseract instance owns one through its Configuration property, so the usual pattern is to read that property and set values on it before calling Read. The same object can also be passed to the IronTesseract(TesseractConfiguration) constructor when a prepared configuration is reused across engines. Because the class implements ICloneable, Clone produces an independent copy, which is handy for keeping a baseline configuration and varying one setting per job.
The settings fall into a few groups. Engine selection covers EngineMode, TesseractVersion, and PageSegmentationMode, which together choose the recognition algorithm, the binary, and the layout-analysis strategy. Output toggles, RenderSearchablePdf, RenderHocr, ReadBarCodes, and ReadDataTables, decide what the read produces beyond plain text. Character control is handled by WhiteListCharacters and BlackListCharacters, which constrain or forbid specific glyphs. For options without a dedicated property, TesseractVariables is a Dictionary<string, object> that passes raw Tesseract variables through, and TrySaveAllTesseractVariablesToFile writes the active set out for inspection. Start from the defaults and change only what a document demands, since over-constraining segmentation or the character set is a common cause of dropped text.
using IronOcr;
var ocr = new IronTesseract();
ocr.Configuration.EngineMode = TesseractEngineMode.TesseractAndLstm;
ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.Auto;
ocr.Configuration.RenderSearchablePdf = true;The Tesseract config how-to walks through the options, the fast configuration how-to tunes for speed, and the Tesseract OCR tutorial puts a full read together.
Constructors
TesseractConfiguration()
Declaration
public TesseractConfiguration()
Fields
EngineMode
Allows the developer to choose the algorithm Tesseract will use for OCR. TesseractAndLstm is the recommended behavior for IronOCR.
Declaration
public TesseractEngineMode EngineMode
Field Value
| Type | Description |
|---|---|
| TesseractEngineMode |
PageSegmentationMode
Determines how a page is scanned to find potential blocks of text. Best documented in Tesseract developer websites.
AutoOsd is a safe default.
Declaration
public TesseractPageSegmentationMode PageSegmentationMode
Field Value
| Type | Description |
|---|---|
| TesseractPageSegmentationMode |
ReadBarCodes
Optionally turns on Barcode reading alongside OCR.
Declaration
public bool ReadBarCodes
Field Value
| Type | Description |
|---|---|
| System.Boolean |
ReadDataTables
Optionally turns on Table Detection and Parsing. To see tables in the OcrResult, access the Tables property.
var result = Ocr.Read(ocrInput);
result.Tables; // Output tables
Declaration
public bool ReadDataTables
Field Value
| Type | Description |
|---|---|
| System.Boolean |
RenderHocr
Prerenders HOCR files during Tesseract read operations. Required True to use SaveAsHocrFile(String) method.
Declaration
public bool RenderHocr
Field Value
| Type | Description |
|---|---|
| System.Boolean |
RenderSearchablePdf
Enables the creation of a Searchable PDF of the OcrInput in memory during Tesseract read operations.
Must be True to save as Searachable PDF.
Declaration
public bool RenderSearchablePdf
Field Value
| Type | Description |
|---|---|
| System.Boolean |
TesseractVariables
Add Tesseract configuration variables of type bool, int, double or string. Gives access to all Tesseract command-line and config file options.
To learn how to use this, Tesseract documentation is very sparse. Please use TrySaveAllTesseractVariablesToFile(String) to output all available tesseract variables for your installation of Tesseract.
To learn more about how to use TesseractVariables see our guide at: https://ironsoftware.com/csharp/ocr/docs/questions/csharp-tesseract-config-configuration-variables/
Declaration
public Dictionary<string, object> TesseractVariables
Field Value
| Type | Description |
|---|---|
| System.Collections.Generic.Dictionary<System.String, System.Object> |
TesseractVersion
IronOcr supports Tesseract 5.1
Declaration
public TesseractVersion TesseractVersion
Field Value
| Type | Description |
|---|---|
| TesseractVersion |
Properties
BlackListCharacters
If set, any characters in this string will not be recognized by IronTesseract OCR. An example use-case is to remove characters with accents.
BlackListCharacters and WhiteListCharacters can positively impact speed and accuracy if set thoughtfully.
Declaration
public string BlackListCharacters { get; set; }
Property Value
| Type | Description |
|---|---|
| System.String |
See Also
WhiteListCharacters
If set, only characters in this string will be read by IronTesseract. Remember to include punctuation marks and space characters.
If know, WhiteListCharacters can dramatically increase performance and accuracy.
Also very useful if we expect only numbers or only letters.
Declaration
public string WhiteListCharacters { get; set; }
Property Value
| Type | Description |
|---|---|
| System.String |
Methods
Clone()
Clone this See TesseractConfiguration
Declaration
public object Clone()
Returns
| Type | Description |
|---|---|
| System.Object | A copy of this TesseractConfiguration as an object. |
TrySaveAllTesseractVariablesToFile(String)
Saves all Tesseract internal settings for this Configuration to a plain text file.
Declaration
public bool TrySaveAllTesseractVariablesToFile(string Path)
Parameters
| Type | Name | Description |
|---|---|---|
| System.String | Path | A valid file path. Recommended file extension is .txt |
Returns
| Type | Description |
|---|---|
| System.Boolean | True if file successfully saved |