Class TesseractConfiguration
A configuration object that fine-tunes Tesseract behavior at an Instance level. Gives access to every option available to tesseract command line or C++ API users.
Inheritance
Implements
Namespace: IronOcr
Assembly: IronOcr.dll
Syntax
public class TesseractConfiguration : Object, ICloneable
Constructors
TesseractConfiguration()
Declaration
public TesseractConfiguration()
Fields
EngineMode
Allows the developer to choose the algorithm Tesseract will use for OCR. TesseractAndLstm is the recommended behavior for IronOCR.
Declaration
public TesseractEngineMode EngineMode
Field Value
Type | Description |
---|---|
TesseractEngineMode |
PageSegmentationMode
Determines how a page is scanned to find potential blocks of text. Best documented in Tesseract developer websites.
AutoOsd is a safe default.
Declaration
public TesseractPageSegmentationMode PageSegmentationMode
Field Value
Type | Description |
---|---|
TesseractPageSegmentationMode |
ReadBarCodes
Optionally turns on Barcode reading alongside OCR.
Declaration
public bool ReadBarCodes
Field Value
Type | Description |
---|---|
System.Boolean |
ReadDataTables
Optionally turns on Table Detection and Parsing. To see tables in the OcrResult, access the Tables property.
var result = Ocr.Read(ocrInput);
result.Tables; // Output tables
Declaration
public bool ReadDataTables
Field Value
Type | Description |
---|---|
System.Boolean |
RenderHocr
Prerenders HOCR files during Tesseract read operations. Required True to use SaveAsHocrFile(String) method.
Declaration
public bool RenderHocr
Field Value
Type | Description |
---|---|
System.Boolean |
RenderSearchablePdf
Enables the creation of a Searchable PDF of the OcrInput in memory during Tesseract read operations. Must be True to save as Searachable PDF. SaveAsSearchablePdf(String) methods.
Declaration
public bool RenderSearchablePdf
Field Value
Type | Description |
---|---|
System.Boolean |
TesseractVariables
Add Tesseract configuration variables of type bool, int, double or string. Gives access to all Tesseract command-line and config file options.
To learn how to use this, Tesseract documentation is very sparse. Please use TrySaveAllTesseractVariablesToFile(String) to output all available tesseract variables for your installation of Tesseract.
To learn more about how to use TesseractVariables see our guide at: https://ironsoftware.com/csharp/ocr/docs/questions/csharp-tesseract-config-configuration-variables/
Declaration
public Dictionary<string, object> TesseractVariables
Field Value
Type | Description |
---|---|
System.Collections.Generic.Dictionary<System.String, System.Object> |
TesseractVersion
IronOcr supports Tesseract 5.1
Declaration
public TesseractVersion TesseractVersion
Field Value
Type | Description |
---|---|
TesseractVersion |
Properties
AutoRotateDetectionForRenderSearchablePdf
Enable the auto-rotate detection to improve the accuracy of the texts size and coordinates while creating a Searable Pdf of the OcrInput in memory Without manually determine the rotation angle of the OcrInput in memory and call Rotate(Double)
Declaration
public bool AutoRotateDetectionForRenderSearchablePdf { get; set; }
Property Value
Type | Description |
---|---|
System.Boolean |
Remarks
This configuration is automatically set to be false when RenderSearchablePdf is set to be false. Even though this configuration is set to true, auto-rotate detection cannot be achieved on mac
BlackListCharacters
If set, any characters in this string will not be recognized by IronTesseract OCR. An example use-case is to remove characters with accents.
BlackListCharacters and WhiteListCharacters can positively impact speed and accuracy if set thoughtfully.
Declaration
public string BlackListCharacters { get; set; }
Property Value
Type | Description |
---|---|
System.String |
See Also
WhiteListCharacters
If set, only characters in this string will be read by IronTesseract. Remember to include punctuation marks and space characters.
If know, WhiteListCharacters can dramatically increase performance and accuracy.
Also very useful if we expect only numbers or only letters.
Declaration
public string WhiteListCharacters { get; set; }
Property Value
Type | Description |
---|---|
System.String |
Methods
Clone()
Clone this See TesseractConfiguration
Declaration
public object Clone()
Returns
Type | Description |
---|---|
System.Object | A copy of this TesseractConfiguration as an object. |
TrySaveAllTesseractVariablesToFile(String)
Saves all Tesseract internal settings for this Configuration to a plain text file.
Declaration
public bool TrySaveAllTesseractVariablesToFile(string Path)
Parameters
Type | Name | Description |
---|---|---|
System.String | Path | A valid file path. Recommended file extension is .txt |
Returns
Type | Description |
---|---|
System.Boolean | True if file successfully saved |