Search Results for

    Show / Hide Table of Contents

    Class TesseractConfiguration

    A configuration object that fine-tunes Tesseract behavior at an Instance level. Gives access to every option available to tesseract command line or C++ API users.

    Inheritance
    System.Object
    TesseractConfiguration
    Implements
    System.ICloneable
    Namespace: IronOcr
    Assembly: IronOcr.dll
    Syntax
    public class TesseractConfiguration : Object, ICloneable

    Constructors

    TesseractConfiguration()

    Declaration
    public TesseractConfiguration()

    Fields

    EngineMode

    Allows the developer to choose the algorithm Tesseract will use for OCR. TesseractAndLstm is the recommended behavior for IronOCR.

    Declaration
    public TesseractEngineMode EngineMode
    Field Value
    Type Description
    TesseractEngineMode

    PageSegmentationMode

    Determines how a page is scanned to find potential blocks of text. Best documented in Tesseract developer websites.

    AutoOsd is a safe default.

    Declaration
    public TesseractPageSegmentationMode PageSegmentationMode
    Field Value
    Type Description
    TesseractPageSegmentationMode

    ReadBarCodes

    Optionally turns on Barcode reading alongside OCR.

    Declaration
    public bool ReadBarCodes
    Field Value
    Type Description
    System.Boolean

    ReadDataTables

    Optionally turns on Table Detection and Parsing. To see tables in the OcrResult, access the Tables property.

    var result = Ocr.Read(ocrInput);
    result.Tables; // Output tables
    Declaration
    public bool ReadDataTables
    Field Value
    Type Description
    System.Boolean

    RenderHocr

    Prerenders HOCR files during Tesseract read operations. Required True to use SaveAsHocrFile(String) method.

    Declaration
    public bool RenderHocr
    Field Value
    Type Description
    System.Boolean

    RenderSearchablePdf

    Enables the creation of a Searchable PDF of the OcrInput in memory during Tesseract read operations. Must be True to save as Searachable PDF. SaveAsSearchablePdf(String) methods.

    Declaration
    public bool RenderSearchablePdf
    Field Value
    Type Description
    System.Boolean

    TesseractVariables

    Add Tesseract configuration variables of type bool, int, double or string. Gives access to all Tesseract command-line and config file options.

    To learn how to use this, Tesseract documentation is very sparse. Please use TrySaveAllTesseractVariablesToFile(String) to output all available tesseract variables for your installation of Tesseract.

    To learn more about how to use TesseractVariables see our guide at: https://ironsoftware.com/csharp/ocr/docs/questions/csharp-tesseract-config-configuration-variables/

    Declaration
    public Dictionary<string, object> TesseractVariables
    Field Value
    Type Description
    System.Collections.Generic.Dictionary<System.String, System.Object>

    TesseractVersion

    IronOcr supports Tesseract 5.1

    Declaration
    public TesseractVersion TesseractVersion
    Field Value
    Type Description
    TesseractVersion

    Properties

    AutoRotateDetectionForRenderSearchablePdf

    Enable the auto-rotate detection to improve the accuracy of the texts size and coordinates while creating a Searable Pdf of the OcrInput in memory Without manually determine the rotation angle of the OcrInput in memory and call Rotate(Double)

    Declaration
    public bool AutoRotateDetectionForRenderSearchablePdf { get; set; }
    Property Value
    Type Description
    System.Boolean
    Remarks

    This configuration is automatically set to be false when RenderSearchablePdf is set to be false. Even though this configuration is set to true, auto-rotate detection cannot be achieved on mac

    BlackListCharacters

    If set, any characters in this string will not be recognized by IronTesseract OCR. An example use-case is to remove characters with accents.

    BlackListCharacters and WhiteListCharacters can positively impact speed and accuracy if set thoughtfully.

    Declaration
    public string BlackListCharacters { get; set; }
    Property Value
    Type Description
    System.String
    See Also
    WhiteListCharacters

    WhiteListCharacters

    If set, only characters in this string will be read by IronTesseract. Remember to include punctuation marks and space characters.

    If know, WhiteListCharacters can dramatically increase performance and accuracy.

    Also very useful if we expect only numbers or only letters.

    Declaration
    public string WhiteListCharacters { get; set; }
    Property Value
    Type Description
    System.String

    Methods

    Clone()

    Clone this See TesseractConfiguration

    Declaration
    public object Clone()
    Returns
    Type Description
    System.Object

    A copy of this TesseractConfiguration as an object.

    TrySaveAllTesseractVariablesToFile(String)

    Saves all Tesseract internal settings for this Configuration to a plain text file.

    Declaration
    public bool TrySaveAllTesseractVariablesToFile(string Path)
    Parameters
    Type Name Description
    System.String Path

    A valid file path. Recommended file extension is .txt

    Returns
    Type Description
    System.Boolean

    True if file successfully saved

    Implements

    System.ICloneable
    ☀
    ☾
    Downloads
    • Download with Nuget
    • Start for Free
    In This Article
    Back to top
    Install with Nuget
    Want to deploy IronOCR to a live project for FREE?
    What’s included?
    30 days of fully-functional product
    Test and share in a live environment
    No watermarks in production
    Get your free 30-day Trial Key instantly.
    No credit card or account creation required
    Your Trial License Key has been emailed to you.
    Download IronOCR free to apply
    your Trial Licenses Key
    Install with NuGet View Licenses
    Licenses from $499. Have a question? Get in touch.