Search Results for

    Show / Hide Table of Contents

    Class IronTesseract

    IronTesseract is a comprehensive managed class for performing Tesseract OCR in .Net applications.

    IronTesseract natively supports Tesseract 3, 4 and 5 engines, and will automatically install all required binaries and language packs (tessdata) files.

    Inheritance
    System.Object
    IronTesseract
    Namespace: IronOcr
    Assembly: IronOcr.dll
    Syntax
    public class IronTesseract : Object

    Constructors

    IronTesseract()

    Public constructor. Creates a default instance of IronTesseract

    Declaration
    public IronTesseract()

    IronTesseract(TesseractConfiguration)

    Public constructor. Creates an instance of IronTesseract with a customized TesseractConfiguration.

    This allows advanced developers to fine tune Tesseract behavior.

    Declaration
    public IronTesseract(TesseractConfiguration Configuration)
    Parameters
    Type Name Description
    TesseractConfiguration Configuration

    Fields

    Configuration

    An instance of TesseractConfiguration which allows fine-grained control of the underlying Tesseract OCR Engine.

    Options include: Language file detail level. Page Segmentation Mode and access to the entire API of tesseract settings variables.

    Declaration
    public TesseractConfiguration Configuration
    Field Value
    Type Description
    TesseractConfiguration

    Properties

    Language

    The Natural Language of the documents Which IronTesseract will read.

    Default is English. Additional languages can be installed easily using Nuget https://www.nuget.org/packages?q=IronOcr.Languages or downloaded from https://ironsoftware.com/csharp/ocr/languages/

    We may use multiple languages packs simultaneously with the UseMultipleLanguages method.

    We can use custom Tesseract .tessdata language packs with the UseCustomTesseractLanguageFile(String) method.

    Declaration
    public OcrLanguage Language { get; set; }
    Property Value
    Type Description
    OcrLanguage
    See Also
    OcrLanguage

    MultiThreaded

    Read multiple PDF pages and images simultaneously on different threads

    Declaration
    public bool MultiThreaded { get; set; }
    Property Value
    Type Description
    System.Boolean

    Methods

    AddSecondaryLanguage(OcrLanguage)

    IronTesseract will use multiple tesseract language files simultaneously. MultilingualOCR.

    Any number of secondary languages may be added. Speed and performance may be affected.

    Declaration
    public void AddSecondaryLanguage(OcrLanguage SecondaryLanguage)
    Parameters
    Type Name Description
    OcrLanguage SecondaryLanguage

    An additional OcrLanguage

    AddSecondaryLanguage(String)

    IronTesseract will use multiple tesseract language files simultaneously. MultilingualOCR using a custom .traineddata tesseract 3,4 or 5 language file.

    Any number of secondary languages may be added. Speed and performance may be affected.

    Declaration
    public void AddSecondaryLanguage(string CustomLanguagePath)
    Parameters
    Type Name Description
    System.String CustomLanguagePath

    File path to a .traineddata tesseract language pack.

    ClearSecondaryLanguages()

    Removes all languages add by AddSecondaryLanguage(OcrLanguage) or AddSecondaryLanguage(String)

    Declaration
    public void ClearSecondaryLanguages()

    Read(OcrInput)

    Reads text from an OcrInput object and returns an OcrResult object.

    OcrInput is the preferred input type because it allows for OCR of multi-paged documents, and allows images to be enhanced before OCR to obtain faster, more accurate results.

    There are also other overloads of this method that allow for Images and PDFs to be read directly as File paths and Bitmaps.

    Declaration
    public OcrResult Read(OcrInput Input)
    Parameters
    Type Name Description
    OcrInput Input

    An OcrInput document which can contain one or more images and PDFs

    Returns
    Type Description
    OcrResult

    A OcrResult object containing text, and detailed, structured information about the extracted text content.

    Read(AnyBitmap)

    Reads text from a IronSoftware.Drawing.AnyBitmap Image file and returns an OcrResult object.

    Declaration
    public OcrResult Read(AnyBitmap Image)
    Parameters
    Type Name Description
    IronSoftware.Drawing.AnyBitmap Image

    An IronSoftware.Drawing.AnyBitmap, SixLabor.ImageSharp.Image, SkiaSharp.SKBitmap, SkiaSharp.SKImage, Microsoft.Maui.Graphics.Platform.PlatformImage, System.Drawing.Bitmap, or System.Drawing.Image

    Returns
    Type Description
    OcrResult

    A OcrResult object containing text, and detailed, structured information about the extracted text content.

    Read(AnyBitmap, CropRectangle)

    Reads text from a region of a IronSoftware.Drawing.AnyBitmap Image file and returns an OcrResult object.

    Declaration
    public OcrResult Read(AnyBitmap Image, CropRectangle ContentArea)
    Parameters
    Type Name Description
    IronSoftware.Drawing.AnyBitmap Image

    An IronSoftware.Drawing.AnyBitmap, SixLabor.ImageSharp.Image, SkiaSharp.SKBitmap, SkiaSharp.SKImage, Microsoft.Maui.Graphics.Platform.PlatformImage, System.Drawing.Bitmap, or System.Drawing.Image

    IronSoftware.Drawing.CropRectangle ContentArea

    Specifies a region within the image to extract text from as a IronSoftware.Drawing.CropRectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

    Returns
    Type Description
    OcrResult

    A OcrResult object containing text, and detailed, structured information about the extracted text content.

    Read(String)

    Reads text from an Image file and returns an OcrResult object.

    Declaration
    public OcrResult Read(string ImagePath)
    Parameters
    Type Name Description
    System.String ImagePath

    Path to an image file.

    Returns
    Type Description
    OcrResult

    A OcrResult object containing text, and detailed, structured information about the extracted text content.

    Read(String, CropRectangle)

    Reads text from a region of an Image file and returns an OcrResult object.

    Declaration
    public OcrResult Read(string ImagePath, CropRectangle ContentArea)
    Parameters
    Type Name Description
    System.String ImagePath

    Path to an image file.

    IronSoftware.Drawing.CropRectangle ContentArea

    Specifies a region within the image to extract text from as a IronSoftware.Drawing.CropRectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

    Returns
    Type Description
    OcrResult

    A OcrResult object containing text, and detailed, structured information about the extracted text content.

    ReadAsync(OcrInput, Int32)

    Reads text from an OcrInput object and returns an OcrResult object.

    OcrInput is the preferred input type because it allows for OCR of multi-paged documents, and allows images to be enhanced before OCR to obtain faster, more accurate results.

    There are also other overloads of this method that allow for Images and PDFs to be read directly as File paths and Bitmaps.

    Declaration
    public OcrReadTask ReadAsync(OcrInput Input, int TimeoutMs = -1)
    Parameters
    Type Name Description
    OcrInput Input

    An OcrInput document which can contain one or more images and PDFs

    System.Int32 TimeoutMs

    Optional timeout in milliseconds, after which the Ocr read will be cancelled. Not supported in .NET 4.0

    Returns
    Type Description
    OcrReadTask

    A task that represents the asynchronous read operation. The value of the System.Threading.Tasks.Task`1.Result property contains an OcrResult object containing text, and detailed, structured information about the extracted text content.

    ReadAsync(AnyBitmap, CropRectangle, Int32)

    Reads text from an IronSoftware.Drawing.AnyBitmap object and returns an OcrResult object.

    Declaration
    public Task<OcrResult> ReadAsync(AnyBitmap Image, CropRectangle ContentArea = null, int TimeoutMs = -1)
    Parameters
    Type Name Description
    IronSoftware.Drawing.AnyBitmap Image

    An IronSoftware.Drawing.AnyBitmap object

    IronSoftware.Drawing.CropRectangle ContentArea

    Specifies a region within the image to extract text from as a IronSoftware.Drawing.CropRectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

    System.Int32 TimeoutMs

    Optional timeout in milliseconds, after which the Ocr read will be cancelled. Not supported in .NET 4.0

    Returns
    Type Description
    System.Threading.Tasks.Task<OcrResult>

    A task that represents the asynchronous read operation. The value of the System.Threading.Tasks.Task`1.Result property contains an OcrResult object containing text, and detailed, structured information about the extracted text content.

    ReadAsync(String, CropRectangle, Int32)

    Reads text from an Image file and returns an OcrResult object.

    Declaration
    public OcrReadTask ReadAsync(string ImagePath, CropRectangle ContentArea = null, int TimeoutMs = -1)
    Parameters
    Type Name Description
    System.String ImagePath

    Path to an image file.

    IronSoftware.Drawing.CropRectangle ContentArea

    Specifies a region within the image to extract text from as a IronSoftware.Drawing.CropRectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

    System.Int32 TimeoutMs

    Optional timeout in milliseconds, after which the Ocr read will be cancelled. Not supported in .NET 4.0

    Returns
    Type Description
    OcrReadTask

    A task that represents the asynchronous read operation. The value of the System.Threading.Tasks.Task`1.Result property contains an OcrResult object containing text, and detailed, structured information about the extracted text content.

    UseCustomTesseractLanguageFile(String)

    IronTesseract will use a tesseract .traineddata language file as its only OCR language.

    https://github.com/tesseract-ocr/tessdata

    Declaration
    public void UseCustomTesseractLanguageFile(string TrainedDataPath)
    Parameters
    Type Name Description
    System.String TrainedDataPath

    File path to a .traineddata file. These can be downloaded from https://github.com/tesseract-ocr/tessdata or generated using Tesseract command line.

    Events

    OcrProgress

    An Event which can be used to track OCR progress and inform users of OCR performance and progress.

    Progress is reported via the OcrProgressEventsArgs class

    Declaration
    public event EventHandler<OcrProgressEventsArgs> OcrProgress
    Event Type
    Type Description
    System.EventHandler<OcrProgressEventsArgs>
    Examples
    myIronTesseract.OcrProgress += (object o, IronOcr.Events.OcrProgressEventsArgs e) =>
       {
          Console.WriteLine(e.ProgressPercent + "%   " + e.Duration.TotalSeconds+"s"  );
       }
    See Also
    OcrProgressEventsArgs
    ☀
    ☾
    Downloads
    • Download with Nuget
    • Free 30-Day Trial Key
    In This Article
    Back to top
    Install with Nuget
    Want to deploy IronOCR to a live project for FREE?
    What’s included?
    30 days of fully-functional product
    Test and share in a live environment
    No watermarks in production
    Get your free 30-day Trial Key instantly.
    No credit card or account creation required
    Your Trial License Key has been emailed to you.
    Download IronOCR free to apply
    your Trial Licenses Key
    Install with NuGet View Licenses
    Licenses from $499. Have a question? Get in touch.