Search Results for

    Show / Hide Table of Contents

    Class IronTesseract

    IronTesseract is a comprehensive managed class for performing Tesseract OCR in .Net applications.

    IronTesseract natively supports Tesseract 3, 4 and 5 engines, and will automatically install all required binaries and language packs (tessdata) files.

    Inheritance
    System.Object
    IronTesseract
    Implements
    IronSoftware.Abstractions.Ocr.IOcrEngine
    Namespace: IronOcr
    Assembly: IronOcr.dll
    Syntax
    public class IronTesseract : Object

    IronTesseract runs Tesseract OCR on images and PDFs in .NET, turning scanned pages, photos, and documents into text and searchable PDFs. It is the engine a developer reaches for behind a search like "C# Tesseract OCR": construct one, point it at an OcrInput, and call a read. It wraps Iron Software's tuned Tesseract 5 build, so the same object handles a clean scan, a noisy photo, or a multi-page PDF.

    Create one with new IronTesseract(), or new IronTesseract(TesseractConfiguration) to start from a prepared configuration. Set Language to the document's natural language, add more with AddSecondaryLanguage for multilingual pages, and flip MultiThreaded to read pages and images on parallel threads. The Configuration field exposes a TesseractConfiguration for fine-grained engine control, and EnableTesseractConsoleMessages surfaces the engine's own diagnostics. Subscribe to the OcrProgress event to report progress on long reads.

    The read surface groups into functional buckets. **Standard reads** are the Read overloads, which accept an OcrInputBase, an array of inputs, an AnyBitmap, or an image path (with an optional Rectangle to limit OCR to a region) and return an OcrResult; the IDocumentId overloads read an existing PDF. **Asynchronous reads** are the ReadAsync overloads, which return an awaitable result with an optional timeout for keeping the call off a request thread. **Specialized machine-learning reads** target specific content: ReadDocumentAdvanced, ReadHandwriting, ReadPassport, ReadLicensePlate, ReadPhoto, and ReadScreenShot, each returning a result type tuned to that scenario, with matching async forms. **Searchable-PDF conversion** is handled by ConvertToSearchablePdf and ConvertToSearchablePdfBytes, which OCR a PDF's images and overlay the recognized text. Custom language data loads through AddSecondaryLanguage and UseCustomTesseractLanguageFile, and ClearSecondaryLanguages resets the set.

    using IronOcr;
    
    var ocr = new IronTesseract();
    ocr.Language = OcrLanguage.English;
    OcrResult result = ocr.Read(new OcrInput("scan.png"));
    Console.WriteLine(result.Text);

    The Iron Tesseract how-to covers configuring and running a read, the read results how-to traverses the returned text and words, and the simple OCR example shows a minimal read.

    Constructors

    IronTesseract()

    Public constructor. Creates a default instance of IronTesseract

    Declaration
    public IronTesseract()

    IronTesseract(TesseractConfiguration)

    Public constructor. Creates an instance of IronTesseract with a customized TesseractConfiguration.

    This allows advanced developers to fine tune Tesseract behavior.

    Declaration
    public IronTesseract(TesseractConfiguration Configuration)
    Parameters
    Type Name Description
    TesseractConfiguration Configuration

    Fields

    Configuration

    An instance of TesseractConfiguration which allows fine-grained control of the underlying Tesseract OCR Engine.

    Options include: Language file detail level. Page Segmentation Mode and access to the entire API of tesseract settings variables.

    Declaration
    public TesseractConfiguration Configuration
    Field Value
    Type Description
    TesseractConfiguration

    Properties

    EnableTesseractConsoleMessages

    Gets or sets a value indicating whether Tesseract developer messages and warnings will be sent to console output.

    Declaration
    public bool EnableTesseractConsoleMessages { get; set; }
    Property Value
    Type Description
    System.Boolean
    Remarks

    Setting this property to true enables console output for Tesseract messages and warnings. Conversely, setting it to false disables this output.

    Language

    The Natural Language of the documents Which IronTesseract will read.

    Default is English. Additional languages can be installed easily using Nuget https://www.nuget.org/packages?q=IronOcr.Languages or downloaded from https://ironsoftware.com/csharp/ocr/languages/

    We may use multiple languages packs simultaneously with the UseMultipleLanguages method.

    We can use custom Tesseract .tessdata language packs with the UseCustomTesseractLanguageFile(String) method.

    Declaration
    public OcrLanguage Language { get; set; }
    Property Value
    Type Description
    OcrLanguage
    See Also
    OcrLanguage

    MultiThreaded

    Read multiple PDF pages and images simultaneously on different threads

    Declaration
    public bool MultiThreaded { get; set; }
    Property Value
    Type Description
    System.Boolean

    Methods

    AddSecondaryLanguage(OcrLanguage)

    IronTesseract will use multiple tesseract language files simultaneously. MultilingualOCR

    Any number of secondary languages may be added. Speed and performance may be affected.

    Declaration
    public void AddSecondaryLanguage(OcrLanguage SecondaryLanguage)
    Parameters
    Type Name Description
    OcrLanguage SecondaryLanguage

    An additional OcrLanguage

    AddSecondaryLanguage(String)

    IronTesseract will use multiple tesseract language files simultaneously. MultilingualOCR uses a custom .traineddata tesseract 3,4 or 5 language file.

    Any number of secondary languages may be added. Speed and performance may be affected.

    Declaration
    public void AddSecondaryLanguage(string CustomLanguagePath)
    Parameters
    Type Name Description
    System.String CustomLanguagePath

    File path to a .traineddata tesseract language pack.

    ClearSecondaryLanguages()

    Removes all languages add by AddSecondaryLanguage(OcrLanguage) or AddSecondaryLanguage(String)

    Declaration
    public void ClearSecondaryLanguages()

    ConvertToSearchablePdf(Byte[], String, String)

    Perform OCR on images within the PDF, overlay the text onto the original PDF, and save the new PDF to file

    Declaration
    public void ConvertToSearchablePdf(byte[] PdfData, string SavePath, string Password = null)
    Parameters
    Type Name Description
    System.Byte[] PdfData

    PDF file data

    System.String SavePath

    Save path of the searchable PDF

    System.String Password

    PDF password

    Remarks

    Useful for generating a searchable PDF which retains bookmarks, annotations, etc.

    ConvertToSearchablePdf(String, String, String)

    Perform OCR on images within the PDF, overlay the text onto the original PDF, and save the new PDF to file

    Declaration
    public void ConvertToSearchablePdf(string PdfPath, string SavePath, string Password = null)
    Parameters
    Type Name Description
    System.String PdfPath

    PDF file path

    System.String SavePath

    Save path of the searchable PDF

    System.String Password

    PDF password

    Remarks

    Useful for generating a searchable PDF which retains bookmarks, annotations, etc.

    ConvertToSearchablePdfBytes(Byte[], String)

    Perform OCR on images within the PDF, overlay the text onto the original PDF, and return a byte array of the new PDF

    Declaration
    public byte[] ConvertToSearchablePdfBytes(byte[] PdfData, string Password = null)
    Parameters
    Type Name Description
    System.Byte[] PdfData

    PDF file data

    System.String Password

    PDF password

    Returns
    Type Description
    System.Byte[]

    Byte array of the generated Searchable PDF

    Remarks

    Useful for generating a searchable PDF which retains bookmarks, annotations, etc.

    ConvertToSearchablePdfBytes(String, String)

    Perform OCR on images within the PDF, overlay the text onto the original PDF, and return a byte array of the new PDF

    Declaration
    public byte[] ConvertToSearchablePdfBytes(string PdfPath, string Password = null)
    Parameters
    Type Name Description
    System.String PdfPath

    PDF file path

    System.String Password

    PDF password

    Returns
    Type Description
    System.Byte[]

    Byte array of the generated Searchable PDF

    Remarks

    Useful for generating a searchable PDF which retains bookmarks, annotations, etc.

    Read(OcrInputBase)

    Reads text from an OcrInput object and returns an OcrResult object.

    OcrInput is the preferred input type because it allows for OCR of multi-paged documents, and allows images to be enhanced before OCR to obtain faster, more accurate results.

    There are also other overloads of this method that allow for Images and PDFs to be read directly as File paths and Bitmaps.

    Declaration
    public OcrResult Read(OcrInputBase Input)
    Parameters
    Type Name Description
    OcrInputBase Input

    An OcrInput document which can contain one or more images and PDFs

    Returns
    Type Description
    OcrResult

    A OcrResult object containing text, and detailed, structured information about the extracted text content.

    Read(OcrInputBase[])

    Reads text from an array of OcrInput objects and returns an array of OcrResult objects.

    OcrInput is the preferred input type because it allows for OCR of multi-paged documents, and allows images to be enhanced before OCR to obtain faster, more accurate results.

    There are also other overloads of this method that allow for Images and PDFs to be read directly as File paths and Bitmaps.

    Declaration
    public OcrResult[] Read(OcrInputBase[] Inputs)
    Parameters
    Type Name Description
    OcrInputBase[] Inputs

    An array of OcrInput documents which can contain one or more images and PDFs each

    Returns
    Type Description
    OcrResult[]

    An array of OcrResult objects containing text, and detailed, structured information about the extracted text content.

    Read(IDocumentId)

    Read the existing Pdf document and return OCR results

    Declaration
    public IOcrResult Read(IDocumentId Document)
    Parameters
    Type Name Description
    IronSoftware.Abstractions.Pdf.IDocumentId Document

    Pdf document to read

    Returns
    Type Description
    IronSoftware.Abstractions.Ocr.IOcrResult

    OCR results

    Read(IDocumentId, PdfContents)

    Read the existing Pdf document and return OCR results

    Declaration
    public IOcrResult Read(IDocumentId Document, PdfContents Contents)
    Parameters
    Type Name Description
    IronSoftware.Abstractions.Pdf.IDocumentId Document

    Pdf document to read

    PdfContents Contents

    Contents to OCR

    Returns
    Type Description
    IronSoftware.Abstractions.Ocr.IOcrResult

    OCR results

    Read(AnyBitmap)

    Reads text from a IronSoftware.Drawing.AnyBitmap Image file and returns an OcrResult object.

    Declaration
    public OcrResult Read(AnyBitmap Image)
    Parameters
    Type Name Description
    IronSoftware.Drawing.AnyBitmap Image

    An IronSoftware.Drawing.AnyBitmap, SixLabor.ImageSharp.Image, SkiaSharp.SKBitmap, SkiaSharp.SKImage, Microsoft.Maui.Graphics.Platform.PlatformImage, System.Drawing.Bitmap, or System.Drawing.Image

    Returns
    Type Description
    OcrResult

    A OcrResult object containing text, and detailed, structured information about the extracted text content.

    Read(AnyBitmap, Rectangle)

    Reads text from a region of a IronSoftware.Drawing.AnyBitmap Image file and returns an OcrResult object.

    Declaration
    public OcrResult Read(AnyBitmap Image, Rectangle ContentArea)
    Parameters
    Type Name Description
    IronSoftware.Drawing.AnyBitmap Image

    An IronSoftware.Drawing.AnyBitmap, SixLabor.ImageSharp.Image, SkiaSharp.SKBitmap, SkiaSharp.SKImage, Microsoft.Maui.Graphics.Platform.PlatformImage, System.Drawing.Bitmap, or System.Drawing.Image

    IronSoftware.Drawing.Rectangle ContentArea

    Specifies a region within the image to extract text from as a IronSoftware.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

    Returns
    Type Description
    OcrResult

    A OcrResult object containing text, and detailed, structured information about the extracted text content.

    Read(String)

    Reads text from an Image file and returns an OcrResult object.

    Declaration
    public OcrResult Read(string ImagePath)
    Parameters
    Type Name Description
    System.String ImagePath

    Path to an image file.

    Returns
    Type Description
    OcrResult

    A OcrResult object containing text, and detailed, structured information about the extracted text content.

    Read(String, Rectangle)

    Reads text from a region of an Image file and returns an OcrResult object.

    Declaration
    public OcrResult Read(string ImagePath, Rectangle ContentArea)
    Parameters
    Type Name Description
    System.String ImagePath

    Path to an image file.

    IronSoftware.Drawing.Rectangle ContentArea

    Specifies a region within the image to extract text from as a IronSoftware.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

    Returns
    Type Description
    OcrResult

    A OcrResult object containing text, and detailed, structured information about the extracted text content.

    ReadAsync(OcrInputBase, Int32)

    Reads text from an OcrInput object and returns an OcrResult object.

    OcrInput is the preferred input type because it allows for OCR of multi-paged documents, and allows images to be enhanced before OCR to obtain faster, more accurate results.

    There are also other overloads of this method that allow for Images and PDFs to be read directly as File paths and Bitmaps.

    Declaration
    public OcrReadTask ReadAsync(OcrInputBase Input, int TimeoutMs = -1)
    Parameters
    Type Name Description
    OcrInputBase Input

    An OcrInput document which can contain one or more images and PDFs

    System.Int32 TimeoutMs

    Optional timeout in milliseconds, after which the Ocr read will be cancelled.
    Please note that the timeout only executes between iterations OCR operation, not during them. Once an iteration starts running, the iteration has to be completed before the timeout can take effect.
    Example case: If OCR processing takes 3 seconds, the application would be completing the OCR operation before a timeout of 1000ms can be executed.
    Remark: Not supported in .NET 4.0

    Returns
    Type Description
    OcrReadTask

    A task that represents the asynchronous read operation. The value of the System.Threading.Tasks.Task`1.Result property contains an OcrResult object containing text, and detailed, structured information about the extracted text content.

    ReadAsync(AnyBitmap, Rectangle, Int32)

    Reads text from an IronSoftware.Drawing.AnyBitmap object and returns an OcrResult object.

    Declaration
    public Task<OcrResult> ReadAsync(AnyBitmap Image, Rectangle ContentArea = null, int TimeoutMs = -1)
    Parameters
    Type Name Description
    IronSoftware.Drawing.AnyBitmap Image

    An IronSoftware.Drawing.AnyBitmap object

    IronSoftware.Drawing.Rectangle ContentArea

    Specifies a region within the image to extract text from as a IronSoftware.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

    System.Int32 TimeoutMs

    Optional timeout in milliseconds, after which the Ocr read will be cancelled.
    Please note that the timeout only executes between iterations OCR operation, not during them. Once an iteration starts running, the iteration has to be completed before the timeout can take effect.
    Example case: If OCR processing takes 3 seconds, the application would be completing the OCR operation before a timeout of 1000ms can be executed.
    Remark: Not supported in .NET 4.0

    Returns
    Type Description
    System.Threading.Tasks.Task<OcrResult>

    A task that represents the asynchronous read operation. The value of the System.Threading.Tasks.Task`1.Result property contains an OcrResult object containing text, and detailed, structured information about the extracted text content.

    ReadAsync(String, Rectangle, Int32)

    Reads text from an Image file and returns an OcrResult object.

    Declaration
    public OcrReadTask ReadAsync(string ImagePath, Rectangle ContentArea = null, int TimeoutMs = -1)
    Parameters
    Type Name Description
    System.String ImagePath

    Path to an image file.

    IronSoftware.Drawing.Rectangle ContentArea

    Specifies a region within the image to extract text from as a IronSoftware.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

    System.Int32 TimeoutMs

    Optional timeout in milliseconds, after which the Ocr read will be cancelled.
    Please note that the timeout only executes between iterations OCR operation, not during them. Once an iteration starts running, the iteration has to be completed before the timeout can take effect.
    Example case: If OCR processing takes 3 seconds, the application would be completing the OCR operation before a timeout of 1000ms can be executed.
    Remark: Not supported in .NET 4.0

    Returns
    Type Description
    OcrReadTask

    A task that represents the asynchronous read operation. The value of the System.Threading.Tasks.Task`1.Result property contains an OcrResult object containing text, and detailed, structured information about the extracted text content.

    ReadDocument(OcrInputBase)

    A strong IronTesseract Document reading method that specializes in scanned documents or photos of paper documents which contain a lot of text.
    Ensure you have use the OcrInput filters to improve inputs, see this article for more information.

    For reading of Images use ReadPhoto(OcrInputBase, ModelType).
    For reading of Passports use ReadPassport(OcrInputBase).
    For reading of LicensePlates use ReadLicensePlate(OcrInputBase).
    For reading of Scanned Documents contain tables with clarity outlines use .

    Declaration
    public OcrResult ReadDocument(OcrInputBase input)
    Parameters
    Type Name Description
    OcrInputBase input

    OCR Input

    Returns
    Type Description
    OcrResult

    OcrResult

    ReadDocumentAdvanced(OcrInputBase, ModelType)

    An optimized read utilizing machine learning models together with computer vision methods for documents that contains tables with clarity outlines. Ensure you have use the OcrInput filters to improve inputs, see: https://ironsoftware.com/csharp/ocr/tutorials/c-sharp-ocr-image-filters/. For reading of Scanned Documents use ReadDocument(OcrInputBase). For reading of Images use ReadPhoto(OcrInputBase, ModelType). For reading of Passports use ReadPassport(OcrInputBase). For reading of LicensePlates use ReadLicensePlate(OcrInputBase).

    Declaration
    public OcrDocAdvancedResult ReadDocumentAdvanced(OcrInputBase input, ModelType modelType)
    Parameters
    Type Name Description
    OcrInputBase input

    OCR Input

    ModelType modelType

    The type of ML model to use. Default is Normal for faster processing. Use Enhanced for higher accuracy on challenging images.

    Returns
    Type Description
    OcrDocAdvancedResult

    OCR document advanced result

    Remarks

    **Current supported languages are English, Chinese, Japanese, Korean, and Latin.
    **This is an extension method to the base IronOCR package and requires that you also install the IronOcr.Extensions.AdvancedScan package.

    ReadDocumentAdvancedAsync(OcrInputBase, Int32, ModelType)

    An async operation for optimized read utilizing machine learning models together with computer vision with optional timeout. Reads text from an IronSoftware.Drawing.AnyBitmap object and returns an OcrResult object.

    Declaration
    public Task<OcrDocAdvancedResult> ReadDocumentAdvancedAsync(OcrInputBase input, int timeoutMs = -1, ModelType modelType)
    Parameters
    Type Name Description
    OcrInputBase input

    OCR Input

    System.Int32 timeoutMs

    Optional timeout in milliseconds

    ModelType modelType

    The type of ML model to use. Default is Normal for faster processing. Use Enhanced for higher accuracy on challenging images.

    Returns
    Type Description
    System.Threading.Tasks.Task<OcrDocAdvancedResult>

    A task that represents the asynchronous read operation. The value of the System.Threading.Tasks.Task`1.Result property contains an OcrDocAdvancedResult object containing text, tables, confidence

    ReadHandwriting(OcrInputBase)

    An optimized read that suitable for handwritten text recognition.

    ------------------------------------------------

    Usage:

    // Load OCR input
    var input = new IronOcr.OcrInput();
    input.Load("input.png");
    // Instantiate OCR engine
    var ocr = new IronOcr.IronTesseract();
    // Read input with handwritten texts
    var result = ocr.ReadHandwriting(input);

    ------------------------------------------------

    Declaration
    public OcrHandwritingResult ReadHandwriting(OcrInputBase input)
    Parameters
    Type Name Description
    OcrInputBase input

    OCR input

    Returns
    Type Description
    OcrHandwritingResult

    OcrHandwritingResult

    Remarks

    Important Considerations:

    ⚠️Language Availability: This method currently supports only English.

    ⚠️Writing Style: This method might provides low accuracy OCR result when trying to recognize cursive handwritten texts

    Related Documentation:

    📖How-To Guide:

    📚API Reference:

    Exceptions
    Type Condition
    ExtensionAdvancedScanException

    ReadHandwritingAsync(OcrInputBase, Int32)

    An async operation for an optimized reading that suitable for handwritten text recognition with an optional timeout.

    ------------------------------------------------

    Usage:

    // Load OCR input
    var input = new IronOcr.OcrInput();
    input.Load("input.png");
    // Instantiate OCR engine
    var ocr = new IronOcr.IronTesseract();
    // Optional timeout in milliseconds
    int timeOut = 20000;
    // Read input with handwritten texts
    var result = await ocr.ReadHandwritingAsync(input, timeOut);

    ------------------------------------------------

    Declaration
    public Task<OcrHandwritingResult> ReadHandwritingAsync(OcrInputBase input, int timeoutMs = -1)
    Parameters
    Type Name Description
    OcrInputBase input

    OCR input

    System.Int32 timeoutMs

    Optional timeout in milliseconds

    Returns
    Type Description
    System.Threading.Tasks.Task<OcrHandwritingResult>

    A task that represents the asynchronous read operation. The value of the System.Threading.Tasks.Task`1.Result property contains an OcrHandwritingResult

    Remarks

    Important Considerations:

    ⚠️Language Availability: This method currently supports only English.

    ⚠️Writing Style: This method might provides low accuracy OCR result when trying to recognize cursive handwritten texts

    Related Documentation:

    📖How-To Guide:

    📚API Reference:

    ReadImagesFromPdf(Byte[], String, IEnumerable<Int32>)

    Extract all images from a PDF, perform OCR on the images, and return the results

    Declaration
    public OcrResult ReadImagesFromPdf(byte[] PdfData, string Password = null, IEnumerable<int> PageIndices = null)
    Parameters
    Type Name Description
    System.Byte[] PdfData

    PDF file data

    System.String Password

    PDF password

    System.Collections.Generic.IEnumerable<System.Int32> PageIndices

    Pages to extract images from

    Returns
    Type Description
    OcrResult

    OCR results

    Remarks

    Useful for generating a searchable PDF which retains bookmarks, annotations, etc.

    ReadImagesFromPdf(String, String, IEnumerable<Int32>)

    Extract all images from a PDF, perform OCR on the images, and return the results

    Declaration
    public OcrResult ReadImagesFromPdf(string PdfPath, string Password = null, IEnumerable<int> PageIndices = null)
    Parameters
    Type Name Description
    System.String PdfPath

    PDF file path

    System.String Password

    PDF password

    System.Collections.Generic.IEnumerable<System.Int32> PageIndices

    Pages to extract images from

    Returns
    Type Description
    OcrResult

    OCR results

    Remarks

    Useful for generating a searchable PDF which retains bookmarks, annotations, etc.

    ReadLicensePlate(OcrInputBase)

    An optimized read that extracts a License Plate from photos.
    Ensure you have use the OcrInput filters to improve inputs, see this article for more information.

    For reading of Scanned Documents use ReadDocument(OcrInputBase).
    For reading of Images use or .
    For reading of Passports use ReadPassport(OcrInputBase).
    For reading of Scanned Documents contain tables with clarity outlines use .

    Declaration
    public OcrLicensePlateResult ReadLicensePlate(OcrInputBase input)
    Parameters
    Type Name Description
    OcrInputBase input

    OCR input

    Returns
    Type Description
    OcrLicensePlateResult

    OcrLicensePlateResult

    Remarks

    **Current supported languages are English, Chinese, Japanese, Korean, and Latin.
    **This is an extension method to the base IronOCR package and requires that you also install the IronOcr.Extensions.AdvancedScan package.

    ReadLicensePlateAsync(OcrInputBase, Int32)

    An async operation for extraction of License Plate from photos with optional timeout. Reads text from an IronSoftware.Drawing.AnyBitmap object and returns an OcrResult object.

    Declaration
    public Task<OcrLicensePlateResult> ReadLicensePlateAsync(OcrInputBase input, int timeoutMs = -1)
    Parameters
    Type Name Description
    OcrInputBase input

    OCR Input

    System.Int32 timeoutMs

    Optional timeout in milliseconds

    Returns
    Type Description
    System.Threading.Tasks.Task<OcrLicensePlateResult>

    A task that represents the asynchronous read operation. The value of the System.Threading.Tasks.Task`1.Result property contains an OcrLicensePlateResult object containing text, tables, confidence

    ReadPassport(OcrInputBase)

    An optimized read that extracts Passport information from Passport photos by scanning the MRZ contents.
    Ensure you have use the OcrInput filters to improve inputs, see this article for more information.

    For reading of Scanned Documents use ReadDocument(OcrInputBase).
    For reading of Images use or .
    For reading of License Plates use ReadLicensePlate(OcrInputBase).
    For reading of Scanned Documents contain tables with clarity outlines use .

    Declaration
    public OcrPassportResult ReadPassport(OcrInputBase input)
    Parameters
    Type Name Description
    OcrInputBase input

    OCR input

    Returns
    Type Description
    OcrPassportResult

    OcrPassportResult

    Remarks

    **This method only supports English language.
    **This is an extension method to the base IronOCR package and requires that you also install the IronOcr.Extensions.AdvancedScan package.
    **IronOcr.Extensions.AdvancedScan.MacOs, the inputs cannot be autorotated to find the MRZ contents.
    **Therefore, users need to make sure that the MRZ contents are always at the bottom of the input before processing the OCR.>

    ReadPassportAsync(OcrInputBase, Int32)

    An async operation for extraction of Passport information from photos with optional timeout. Reads text from an IronSoftware.Drawing.AnyBitmap object and returns an OcrResult object.

    Declaration
    public Task<OcrPassportResult> ReadPassportAsync(OcrInputBase input, int timeoutMs = -1)
    Parameters
    Type Name Description
    OcrInputBase input

    OCR Input

    System.Int32 timeoutMs

    Optional timeout in milliseconds

    Returns
    Type Description
    System.Threading.Tasks.Task<OcrPassportResult>

    A task that represents the asynchronous read operation. The value of the System.Threading.Tasks.Task`1.Result property contains an OcrPassportResult object containing text, tables, confidence

    ReadPhoto(OcrInputBase, ModelType)

    An optimized read that performs for images that contain hard-to-read text.
    Ensure you have use the OcrInput filters to improve inputs, see this article for more information.

    For reading of Scanned Documents use ReadDocument(OcrInputBase).
    For reading of Passports use ReadPassport(OcrInputBase).
    For reading of License Plates use ReadLicensePlate(OcrInputBase).
    For reading of Scanned Documents contain tables with clarity outlines use .

    Declaration
    public OcrPhotoResult ReadPhoto(OcrInputBase input, ModelType modelType)
    Parameters
    Type Name Description
    OcrInputBase input

    OCR input

    ModelType modelType

    The type of ML model to use. Default is Normal for faster processing. Use Enhanced for higher accuracy on challenging images.

    Returns
    Type Description
    OcrPhotoResult

    OcrPhotoResult

    Remarks

    **Current supported languages are English, Chinese, Japanese, Korean, and Latin.
    **This is an extension method to the base IronOCR package and requires that you also install the IronOcr.Extensions.AdvancedScan package.

    ReadPhotoAsync(OcrInputBase, Int32, ModelType)

    An async operation for extraction of hard to read text from photos with optional timeout.
    Reads text from an IronSoftware.Drawing.AnyBitmap object and returns an OcrResult object.

    Declaration
    public Task<OcrPhotoResult> ReadPhotoAsync(OcrInputBase input, int timeoutMs = -1, ModelType modelType)
    Parameters
    Type Name Description
    OcrInputBase input

    OCR Input

    System.Int32 timeoutMs

    Optional timeout in milliseconds

    ModelType modelType

    The type of ML model to use. Default is Normal for faster processing. Use Enhanced for higher accuracy on challenging images.

    Returns
    Type Description
    System.Threading.Tasks.Task<OcrPhotoResult>

    A task that represents the asynchronous read operation. The value of the System.Threading.Tasks.Task`1.Result property contains an OcrPhotoResult object containing text, tables, confidence

    ReadScreenShot(OcrInputBase, ModelType)

    An optimized read that performs for screenshots that contain hard-to-read text.
    Ensure you have use the OcrInput filters to improve inputs, see this article for more information.

    For reading of Scanned Documents use ReadDocument(OcrInputBase).
    For reading of Passports use ReadPassport(OcrInputBase).
    For reading of License Plates use ReadLicensePlate(OcrInputBase).
    For reading of Scanned Documents contain tables with clarity outlines use ReadDocumentAdvanced(OcrInputBase, ModelType).

    Declaration
    public OcrPhotoResult ReadScreenShot(OcrInputBase input, ModelType modelType)
    Parameters
    Type Name Description
    OcrInputBase input

    OCR input

    ModelType modelType

    The type of ML model to use. Default is Normal for faster processing. Use Enhanced for higher accuracy on challenging images.

    Returns
    Type Description
    OcrPhotoResult

    OcrPhotoResult

    Remarks

    **Current supported languages are English, Chinese, Japanese, Korean, and Latin.
    **This is an extension method to the base IronOCR package and requires that you also install the IronOcr.Extensions.AdvancedScan package.

    UseCustomTesseractLanguageFile(String)

    IronTesseract will use a tesseract .traineddata language file as its only OCR language.

    https://github.com/tesseract-ocr/tessdata

    Declaration
    public void UseCustomTesseractLanguageFile(string TrainedDataPath)
    Parameters
    Type Name Description
    System.String TrainedDataPath

    File path to a .traineddata file. These can be downloaded from https://github.com/tesseract-ocr/tessdata or generated using Tesseract command line.

    Events

    OcrProgress

    An Event which can be used to track OCR progress and inform users of OCR performance and progress.

    Progress is reported via the OcrProgressEventsArgs class

    Declaration
    public event EventHandler<OcrProgressEventsArgs> OcrProgress
    Event Type
    Type Description
    System.EventHandler<OcrProgressEventsArgs>
    Examples
    myIronTesseract.OcrProgress += (object o, IronOcr.Events.OcrProgressEventsArgs e) =>
                       {
                          Console.WriteLine(e.ProgressPercent + "%   " + e.Duration.TotalSeconds+"s"  );
                       }
    See Also
    OcrProgressEventsArgs

    Implements

    IronSoftware.Abstractions.Ocr.IOcrEngine
    ☀
    ☾
    Downloads
    • Download with Nuget
    • Start for Free
    In This Article
    Back to top
    Install with Nuget
    IronOCR_for_dotnet_log2o
    Blue key in circleGet started for FREE
    No credit card required
    Test in a live environment

    Test in production without watermarks.
    Works wherever you need it to.

    Fully-functional product

    Get 30 days of fully functional product.
    Have it up and running in minutes.

    24/5 technical support

    Full access to our support engineering team during your product trial

    Grey key in circleGet started for FREE
    The trial form was submitted successfully.
    Calendar in circleBook Free Live Demo
    No contact, no card details, no commitments Book a 30-minute, personal demo.
    Here's what to expect:

    A live demo of our product and its key features

    Get project specific feature recommendations

    All your questions are answered to make sure you have all the information you need. (No commitment whatsoever.)

    Grey key in circleBook Free Live Demo
    Your booking has been completed Check your e-mail for confirmation
    Support Team Member 6 related to The C# PDF Library Support Team Member 14 related to The C# PDF Library Support Team Member 4 related to The C# PDF Library Support Team Member 2 related to The C# PDF Library
    Online 24/5
    Need help? Our sales team would be glad to help you.
    Try the Enterprise Trial
    ironpdf_for_dotnet_log2o
    Key in blue circle
    Get your free 30-day Trial Key instantly.
    bullet_checkedNo credit card or account creation required
    Key in blue circle
    Get your free 30-day Trial Key instantly.
    Blue key in circleNo credit card or account creation required
    Green Check in orange circle
    The trial form was submitted successfully.
    badge_greencheck_in_yellowcircle
    Thank you for starting a trial

    Please check your email for the trial license key.

    If you don’t receive an email, please start a live chat or email support@ironsoftware.com

    Install with NuGet
    View Licensing
    • Logo Aetna
    • Logo NASA
    • Logo GE
    • Logo Porsche
    • Logo USDA
    • Logo Qatar
    Join Millions of Engineers who’ve tried IronOCR