Where does IronTesseract live in the IronOCR API?

IronTesseract is a class in the IronOcr namespace, shipped in IronOcr.dll. It derives from Object and implements IOcrEngine. Construct it with new IronTesseract() and call Read with an OcrInput.

How do you read text from an image in C# with IronTesseract?

Create an IronTesseract, set Language, and call Read with an OcrInput wrapping the image, then read OcrResult.Text. Use ReadAsync for the non-blocking form, and pass a Rectangle to a Read overload to OCR only a region.

Class IronTesseract

Q: What specialized reads does IronTesseract provide?

Beyond the standard Read, IronTesseract offers machine-learning reads for specific content: ReadDocumentAdvanced, ReadHandwriting, ReadPassport, ReadLicensePlate, ReadPhoto, and ReadScreenShot, each returning a result type tuned to that scenario, with async counterparts.

IronTesseract is a comprehensive managed class for performing Tesseract OCR in .Net applications.

IronTesseract natively supports Tesseract 3, 4 and 5 engines, and will automatically install all required binaries and language packs (tessdata) files.

Inheritance

System.Object

IronTesseract

Implements

IronSoftware.Abstractions.Ocr.IOcrEngine

Namespace: IronOcr

Assembly: IronOcr.dll

Syntax

public class IronTesseract : Object

IronTesseract runs Tesseract OCR on images and PDFs in .NET, turning scanned pages, photos, and documents into text and searchable PDFs. It is the engine a developer reaches for behind a search like "C# Tesseract OCR": construct one, point it at an OcrInput, and call a read. It wraps Iron Software's tuned Tesseract 5 build, so the same object handles a clean scan, a noisy photo, or a multi-page PDF.

Create one with new IronTesseract(), or new IronTesseract(TesseractConfiguration) to start from a prepared configuration. Set Language to the document's natural language, add more with AddSecondaryLanguage for multilingual pages, and flip MultiThreaded to read pages and images on parallel threads. The Configuration field exposes a TesseractConfiguration for fine-grained engine control, and EnableTesseractConsoleMessages surfaces the engine's own diagnostics. Subscribe to the OcrProgress event to report progress on long reads.

The read surface groups into functional buckets. **Standard reads** are the Read overloads, which accept an OcrInputBase, an array of inputs, an AnyBitmap, or an image path (with an optional Rectangle to limit OCR to a region) and return an OcrResult; the IDocumentId overloads read an existing PDF. **Asynchronous reads** are the ReadAsync overloads, which return an awaitable result with an optional timeout for keeping the call off a request thread. **Specialized machine-learning reads** target specific content: ReadDocumentAdvanced, ReadHandwriting, ReadPassport, ReadLicensePlate, ReadPhoto, and ReadScreenShot, each returning a result type tuned to that scenario, with matching async forms. **Searchable-PDF conversion** is handled by ConvertToSearchablePdf and ConvertToSearchablePdfBytes, which OCR a PDF's images and overlay the recognized text. Custom language data loads through AddSecondaryLanguage and UseCustomTesseractLanguageFile, and ClearSecondaryLanguages resets the set.

using IronOcr;

var ocr = new IronTesseract();
ocr.Language = OcrLanguage.English;
OcrResult result = ocr.Read(new OcrInput("scan.png"));
Console.WriteLine(result.Text);

The Iron Tesseract how-to covers configuring and running a read, the read results how-to traverses the returned text and words, and the simple OCR example shows a minimal read.

Constructors

IronTesseract()

Public constructor. Creates a default instance of IronTesseract

Declaration

public IronTesseract()

IronTesseract(TesseractConfiguration)

Public constructor. Creates an instance of IronTesseract with a customized TesseractConfiguration.

This allows advanced developers to fine tune Tesseract behavior.

Declaration

public IronTesseract(TesseractConfiguration Configuration)

Parameters

Type	Name	Description
TesseractConfiguration	Configuration

Fields

Configuration

An instance of TesseractConfiguration which allows fine-grained control of the underlying Tesseract OCR Engine.

Options include: Language file detail level. Page Segmentation Mode and access to the entire API of tesseract settings variables.

Declaration

public TesseractConfiguration Configuration

Field Value

Type	Description
TesseractConfiguration

Properties

EnableTesseractConsoleMessages

Gets or sets a value indicating whether Tesseract developer messages and warnings will be sent to console output.

Declaration

public bool EnableTesseractConsoleMessages { get; set; }

Property Value

Type	Description
System.Boolean

Remarks

Setting this property to true enables console output for Tesseract messages and warnings. Conversely, setting it to false disables this output.

Language

The Natural Language of the documents Which IronTesseract will read.

Default is English. Additional languages can be installed easily using Nuget https://www.nuget.org/packages?q=IronOcr.Languages or downloaded from https://ironsoftware.com/csharp/ocr/languages/

We may use multiple languages packs simultaneously with the UseMultipleLanguages method.

We can use custom Tesseract .tessdata language packs with the UseCustomTesseractLanguageFile(String) method.

Declaration

public OcrLanguage Language { get; set; }

Property Value

Type	Description
OcrLanguage

MultiThreaded

Read multiple PDF pages and images simultaneously on different threads

Declaration

public bool MultiThreaded { get; set; }

Property Value

Type	Description
System.Boolean

Methods

AddSecondaryLanguage(OcrLanguage)

IronTesseract will use multiple tesseract language files simultaneously. MultilingualOCR

Any number of secondary languages may be added. Speed and performance may be affected.

Declaration

public void AddSecondaryLanguage(OcrLanguage SecondaryLanguage)

Parameters

Type	Name	Description
OcrLanguage	SecondaryLanguage	An additional OcrLanguage

AddSecondaryLanguage(String)

IronTesseract will use multiple tesseract language files simultaneously. MultilingualOCR uses a custom .traineddata tesseract 3,4 or 5 language file.

Any number of secondary languages may be added. Speed and performance may be affected.

Declaration

public void AddSecondaryLanguage(string CustomLanguagePath)

Parameters

Type	Name	Description
System.String	CustomLanguagePath	File path to a .traineddata tesseract language pack.

ClearSecondaryLanguages()

Removes all languages add by AddSecondaryLanguage(OcrLanguage) or AddSecondaryLanguage(String)

Declaration

public void ClearSecondaryLanguages()

ConvertToSearchablePdf(Byte[], String, String)

Perform OCR on images within the PDF, overlay the text onto the original PDF, and save the new PDF to file

Declaration

public void ConvertToSearchablePdf(byte[] PdfData, string SavePath, string Password = null)

Parameters

Type	Name	Description
System.Byte[]	PdfData	PDF file data
System.String	SavePath	Save path of the searchable PDF
System.String	Password	PDF password

Remarks

Useful for generating a searchable PDF which retains bookmarks, annotations, etc.

ConvertToSearchablePdf(String, String, String)

Perform OCR on images within the PDF, overlay the text onto the original PDF, and save the new PDF to file

Declaration

public void ConvertToSearchablePdf(string PdfPath, string SavePath, string Password = null)

Parameters

Type	Name	Description
System.String	PdfPath	PDF file path
System.String	SavePath	Save path of the searchable PDF
System.String	Password	PDF password

Remarks

Useful for generating a searchable PDF which retains bookmarks, annotations, etc.

ConvertToSearchablePdfBytes(Byte[], String)

Perform OCR on images within the PDF, overlay the text onto the original PDF, and return a byte array of the new PDF

Declaration

public byte[] ConvertToSearchablePdfBytes(byte[] PdfData, string Password = null)

Parameters

Type	Name	Description
System.Byte[]	PdfData	PDF file data
System.String	Password	PDF password

Returns

Type	Description
System.Byte[]	Byte array of the generated Searchable PDF

Remarks

Useful for generating a searchable PDF which retains bookmarks, annotations, etc.

ConvertToSearchablePdfBytes(String, String)

Perform OCR on images within the PDF, overlay the text onto the original PDF, and return a byte array of the new PDF

Declaration

public byte[] ConvertToSearchablePdfBytes(string PdfPath, string Password = null)

Parameters

Type	Name	Description
System.String	PdfPath	PDF file path
System.String	Password	PDF password

Returns

Type	Description
System.Byte[]	Byte array of the generated Searchable PDF

Remarks

Useful for generating a searchable PDF which retains bookmarks, annotations, etc.

Read(OcrInputBase)

Reads text from an OcrInput object and returns an OcrResult object.

OcrInput is the preferred input type because it allows for OCR of multi-paged documents, and allows images to be enhanced before OCR to obtain faster, more accurate results.

There are also other overloads of this method that allow for Images and PDFs to be read directly as File paths and Bitmaps.

Declaration

public OcrResult Read(OcrInputBase Input)

Parameters

Type	Name	Description
OcrInputBase	Input	An OcrInput document which can contain one or more images and PDFs

Returns

Type	Description
OcrResult	A OcrResult object containing text, and detailed, structured information about the extracted text content.

Read(OcrInputBase[])

Reads text from an array of OcrInput objects and returns an array of OcrResult objects.

OcrInput is the preferred input type because it allows for OCR of multi-paged documents, and allows images to be enhanced before OCR to obtain faster, more accurate results.

There are also other overloads of this method that allow for Images and PDFs to be read directly as File paths and Bitmaps.

Declaration

public OcrResult[] Read(OcrInputBase[] Inputs)

Parameters

Type	Name	Description
OcrInputBase[]	Inputs	An array of OcrInput documents which can contain one or more images and PDFs each

Returns

Type	Description
OcrResult[]	An array of OcrResult objects containing text, and detailed, structured information about the extracted text content.

Read(IDocumentId)

Read the existing Pdf document and return OCR results

Declaration

public IOcrResult Read(IDocumentId Document)

Parameters

Type	Name	Description
IronSoftware.Abstractions.Pdf.IDocumentId	Document	Pdf document to read

Returns

Type	Description
IronSoftware.Abstractions.Ocr.IOcrResult	OCR results

Read(IDocumentId, PdfContents)

Read the existing Pdf document and return OCR results

Declaration

public IOcrResult Read(IDocumentId Document, PdfContents Contents)

Parameters

Type	Name	Description
IronSoftware.Abstractions.Pdf.IDocumentId	Document	Pdf document to read
PdfContents	Contents	Contents to OCR

Returns

Type	Description
IronSoftware.Abstractions.Ocr.IOcrResult	OCR results

Read(AnyBitmap)

Reads text from a IronSoftware.Drawing.AnyBitmap Image file and returns an OcrResult object.

Declaration

public OcrResult Read(AnyBitmap Image)

Parameters

Type	Name	Description
IronSoftware.Drawing.AnyBitmap	Image	An IronSoftware.Drawing.AnyBitmap, SixLabor.ImageSharp.Image, SkiaSharp.SKBitmap, SkiaSharp.SKImage, Microsoft.Maui.Graphics.Platform.PlatformImage, System.Drawing.Bitmap, or System.Drawing.Image

Returns

Type	Description
OcrResult	A OcrResult object containing text, and detailed, structured information about the extracted text content.

Read(AnyBitmap, Rectangle)

Reads text from a region of a IronSoftware.Drawing.AnyBitmap Image file and returns an OcrResult object.

Declaration

public OcrResult Read(AnyBitmap Image, Rectangle ContentArea)

Parameters

Type	Name	Description
IronSoftware.Drawing.AnyBitmap	Image	An IronSoftware.Drawing.AnyBitmap, SixLabor.ImageSharp.Image, SkiaSharp.SKBitmap, SkiaSharp.SKImage, Microsoft.Maui.Graphics.Platform.PlatformImage, System.Drawing.Bitmap, or System.Drawing.Image
IronSoftware.Drawing.Rectangle	ContentArea	Specifies a region within the image to extract text from as a IronSoftware.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

Returns

Type	Description
OcrResult	A OcrResult object containing text, and detailed, structured information about the extracted text content.

Read(String)

Reads text from an Image file and returns an OcrResult object.

Declaration

public OcrResult Read(string ImagePath)

Parameters

Type	Name	Description
System.String	ImagePath	Path to an image file.

Returns

Type	Description
OcrResult	A OcrResult object containing text, and detailed, structured information about the extracted text content.

Read(String, Rectangle)

Reads text from a region of an Image file and returns an OcrResult object.

Declaration

public OcrResult Read(string ImagePath, Rectangle ContentArea)

Parameters

Type	Name	Description
System.String	ImagePath	Path to an image file.
IronSoftware.Drawing.Rectangle	ContentArea	Specifies a region within the image to extract text from as a IronSoftware.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

Returns

Type	Description
OcrResult	A OcrResult object containing text, and detailed, structured information about the extracted text content.

ReadAsync(OcrInputBase, Int32)

Reads text from an OcrInput object and returns an OcrResult object.

OcrInput is the preferred input type because it allows for OCR of multi-paged documents, and allows images to be enhanced before OCR to obtain faster, more accurate results.

There are also other overloads of this method that allow for Images and PDFs to be read directly as File paths and Bitmaps.

Declaration

public OcrReadTask ReadAsync(OcrInputBase Input, int TimeoutMs = -1)

Parameters

Type	Name	Description
OcrInputBase	Input	An OcrInput document which can contain one or more images and PDFs
System.Int32	TimeoutMs	Optional timeout in milliseconds, after which the Ocr read will be cancelled. Please note that the timeout only executes between iterations OCR operation, not during them. Once an iteration starts running, the iteration has to be completed before the timeout can take effect. Example case: If OCR processing takes 3 seconds, the application would be completing the OCR operation before a timeout of 1000ms can be executed. Remark: Not supported in .NET 4.0

Returns

Type	Description
OcrReadTask	A task that represents the asynchronous read operation. The value of the System.Threading.Tasks.Task`1.Result property contains an OcrResult object containing text, and detailed, structured information about the extracted text content.

ReadAsync(AnyBitmap, Rectangle, Int32)

Reads text from an IronSoftware.Drawing.AnyBitmap object and returns an OcrResult object.

Declaration

public Task<OcrResult> ReadAsync(AnyBitmap Image, Rectangle ContentArea = null, int TimeoutMs = -1)

Parameters

Type	Name	Description
IronSoftware.Drawing.AnyBitmap	Image	An IronSoftware.Drawing.AnyBitmap object
IronSoftware.Drawing.Rectangle	ContentArea	Specifies a region within the image to extract text from as a IronSoftware.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.
System.Int32	TimeoutMs	Optional timeout in milliseconds, after which the Ocr read will be cancelled. Please note that the timeout only executes between iterations OCR operation, not during them. Once an iteration starts running, the iteration has to be completed before the timeout can take effect. Example case: If OCR processing takes 3 seconds, the application would be completing the OCR operation before a timeout of 1000ms can be executed. Remark: Not supported in .NET 4.0

Returns

Type	Description
System.Threading.Tasks.Task<OcrResult>	A task that represents the asynchronous read operation. The value of the System.Threading.Tasks.Task`1.Result property contains an OcrResult object containing text, and detailed, structured information about the extracted text content.

ReadAsync(String, Rectangle, Int32)

Reads text from an Image file and returns an OcrResult object.

Declaration

public OcrReadTask ReadAsync(string ImagePath, Rectangle ContentArea = null, int TimeoutMs = -1)

Parameters

Type	Name	Description
System.String	ImagePath	Path to an image file.
IronSoftware.Drawing.Rectangle	ContentArea	Specifies a region within the image to extract text from as a IronSoftware.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.
System.Int32	TimeoutMs	Optional timeout in milliseconds, after which the Ocr read will be cancelled. Please note that the timeout only executes between iterations OCR operation, not during them. Once an iteration starts running, the iteration has to be completed before the timeout can take effect. Example case: If OCR processing takes 3 seconds, the application would be completing the OCR operation before a timeout of 1000ms can be executed. Remark: Not supported in .NET 4.0

Returns

Type	Description
OcrReadTask	A task that represents the asynchronous read operation. The value of the System.Threading.Tasks.Task`1.Result property contains an OcrResult object containing text, and detailed, structured information about the extracted text content.

ReadDocument(OcrInputBase)

A strong IronTesseract Document reading method that specializes in scanned documents or photos of paper documents which contain a lot of text.
Ensure you have use the OcrInput filters to improve inputs, see this article for more information.

For reading of Images use ReadPhoto(OcrInputBase, ModelType).
For reading of Passports use ReadPassport(OcrInputBase).
For reading of LicensePlates use ReadLicensePlate(OcrInputBase).
For reading of Scanned Documents contain tables with clarity outlines use .

Declaration

public OcrResult ReadDocument(OcrInputBase input)

Parameters

Type	Name	Description
OcrInputBase	input	OCR Input

Returns

Type	Description
OcrResult	OcrResult

ReadDocumentAdvanced(OcrInputBase, ModelType)

An optimized read utilizing machine learning models together with computer vision methods for documents that contains tables with clarity outlines. Ensure you have use the OcrInput filters to improve inputs, see: https://ironsoftware.com/csharp/ocr/tutorials/c-sharp-ocr-image-filters/. For reading of Scanned Documents use ReadDocument(OcrInputBase). For reading of Images use ReadPhoto(OcrInputBase, ModelType). For reading of Passports use ReadPassport(OcrInputBase). For reading of LicensePlates use ReadLicensePlate(OcrInputBase).

Declaration

public OcrDocAdvancedResult ReadDocumentAdvanced(OcrInputBase input, ModelType modelType)

Parameters

Type	Name	Description
OcrInputBase	input	OCR Input
ModelType	modelType	The type of ML model to use. Default is Normal for faster processing. Use Enhanced for higher accuracy on challenging images.

Returns

Type	Description
OcrDocAdvancedResult	OCR document advanced result

Remarks

**Current supported languages are English, Chinese, Japanese, Korean, and Latin.
**This is an extension method to the base IronOCR package and requires that you also install the IronOcr.Extensions.AdvancedScan package.

ReadDocumentAdvancedAsync(OcrInputBase, Int32, ModelType)

An async operation for optimized read utilizing machine learning models together with computer vision with optional timeout. Reads text from an IronSoftware.Drawing.AnyBitmap object and returns an OcrResult object.

Declaration

public Task<OcrDocAdvancedResult> ReadDocumentAdvancedAsync(OcrInputBase input, int timeoutMs = -1, ModelType modelType)

Parameters

Type	Name	Description
OcrInputBase	input	OCR Input
System.Int32	timeoutMs	Optional timeout in milliseconds
ModelType	modelType	The type of ML model to use. Default is Normal for faster processing. Use Enhanced for higher accuracy on challenging images.

Returns

Type	Description
System.Threading.Tasks.Task<OcrDocAdvancedResult>	A task that represents the asynchronous read operation. The value of the System.Threading.Tasks.Task`1.Result property contains an OcrDocAdvancedResult object containing text, tables, confidence

ReadHandwriting(OcrInputBase)

An optimized read that suitable for handwritten text recognition.

------------------------------------------------

Usage:

// Load OCR input
var input = new IronOcr.OcrInput();
input.Load("input.png");
// Instantiate OCR engine
var ocr = new IronOcr.IronTesseract();
// Read input with handwritten texts
var result = ocr.ReadHandwriting(input);

------------------------------------------------

Declaration

public OcrHandwritingResult ReadHandwriting(OcrInputBase input)

Parameters

Type	Name	Description
OcrInputBase	input	OCR input

Returns

Type	Description
OcrHandwritingResult	OcrHandwritingResult

Remarks

Important Considerations:

⚠️Language Availability: This method currently supports only English.

⚠️Writing Style: This method might provides low accuracy OCR result when trying to recognize cursive handwritten texts

Related Documentation:

📖How-To Guide:

📚API Reference:

Exceptions

Type	Condition
ExtensionAdvancedScanException

ReadHandwritingAsync(OcrInputBase, Int32)

An async operation for an optimized reading that suitable for handwritten text recognition with an optional timeout.

------------------------------------------------

Usage:

// Load OCR input
var input = new IronOcr.OcrInput();
input.Load("input.png");
// Instantiate OCR engine
var ocr = new IronOcr.IronTesseract();
// Optional timeout in milliseconds
int timeOut = 20000;
// Read input with handwritten texts
var result = await ocr.ReadHandwritingAsync(input, timeOut);

------------------------------------------------

Declaration

public Task<OcrHandwritingResult> ReadHandwritingAsync(OcrInputBase input, int timeoutMs = -1)

Parameters

Type	Name	Description
OcrInputBase	input	OCR input
System.Int32	timeoutMs	Optional timeout in milliseconds

Returns

Type	Description
System.Threading.Tasks.Task<OcrHandwritingResult>	A task that represents the asynchronous read operation. The value of the System.Threading.Tasks.Task`1.Result property contains an OcrHandwritingResult

Remarks

Important Considerations:

⚠️Language Availability: This method currently supports only English.

⚠️Writing Style: This method might provides low accuracy OCR result when trying to recognize cursive handwritten texts

Related Documentation:

📖How-To Guide:

📚API Reference:

ReadImagesFromPdf(Byte[], String, IEnumerable<Int32>)

Extract all images from a PDF, perform OCR on the images, and return the results

Declaration

public OcrResult ReadImagesFromPdf(byte[] PdfData, string Password = null, IEnumerable<int> PageIndices = null)

Parameters

Type	Name	Description
System.Byte[]	PdfData	PDF file data
System.String	Password	PDF password
System.Collections.Generic.IEnumerable<System.Int32>	PageIndices	Pages to extract images from

Returns

Type	Description
OcrResult	OCR results

Remarks

Useful for generating a searchable PDF which retains bookmarks, annotations, etc.

ReadImagesFromPdf(String, String, IEnumerable<Int32>)

Extract all images from a PDF, perform OCR on the images, and return the results

Declaration

public OcrResult ReadImagesFromPdf(string PdfPath, string Password = null, IEnumerable<int> PageIndices = null)

Parameters

Type	Name	Description
System.String	PdfPath	PDF file path
System.String	Password	PDF password
System.Collections.Generic.IEnumerable<System.Int32>	PageIndices	Pages to extract images from

Returns

Type	Description
OcrResult	OCR results

Remarks

Useful for generating a searchable PDF which retains bookmarks, annotations, etc.

ReadLicensePlate(OcrInputBase)

An optimized read that extracts a License Plate from photos.
Ensure you have use the OcrInput filters to improve inputs, see this article for more information.

For reading of Scanned Documents use ReadDocument(OcrInputBase).
For reading of Images use or .
For reading of Passports use ReadPassport(OcrInputBase).
For reading of Scanned Documents contain tables with clarity outlines use .

Declaration

public OcrLicensePlateResult ReadLicensePlate(OcrInputBase input)

Parameters

Type	Name	Description
OcrInputBase	input	OCR input

Returns

Type	Description
OcrLicensePlateResult	OcrLicensePlateResult

Remarks

**Current supported languages are English, Chinese, Japanese, Korean, and Latin.
**This is an extension method to the base IronOCR package and requires that you also install the IronOcr.Extensions.AdvancedScan package.

ReadLicensePlateAsync(OcrInputBase, Int32)

An async operation for extraction of License Plate from photos with optional timeout. Reads text from an IronSoftware.Drawing.AnyBitmap object and returns an OcrResult object.

Declaration

public Task<OcrLicensePlateResult> ReadLicensePlateAsync(OcrInputBase input, int timeoutMs = -1)

Parameters

Type	Name	Description
OcrInputBase	input	OCR Input
System.Int32	timeoutMs	Optional timeout in milliseconds

Returns

Type	Description
System.Threading.Tasks.Task<OcrLicensePlateResult>	A task that represents the asynchronous read operation. The value of the System.Threading.Tasks.Task`1.Result property contains an OcrLicensePlateResult object containing text, tables, confidence

ReadPassport(OcrInputBase)

An optimized read that extracts Passport information from Passport photos by scanning the MRZ contents.
Ensure you have use the OcrInput filters to improve inputs, see this article for more information.

For reading of Scanned Documents use ReadDocument(OcrInputBase).
For reading of Images use or .
For reading of License Plates use ReadLicensePlate(OcrInputBase).
For reading of Scanned Documents contain tables with clarity outlines use .

Declaration

public OcrPassportResult ReadPassport(OcrInputBase input)

Parameters

Type	Name	Description
OcrInputBase	input	OCR input

Returns

Type	Description
OcrPassportResult	OcrPassportResult

Remarks

**This method only supports English language.
**This is an extension method to the base IronOCR package and requires that you also install the IronOcr.Extensions.AdvancedScan package.
**IronOcr.Extensions.AdvancedScan.MacOs, the inputs cannot be autorotated to find the MRZ contents.
**Therefore, users need to make sure that the MRZ contents are always at the bottom of the input before processing the OCR.>

ReadPassportAsync(OcrInputBase, Int32)

An async operation for extraction of Passport information from photos with optional timeout. Reads text from an IronSoftware.Drawing.AnyBitmap object and returns an OcrResult object.

Declaration

public Task<OcrPassportResult> ReadPassportAsync(OcrInputBase input, int timeoutMs = -1)

Parameters

Type	Name	Description
OcrInputBase	input	OCR Input
System.Int32	timeoutMs	Optional timeout in milliseconds

Returns

Type	Description
System.Threading.Tasks.Task<OcrPassportResult>	A task that represents the asynchronous read operation. The value of the System.Threading.Tasks.Task`1.Result property contains an OcrPassportResult object containing text, tables, confidence

ReadPhoto(OcrInputBase, ModelType)

An optimized read that performs for images that contain hard-to-read text.
Ensure you have use the OcrInput filters to improve inputs, see this article for more information.

For reading of Scanned Documents use ReadDocument(OcrInputBase).
For reading of Passports use ReadPassport(OcrInputBase).
For reading of License Plates use ReadLicensePlate(OcrInputBase).
For reading of Scanned Documents contain tables with clarity outlines use .

Declaration

public OcrPhotoResult ReadPhoto(OcrInputBase input, ModelType modelType)

Parameters

Type	Name	Description
OcrInputBase	input	OCR input
ModelType	modelType	The type of ML model to use. Default is Normal for faster processing. Use Enhanced for higher accuracy on challenging images.

Returns

Type	Description
OcrPhotoResult	OcrPhotoResult

Remarks

ReadPhotoAsync(OcrInputBase, Int32, ModelType)

An async operation for extraction of hard to read text from photos with optional timeout.
Reads text from an IronSoftware.Drawing.AnyBitmap object and returns an OcrResult object.

Declaration

public Task<OcrPhotoResult> ReadPhotoAsync(OcrInputBase input, int timeoutMs = -1, ModelType modelType)

Parameters

Type	Name	Description
OcrInputBase	input	OCR Input
System.Int32	timeoutMs	Optional timeout in milliseconds
ModelType	modelType	The type of ML model to use. Default is Normal for faster processing. Use Enhanced for higher accuracy on challenging images.

Returns

Type	Description
System.Threading.Tasks.Task<OcrPhotoResult>	A task that represents the asynchronous read operation. The value of the System.Threading.Tasks.Task`1.Result property contains an OcrPhotoResult object containing text, tables, confidence

ReadScreenShot(OcrInputBase, ModelType)

An optimized read that performs for screenshots that contain hard-to-read text.
Ensure you have use the OcrInput filters to improve inputs, see this article for more information.

For reading of Scanned Documents use ReadDocument(OcrInputBase).
For reading of Passports use ReadPassport(OcrInputBase).
For reading of License Plates use ReadLicensePlate(OcrInputBase).
For reading of Scanned Documents contain tables with clarity outlines use ReadDocumentAdvanced(OcrInputBase, ModelType).

Declaration

public OcrPhotoResult ReadScreenShot(OcrInputBase input, ModelType modelType)

Parameters

Type	Name	Description
OcrInputBase	input	OCR input
ModelType	modelType	The type of ML model to use. Default is Normal for faster processing. Use Enhanced for higher accuracy on challenging images.

Returns

Type	Description
OcrPhotoResult	OcrPhotoResult

Remarks

UseCustomTesseractLanguageFile(String)

IronTesseract will use a tesseract .traineddata language file as its only OCR language.

https://github.com/tesseract-ocr/tessdata

Declaration

public void UseCustomTesseractLanguageFile(string TrainedDataPath)

Parameters

Type	Name	Description
System.String	TrainedDataPath	File path to a .traineddata file. These can be downloaded from https://github.com/tesseract-ocr/tessdata or generated using Tesseract command line.

Events

OcrProgress

An Event which can be used to track OCR progress and inform users of OCR performance and progress.

Progress is reported via the OcrProgressEventsArgs class

Declaration

public event EventHandler<OcrProgressEventsArgs> OcrProgress

Event Type

Type	Description
System.EventHandler<OcrProgressEventsArgs>

Examples

myIronTesseract.OcrProgress += (object o, IronOcr.Events.OcrProgressEventsArgs e) =>
                   {
                      Console.WriteLine(e.ProgressPercent + "%   " + e.Duration.TotalSeconds+"s"  );
                   }

Implements

IronSoftware.Abstractions.Ocr.IOcrEngine

Class IronTesseract

Inheritance

Implements

Namespace: IronOcr

Assembly: IronOcr.dll

Syntax

Constructors

IronTesseract()

Declaration

IronTesseract(TesseractConfiguration)

Declaration

Parameters

Fields

Configuration

Declaration

Field Value

Properties

EnableTesseractConsoleMessages

Declaration

Property Value

Remarks

Language

Declaration

Property Value

See Also

MultiThreaded

Declaration

Property Value

Methods

AddSecondaryLanguage(OcrLanguage)

Declaration

Parameters

AddSecondaryLanguage(String)

Declaration

Parameters

ClearSecondaryLanguages()

Declaration

ConvertToSearchablePdf(Byte[], String, String)

Declaration

Parameters

Remarks

ConvertToSearchablePdf(String, String, String)

Declaration

Parameters

Remarks

ConvertToSearchablePdfBytes(Byte[], String)

Declaration

Parameters

Returns

Remarks

ConvertToSearchablePdfBytes(String, String)

Declaration

Parameters

Returns

Remarks

Read(OcrInputBase)

Declaration

Parameters

Returns

Read(OcrInputBase[])

Declaration

Parameters

Returns

Read(IDocumentId)

Declaration

Parameters

Returns

Read(IDocumentId, PdfContents)

Declaration

Parameters

Returns

Read(AnyBitmap)

Declaration

Parameters

Returns

Read(AnyBitmap, Rectangle)

Declaration

Parameters

Returns

Read(String)