Class IronTesseract
IronTesseract is a comprehensive managed class for performing Tesseract OCR in .Net applications.
IronTesseract natively supports Tesseract 3, 4 and 5 engines, and will automatically install all required binaries and language packs (tessdata) files.
Inheritance
Implements
Namespace: IronOcr
Assembly: IronOcr.dll
Syntax
public class IronTesseract : Object
IronTesseract runs Tesseract OCR on images and PDFs in .NET, turning scanned pages, photos, and documents into text and searchable PDFs. It is the engine a developer reaches for behind a search like "C# Tesseract OCR": construct one, point it at an OcrInput, and call a read. It wraps Iron Software's tuned Tesseract 5 build, so the same object handles a clean scan, a noisy photo, or a multi-page PDF.
Create one with new IronTesseract(), or new IronTesseract(TesseractConfiguration) to start from a prepared configuration. Set Language to the document's natural language, add more with AddSecondaryLanguage for multilingual pages, and flip MultiThreaded to read pages and images on parallel threads. The Configuration field exposes a TesseractConfiguration for fine-grained engine control, and EnableTesseractConsoleMessages surfaces the engine's own diagnostics. Subscribe to the OcrProgress event to report progress on long reads.
The read surface groups into functional buckets. **Standard reads** are the Read overloads, which accept an OcrInputBase, an array of inputs, an AnyBitmap, or an image path (with an optional Rectangle to limit OCR to a region) and return an OcrResult; the IDocumentId overloads read an existing PDF. **Asynchronous reads** are the ReadAsync overloads, which return an awaitable result with an optional timeout for keeping the call off a request thread. **Specialized machine-learning reads** target specific content: ReadDocumentAdvanced, ReadHandwriting, ReadPassport, ReadLicensePlate, ReadPhoto, and ReadScreenShot, each returning a result type tuned to that scenario, with matching async forms. **Searchable-PDF conversion** is handled by ConvertToSearchablePdf and ConvertToSearchablePdfBytes, which OCR a PDF's images and overlay the recognized text. Custom language data loads through AddSecondaryLanguage and UseCustomTesseractLanguageFile, and ClearSecondaryLanguages resets the set.
using IronOcr;
var ocr = new IronTesseract();
ocr.Language = OcrLanguage.English;
OcrResult result = ocr.Read(new OcrInput("scan.png"));
Console.WriteLine(result.Text);The Iron Tesseract how-to covers configuring and running a read, the read results how-to traverses the returned text and words, and the simple OCR example shows a minimal read.
Constructors
IronTesseract()
Public constructor. Creates a default instance of IronTesseract
Declaration
public IronTesseract()
IronTesseract(TesseractConfiguration)
Public constructor. Creates an instance of IronTesseract with a customized TesseractConfiguration.
This allows advanced developers to fine tune Tesseract behavior.
Declaration
public IronTesseract(TesseractConfiguration Configuration)
Parameters
| Type | Name | Description |
|---|---|---|
| TesseractConfiguration | Configuration |
Fields
Configuration
An instance of TesseractConfiguration which allows fine-grained control of the underlying Tesseract OCR Engine.
Options include: Language file detail level. Page Segmentation Mode and access to the entire API of tesseract settings variables.
Declaration
public TesseractConfiguration Configuration
Field Value
| Type | Description |
|---|---|
| TesseractConfiguration |
Properties
EnableTesseractConsoleMessages
Gets or sets a value indicating whether Tesseract developer messages and warnings will be sent to console output.
Declaration
public bool EnableTesseractConsoleMessages { get; set; }
Property Value
| Type | Description |
|---|---|
| System.Boolean |
Remarks
Setting this property to true enables console output for Tesseract messages and warnings. Conversely, setting it to false disables this output.
Language
The Natural Language of the documents Which IronTesseract will read.
Default is English. Additional languages can be installed easily using Nuget https://www.nuget.org/packages?q=IronOcr.Languages or downloaded from https://ironsoftware.com/csharp/ocr/languages/
We may use multiple languages packs simultaneously with the UseMultipleLanguages method.
We can use custom Tesseract .tessdata language packs with the UseCustomTesseractLanguageFile(String) method.
Declaration
public OcrLanguage Language { get; set; }
Property Value
| Type | Description |
|---|---|
| OcrLanguage |
See Also
MultiThreaded
Read multiple PDF pages and images simultaneously on different threads
Declaration
public bool MultiThreaded { get; set; }
Property Value
| Type | Description |
|---|---|
| System.Boolean |
Methods
AddSecondaryLanguage(OcrLanguage)
IronTesseract will use multiple tesseract language files simultaneously. MultilingualOCR
Any number of secondary languages may be added. Speed and performance may be affected.
Declaration
public void AddSecondaryLanguage(OcrLanguage SecondaryLanguage)
Parameters
| Type | Name | Description |
|---|---|---|
| OcrLanguage | SecondaryLanguage | An additional OcrLanguage |
AddSecondaryLanguage(String)
IronTesseract will use multiple tesseract language files simultaneously. MultilingualOCR uses a custom .traineddata tesseract 3,4 or 5 language file.
Any number of secondary languages may be added. Speed and performance may be affected.
Declaration
public void AddSecondaryLanguage(string CustomLanguagePath)
Parameters
| Type | Name | Description |
|---|---|---|
| System.String | CustomLanguagePath | File path to a .traineddata tesseract language pack. |
ClearSecondaryLanguages()
Removes all languages add by AddSecondaryLanguage(OcrLanguage) or AddSecondaryLanguage(String)
Declaration
public void ClearSecondaryLanguages()
ConvertToSearchablePdf(Byte[], String, String)
Perform OCR on images within the PDF, overlay the text onto the original PDF, and save the new PDF to file
Declaration
public void ConvertToSearchablePdf(byte[] PdfData, string SavePath, string Password = null)
Parameters
| Type | Name | Description |
|---|---|---|
| System.Byte[] | PdfData | PDF file data |
| System.String | SavePath | Save path of the searchable PDF |
| System.String | Password | PDF password |
Remarks
Useful for generating a searchable PDF which retains bookmarks, annotations, etc.
ConvertToSearchablePdf(String, String, String)
Perform OCR on images within the PDF, overlay the text onto the original PDF, and save the new PDF to file
Declaration
public void ConvertToSearchablePdf(string PdfPath, string SavePath, string Password = null)
Parameters
| Type | Name | Description |
|---|---|---|
| System.String | PdfPath | PDF file path |
| System.String | SavePath | Save path of the searchable PDF |
| System.String | Password | PDF password |
Remarks
Useful for generating a searchable PDF which retains bookmarks, annotations, etc.
ConvertToSearchablePdfBytes(Byte[], String)
Perform OCR on images within the PDF, overlay the text onto the original PDF, and return a byte array of the new PDF
Declaration
public byte[] ConvertToSearchablePdfBytes(byte[] PdfData, string Password = null)
Parameters
| Type | Name | Description |
|---|---|---|
| System.Byte[] | PdfData | PDF file data |
| System.String | Password | PDF password |
Returns
| Type | Description |
|---|---|
| System.Byte[] | Byte array of the generated Searchable PDF |
Remarks
Useful for generating a searchable PDF which retains bookmarks, annotations, etc.
ConvertToSearchablePdfBytes(String, String)
Perform OCR on images within the PDF, overlay the text onto the original PDF, and return a byte array of the new PDF
Declaration
public byte[] ConvertToSearchablePdfBytes(string PdfPath, string Password = null)
Parameters
| Type | Name | Description |
|---|---|---|
| System.String | PdfPath | PDF file path |
| System.String | Password | PDF password |
Returns
| Type | Description |
|---|---|
| System.Byte[] | Byte array of the generated Searchable PDF |
Remarks
Useful for generating a searchable PDF which retains bookmarks, annotations, etc.
Read(OcrInputBase)
Reads text from an OcrInput object and returns an OcrResult object.
OcrInput is the preferred input type because it allows for OCR of multi-paged documents, and allows images to be enhanced before OCR to obtain faster, more accurate results.
There are also other overloads of this method that allow for Images and PDFs to be read directly as File paths and Bitmaps.
Declaration
public OcrResult Read(OcrInputBase Input)
Parameters
| Type | Name | Description |
|---|---|---|
| OcrInputBase | Input | An OcrInput document which can contain one or more images and PDFs |
Returns
| Type | Description |
|---|---|
| OcrResult | A OcrResult object containing text, and detailed, structured information about the extracted text content. |
Read(OcrInputBase[])
Reads text from an array of OcrInput objects and returns an array of OcrResult objects.
OcrInput is the preferred input type because it allows for OCR of multi-paged documents, and allows images to be enhanced before OCR to obtain faster, more accurate results.
There are also other overloads of this method that allow for Images and PDFs to be read directly as File paths and Bitmaps.
Declaration
public OcrResult[] Read(OcrInputBase[] Inputs)
Parameters
| Type | Name | Description |
|---|---|---|
| OcrInputBase[] | Inputs | An array of OcrInput documents which can contain one or more images and PDFs each |
Returns
| Type | Description |
|---|---|
| OcrResult[] | An array of OcrResult objects containing text, and detailed, structured information about the extracted text content. |
Read(IDocumentId)
Read the existing Pdf document and return OCR results
Declaration
public IOcrResult Read(IDocumentId Document)
Parameters
| Type | Name | Description |
|---|---|---|
| IronSoftware.Abstractions.Pdf.IDocumentId | Document | Pdf document to read |
Returns
| Type | Description |
|---|---|
| IronSoftware.Abstractions.Ocr.IOcrResult | OCR results |
Read(IDocumentId, PdfContents)
Read the existing Pdf document and return OCR results
Declaration
public IOcrResult Read(IDocumentId Document, PdfContents Contents)
Parameters
| Type | Name | Description |
|---|---|---|
| IronSoftware.Abstractions.Pdf.IDocumentId | Document | Pdf document to read |
| PdfContents | Contents | Contents to OCR |
Returns
| Type | Description |
|---|---|
| IronSoftware.Abstractions.Ocr.IOcrResult | OCR results |
Read(AnyBitmap)
Reads text from a IronSoftware.Drawing.AnyBitmap Image file and returns an OcrResult object.
Declaration
public OcrResult Read(AnyBitmap Image)
Parameters
| Type | Name | Description |
|---|---|---|
| IronSoftware.Drawing.AnyBitmap | Image | An IronSoftware.Drawing.AnyBitmap, SixLabor.ImageSharp.Image, SkiaSharp.SKBitmap, SkiaSharp.SKImage, Microsoft.Maui.Graphics.Platform.PlatformImage, System.Drawing.Bitmap, or System.Drawing.Image |
Returns
| Type | Description |
|---|---|
| OcrResult | A OcrResult object containing text, and detailed, structured information about the extracted text content. |
Read(AnyBitmap, Rectangle)
Reads text from a region of a IronSoftware.Drawing.AnyBitmap Image file and returns an OcrResult object.
Declaration
public OcrResult Read(AnyBitmap Image, Rectangle ContentArea)
Parameters
| Type | Name | Description |
|---|---|---|
| IronSoftware.Drawing.AnyBitmap | Image | An IronSoftware.Drawing.AnyBitmap, SixLabor.ImageSharp.Image, SkiaSharp.SKBitmap, SkiaSharp.SKImage, Microsoft.Maui.Graphics.Platform.PlatformImage, System.Drawing.Bitmap, or System.Drawing.Image |
| IronSoftware.Drawing.Rectangle | ContentArea | Specifies a region within the image to extract text from as a IronSoftware.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
Returns
| Type | Description |
|---|---|
| OcrResult | A OcrResult object containing text, and detailed, structured information about the extracted text content. |
Read(String)
Reads text from an Image file and returns an OcrResult object.
Declaration
public OcrResult Read(string ImagePath)
Parameters
| Type | Name | Description |
|---|---|---|
| System.String | ImagePath | Path to an image file. |
Returns
| Type | Description |
|---|---|
| OcrResult | A OcrResult object containing text, and detailed, structured information about the extracted text content. |
Read(String, Rectangle)
Reads text from a region of an Image file and returns an OcrResult object.
Declaration
public OcrResult Read(string ImagePath, Rectangle ContentArea)
Parameters
| Type | Name | Description |
|---|---|---|
| System.String | ImagePath | Path to an image file. |
| IronSoftware.Drawing.Rectangle | ContentArea | Specifies a region within the image to extract text from as a IronSoftware.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
Returns
| Type | Description |
|---|---|
| OcrResult | A OcrResult object containing text, and detailed, structured information about the extracted text content. |
ReadAsync(OcrInputBase, Int32)
Reads text from an OcrInput object and returns an OcrResult object.
OcrInput is the preferred input type because it allows for OCR of multi-paged documents, and allows images to be enhanced before OCR to obtain faster, more accurate results.
There are also other overloads of this method that allow for Images and PDFs to be read directly as File paths and Bitmaps.
Declaration
public OcrReadTask ReadAsync(OcrInputBase Input, int TimeoutMs = -1)
Parameters
| Type | Name | Description |
|---|---|---|
| OcrInputBase | Input | An OcrInput document which can contain one or more images and PDFs |
| System.Int32 | TimeoutMs | Optional timeout in milliseconds, after which the Ocr read will be cancelled. |
Returns
| Type | Description |
|---|---|
| OcrReadTask | A task that represents the asynchronous read operation. The value of the System.Threading.Tasks.Task`1.Result property contains an OcrResult object containing text, and detailed, structured information about the extracted text content. |
ReadAsync(AnyBitmap, Rectangle, Int32)
Reads text from an IronSoftware.Drawing.AnyBitmap object and returns an OcrResult object.
Declaration
public Task<OcrResult> ReadAsync(AnyBitmap Image, Rectangle ContentArea = null, int TimeoutMs = -1)
Parameters
| Type | Name | Description |
|---|---|---|
| IronSoftware.Drawing.AnyBitmap | Image | An IronSoftware.Drawing.AnyBitmap object |
| IronSoftware.Drawing.Rectangle | ContentArea | Specifies a region within the image to extract text from as a IronSoftware.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
| System.Int32 | TimeoutMs | Optional timeout in milliseconds, after which the Ocr read will be cancelled. |
Returns
| Type | Description |
|---|---|
| System.Threading.Tasks.Task<OcrResult> | A task that represents the asynchronous read operation. The value of the System.Threading.Tasks.Task`1.Result property contains an OcrResult object containing text, and detailed, structured information about the extracted text content. |
ReadAsync(String, Rectangle, Int32)
Reads text from an Image file and returns an OcrResult object.
Declaration
public OcrReadTask ReadAsync(string ImagePath, Rectangle ContentArea = null, int TimeoutMs = -1)
Parameters
| Type | Name | Description |
|---|---|---|
| System.String | ImagePath | Path to an image file. |
| IronSoftware.Drawing.Rectangle | ContentArea | Specifies a region within the image to extract text from as a IronSoftware.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
| System.Int32 | TimeoutMs | Optional timeout in milliseconds, after which the Ocr read will be cancelled. |
Returns
| Type | Description |
|---|---|
| OcrReadTask | A task that represents the asynchronous read operation. The value of the System.Threading.Tasks.Task`1.Result property contains an OcrResult object containing text, and detailed, structured information about the extracted text content. |
ReadDocument(OcrInputBase)
A strong IronTesseract Document reading method that specializes in scanned documents or photos of paper documents which contain a lot of text.
Ensure you have use the OcrInput filters to improve inputs, see this article for more information.
For reading of Images use ReadPhoto(OcrInputBase, ModelType).
For reading of Passports use ReadPassport(OcrInputBase).
For reading of LicensePlates use ReadLicensePlate(OcrInputBase).
For reading of Scanned Documents contain tables with clarity outlines use
Declaration
public OcrResult ReadDocument(OcrInputBase input)
Parameters
| Type | Name | Description |
|---|---|---|
| OcrInputBase | input | OCR Input |
Returns
| Type | Description |
|---|---|
| OcrResult |
ReadDocumentAdvanced(OcrInputBase, ModelType)
An optimized read utilizing machine learning models together with computer vision methods for documents that contains tables with clarity outlines. Ensure you have use the OcrInput filters to improve inputs, see: https://ironsoftware.com/csharp/ocr/tutorials/c-sharp-ocr-image-filters/. For reading of Scanned Documents use ReadDocument(OcrInputBase). For reading of Images use ReadPhoto(OcrInputBase, ModelType). For reading of Passports use ReadPassport(OcrInputBase). For reading of LicensePlates use ReadLicensePlate(OcrInputBase).
Declaration
public OcrDocAdvancedResult ReadDocumentAdvanced(OcrInputBase input, ModelType modelType)
Parameters
| Type | Name | Description |
|---|---|---|
| OcrInputBase | input | OCR Input |
| ModelType | modelType | The type of ML model to use. Default is Normal for faster processing. Use Enhanced for higher accuracy on challenging images. |
Returns
| Type | Description |
|---|---|
| OcrDocAdvancedResult | OCR document advanced result |
Remarks
**Current supported languages are English, Chinese, Japanese, Korean, and Latin.
**This is an extension method to the base IronOCR package and requires that you also install the IronOcr.Extensions.AdvancedScan package.
ReadDocumentAdvancedAsync(OcrInputBase, Int32, ModelType)
An async operation for optimized read utilizing machine learning models together with computer vision with optional timeout. Reads text from an IronSoftware.Drawing.AnyBitmap object and returns an OcrResult object.
Declaration
public Task<OcrDocAdvancedResult> ReadDocumentAdvancedAsync(OcrInputBase input, int timeoutMs = -1, ModelType modelType)
Parameters
| Type | Name | Description |
|---|---|---|
| OcrInputBase | input | OCR Input |
| System.Int32 | timeoutMs | Optional timeout in milliseconds |
| ModelType | modelType | The type of ML model to use. Default is Normal for faster processing. Use Enhanced for higher accuracy on challenging images. |
Returns
| Type | Description |
|---|---|
| System.Threading.Tasks.Task<OcrDocAdvancedResult> | A task that represents the asynchronous read operation. The value of the System.Threading.Tasks.Task`1.Result property contains an OcrDocAdvancedResult object containing text, tables, confidence |
ReadHandwriting(OcrInputBase)
An optimized read that suitable for handwritten text recognition.
------------------------------------------------
Usage:
// Load OCR input
var input = new IronOcr.OcrInput();
input.Load("input.png");
// Instantiate OCR engine
var ocr = new IronOcr.IronTesseract();
// Read input with handwritten texts
var result = ocr.ReadHandwriting(input);
------------------------------------------------
Declaration
public OcrHandwritingResult ReadHandwriting(OcrInputBase input)
Parameters
| Type | Name | Description |
|---|---|---|
| OcrInputBase | input | OCR input |
Returns
| Type | Description |
|---|---|
| OcrHandwritingResult |
Remarks
Important Considerations:
⚠️Language Availability: This method currently supports only English.
⚠️Writing Style: This method might provides low accuracy OCR result when trying to recognize cursive handwritten texts
Related Documentation:📖How-To Guide:
📚API Reference:
Exceptions
| Type | Condition |
|---|---|
| ExtensionAdvancedScanException |
ReadHandwritingAsync(OcrInputBase, Int32)
An async operation for an optimized reading that suitable for handwritten text recognition with an optional timeout.
------------------------------------------------
Usage:
// Load OCR input
var input = new IronOcr.OcrInput();
input.Load("input.png");
// Instantiate OCR engine
var ocr = new IronOcr.IronTesseract();
// Optional timeout in milliseconds
int timeOut = 20000;
// Read input with handwritten texts
var result = await ocr.ReadHandwritingAsync(input, timeOut);
------------------------------------------------
Declaration
public Task<OcrHandwritingResult> ReadHandwritingAsync(OcrInputBase input, int timeoutMs = -1)
Parameters
| Type | Name | Description |
|---|---|---|
| OcrInputBase | input | OCR input |
| System.Int32 | timeoutMs | Optional timeout in milliseconds |
Returns
| Type | Description |
|---|---|
| System.Threading.Tasks.Task<OcrHandwritingResult> | A task that represents the asynchronous read operation. The value of the System.Threading.Tasks.Task`1.Result property contains an OcrHandwritingResult |
Remarks
Important Considerations:
⚠️Language Availability: This method currently supports only English.
⚠️Writing Style: This method might provides low accuracy OCR result when trying to recognize cursive handwritten texts
Related Documentation:📖How-To Guide:
📚API Reference:
ReadImagesFromPdf(Byte[], String, IEnumerable<Int32>)
Extract all images from a PDF, perform OCR on the images, and return the results
Declaration
public OcrResult ReadImagesFromPdf(byte[] PdfData, string Password = null, IEnumerable<int> PageIndices = null)
Parameters
| Type | Name | Description |
|---|---|---|
| System.Byte[] | PdfData | PDF file data |
| System.String | Password | PDF password |
| System.Collections.Generic.IEnumerable<System.Int32> | PageIndices | Pages to extract images from |
Returns
| Type | Description |
|---|---|
| OcrResult | OCR results |
Remarks
Useful for generating a searchable PDF which retains bookmarks, annotations, etc.
ReadImagesFromPdf(String, String, IEnumerable<Int32>)
Extract all images from a PDF, perform OCR on the images, and return the results
Declaration
public OcrResult ReadImagesFromPdf(string PdfPath, string Password = null, IEnumerable<int> PageIndices = null)
Parameters
| Type | Name | Description |
|---|---|---|
| System.String | PdfPath | PDF file path |
| System.String | Password | PDF password |
| System.Collections.Generic.IEnumerable<System.Int32> | PageIndices | Pages to extract images from |
Returns
| Type | Description |
|---|---|
| OcrResult | OCR results |
Remarks
Useful for generating a searchable PDF which retains bookmarks, annotations, etc.
ReadLicensePlate(OcrInputBase)
An optimized read that extracts a License Plate from photos.
Ensure you have use the OcrInput filters to improve inputs, see this article for more information.
For reading of Scanned Documents use ReadDocument(OcrInputBase).
For reading of Images use
For reading of Passports use ReadPassport(OcrInputBase).
For reading of Scanned Documents contain tables with clarity outlines use
Declaration
public OcrLicensePlateResult ReadLicensePlate(OcrInputBase input)
Parameters
| Type | Name | Description |
|---|---|---|
| OcrInputBase | input | OCR input |
Returns
| Type | Description |
|---|---|
| OcrLicensePlateResult |
Remarks
**Current supported languages are English, Chinese, Japanese, Korean, and Latin.
**This is an extension method to the base IronOCR package and requires that you also install the IronOcr.Extensions.AdvancedScan package.
ReadLicensePlateAsync(OcrInputBase, Int32)
An async operation for extraction of License Plate from photos with optional timeout. Reads text from an IronSoftware.Drawing.AnyBitmap object and returns an OcrResult object.
Declaration
public Task<OcrLicensePlateResult> ReadLicensePlateAsync(OcrInputBase input, int timeoutMs = -1)
Parameters
| Type | Name | Description |
|---|---|---|
| OcrInputBase | input | OCR Input |
| System.Int32 | timeoutMs | Optional timeout in milliseconds |
Returns
| Type | Description |
|---|---|
| System.Threading.Tasks.Task<OcrLicensePlateResult> | A task that represents the asynchronous read operation. The value of the System.Threading.Tasks.Task`1.Result property contains an OcrLicensePlateResult object containing text, tables, confidence |
ReadPassport(OcrInputBase)
An optimized read that extracts Passport information from Passport photos by scanning the MRZ contents.
Ensure you have use the OcrInput filters to improve inputs, see this article for more information.
For reading of Scanned Documents use ReadDocument(OcrInputBase).
For reading of Images use
For reading of License Plates use ReadLicensePlate(OcrInputBase).
For reading of Scanned Documents contain tables with clarity outlines use
Declaration
public OcrPassportResult ReadPassport(OcrInputBase input)
Parameters
| Type | Name | Description |
|---|---|---|
| OcrInputBase | input | OCR input |
Returns
| Type | Description |
|---|---|
| OcrPassportResult |
Remarks
**This method only supports English language.
**This is an extension method to the base IronOCR package and requires that you also install the IronOcr.Extensions.AdvancedScan package.
**IronOcr.Extensions.AdvancedScan.MacOs, the inputs cannot be autorotated to find the MRZ contents.
**Therefore, users need to make sure that the MRZ contents are always at the bottom of the input before processing the OCR.>
ReadPassportAsync(OcrInputBase, Int32)
An async operation for extraction of Passport information from photos with optional timeout. Reads text from an IronSoftware.Drawing.AnyBitmap object and returns an OcrResult object.
Declaration
public Task<OcrPassportResult> ReadPassportAsync(OcrInputBase input, int timeoutMs = -1)
Parameters
| Type | Name | Description |
|---|---|---|
| OcrInputBase | input | OCR Input |
| System.Int32 | timeoutMs | Optional timeout in milliseconds |
Returns
| Type | Description |
|---|---|
| System.Threading.Tasks.Task<OcrPassportResult> | A task that represents the asynchronous read operation. The value of the System.Threading.Tasks.Task`1.Result property contains an OcrPassportResult object containing text, tables, confidence |
ReadPhoto(OcrInputBase, ModelType)
An optimized read that performs for images that contain hard-to-read text.
Ensure you have use the OcrInput filters to improve inputs, see this article for more information.
For reading of Scanned Documents use ReadDocument(OcrInputBase).
For reading of Passports use ReadPassport(OcrInputBase).
For reading of License Plates use ReadLicensePlate(OcrInputBase).
For reading of Scanned Documents contain tables with clarity outlines use
Declaration
public OcrPhotoResult ReadPhoto(OcrInputBase input, ModelType modelType)
Parameters
| Type | Name | Description |
|---|---|---|
| OcrInputBase | input | OCR input |
| ModelType | modelType | The type of ML model to use. Default is Normal for faster processing. Use Enhanced for higher accuracy on challenging images. |
Returns
| Type | Description |
|---|---|
| OcrPhotoResult |
Remarks
**Current supported languages are English, Chinese, Japanese, Korean, and Latin.
**This is an extension method to the base IronOCR package and requires that you also install the IronOcr.Extensions.AdvancedScan package.
ReadPhotoAsync(OcrInputBase, Int32, ModelType)
An async operation for extraction of hard to read text from photos with optional timeout.
Reads text from an IronSoftware.Drawing.AnyBitmap object and returns an OcrResult object.
Declaration
public Task<OcrPhotoResult> ReadPhotoAsync(OcrInputBase input, int timeoutMs = -1, ModelType modelType)
Parameters
| Type | Name | Description |
|---|---|---|
| OcrInputBase | input | OCR Input |
| System.Int32 | timeoutMs | Optional timeout in milliseconds |
| ModelType | modelType | The type of ML model to use. Default is Normal for faster processing. Use Enhanced for higher accuracy on challenging images. |
Returns
| Type | Description |
|---|---|
| System.Threading.Tasks.Task<OcrPhotoResult> | A task that represents the asynchronous read operation. The value of the System.Threading.Tasks.Task`1.Result property contains an OcrPhotoResult object containing text, tables, confidence |
ReadScreenShot(OcrInputBase, ModelType)
An optimized read that performs for screenshots that contain hard-to-read text.
Ensure you have use the OcrInput filters to improve inputs, see this article for more information.
For reading of Scanned Documents use ReadDocument(OcrInputBase).
For reading of Passports use ReadPassport(OcrInputBase).
For reading of License Plates use ReadLicensePlate(OcrInputBase).
For reading of Scanned Documents contain tables with clarity outlines use ReadDocumentAdvanced(OcrInputBase, ModelType).
Declaration
public OcrPhotoResult ReadScreenShot(OcrInputBase input, ModelType modelType)
Parameters
| Type | Name | Description |
|---|---|---|
| OcrInputBase | input | OCR input |
| ModelType | modelType | The type of ML model to use. Default is Normal for faster processing. Use Enhanced for higher accuracy on challenging images. |
Returns
| Type | Description |
|---|---|
| OcrPhotoResult |
Remarks
**Current supported languages are English, Chinese, Japanese, Korean, and Latin.
**This is an extension method to the base IronOCR package and requires that you also install the IronOcr.Extensions.AdvancedScan package.
UseCustomTesseractLanguageFile(String)
IronTesseract will use a tesseract .traineddata language file as its only OCR language.
https://github.com/tesseract-ocr/tessdata
Declaration
public void UseCustomTesseractLanguageFile(string TrainedDataPath)
Parameters
| Type | Name | Description |
|---|---|---|
| System.String | TrainedDataPath | File path to a .traineddata file. These can be downloaded from https://github.com/tesseract-ocr/tessdata or generated using Tesseract command line. |
Events
OcrProgress
An Event which can be used to track OCR progress and inform users of OCR performance and progress.
Progress is reported via the OcrProgressEventsArgs class
Declaration
public event EventHandler<OcrProgressEventsArgs> OcrProgress
Event Type
| Type | Description |
|---|---|
| System.EventHandler<OcrProgressEventsArgs> |
Examples
myIronTesseract.OcrProgress += (object o, IronOcr.Events.OcrProgressEventsArgs e) =>
{
Console.WriteLine(e.ProgressPercent + "% " + e.Duration.TotalSeconds+"s" );
}