Class OcrInput
Stores OCR input data and allows OCR of PDF documents or any image format.
Also provides various image filter methods which can improve OCR accuracy.
Implements
Inherited Members
Namespace: IronOcr
Assembly: IronOcr.dll
Syntax
public class OcrInput : OcrInputBase, IDisposable
Remarks
Also see OcrPdfInput and OcrImageInput
Constructors
OcrInput()
Creates an OcrInput Object which holds pages of rasterized input media (images, PDFs, TIFs, GIFs)
You may Load one or more media and apply image correction filters to loaded pages such as Deskew or Binarize.
Please use the `using` keyword:
using var input = new OcrInput();
input.LoadImage("input.png");
input.Deskew();
var result = new IronTesseract().Read(input);
Declaration
public OcrInput()
Methods
Add(IEnumerable<OcrInputPage>, Rectangle)
Please migrate to using: LoadPages(OcrInputPage[], Rectangle)
Declaration
public void Add(IEnumerable<OcrInputPage> imagesAsOcrInputPages, Rectangle ContentArea)
Parameters
| Type | Name | Description |
|---|---|---|
| System.Collections.Generic.IEnumerable<OcrInputPage> | imagesAsOcrInputPages | |
| IronSoftware.Drawing.Rectangle | ContentArea |
Load(Object, Rectangle)
Method that will attempt to read an input from an object. It is recommended to use the explicit Load methods instead for more customization and stability. See LoadPdf and LoadImage.
Declaration
public void Load(object inputObject, Rectangle contentArea = null)
Parameters
| Type | Name | Description |
|---|---|---|
| System.Object | inputObject | Object to be loaded |
| IronSoftware.Drawing.Rectangle | contentArea | Optional cropped area of the page to be added. Will be ignored if onlyPdfImages set to true. |
LoadImage(AnyBitmap, Rectangle)
Loads image into this OcrInput object.
Accepts: PNG. JPG, BMP, TIFF, GIF, WEBP, and other common Image formats.
Declaration
public void LoadImage(AnyBitmap imageBitmap, Rectangle contentArea = null)
Parameters
| Type | Name | Description |
|---|---|---|
| IronSoftware.Drawing.AnyBitmap | imageBitmap | Image as an AnyBitmap |
| IronSoftware.Drawing.Rectangle | contentArea | Optional cropped area of the page to be added. |
Remarks
If multiple pages exist in the image such as TIFF/GIF frames, all will be added as their own OcrInputPage
LoadImage(Byte[], Rectangle)
Loads image into this OcrInput object.
Accepts: PNG. JPG, BMP, TIFF, GIF, WEBP, and other common Image formats.
Declaration
public void LoadImage(byte[] imageBytes, Rectangle contentArea = null)
Parameters
| Type | Name | Description |
|---|---|---|
| System.Byte[] | imageBytes | Image as a byte array |
| IronSoftware.Drawing.Rectangle | contentArea | Optional cropped area of the page to be added. |
Remarks
If multiple pages exist in the image such as TIFF/GIF frames, all will be added as their own OcrInputPage
LoadImage(Stream, Rectangle)
Loads image into this OcrInput object.
Accepts: PNG. JPG, BMP, TIFF, GIF, WEBP, and other common Image formats.
Declaration
public void LoadImage(Stream imageStream, Rectangle contentArea = null)
Parameters
| Type | Name | Description |
|---|---|---|
| System.IO.Stream | imageStream | Image as a Stream |
| IronSoftware.Drawing.Rectangle | contentArea | Optional cropped area of the page to be added. |
Remarks
If multiple pages exist in the image such as TIFF/GIF frames, all will be added as their own OcrInputPage
LoadImage(String, Rectangle)
Loads image into this OcrInput object.
Accepts: PNG. JPG, BMP, TIFF, GIF, WEBP, and other common Image formats.
Declaration
public void LoadImage(string imageFilePath, Rectangle contentArea = null)
Parameters
| Type | Name | Description |
|---|---|---|
| System.String | imageFilePath | File Path of the Image |
| IronSoftware.Drawing.Rectangle | contentArea | Optional cropped area of the page to be added. |
Remarks
If multiple pages exist in the image such as TIFF/GIF frames, all will be added as their own OcrInputPage
LoadImageFrame(AnyBitmap, Int32, Rectangle)
Loads frames of a TIFF or GIF from an AnyBitmap to this OcrInput object.
Declaration
public void LoadImageFrame(AnyBitmap multiFrameBitmap, int frame, Rectangle contentArea = null)
Parameters
| Type | Name | Description |
|---|---|---|
| IronSoftware.Drawing.AnyBitmap | multiFrameBitmap | TIFF or GIF as a Bitmap to be loaded. |
| System.Int32 | frame | Specify which frame to load. |
| IronSoftware.Drawing.Rectangle | contentArea | Optional cropped area of the page to be added. |
LoadImageFrame(Byte[], Int32, Rectangle)
Loads frames of a TIFF or GIF from an AnyBitmap to this OcrInput object.
Declaration
public void LoadImageFrame(byte[] tiffBytes, int frame, Rectangle contentArea = null)
Parameters
| Type | Name | Description |
|---|---|---|
| System.Byte[] | tiffBytes | TIFF or GIF as a Bitmap to be loaded. |
| System.Int32 | frame | Specify which frame to load. |
| IronSoftware.Drawing.Rectangle | contentArea | Optional cropped area of the page to be added. |
LoadImageFrame(Stream, Int32, Rectangle)
Loads frames of a TIFF or GIF from an AnyBitmap to this OcrInput object.
Declaration
public void LoadImageFrame(Stream tiffStream, int frame, Rectangle contentArea = null)
Parameters
| Type | Name | Description |
|---|---|---|
| System.IO.Stream | tiffStream | TIFF or GIF as a Bitmap to be loaded. |
| System.Int32 | frame | Specify which frame to load. |
| IronSoftware.Drawing.Rectangle | contentArea | Optional cropped area of the page to be added. |
LoadImageFrame(String, Int32, Rectangle)
Loads frames of a TIFF or GIF from an AnyBitmap to this OcrInput object.
Declaration
public void LoadImageFrame(string tiffFilePath, int frame, Rectangle contentArea = null)
Parameters
| Type | Name | Description |
|---|---|---|
| System.String | tiffFilePath | TIFF or GIF as a Bitmap to be loaded. |
| System.Int32 | frame | Specify which frame to load. |
| IronSoftware.Drawing.Rectangle | contentArea | Optional cropped area of the page to be added. |
LoadImageFrames(AnyBitmap, Int32[], Rectangle)
Loads frames of a TIFF or GIF from an AnyBitmap to this OcrInput object.
Declaration
public void LoadImageFrames(AnyBitmap multiFrameBitmap, int[] frames, Rectangle contentArea = null)
Parameters
| Type | Name | Description |
|---|---|---|
| IronSoftware.Drawing.AnyBitmap | multiFrameBitmap | TIFF or GIF as a Bitmap to be loaded. |
| System.Int32[] | frames | Specify which frames to load. |
| IronSoftware.Drawing.Rectangle | contentArea | Optional cropped area of the page to be added. |
LoadImageFrames(Byte[], Int32[], Rectangle)
Loads frames of a TIFF or GIF from an AnyBitmap to this OcrInput object.
Declaration
public void LoadImageFrames(byte[] tiffBytes, int[] frames, Rectangle contentArea = null)
Parameters
| Type | Name | Description |
|---|---|---|
| System.Byte[] | tiffBytes | byte array of the TIFF or GIF to be loaded. |
| System.Int32[] | frames | For all frames, set to null. Otherwise, can specify which frames to load. |
| IronSoftware.Drawing.Rectangle | contentArea | Optional cropped area of the page to be added. |
LoadImageFrames(Stream, Int32[], Rectangle)
Loads frames of a TIFF or GIF from an AnyBitmap to this OcrInput object.
Declaration
public void LoadImageFrames(Stream tiffStream, int[] frames, Rectangle contentArea = null)
Parameters
| Type | Name | Description |
|---|---|---|
| System.IO.Stream | tiffStream | Stream of the TIFF or GIF to be loaded. |
| System.Int32[] | frames | Specify which frames to load. |
| IronSoftware.Drawing.Rectangle | contentArea | Optional cropped area of the page to be added. |
LoadImageFrames(String, Int32[], Rectangle)
Loads frames of a TIFF or GIF from an AnyBitmap to this OcrInput object.
Declaration
public void LoadImageFrames(string tiffFilePath, int[] frames, Rectangle contentArea = null)
Parameters
| Type | Name | Description |
|---|---|---|
| System.String | tiffFilePath | File path of the TIFF or GIF to be loaded. |
| System.Int32[] | frames | Specify which frames to load. |
| IronSoftware.Drawing.Rectangle | contentArea | Optional cropped area of the page to be added. |
LoadOcrInputPages(OcrInput, Rectangle)
Loads an existing OcrInput's loaded pages to this OcrInput object.
Declaration
public void LoadOcrInputPages(OcrInput ocrInput, Rectangle contentArea = null)
Parameters
| Type | Name | Description |
|---|---|---|
| OcrInput | ocrInput | OcrInput to be loaded. |
| IronSoftware.Drawing.Rectangle | contentArea | Optional cropped area of the page to be added. |
Remarks
This only loads the Page objects and no other OcrInput properties.
LoadPage(OcrInputPage, Rectangle)
Loads an existing OcrInput Page to this OcrInput object.
Declaration
public void LoadPage(OcrInputPage page, Rectangle contentArea = null)
Parameters
| Type | Name | Description |
|---|---|---|
| OcrInputPage | page | Page to be loaded. |
| IronSoftware.Drawing.Rectangle | contentArea | Optional cropped area of the page to be added. |
LoadPages(OcrInputPage[], Rectangle)
Loads existing OcrInput Pages to this OcrInput object.
Declaration
public void LoadPages(OcrInputPage[] pages, Rectangle contentArea = null)
Parameters
| Type | Name | Description |
|---|---|---|
| OcrInputPage[] | pages | |
| IronSoftware.Drawing.Rectangle | contentArea | Optional cropped area of the page to be added. |
LoadPdf(IDocumentId, Int32[], Int32, Boolean, Rectangle, String)
Load the PDF document into the OcrInput object.
Declaration
public void LoadPdf(IDocumentId Document, int[] PageIndices, int DPI = 200, bool OnlyEmbeddedImages = false, Rectangle ContentArea = null, string PdfPassword = null)
Parameters
| Type | Name | Description |
|---|---|---|
| IronSoftware.IDocumentId | Document | Document to load |
| System.Int32[] | PageIndices | Page indices to OCR |
| System.Int32 | DPI | DPI (dots per inch) to apply to PDF page rasterization (Default: 200). High value means better quality but more resource intensive. |
| System.Boolean | OnlyEmbeddedImages | Set this flag to true to only scan the embedded images in a PDF and leave existing text untouched. (Default: false) |
| IronSoftware.Drawing.Rectangle | ContentArea | Optional cropped area of the page to be added. Cannot be used if onlyEmbeddedImages = true |
| System.String | PdfPassword | Password for the PDF if there is one (Default: null). This parameter is required if the PDF document has password protection and you want to save a searchable PDF. |
LoadPdf(in Byte[], Int32, Boolean, Rectangle, String)
Loads all pages of a PDF into the OcrInput object.
Declaration
public void LoadPdf(in byte[] PdfBytes, int DPI = 200, bool OnlyEmbeddedImages = false, Rectangle ContentArea = null, string Password = null)
Parameters
| Type | Name | Description |
|---|---|---|
| System.Byte[] | PdfBytes | Byte array of the PDF being loaded. |
| System.Int32 | DPI | DPI (dots per inch) to apply to PDF page rasterization (Default: 200). High value means better quality but more resource intensive. |
| System.Boolean | OnlyEmbeddedImages | Set this flag to true to only scan the embedded images in a PDF and leave existing text untouched. (Default: false) |
| IronSoftware.Drawing.Rectangle | ContentArea | Optional cropped area of the page to be added. Cannot be used if onlyEmbeddedImages = true |
| System.String | Password | Password for the PDF if there is one. (Default: null) |
LoadPdf(Stream, Int32, Boolean, Rectangle, String)
Loads all pages of a PDF into the OcrInput object.
Declaration
public void LoadPdf(Stream PdfStream, int DPI = 200, bool OnlyEmbeddedImages = false, Rectangle ContentArea = null, string Password = null)
Parameters
| Type | Name | Description |
|---|---|---|
| System.IO.Stream | PdfStream | Stream of the PDF being loaded. |
| System.Int32 | DPI | DPI (dots per inch) to apply to PDF page rasterization (Default: 200). High value means better quality but more resource intensive. |
| System.Boolean | OnlyEmbeddedImages | Set this flag to true to only scan the embedded images in a PDF and leave existing text untouched. (Default: false) |
| IronSoftware.Drawing.Rectangle | ContentArea | Optional cropped area of the page to be added. Cannot be used if onlyEmbeddedImages = true |
| System.String | Password | Password for the PDF if there is one. (Default: null) |
LoadPdf(String, Int32, Boolean, Rectangle, String)
Loads all pages of a PDF into the OcrInput object.
Declaration
public void LoadPdf(string PdfFilePath, int DPI = 200, bool OnlyEmbeddedImages = false, Rectangle ContentArea = null, string Password = null)
Parameters
| Type | Name | Description |
|---|---|---|
| System.String | PdfFilePath | File Path of the PDF being loaded. |
| System.Int32 | DPI | DPI (dots per inch) to apply to PDF page rasterization (Default: 200). High value means better quality but more resource intensive. |
| System.Boolean | OnlyEmbeddedImages | Set this flag to true to only scan the embedded images in a PDF and leave existing text untouched. (Default: false) |
| IronSoftware.Drawing.Rectangle | ContentArea | Optional cropped area of the page to be added. Cannot be used if onlyEmbeddedImages = true |
| System.String | Password | Password for the PDF if there is one. (Default: null) |
LoadPdfPage(in Byte[], Int32, Int32, Boolean, Rectangle, String)
Loads a specific page of a PDF into the OcrInput object.
Declaration
public void LoadPdfPage(in byte[] PdfBytes, int PageIndex, int DPI = 200, bool OnlyEmbeddedImages = false, Rectangle ContentArea = null, string Password = null)
Parameters
| Type | Name | Description |
|---|---|---|
| System.Byte[] | PdfBytes | Byte array of the PDF being loaded. |
| System.Int32 | PageIndex | Index of page to load. First page is 0. |
| System.Int32 | DPI | DPI (dots per inch) to apply to PDF page rasterization (Default: 200). High value means better quality but more resource intensive. |
| System.Boolean | OnlyEmbeddedImages | Set this flag to true to only scan the embedded images in a PDF and leave existing text untouched. (Default: false) |
| IronSoftware.Drawing.Rectangle | ContentArea | Optional cropped area of the page to be added. Cannot be used if onlyEmbeddedImages = true |
| System.String | Password | Password for the PDF if there is one. (Default: null) |
LoadPdfPage(Stream, Int32, Int32, Boolean, Rectangle, String)
Loads a specific page of a PDF into the OcrInput object.t
Declaration
public void LoadPdfPage(Stream PdfStream, int PageIndex, int DPI = 200, bool OnlyEmbeddedImages = false, Rectangle ContentArea = null, string Password = null)
Parameters
| Type | Name | Description |
|---|---|---|
| System.IO.Stream | PdfStream | Stream of the PDF being loaded. |
| System.Int32 | PageIndex | Index of page to load. First page is 0. |
| System.Int32 | DPI | DPI (dots per inch) to apply to PDF page rasterization (Default: 200). High value means better quality but more resource intensive. |
| System.Boolean | OnlyEmbeddedImages | Set this flag to true to only scan the embedded images in a PDF and leave existing text untouched. (Default: false) |
| IronSoftware.Drawing.Rectangle | ContentArea | Optional cropped area of the page to be added. Cannot be used if onlyEmbeddedImages = true |
| System.String | Password | Password for the PDF if there is one. (Default: null) |
LoadPdfPage(String, Int32, Int32, Boolean, Rectangle, String)
Loads a specific page of a PDF into the OcrInput object.
Declaration
public void LoadPdfPage(string PdfFilePath, int PageIndex, int DPI = 200, bool OnlyEmbeddedImages = false, Rectangle ContentArea = null, string Password = null)
Parameters
| Type | Name | Description |
|---|---|---|
| System.String | PdfFilePath | File Path of the PDF being loaded. |
| System.Int32 | PageIndex | Index of page to load. First page is 0. |
| System.Int32 | DPI | DPI (dots per inch) to apply to PDF page rasterization (Default: 200). High value means better quality but more resource intensive. |
| System.Boolean | OnlyEmbeddedImages | Set this flag to true to only scan the embedded images in a PDF and leave existing text untouched. (Default: false) |
| IronSoftware.Drawing.Rectangle | ContentArea | Optional cropped area of the page to be added. Cannot be used if onlyEmbeddedImages = true |
| System.String | Password | Password for the PDF if there is one. (Default: null) |
LoadPdfPages(in Byte[], Int32[], Int32, Boolean, Rectangle, String)
Loads a specific page of a PDF into the OcrInput object.
Declaration
public void LoadPdfPages(in byte[] PdfBytes, int[] PageIndices, int DPI = 200, bool OnlyEmbeddedImages = false, Rectangle ContentArea = null, string Password = null)
Parameters
| Type | Name | Description |
|---|---|---|
| System.Byte[] | PdfBytes | Byte array of the PDF being loaded. |
| System.Int32[] | PageIndices | Indices of pages to load. First page is 0. |
| System.Int32 | DPI | DPI (dots per inch) to apply to PDF page rasterization (Default: 200). High value means better quality but more resource intensive. |
| System.Boolean | OnlyEmbeddedImages | Set this flag to true to only scan the embedded images in a PDF and leave existing text untouched. (Default: false) |
| IronSoftware.Drawing.Rectangle | ContentArea | Optional cropped area of the page to be added. Cannot be used if onlyEmbeddedImages = true |
| System.String | Password | Password for the PDF if there is one. (Default: null) |
LoadPdfPages(Stream, Int32[], Int32, Boolean, Rectangle, String)
Loads a specific page of a PDF into the OcrInput object.
Declaration
public void LoadPdfPages(Stream PdfStream, int[] PageIndices, int DPI = 200, bool OnlyEmbeddedImages = false, Rectangle ContentArea = null, string Password = null)
Parameters
| Type | Name | Description |
|---|---|---|
| System.IO.Stream | PdfStream | Stream of the PDF being loaded. |
| System.Int32[] | PageIndices | Indices of pages to load. First page is 0. |
| System.Int32 | DPI | DPI (dots per inch) to apply to PDF page rasterization (Default: 200). High value means better quality but more resource intensive. |
| System.Boolean | OnlyEmbeddedImages | Set this flag to true to only scan the embedded images in a PDF and leave existing text untouched. (Default: false) |
| IronSoftware.Drawing.Rectangle | ContentArea | Optional cropped area of the page to be added. Cannot be used if onlyEmbeddedImages = true |
| System.String | Password | Password for the PDF if there is one. (Default: null) |
LoadPdfPages(String, Int32[], Int32, Boolean, Rectangle, String)
Loads a specific page of a PDF into the OcrInput object.
Declaration
public void LoadPdfPages(string PdfFilePath, int[] PageIndices, int DPI = 200, bool OnlyEmbeddedImages = false, Rectangle ContentArea = null, string Password = null)
Parameters
| Type | Name | Description |
|---|---|---|
| System.String | PdfFilePath | File Path of the PDF being loaded. |
| System.Int32[] | PageIndices | Indices of pages to load. First page is 0. |
| System.Int32 | DPI | DPI (dots per inch) to apply to PDF page rasterization (Default: 200). High value means better quality but more resource intensive. |
| System.Boolean | OnlyEmbeddedImages | Set this flag to true to only scan the embedded images in a PDF and leave existing text untouched. (Default: false) |
| IronSoftware.Drawing.Rectangle | ContentArea | Optional cropped area of the page to be added. Cannot be used if onlyEmbeddedImages = true |
| System.String | Password | Password for the PDF if there is one. (Default: null) |
LoadScannedPdf(IDocumentId, Int32[], Int32, String)
Load a scanned PDF document into the OcrInput object.
Declaration
public void LoadScannedPdf(IDocumentId Document, int[] PageIndices = null, int DPI = 200, string PdfPassword = null)
Parameters
| Type | Name | Description |
|---|---|---|
| IronSoftware.IDocumentId | Document | Scanned document to load |
| System.Int32[] | PageIndices | Page indices to OCR |
| System.Int32 | DPI | DPI (dots per inch) to apply to PDF page rasterization (Default: 200). High value means better quality but more resource intensive. |
| System.String | PdfPassword | Password for the PDF if there is one (Default: null). This parameter is required if the PDF document has password protection and you want to save a searchable PDF. |
LoadScannedPdf(in Byte[], Int32[], Int32, String)
Load a scanned PDF document into the OcrInput object.
Declaration
public void LoadScannedPdf(in byte[] PdfBytes, int[] PageIndices = null, int DPI = 200, string Password = null)
Parameters
| Type | Name | Description |
|---|---|---|
| System.Byte[] | PdfBytes | Byte array of the PDF being loaded. |
| System.Int32[] | PageIndices | Page indices to OCR |
| System.Int32 | DPI | DPI (dots per inch) to apply to PDF page rasterization (Default: 200). High value means better quality but more resource intensive. |
| System.String | Password | Password for the PDF if there is one. (Default: null) |
LoadScannedPdf(Stream, Int32[], Int32, String)
Load a scanned PDF document into the OcrInput object.
Declaration
public void LoadScannedPdf(Stream PdfStream, int[] PageIndices = null, int DPI = 200, string Password = null)
Parameters
| Type | Name | Description |
|---|---|---|
| System.IO.Stream | PdfStream | Stream of the PDF being loaded. |
| System.Int32[] | PageIndices | Page indices to OCR |
| System.Int32 | DPI | DPI (dots per inch) to apply to PDF page rasterization (Default: 200). High value means better quality but more resource intensive. |
| System.String | Password | Password for the PDF if there is one. (Default: null) |
LoadScannedPdf(String, Int32[], Int32, String)
Load a scanned PDF document into the OcrInput object.
Declaration
public void LoadScannedPdf(string PdfFilePath, int[] PageIndices = null, int DPI = 200, string Password = null)
Parameters
| Type | Name | Description |
|---|---|---|
| System.String | PdfFilePath | File Path of the PDF being loaded. |
| System.Int32[] | PageIndices | Page indices to OCR |
| System.Int32 | DPI | DPI (dots per inch) to apply to PDF page rasterization (Default: 200). High value means better quality but more resource intensive. |
| System.String | Password | Password for the PDF if there is one. (Default: null) |