Class OcrInput
Stores OCR input data and allows OCR of PDF documents or any image format.
Also provides various image filter methods which can improve OCR accuracy.
Implements
Inherited Members
Namespace: IronOcr
Assembly: IronOcr.dll
Syntax
public class OcrInput : OcrInputBase, IDisposable
Remarks
Also see OcrPdfInput and OcrImageInput
Constructors
OcrInput()
Creates an OcrInput Object which holds pages of rasterized input media (images, PDFs, TIFs, GIFs)
You may Load one or more media and apply image correction filters to loaded pages such as Deskew or Binarize.
Please use the `using` keyword:
using var input = new OcrInput();
input.LoadImage("input.png");
input.Deskew();
var result = new IronTesseract().Read(input);
Declaration
public OcrInput()
Methods
Add(IEnumerable<OcrInputPage>, Rectangle)
Please migrate to using: LoadPages(OcrInputPage[], Rectangle)
Declaration
public void Add(IEnumerable<OcrInputPage> imagesAsOcrInputPages, Rectangle ContentArea)
Parameters
Type | Name | Description |
---|---|---|
System.Collections.Generic.IEnumerable<OcrInputPage> | imagesAsOcrInputPages | |
IronSoftware.Drawing.Rectangle | ContentArea |
AddPdf(String, String, Rectangle, Nullable<Int32>, Boolean)
Please migrate to using: LoadPdf(in Byte[], Int32, Boolean, Rectangle, String)
Declaration
public void AddPdf(string PdfPath, string Password = null, Rectangle ContentArea = null, Nullable<int> DPI, bool OnlyImages = false)
Parameters
Type | Name | Description |
---|---|---|
System.String | PdfPath | |
System.String | Password | |
IronSoftware.Drawing.Rectangle | ContentArea | |
System.Nullable<System.Int32> | DPI | |
System.Boolean | OnlyImages |
Load(Object, Rectangle)
Method that will attempt to read an input from an object. It is recommended to use the explicit Load methods instead for more customization and stability. See LoadPdf and LoadImage.
Declaration
public void Load(object inputObject, Rectangle contentArea = null)
Parameters
Type | Name | Description |
---|---|---|
System.Object | inputObject | Object to be loaded |
IronSoftware.Drawing.Rectangle | contentArea | Optional cropped area of the page to be added. Will be ignored if onlyPdfImages set to true. |
LoadImage(AnyBitmap, Rectangle)
Loads image into this OcrInput object.
Accepts: PNG. JPG, BMP, TIFF, GIF, WEBP, and other common Image formats.
Declaration
public void LoadImage(AnyBitmap imageBitmap, Rectangle contentArea = null)
Parameters
Type | Name | Description |
---|---|---|
IronSoftware.Drawing.AnyBitmap | imageBitmap | Image as an AnyBitmap |
IronSoftware.Drawing.Rectangle | contentArea | Optional cropped area of the page to be added. |
Remarks
If multiple pages exist in the image such as TIFF/GIF frames, all will be added as their own OcrInputPage
LoadImage(Byte[], Rectangle)
Loads image into this OcrInput object.
Accepts: PNG. JPG, BMP, TIFF, GIF, WEBP, and other common Image formats.
Declaration
public void LoadImage(byte[] imageBytes, Rectangle contentArea = null)
Parameters
Type | Name | Description |
---|---|---|
System.Byte[] | imageBytes | Image as a byte array |
IronSoftware.Drawing.Rectangle | contentArea | Optional cropped area of the page to be added. |
Remarks
If multiple pages exist in the image such as TIFF/GIF frames, all will be added as their own OcrInputPage
LoadImage(Stream, Rectangle)
Loads image into this OcrInput object.
Accepts: PNG. JPG, BMP, TIFF, GIF, WEBP, and other common Image formats.
Declaration
public void LoadImage(Stream imageStream, Rectangle contentArea = null)
Parameters
Type | Name | Description |
---|---|---|
System.IO.Stream | imageStream | Image as a Stream |
IronSoftware.Drawing.Rectangle | contentArea | Optional cropped area of the page to be added. |
Remarks
If multiple pages exist in the image such as TIFF/GIF frames, all will be added as their own OcrInputPage
LoadImage(String, Rectangle)
Loads image into this OcrInput object.
Accepts: PNG. JPG, BMP, TIFF, GIF, WEBP, and other common Image formats.
Declaration
public void LoadImage(string imageFilePath, Rectangle contentArea = null)
Parameters
Type | Name | Description |
---|---|---|
System.String | imageFilePath | File Path of the Image |
IronSoftware.Drawing.Rectangle | contentArea | Optional cropped area of the page to be added. |
Remarks
If multiple pages exist in the image such as TIFF/GIF frames, all will be added as their own OcrInputPage
LoadImageFrame(AnyBitmap, Int32, Rectangle)
Loads frames of a TIFF or GIF from an AnyBitmap to this OcrInput object.
Declaration
public void LoadImageFrame(AnyBitmap multiFrameBitmap, int frame, Rectangle contentArea = null)
Parameters
Type | Name | Description |
---|---|---|
IronSoftware.Drawing.AnyBitmap | multiFrameBitmap | TIFF or GIF as a Bitmap to be loaded. |
System.Int32 | frame | Specify which frame to load. |
IronSoftware.Drawing.Rectangle | contentArea | Optional cropped area of the page to be added. |
LoadImageFrame(Byte[], Int32, Rectangle)
Loads frames of a TIFF or GIF from an AnyBitmap to this OcrInput object.
Declaration
public void LoadImageFrame(byte[] tiffBytes, int frame, Rectangle contentArea = null)
Parameters
Type | Name | Description |
---|---|---|
System.Byte[] | tiffBytes | TIFF or GIF as a Bitmap to be loaded. |
System.Int32 | frame | Specify which frame to load. |
IronSoftware.Drawing.Rectangle | contentArea | Optional cropped area of the page to be added. |
LoadImageFrame(Stream, Int32, Rectangle)
Loads frames of a TIFF or GIF from an AnyBitmap to this OcrInput object.
Declaration
public void LoadImageFrame(Stream tiffStream, int frame, Rectangle contentArea = null)
Parameters
Type | Name | Description |
---|---|---|
System.IO.Stream | tiffStream | TIFF or GIF as a Bitmap to be loaded. |
System.Int32 | frame | Specify which frame to load. |
IronSoftware.Drawing.Rectangle | contentArea | Optional cropped area of the page to be added. |
LoadImageFrame(String, Int32, Rectangle)
Loads frames of a TIFF or GIF from an AnyBitmap to this OcrInput object.
Declaration
public void LoadImageFrame(string tiffFilePath, int frame, Rectangle contentArea = null)
Parameters
Type | Name | Description |
---|---|---|
System.String | tiffFilePath | TIFF or GIF as a Bitmap to be loaded. |
System.Int32 | frame | Specify which frame to load. |
IronSoftware.Drawing.Rectangle | contentArea | Optional cropped area of the page to be added. |
LoadImageFrames(AnyBitmap, Int32[], Rectangle)
Loads frames of a TIFF or GIF from an AnyBitmap to this OcrInput object.
Declaration
public void LoadImageFrames(AnyBitmap multiFrameBitmap, int[] frames, Rectangle contentArea = null)
Parameters
Type | Name | Description |
---|---|---|
IronSoftware.Drawing.AnyBitmap | multiFrameBitmap | TIFF or GIF as a Bitmap to be loaded. |
System.Int32[] | frames | Specify which frames to load. |
IronSoftware.Drawing.Rectangle | contentArea | Optional cropped area of the page to be added. |
LoadImageFrames(Byte[], Int32[], Rectangle)
Loads frames of a TIFF or GIF from an AnyBitmap to this OcrInput object.
Declaration
public void LoadImageFrames(byte[] tiffBytes, int[] frames, Rectangle contentArea = null)
Parameters
Type | Name | Description |
---|---|---|
System.Byte[] | tiffBytes | byte array of the TIFF or GIF to be loaded. |
System.Int32[] | frames | For all frames, set to null. Otherwise, can specify which frames to load. |
IronSoftware.Drawing.Rectangle | contentArea | Optional cropped area of the page to be added. |
LoadImageFrames(Stream, Int32[], Rectangle)
Loads frames of a TIFF or GIF from an AnyBitmap to this OcrInput object.
Declaration
public void LoadImageFrames(Stream tiffStream, int[] frames, Rectangle contentArea = null)
Parameters
Type | Name | Description |
---|---|---|
System.IO.Stream | tiffStream | Stream of the TIFF or GIF to be loaded. |
System.Int32[] | frames | Specify which frames to load. |
IronSoftware.Drawing.Rectangle | contentArea | Optional cropped area of the page to be added. |
LoadImageFrames(String, Int32[], Rectangle)
Loads frames of a TIFF or GIF from an AnyBitmap to this OcrInput object.
Declaration
public void LoadImageFrames(string tiffFilePath, int[] frames, Rectangle contentArea = null)
Parameters
Type | Name | Description |
---|---|---|
System.String | tiffFilePath | File path of the TIFF or GIF to be loaded. |
System.Int32[] | frames | Specify which frames to load. |
IronSoftware.Drawing.Rectangle | contentArea | Optional cropped area of the page to be added. |
LoadOcrInputPages(OcrInput, Rectangle)
Loads an existing OcrInput's loaded pages to this OcrInput object.
Declaration
public void LoadOcrInputPages(OcrInput ocrInput, Rectangle contentArea = null)
Parameters
Type | Name | Description |
---|---|---|
OcrInput | ocrInput | OcrInput to be loaded. |
IronSoftware.Drawing.Rectangle | contentArea | Optional cropped area of the page to be added. |
Remarks
This only loads the Page objects and no other OcrInput properties.
LoadPage(OcrInputPage, Rectangle)
Loads an existing OcrInput Page to this OcrInput object.
Declaration
public void LoadPage(OcrInputPage page, Rectangle contentArea = null)
Parameters
Type | Name | Description |
---|---|---|
OcrInputPage | page | Page to be loaded. |
IronSoftware.Drawing.Rectangle | contentArea | Optional cropped area of the page to be added. |
LoadPages(OcrInputPage[], Rectangle)
Loads existing OcrInput Pages to this OcrInput object.
Declaration
public void LoadPages(OcrInputPage[] pages, Rectangle contentArea = null)
Parameters
Type | Name | Description |
---|---|---|
OcrInputPage[] | pages | |
IronSoftware.Drawing.Rectangle | contentArea | Optional cropped area of the page to be added. |
LoadPdf(IDocumentId, Int32[], Int32, Boolean, Rectangle)
Load the PDF document into the OcrInput object.
Declaration
public void LoadPdf(IDocumentId Document, int[] PageIndices, int DPI = 200, bool OnlyEmbeddedImages = false, Rectangle ContentArea = null)
Parameters
Type | Name | Description |
---|---|---|
IronSoftware.IDocumentId | Document | Document to load |
System.Int32[] | PageIndices | Page indices to OCR |
System.Int32 | DPI | DPI (dots per inch) to apply to PDF page rasterization (Default: 200). High value means better quality but more resource intensive. |
System.Boolean | OnlyEmbeddedImages | Set this flag to true to only scan the embedded images in a PDF and leave existing text untouched. (Default: false) |
IronSoftware.Drawing.Rectangle | ContentArea | Optional cropped area of the page to be added. Cannot be used if onlyEmbeddedImages = true |
LoadPdf(in Byte[], Int32, Boolean, Rectangle, String)
Loads all pages of a PDF into the OcrInput object.
Declaration
public void LoadPdf(in byte[] PdfBytes, int DPI = 200, bool OnlyEmbeddedImages = false, Rectangle ContentArea = null, string Password = null)
Parameters
Type | Name | Description |
---|---|---|
System.Byte[] | PdfBytes | Byte array of the PDF being loaded. |
System.Int32 | DPI | DPI (dots per inch) to apply to PDF page rasterization (Default: 200). High value means better quality but more resource intensive. |
System.Boolean | OnlyEmbeddedImages | Set this flag to true to only scan the embedded images in a PDF and leave existing text untouched. (Default: false) |
IronSoftware.Drawing.Rectangle | ContentArea | Optional cropped area of the page to be added. Cannot be used if onlyEmbeddedImages = true |
System.String | Password | Password for the PDF if there is one. (Default: null) |
LoadPdf(Stream, Int32, Boolean, Rectangle, String)
Loads all pages of a PDF into the OcrInput object.
Declaration
public void LoadPdf(Stream PdfStream, int DPI = 200, bool OnlyEmbeddedImages = false, Rectangle ContentArea = null, string Password = null)
Parameters
Type | Name | Description |
---|---|---|
System.IO.Stream | PdfStream | Stream of the PDF being loaded. |
System.Int32 | DPI | DPI (dots per inch) to apply to PDF page rasterization (Default: 200). High value means better quality but more resource intensive. |
System.Boolean | OnlyEmbeddedImages | Set this flag to true to only scan the embedded images in a PDF and leave existing text untouched. (Default: false) |
IronSoftware.Drawing.Rectangle | ContentArea | Optional cropped area of the page to be added. Cannot be used if onlyEmbeddedImages = true |
System.String | Password | Password for the PDF if there is one. (Default: null) |
LoadPdf(String, Int32, Boolean, Rectangle, String)
Loads all pages of a PDF into the OcrInput object.
Declaration
public void LoadPdf(string PdfFilePath, int DPI = 200, bool OnlyEmbeddedImages = false, Rectangle ContentArea = null, string Password = null)
Parameters
Type | Name | Description |
---|---|---|
System.String | PdfFilePath | File Path of the PDF being loaded. |
System.Int32 | DPI | DPI (dots per inch) to apply to PDF page rasterization (Default: 200). High value means better quality but more resource intensive. |
System.Boolean | OnlyEmbeddedImages | Set this flag to true to only scan the embedded images in a PDF and leave existing text untouched. (Default: false) |
IronSoftware.Drawing.Rectangle | ContentArea | Optional cropped area of the page to be added. Cannot be used if onlyEmbeddedImages = true |
System.String | Password | Password for the PDF if there is one. (Default: null) |
LoadPdfPage(in Byte[], Int32, Int32, Boolean, Rectangle, String)
Loads a specific page of a PDF into the OcrInput object.
Declaration
public void LoadPdfPage(in byte[] PdfBytes, int PageIndex, int DPI = 200, bool OnlyEmbeddedImages = false, Rectangle ContentArea = null, string Password = null)
Parameters
Type | Name | Description |
---|---|---|
System.Byte[] | PdfBytes | Byte array of the PDF being loaded. |
System.Int32 | PageIndex | Index of page to load. First page is 0. |
System.Int32 | DPI | DPI (dots per inch) to apply to PDF page rasterization (Default: 200). High value means better quality but more resource intensive. |
System.Boolean | OnlyEmbeddedImages | Set this flag to true to only scan the embedded images in a PDF and leave existing text untouched. (Default: false) |
IronSoftware.Drawing.Rectangle | ContentArea | Optional cropped area of the page to be added. Cannot be used if onlyEmbeddedImages = true |
System.String | Password | Password for the PDF if there is one. (Default: null) |
LoadPdfPage(Stream, Int32, Int32, Boolean, Rectangle, String)
Loads a specific page of a PDF into the OcrInput object.t
Declaration
public void LoadPdfPage(Stream PdfStream, int PageIndex, int DPI = 200, bool OnlyEmbeddedImages = false, Rectangle ContentArea = null, string Password = null)
Parameters
Type | Name | Description |
---|---|---|
System.IO.Stream | PdfStream | Stream of the PDF being loaded. |
System.Int32 | PageIndex | Index of page to load. First page is 0. |
System.Int32 | DPI | DPI (dots per inch) to apply to PDF page rasterization (Default: 200). High value means better quality but more resource intensive. |
System.Boolean | OnlyEmbeddedImages | Set this flag to true to only scan the embedded images in a PDF and leave existing text untouched. (Default: false) |
IronSoftware.Drawing.Rectangle | ContentArea | Optional cropped area of the page to be added. Cannot be used if onlyEmbeddedImages = true |
System.String | Password | Password for the PDF if there is one. (Default: null) |
LoadPdfPage(String, Int32, Int32, Boolean, Rectangle, String)
Loads a specific page of a PDF into the OcrInput object.
Declaration
public void LoadPdfPage(string PdfFilePath, int PageIndex, int DPI = 200, bool OnlyEmbeddedImages = false, Rectangle ContentArea = null, string Password = null)
Parameters
Type | Name | Description |
---|---|---|
System.String | PdfFilePath | File Path of the PDF being loaded. |
System.Int32 | PageIndex | Index of page to load. First page is 0. |
System.Int32 | DPI | DPI (dots per inch) to apply to PDF page rasterization (Default: 200). High value means better quality but more resource intensive. |
System.Boolean | OnlyEmbeddedImages | Set this flag to true to only scan the embedded images in a PDF and leave existing text untouched. (Default: false) |
IronSoftware.Drawing.Rectangle | ContentArea | Optional cropped area of the page to be added. Cannot be used if onlyEmbeddedImages = true |
System.String | Password | Password for the PDF if there is one. (Default: null) |
LoadPdfPages(in Byte[], Int32[], Int32, Boolean, Rectangle, String)
Loads a specific page of a PDF into the OcrInput object.
Declaration
public void LoadPdfPages(in byte[] PdfBytes, int[] PageIndices, int DPI = 200, bool OnlyEmbeddedImages = false, Rectangle ContentArea = null, string Password = null)
Parameters
Type | Name | Description |
---|---|---|
System.Byte[] | PdfBytes | Byte array of the PDF being loaded. |
System.Int32[] | PageIndices | Indices of pages to load. First page is 0. |
System.Int32 | DPI | DPI (dots per inch) to apply to PDF page rasterization (Default: 200). High value means better quality but more resource intensive. |
System.Boolean | OnlyEmbeddedImages | Set this flag to true to only scan the embedded images in a PDF and leave existing text untouched. (Default: false) |
IronSoftware.Drawing.Rectangle | ContentArea | Optional cropped area of the page to be added. Cannot be used if onlyEmbeddedImages = true |
System.String | Password | Password for the PDF if there is one. (Default: null) |
LoadPdfPages(Stream, Int32[], Int32, Boolean, Rectangle, String)
Loads a specific page of a PDF into the OcrInput object.
Declaration
public void LoadPdfPages(Stream PdfStream, int[] PageIndices, int DPI = 200, bool OnlyEmbeddedImages = false, Rectangle ContentArea = null, string Password = null)
Parameters
Type | Name | Description |
---|---|---|
System.IO.Stream | PdfStream | Stream of the PDF being loaded. |
System.Int32[] | PageIndices | Indices of pages to load. First page is 0. |
System.Int32 | DPI | DPI (dots per inch) to apply to PDF page rasterization (Default: 200). High value means better quality but more resource intensive. |
System.Boolean | OnlyEmbeddedImages | Set this flag to true to only scan the embedded images in a PDF and leave existing text untouched. (Default: false) |
IronSoftware.Drawing.Rectangle | ContentArea | Optional cropped area of the page to be added. Cannot be used if onlyEmbeddedImages = true |
System.String | Password | Password for the PDF if there is one. (Default: null) |
LoadPdfPages(String, Int32[], Int32, Boolean, Rectangle, String)
Loads a specific page of a PDF into the OcrInput object.
Declaration
public void LoadPdfPages(string PdfFilePath, int[] PageIndices, int DPI = 200, bool OnlyEmbeddedImages = false, Rectangle ContentArea = null, string Password = null)
Parameters
Type | Name | Description |
---|---|---|
System.String | PdfFilePath | File Path of the PDF being loaded. |
System.Int32[] | PageIndices | Indices of pages to load. First page is 0. |
System.Int32 | DPI | DPI (dots per inch) to apply to PDF page rasterization (Default: 200). High value means better quality but more resource intensive. |
System.Boolean | OnlyEmbeddedImages | Set this flag to true to only scan the embedded images in a PDF and leave existing text untouched. (Default: false) |
IronSoftware.Drawing.Rectangle | ContentArea | Optional cropped area of the page to be added. Cannot be used if onlyEmbeddedImages = true |
System.String | Password | Password for the PDF if there is one. (Default: null) |
LoadScannedPdf(IDocumentId, Int32[], Int32)
Load a scanned PDF document into the OcrInput object.
Declaration
public void LoadScannedPdf(IDocumentId Document, int[] PageIndices = null, int DPI = 200)
Parameters
Type | Name | Description |
---|---|---|
IronSoftware.IDocumentId | Document | Scanned document to load |
System.Int32[] | PageIndices | Page indices to OCR |
System.Int32 | DPI | DPI (dots per inch) to apply to PDF page rasterization (Default: 200). High value means better quality but more resource intensive. |
LoadScannedPdf(in Byte[], Int32[], Int32, String)
Load a scanned PDF document into the OcrInput object.
Declaration
public void LoadScannedPdf(in byte[] PdfBytes, int[] PageIndices = null, int DPI = 200, string Password = null)
Parameters
Type | Name | Description |
---|---|---|
System.Byte[] | PdfBytes | Byte array of the PDF being loaded. |
System.Int32[] | PageIndices | Page indices to OCR |
System.Int32 | DPI | DPI (dots per inch) to apply to PDF page rasterization (Default: 200). High value means better quality but more resource intensive. |
System.String | Password | Password for the PDF if there is one. (Default: null) |
LoadScannedPdf(Stream, Int32[], Int32, String)
Load a scanned PDF document into the OcrInput object.
Declaration
public void LoadScannedPdf(Stream PdfStream, int[] PageIndices = null, int DPI = 200, string Password = null)
Parameters
Type | Name | Description |
---|---|---|
System.IO.Stream | PdfStream | Stream of the PDF being loaded. |
System.Int32[] | PageIndices | Page indices to OCR |
System.Int32 | DPI | DPI (dots per inch) to apply to PDF page rasterization (Default: 200). High value means better quality but more resource intensive. |
System.String | Password | Password for the PDF if there is one. (Default: null) |
LoadScannedPdf(String, Int32[], Int32, String)
Load a scanned PDF document into the OcrInput object.
Declaration
public void LoadScannedPdf(string PdfFilePath, int[] PageIndices = null, int DPI = 200, string Password = null)
Parameters
Type | Name | Description |
---|---|---|
System.String | PdfFilePath | File Path of the PDF being loaded. |
System.Int32[] | PageIndices | Page indices to OCR |
System.Int32 | DPI | DPI (dots per inch) to apply to PDF page rasterization (Default: 200). High value means better quality but more resource intensive. |
System.String | Password | Password for the PDF if there is one. (Default: null) |