Class OcrInput
OcrInput provides a robust class for preparing one or more Image Files, PDFs, System.Drawing Objects, Streams and Binary Image data for OCR. Instances of OcrInput can be read by the IronTesseract class.
We recognise that much of the quality of OCR results depends on preparing images to be read. This class allows developers to enhance their scanned documents provide faster, more accurate OCR results using filters such as: EnhanceResolution(Int32), DeNoise(), ToGrayScale(), Deskew(Int32, Boolean), Rotate(Double) and Sharpen().
Supports for multi-paged OCR input.
Inheritance
Implements
Inherited Members
Namespace: IronOcr
Assembly: IronOcr.dll
Syntax
public class OcrInput : IDisposable
Constructors
OcrInput(Byte[])
Create a new OcrInput object populated with an Image file as binary data.
This class is IDisposable and is best initiated with a 'using' statement.
Declaration
public OcrInput(byte[] Bytes)
Parameters
Type | Name | Description |
---|---|---|
System.Byte[] | Bytes | Bytes of an Image or PDF file. |
OcrInput(Byte[], Rectangle)
Create a new OcrInput object populated with an Image file as binary data.
This class is IDisposable and is best initiated with a 'using' statement.
Declaration
public OcrInput(byte[] Bytes, Rectangle ContentArea)
Parameters
Type | Name | Description |
---|---|---|
System.Byte[] | Bytes | Bytes of an Image or PDF file. |
System.Drawing.Rectangle | ContentArea | Specifies a region of each image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
OcrInput(IEnumerable<Byte[]>)
Create a new OcrInput object populated with the binary data of multiple Images with a common ContentArea.
This class is IDisposable and is best initiated with a 'using' statement.
Declaration
public OcrInput(IEnumerable<byte[]> Bytes)
Parameters
Type | Name | Description |
---|---|---|
System.Collections.Generic.IEnumerable<System.Byte[]> | Bytes | An IEnumerable of byte arrays containing Image or PDF files. |
OcrInput(IEnumerable<Byte[]>, Rectangle)
Create a new OcrInput object populated with the binary data of multiple Images with a common ContentArea.
This class is IDisposable and is best initiated with a 'using' statement.
Declaration
public OcrInput(IEnumerable<byte[]> Bytes, Rectangle ContentArea)
Parameters
Type | Name | Description |
---|---|---|
System.Collections.Generic.IEnumerable<System.Byte[]> | Bytes | An IEnumerable of byte arrays containing Image or PDF files. |
System.Drawing.Rectangle | ContentArea | Specifies a region of each image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
OcrInput(IEnumerable<Bitmap>)
Create a new OcrInput object populated with multiple System.Drawing.Bitmap.
This class is IDisposable and is best initiated with a 'using' statement.
Declaration
public OcrInput(IEnumerable<Bitmap> Bitmaps)
Parameters
Type | Name | Description |
---|---|---|
System.Collections.Generic.IEnumerable<System.Drawing.Bitmap> | Bitmaps | An IEnumerable of System.Drawing.Bitmap. |
OcrInput(IEnumerable<Bitmap>, Rectangle)
Create a new OcrInput object populated with multiple System.Drawing.Bitmaps sharing a common ContentArea.
This class is IDisposable and is best initiated with a 'using' statement.
Declaration
public OcrInput(IEnumerable<Bitmap> Bitmaps, Rectangle ContentArea)
Parameters
Type | Name | Description |
---|---|---|
System.Collections.Generic.IEnumerable<System.Drawing.Bitmap> | Bitmaps | An IEnumerable of System.Drawing.Bitmap. |
System.Drawing.Rectangle | ContentArea | Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
OcrInput(IEnumerable<Image>)
Create a new OcrInput object populated with any number of System.Drawing.Image.
This class is IDisposable and is best initiated with a 'using' statement.
Declaration
public OcrInput(IEnumerable<Image> Images)
Parameters
Type | Name | Description |
---|---|---|
System.Collections.Generic.IEnumerable<System.Drawing.Image> | Images | Any Number of System.Drawing.Image |
OcrInput(IEnumerable<Image>, Rectangle)
Create a new OcrInput object populated with any number of System.Drawing.Image sharing a common ContentArea.
This class is IDisposable and is best initiated with a 'using' statement.
Declaration
public OcrInput(IEnumerable<Image> Images, Rectangle ContentArea)
Parameters
Type | Name | Description |
---|---|---|
System.Collections.Generic.IEnumerable<System.Drawing.Image> | Images | Any Number of System.Drawing.Image |
System.Drawing.Rectangle | ContentArea | Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
OcrInput(IEnumerable<Stream>)
Create a new OcrInput object populated with multiple images as Streams.
This class is IDisposable and is best initiated with a 'using' statement.
Declaration
public OcrInput(IEnumerable<Stream> Streams)
Parameters
Type | Name | Description |
---|---|---|
System.Collections.Generic.IEnumerable<System.IO.Stream> | Streams | Steam containing an Image or PDF file. |
OcrInput(IEnumerable<Stream>, Rectangle)
Create a new OcrInput object populated with multiple images as Streams sharing a common ContentArea.
This class is IDisposable and is best initiated with a 'using' statement.
Declaration
public OcrInput(IEnumerable<Stream> Streams, Rectangle ContentArea)
Parameters
Type | Name | Description |
---|---|---|
System.Collections.Generic.IEnumerable<System.IO.Stream> | Streams | Steam containing an Image or PDF file. |
System.Drawing.Rectangle | ContentArea | Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
OcrInput(IEnumerable<String>)
Create a new OcrInput object populated with multiple Image files.
This class is IDisposable and is best initiated with a 'using' statement.
Declaration
public OcrInput(IEnumerable<string> FilePaths)
Parameters
Type | Name | Description |
---|---|---|
System.Collections.Generic.IEnumerable<System.String> | FilePaths | An IEnumerable of paths to Image or PDF files. |
OcrInput(IEnumerable<String>, Rectangle)
Create a new OcrInput object populated with multiple Image files with a common ContentArea.
This class is IDisposable and is best initiated with a 'using' statement.
Declaration
public OcrInput(IEnumerable<string> FilePaths, Rectangle ContentArea)
Parameters
Type | Name | Description |
---|---|---|
System.Collections.Generic.IEnumerable<System.String> | FilePaths | An IEnumerable of paths to Image or PDF files. |
System.Drawing.Rectangle | ContentArea | Specifies a region of each image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
OcrInput(Bitmap)
Create a new OcrInput object populated with a System.Drawing.Bitmap.
This class is IDisposable and is best initiated with a 'using' statement.
Declaration
public OcrInput(Bitmap Bitmap)
Parameters
Type | Name | Description |
---|---|---|
System.Drawing.Bitmap | Bitmap | A System.Drawing.Bitmap. |
OcrInput(Bitmap, Rectangle)
Create a new OcrInput object populated with a System.Drawing.Bitmap.
This class is IDisposable and is best initiated with a 'using' statement.
Declaration
public OcrInput(Bitmap Bitmap, Rectangle ContentArea)
Parameters
Type | Name | Description |
---|---|---|
System.Drawing.Bitmap | Bitmap | A System.Drawing.Bitmap. |
System.Drawing.Rectangle | ContentArea | Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
OcrInput(Image)
Create a new OcrInput object populated with a System.Drawing.Image.
This class is IDisposable and is best initiated with a 'using' statement.
Declaration
public OcrInput(Image Image)
Parameters
Type | Name | Description |
---|---|---|
System.Drawing.Image | Image | System.Drawing.Image |
OcrInput(Image, Rectangle)
Create a new OcrInput object populated with a specified region of a System.Drawing.Image.
This class is IDisposable and is best initiated with a 'using' statement.
Declaration
public OcrInput(Image Image, Rectangle ContentArea)
Parameters
Type | Name | Description |
---|---|---|
System.Drawing.Image | Image | System.Drawing.Image |
System.Drawing.Rectangle | ContentArea | Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
OcrInput(Rectangle, Object[])
Create a new OcrInput object populated with one or more images sharing a common crop area.
This class is IDisposable and is best initiated with a 'using' statement.
This constructor accepts any number of images as File Paths, Streams, Byte Arrays, System.Drawing.Image and System.Drawing.Bitmap. Each will become a OcrInput.Page.
Declaration
public OcrInput(Rectangle ContentArea, params object[] Inputs)
Parameters
Type | Name | Description |
---|---|---|
System.Drawing.Rectangle | ContentArea | Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
System.Object[] | Inputs | Any number of images as File Paths, Streams, Byte Arrays, System.Drawing.Image and System.Drawing.Bitmap. |
OcrInput(Stream)
Create a new OcrInput object populated with image data as a Stream.
This class is IDisposable and is best initiated with a 'using' statement.
Declaration
public OcrInput(Stream Stream)
Parameters
Type | Name | Description |
---|---|---|
System.IO.Stream | Stream | Steam containing an Image or PDF file. |
OcrInput(Stream, Rectangle)
Create a new OcrInput object populated with image data as a Stream.
This class is IDisposable and is best initiated with a 'using' statement.
Declaration
public OcrInput(Stream Stream, Rectangle ContentArea)
Parameters
Type | Name | Description |
---|---|---|
System.IO.Stream | Stream | Steam containing an Image or PDF file. |
System.Drawing.Rectangle | ContentArea | Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
OcrInput(Object[])
Create a new OcrInput object to which images and PDF pages may be added.
This class is IDisposable and is best initiated with a 'using' statement.
Declaration
public OcrInput(params object[] Inputs)
Parameters
Type | Name | Description |
---|---|---|
System.Object[] | Inputs | Any number of images as File Paths, Streams, Byte Arrays, System.Drawing.Image and System.Drawing.Bitmap. |
OcrInput(String)
Create a new OcrInput object populated with an Image file or PDF document.
This class is IDisposable and is best initiated with a 'using' statement.
Declaration
public OcrInput(string FilePath)
Parameters
Type | Name | Description |
---|---|---|
System.String | FilePath | Path to an Image or PDF file. |
OcrInput(String, Rectangle)
Create a new OcrInput object populated with an Image file.
This class is IDisposable and is best initiated with a 'using' statement.
Declaration
public OcrInput(string FilePath, Rectangle ContentArea)
Parameters
Type | Name | Description |
---|---|---|
System.String | FilePath | Path to an Image or PDF file. |
System.Drawing.Rectangle | ContentArea | Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
Fields
MinimumDPI
OcrInput will automatically detect images with a resolution below MinimumDPI (typically 150DPI) and upscale them to TargetDPI to avoid poor OCR speed and results.
To disable this automatic image enhancement functionality, set MinimumDPI = null.
Declaration
public int? MinimumDPI
Field Value
Type | Description |
---|---|
System.Nullable<System.Int32> |
TargetDPI
The resolution that low resolution images will be enhanced to. See MinimumDPI.
TargetDPI also determines the resolution at which PDF documents will be sampled.
Declaration
public int TargetDPI
Field Value
Type | Description |
---|---|
System.Int32 |
Properties
Pages
Access to every OcrInput.Page within this OcrInput
Declaration
public IEnumerable<OcrInput.Page> Pages { get; }
Property Value
Type | Description |
---|---|
System.Collections.Generic.IEnumerable<OcrInput.Page> |
Title
A title for the OcrInput document. This is relevant as it becomes metadata when exporting searchable PDFs and HOCR files from IronTesseract results.
Declaration
public string Title { get; set; }
Property Value
Type | Description |
---|---|
System.String |
Methods
AddImage(Byte[])
Adds a byte array containing the binary data of an image to this OcrInput.
Declaration
public void AddImage(byte[] ImageBytes)
Parameters
Type | Name | Description |
---|---|---|
System.Byte[] | ImageBytes | A byte[] containing an image. Supported formats include JPEG, TIFF, GIF, PNG, PDF, BMP. |
AddImage(Byte[], Rectangle)
Adds a byte array containing the binary data of an image to this OcrInput.
Declaration
public void AddImage(byte[] ImageBytes, Rectangle ContentArea)
Parameters
Type | Name | Description |
---|---|---|
System.Byte[] | ImageBytes | A byte[] containing an image. Supported formats include JPEG, TIFF, GIF, PNG, PDF, BMP. |
System.Drawing.Rectangle | ContentArea | Optionally specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
AddImage(Bitmap)
Adds a System.Drawing.Bitmap to this OcrInput.
Declaration
public void AddImage(Bitmap Bitmap)
Parameters
Type | Name | Description |
---|---|---|
System.Drawing.Bitmap | Bitmap | A managed Bitmap object. |
AddImage(Bitmap, Rectangle)
Adds a System.Drawing.Bitmap to this OcrInput.
Declaration
public void AddImage(Bitmap Bitmap, Rectangle ContentArea)
Parameters
Type | Name | Description |
---|---|---|
System.Drawing.Bitmap | Bitmap | A managed Bitmap object. |
System.Drawing.Rectangle | ContentArea | Optionally specifies a region of the bitmap to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
AddImage(Image)
Adds a System.Drawing.Image to this OcrInput.
Declaration
public void AddImage(Image Image)
Parameters
Type | Name | Description |
---|---|---|
System.Drawing.Image | Image | A managed Image object. |
AddImage(Image, Rectangle)
Adds a System.Drawing.Image to this OcrInput.
Declaration
public void AddImage(Image Image, Rectangle ContentArea)
Parameters
Type | Name | Description |
---|---|---|
System.Drawing.Image | Image | A managed Image object. |
System.Drawing.Rectangle | ContentArea | Optionally specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
AddImage(Stream)
Adds a System.IO.Stream containing the raw data of an image to this OcrInput.
Declaration
public void AddImage(Stream ImageStream)
Parameters
Type | Name | Description |
---|---|---|
System.IO.Stream | ImageStream | A Stream containing an image. Supported formats include JPEG, TIFF, GIF, PNG, PDF, BMP. |
AddImage(Stream, Rectangle)
Adds a System.IO.Stream containing the raw data of an image to this OcrInput.
Declaration
public void AddImage(Stream ImageStream, Rectangle ContentArea)
Parameters
Type | Name | Description |
---|---|---|
System.IO.Stream | ImageStream | A Stream containing an image. Supported formats include JPEG, TIFF, GIF, PNG, PDF, BMP. |
System.Drawing.Rectangle | ContentArea | Optionally specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
AddImage(String)
Adds an image file to this OcrInput.
Declaration
public void AddImage(string ImagePath)
Parameters
Type | Name | Description |
---|---|---|
System.String | ImagePath | File path to an image file. Supported formats include JPEG, TIFF, GIF, PNG, PDF, BMP. |
AddImage(String, Rectangle)
Adds an image file to this OcrInput.
Declaration
public void AddImage(string ImagePath, Rectangle ContentArea)
Parameters
Type | Name | Description |
---|---|---|
System.String | ImagePath | File path to an image file. Supported formats include JPEG, TIFF, GIF, PNG, PDF, BMP. |
System.Drawing.Rectangle | ContentArea | Optionally specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
AddMultiFrameTiff(Byte[])
Adds a byte[] containing the binary data of a TIFF image with multiple pages to this OcrInput.
Declaration
public void AddMultiFrameTiff(byte[] TiffBytes)
Parameters
Type | Name | Description |
---|---|---|
System.Byte[] | TiffBytes | A byte[] containing a TIFF file. |
AddMultiFrameTiff(Byte[], Rectangle)
Adds a byte[] containing the binary data of a TIFF image with multiple pages to this OcrInput.
Declaration
public void AddMultiFrameTiff(byte[] TiffBytes, Rectangle ContentArea)
Parameters
Type | Name | Description |
---|---|---|
System.Byte[] | TiffBytes | A byte[] containing a TIFF file. |
System.Drawing.Rectangle | ContentArea | Optionally specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
AddMultiFrameTiff(Stream)
Adds a Stream containing the binary data of a TIFF image with multiple pages to this OcrInput.
Declaration
public void AddMultiFrameTiff(Stream TiffSteam)
Parameters
Type | Name | Description |
---|---|---|
System.IO.Stream | TiffSteam | A System.IO.Stream containing a TIFF file . |
AddMultiFrameTiff(Stream, Rectangle)
Adds a Stream containing the binary data of a TIFF image with multiple pages to this OcrInput.
Declaration
public void AddMultiFrameTiff(Stream TiffSteam, Rectangle ContentArea)
Parameters
Type | Name | Description |
---|---|---|
System.IO.Stream | TiffSteam | A System.IO.Stream containing a TIFF file . |
System.Drawing.Rectangle | ContentArea | Optionally specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
AddMultiFrameTiff(String)
Adds a Multi-frame TIFF file to the OcrInput document.
Each Frame will become a page of this OcrInput
Declaration
public void AddMultiFrameTiff(string ImagePath)
Parameters
Type | Name | Description |
---|---|---|
System.String | ImagePath | A file path to a TIFF image. |
AddMultiFrameTiff(String, Rectangle)
Adds a Multi-frame TIFF file to the OcrInput document.
Each Frame will become a page of this OcrInput
Declaration
public void AddMultiFrameTiff(string ImagePath, Rectangle ContentArea)
Parameters
Type | Name | Description |
---|---|---|
System.String | ImagePath | A file path to a TIFF image. |
System.Drawing.Rectangle | ContentArea | Optionally specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
AddPdf(Byte[], String, Nullable<Rectangle>, Nullable<Int32>)
Adds all pages of a PDF document to this OcrInput.
Declaration
public void AddPdf(byte[] PdfBytes, string Password = null, Rectangle? ContentArea = default(Rectangle? ), int? DPI)
Parameters
Type | Name | Description |
---|---|---|
System.Byte[] | PdfBytes | Binary data of a PDF file |
System.String | Password | Optional Password to unlock an encrypted or protected PDF |
System.Nullable<System.Drawing.Rectangle> | ContentArea | Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
System.Nullable<System.Int32> | DPI | Resolution at which to sample the PDF. If null or zero will use TargetDPI |
AddPdf(Stream, String, Nullable<Rectangle>, Nullable<Int32>)
Adds all pages of a PDF document to this OcrInput.
Declaration
public void AddPdf(Stream PdfStream, string Password = null, Rectangle? ContentArea = default(Rectangle? ), int? DPI)
Parameters
Type | Name | Description |
---|---|---|
System.IO.Stream | PdfStream | System.IO.Stream containing a PDF |
System.String | Password | Optional Password to unlock an encrypted or protected PDF |
System.Nullable<System.Drawing.Rectangle> | ContentArea | Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
System.Nullable<System.Int32> | DPI | Resolution at which to sample the PDF. If null or zero will use TargetDPI |
AddPdf(String, String, Nullable<Rectangle>, Nullable<Int32>)
Adds all pages of a PDF document to this OcrInput.
Declaration
public void AddPdf(string PdfPath, string Password = null, Rectangle? ContentArea = default(Rectangle? ), int? DPI)
Parameters
Type | Name | Description |
---|---|---|
System.String | PdfPath | String file path to the PDF |
System.String | Password | Optional Password to unlock an encrypted or protected PDF |
System.Nullable<System.Drawing.Rectangle> | ContentArea | Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
System.Nullable<System.Int32> | DPI | Resolution at which to sample the PDF. If null or zero will use TargetDPI |
AddPdfPage(Byte[], Int32, String, Nullable<Rectangle>, Nullable<Int32>)
Adds one page of a PDF document to this OcrInput.
Declaration
public void AddPdfPage(byte[] PdfBytes, int Page, string Password = null, Rectangle? ContentArea = default(Rectangle? ), int? DPI)
Parameters
Type | Name | Description |
---|---|---|
System.Byte[] | PdfBytes | Binary data of a PDF file |
System.Int32 | Page | The page number within the PDF to read. Zero based (first page is number 0) |
System.String | Password | Optional Password to unlock an encrypted or protected PDF |
System.Nullable<System.Drawing.Rectangle> | ContentArea | Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
System.Nullable<System.Int32> | DPI | Resolution at which to sample the PDF. If null or zero will use TargetDPI |
AddPdfPage(Stream, Int32, String, Nullable<Rectangle>, Nullable<Int32>)
Adds one page of a PDF document to this OcrInput.
Declaration
public void AddPdfPage(Stream PdfStream, int Page, string Password = null, Rectangle? ContentArea = default(Rectangle? ), int? DPI)
Parameters
Type | Name | Description |
---|---|---|
System.IO.Stream | PdfStream | System.IO.Stream containing a PDF |
System.Int32 | Page | The page number within the PDF to read. Zero based (first page is number 0) |
System.String | Password | Optional Password to unlock an encrypted or protected PDF |
System.Nullable<System.Drawing.Rectangle> | ContentArea | Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
System.Nullable<System.Int32> | DPI | Resolution at which to sample the PDF. If null or zero will use TargetDPI |
AddPdfPage(String, Int32, String, Nullable<Rectangle>, Nullable<Int32>)
Adds one page of a PDF document to this OcrInput.
Declaration
public void AddPdfPage(string PdfPath, int Page, string Password = null, Rectangle? ContentArea = default(Rectangle? ), int? DPI)
Parameters
Type | Name | Description |
---|---|---|
System.String | PdfPath | String file path to the PDF |
System.Int32 | Page | The page number within the PDF to read. Zero based (first page is number 0) |
System.String | Password | Optional Password to unlock an encrypted or protected PDF |
System.Nullable<System.Drawing.Rectangle> | ContentArea | Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
System.Nullable<System.Int32> | DPI | Resolution at which to sample the PDF. If null or zero will use TargetDPI |
AddPdfPages(Byte[], IEnumerable<Int32>, String, Nullable<Rectangle>, Nullable<Int32>)
Adds selected pages of a PDF document to this OcrInput.
Declaration
public void AddPdfPages(byte[] PdfBytes, IEnumerable<int> Pages, string Password = null, Rectangle? ContentArea = default(Rectangle? ), int? DPI)
Parameters
Type | Name | Description |
---|---|---|
System.Byte[] | PdfBytes | Binary data of a PDF file |
System.Collections.Generic.IEnumerable<System.Int32> | Pages | The page numbers within the PDF to read. Zero based (first page is number 0) |
System.String | Password | Optional Password to unlock an encrypted or protected PDF |
System.Nullable<System.Drawing.Rectangle> | ContentArea | Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
System.Nullable<System.Int32> | DPI | Resolution at which to sample the PDF. If null or zero will use TargetDPI |
AddPdfPages(Stream, IEnumerable<Int32>, String, Nullable<Rectangle>, Nullable<Int32>)
Adds selected pages of a PDF document to this OcrInput.
Declaration
public void AddPdfPages(Stream PdfStream, IEnumerable<int> Pages, string Password = null, Rectangle? ContentArea = default(Rectangle? ), int? DPI)
Parameters
Type | Name | Description |
---|---|---|
System.IO.Stream | PdfStream | System.IO.Stream containing a PDF |
System.Collections.Generic.IEnumerable<System.Int32> | Pages | The page numbers within the PDF to read. Zero based (first page is number 0) |
System.String | Password | Optional Password to unlock an encrypted or protected PDF |
System.Nullable<System.Drawing.Rectangle> | ContentArea | Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
System.Nullable<System.Int32> | DPI | Resolution at which to sample the PDF. If null or zero will use TargetDPI |
AddPdfPages(String, IEnumerable<Int32>, String, Nullable<Rectangle>, Nullable<Int32>)
Adds selected pages from a PDF document into this OcrInput.
Declaration
public void AddPdfPages(string PdfPath, IEnumerable<int> Pages, string Password = null, Rectangle? ContentArea = default(Rectangle? ), int? DPI)
Parameters
Type | Name | Description |
---|---|---|
System.String | PdfPath | String file path to the PDF |
System.Collections.Generic.IEnumerable<System.Int32> | Pages | |
System.String | Password | Optional Password to unlock an encrypted or protected PDF |
System.Nullable<System.Drawing.Rectangle> | ContentArea | Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
System.Nullable<System.Int32> | DPI | Resolution at which to sample the PDF. If null or zero will use TargetDPI |
AddRange(OcrInput)
Combines 2 instances of OcrInput, appending pages to the end of this OcrInput document.
Declaration
public void AddRange(OcrInput Range)
Parameters
Type | Name | Description |
---|---|---|
OcrInput | Range | an Ocr Input to be appended to this instance. |
AddRange(IEnumerable<Object>)
Adds an IEnumerable of Images to this OcrInput.
Supports FilePaths, System.Drawing.Image, Bitmaps, Streams & ByteArray.
Declaration
public void AddRange(IEnumerable<object> Images)
Parameters
Type | Name | Description |
---|---|---|
System.Collections.Generic.IEnumerable<System.Object> | Images | Adds an IEnumerable of objects representing images. Supports FilePaths, System.Drawing.Image, Bitmaps, Streams & byte[]. |
AddRange(IEnumerable<Object>, Rectangle)
Adds an IEnumerable of Images to this OcrInput.
Supports FilePaths, System.Drawing.Image, Bitmaps, Streams & ByteArray.
Declaration
public void AddRange(IEnumerable<object> Images, Rectangle ContentArea)
Parameters
Type | Name | Description |
---|---|---|
System.Collections.Generic.IEnumerable<System.Object> | Images | Adds an IEnumerable of objects representing images. Supports FilePaths, System.Drawing.Image, Bitmaps, Streams & byte[]. |
System.Drawing.Rectangle | ContentArea | Specifies a region of the images to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
Binarize()
This image filter turns every pixel black or white with no middle ground. May Improve OCR performance cases of very low contrast of text to background.
Declaration
public OcrInput Binarize()
Returns
Type | Description |
---|---|
OcrInput | This OcrInput object allowing for LINQ style fluent notation. |
Contrast()
Increases contrast automatically. This filter often improves OCR speed and accuracy in low contrast scans. Flattens Alpha channels to white.
Declaration
public OcrInput Contrast()
Returns
Type | Description |
---|---|
OcrInput | This OcrInput object allowing for LINQ style fluent notation. |
DeepCleanBackgroundNoise()
Heavy background noise removal. Only use this filter in case extreme document background noise is known, because this filter will also risk reducing OCR accuracy of clean documents, and is very CPU expensive.
Declaration
public OcrInput DeepCleanBackgroundNoise()
Returns
Type | Description |
---|---|
OcrInput | This OcrInput object allowing for LINQ style fluent notation. |
DeNoise()
Removes digital noise. This filter should only be used where noise is expected. Flattens Alpha channels to white.
Declaration
public OcrInput DeNoise()
Returns
Type | Description |
---|---|
OcrInput | This OcrInput object allowing for LINQ style fluent notation. |
Deskew(Int32, Boolean)
Rotates an image so it is the right way up and orthogonal. This is very useful for OCR because Tesseract tolerance for skewed scans can be as low as 5 degrees.
Declaration
public OcrInput Deskew(int MaxAngle = 85, bool EveryPageTheSameAmount = false)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | MaxAngle | Maximum expected angle of skew |
System.Boolean | EveryPageTheSameAmount | DO we know that every page of the OcrInput is skewed by the same angle. Enhances speed if true. |
Returns
Type | Description |
---|---|
OcrInput | This OcrInput object allowing for LINQ style fluent notation. |
Dilate(Boolean)
Advanced Morphology. Opposite of Erode(Boolean).
Declaration
public OcrInput Dilate(bool use3x3 = false)
Parameters
Type | Name | Description |
---|---|---|
System.Boolean | use3x3 | 2x2 is default morphology |
Returns
Type | Description |
---|---|
OcrInput | This OcrInput object allowing for LINQ style fluent notation. |
Dispose()
OcrInput is IDisposable. For best practice and to avoid memory leaks, remember to dispose, or initialize instances with a "using" statement.
Declaration
public void Dispose()
EnhanceResolution(Int32)
Enhances the resolution of low quality images. This filter is not often needed because MinimumDPI and TargetDPI will automatically catch and resolve low resolution inputs.
May not work for all images if their metadata is corrupted.
Declaration
public OcrInput EnhanceResolution(int TargetDPI = 225)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | TargetDPI | The target DPI to resample to. |
Returns
Type | Description |
---|---|
OcrInput | This OcrInput object allowing for LINQ style fluent notation. |
EnhanceResolution(Int32, Int32)
Enhances the resolution of low quality images. This filter is not often needed because MinimumDPI and TargetDPI will automatically catch and resolve low resolution inputs.
May not work for all images if their metadata is corrupted.
Declaration
public OcrInput EnhanceResolution(int TargetDPI, int MinimumDPI)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | TargetDPI | The target DPI to resample to. |
System.Int32 | MinimumDPI | Only resamples images below this DPI threshold. |
Returns
Type | Description |
---|---|
OcrInput | This OcrInput object allowing for LINQ style fluent notation. |
Erode(Boolean)
Advanced Morphology. Opposite of Dilate(Boolean).
Declaration
public OcrInput Erode(bool use3x3 = false)
Parameters
Type | Name | Description |
---|---|---|
System.Boolean | use3x3 | 2x2 is default morphology |
Returns
Type | Description |
---|---|
OcrInput | This OcrInput object allowing for LINQ style fluent notation. |
Finalize()
OcrInput has a safe finaliser that cleans up undisposed native images in memory.
Declaration
protected void Finalize()
FromPdf(Byte[], String, Nullable<Rectangle>, Nullable<Int32>)
Create a new OcrInput object populated with a PDF as binary data.
This class is IDisposable and is best initiated with a 'using' statement.
This constructor accepts any number of images as File Paths, Streams, Byte Arrays, System.Drawing.Image and System.Drawing.Bitmap. Each will become a OcrInput.Page.
Declaration
public static OcrInput FromPdf(byte[] PdfBytes, string Password = null, Rectangle? ContentArea = default(Rectangle? ), int? DPI)
Parameters
Type | Name | Description |
---|---|---|
System.Byte[] | PdfBytes | The PDF document as binary data in memory. |
System.String | Password | Optional Password to unlock an encrypted or protected PDF |
System.Nullable<System.Drawing.Rectangle> | ContentArea | Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
System.Nullable<System.Int32> | DPI | Resolution at which to sample the PDF. If null or zero will use TargetDPI |
Returns
Type | Description |
---|---|
OcrInput |
FromPdf(Stream, String, Nullable<Rectangle>, Nullable<Int32>)
Create a new OcrInput object populated with a PDF as a Stream.
This class is IDisposable and is best initiated with a 'using' statement.
This constructor accepts any number of images as File Paths, Streams, Byte Arrays, System.Drawing.Image and System.Drawing.Bitmap. Each will become a OcrInput.Page.
Declaration
public static OcrInput FromPdf(Stream PdfStream, string Password = null, Rectangle? ContentArea = default(Rectangle? ), int? DPI)
Parameters
Type | Name | Description |
---|---|---|
System.IO.Stream | PdfStream | The PDF document as a System.IO.Stream. |
System.String | Password | Optional Password to unlock an encrypted or protected PDF |
System.Nullable<System.Drawing.Rectangle> | ContentArea | Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
System.Nullable<System.Int32> | DPI | Resolution at which to sample the PDF. If null or zero will use TargetDPI |
Returns
Type | Description |
---|---|
OcrInput |
FromPdf(String, String, Nullable<Rectangle>, Nullable<Int32>)
Create a new OcrInput object populated with a PDF.
This class is IDisposable and is best initiated with a 'using' statement.
This constructor accepts any number of images as File Paths, Streams, Byte Arrays, System.Drawing.Image and System.Drawing.Bitmap. Each will become a OcrInput.Page.
Declaration
public static OcrInput FromPdf(string PdfPath, string Password = null, Rectangle? ContentArea = default(Rectangle? ), int? DPI = default(int? ))
Parameters
Type | Name | Description |
---|---|---|
System.String | PdfPath | File path to the PDF |
System.String | Password | Optional Password to unlock an encrypted or protected PDF |
System.Nullable<System.Drawing.Rectangle> | ContentArea | Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
System.Nullable<System.Int32> | DPI | Resolution at which to sample the PDF. If null or zero will use TargetDPI |
Returns
Type | Description |
---|---|
OcrInput |
FromPdfPage(Byte[], Int32, String, Nullable<Rectangle>, Nullable<Int32>)
Create a new OcrInput object populated with a single page from a PDF as binary data.
This class is IDisposable and is best initiated with a 'using' statement.
This constructor accepts any number of images as File Paths, Streams, Byte Arrays, System.Drawing.Image and System.Drawing.Bitmap. Each will become a OcrInput.Page.
Declaration
public static OcrInput FromPdfPage(byte[] PdfBytes, int Page, string Password = null, Rectangle? ContentArea = default(Rectangle? ), int? DPI)
Parameters
Type | Name | Description |
---|---|---|
System.Byte[] | PdfBytes | The PDF document as binary data in memory. |
System.Int32 | Page | The page number within the PDF to read. Zero based (first page is number 0) |
System.String | Password | Optional Password to unlock an encrypted or protected PDF |
System.Nullable<System.Drawing.Rectangle> | ContentArea | Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
System.Nullable<System.Int32> | DPI | Resolution at which to sample the PDF. If null or zero will use TargetDPI |
Returns
Type | Description |
---|---|
OcrInput |
FromPdfPage(Stream, Int32, String, Nullable<Rectangle>, Nullable<Int32>)
Create a new OcrInput object populated with a single page from a PDF as a Stream.
This class is IDisposable and is best initiated with a 'using' statement.
This constructor accepts any number of images as File Paths, Streams, Byte Arrays, System.Drawing.Image and System.Drawing.Bitmap. Each will become a OcrInput.Page.
Declaration
public static OcrInput FromPdfPage(Stream PdfStream, int Page, string Password = null, Rectangle? ContentArea = default(Rectangle? ), int? DPI)
Parameters
Type | Name | Description |
---|---|---|
System.IO.Stream | PdfStream | The PDF document as a System.IO.Stream. |
System.Int32 | Page | The page number within the PDF to read. Zero based (first page is number 0) |
System.String | Password | Optional Password to unlock an encrypted or protected PDF |
System.Nullable<System.Drawing.Rectangle> | ContentArea | Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
System.Nullable<System.Int32> | DPI | Resolution at which to sample the PDF. If null or zero will use TargetDPI |
Returns
Type | Description |
---|---|
OcrInput |
FromPdfPage(String, Int32, String, Nullable<Rectangle>, Nullable<Int32>)
Create a new OcrInput object populated with a single page of a PDF.
This class is IDisposable and is best initiated with a 'using' statement.
This constructor accepts any number of images as File Paths, Streams, Byte Arrays, System.Drawing.Image and System.Drawing.Bitmap. Each will become a OcrInput.Page.
Declaration
public static OcrInput FromPdfPage(string PdfPath, int Page, string Password = null, Rectangle? ContentArea = default(Rectangle? ), int? DPI)
Parameters
Type | Name | Description |
---|---|---|
System.String | PdfPath | File path to the PDF |
System.Int32 | Page | Which page of the PDF to read. Zero based (first page is number 0) |
System.String | Password | Optional Password to unlock an encrypted or protected PDF |
System.Nullable<System.Drawing.Rectangle> | ContentArea | Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
System.Nullable<System.Int32> | DPI | Resolution at which to sample the PDF. If null or zero will use TargetDPI |
Returns
Type | Description |
---|---|
OcrInput |
FromPdfPages(Byte[], IEnumerable<Int32>, String, Nullable<Rectangle>, Nullable<Int32>)
Create a new OcrInput object populated with multiple pages from a PDF as binary data.
This class is IDisposable and is best initiated with a 'using' statement.
This constructor accepts any number of images as File Paths, Streams, Byte Arrays, System.Drawing.Image and System.Drawing.Bitmap. Each will become a OcrInput.Page.
Declaration
public static OcrInput FromPdfPages(byte[] PdfBytes, IEnumerable<int> Pages, string Password = null, Rectangle? ContentArea = default(Rectangle? ), int? DPI)
Parameters
Type | Name | Description |
---|---|---|
System.Byte[] | PdfBytes | The PDF document as binary data in memory. |
System.Collections.Generic.IEnumerable<System.Int32> | Pages | List which pages of the PDF which will be read. Zero based (first page is number 0) |
System.String | Password | Optional Password to unlock an encrypted or protected PDF |
System.Nullable<System.Drawing.Rectangle> | ContentArea | Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
System.Nullable<System.Int32> | DPI | Resolution at which to sample the PDF. If null or zero will use TargetDPI |
Returns
Type | Description |
---|---|
OcrInput |
FromPdfPages(Stream, IEnumerable<Int32>, String, Nullable<Rectangle>, Nullable<Int32>)
Create a new OcrInput object populated with multiple selected pages from a PDF as a Stream.
This class is IDisposable and is best initiated with a 'using' statement.
This constructor accepts any number of images as File Paths, Streams, Byte Arrays, System.Drawing.Image and System.Drawing.Bitmap. Each will become a OcrInput.Page.
Declaration
public static OcrInput FromPdfPages(Stream PdfStream, IEnumerable<int> Pages, string Password = null, Rectangle? ContentArea = default(Rectangle? ), int? DPI)
Parameters
Type | Name | Description |
---|---|---|
System.IO.Stream | PdfStream | The PDF document as a System.IO.Stream. |
System.Collections.Generic.IEnumerable<System.Int32> | Pages | The pages of the PDF to read. Zero based (first page is number 0) |
System.String | Password | Optional Password to unlock an encrypted or protected PDF |
System.Nullable<System.Drawing.Rectangle> | ContentArea | Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
System.Nullable<System.Int32> | DPI | Resolution at which to sample the PDF. If null or zero will use TargetDPI |
Returns
Type | Description |
---|---|
OcrInput |
FromPdfPages(String, IEnumerable<Int32>, String, Nullable<Rectangle>, Nullable<Int32>)
Create a new OcrInput object populated with multiple pages from a PDF.
This class is IDisposable and is best initiated with a 'using' statement.
This constructor accepts any number of images as File Paths, Streams, Byte Arrays, System.Drawing.Image and System.Drawing.Bitmap. Each will become a OcrInput.Page.
Declaration
public static OcrInput FromPdfPages(string PdfPath, IEnumerable<int> Pages, string Password = null, Rectangle? ContentArea = default(Rectangle? ), int? DPI)
Parameters
Type | Name | Description |
---|---|---|
System.String | PdfPath | File path to the PDF |
System.Collections.Generic.IEnumerable<System.Int32> | Pages | List which pages of the PDF will be read. Zero based (first page is number 0) |
System.String | Password | Optional Password to unlock an encrypted or protected PDF |
System.Nullable<System.Drawing.Rectangle> | ContentArea | Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
System.Nullable<System.Int32> | DPI | Resolution at which to sample the PDF. If null or zero will use TargetDPI |
Returns
Type | Description |
---|---|
OcrInput |
Invert(Boolean)
Inverts every color. E.g. White becomes black : black becomes white.
Declaration
public OcrInput Invert(bool GrayScale = true)
Parameters
Type | Name | Description |
---|---|---|
System.Boolean | GrayScale | Optionally remove all color channels and return a GrayScale image. |
Returns
Type | Description |
---|---|
OcrInput | This OcrInput object allowing for LINQ style fluent notation. |
PageCount()
The number of OcrInput.Pages currently present in this OcrInput
Declaration
public int PageCount()
Returns
Type | Description |
---|---|
System.Int32 |
Rotate(Double)
Rotates images by a number of degrees clockwise. For anti-clockwise, use negative numbers. Also see Deskew(Int32, Boolean)
Declaration
public OcrInput Rotate(double Degrees)
Parameters
Type | Name | Description |
---|---|---|
System.Double | Degrees | A number of clockwise degrees. |
Returns
Type | Description |
---|---|
OcrInput | This OcrInput object allowing for LINQ style fluent notation. |
Scale(Int32, Boolean)
Scales OCRInput pages proportionally.
Declaration
public OcrInput Scale(int Percentage, bool ScaleCropArea = true)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | Percentage | The percentage scale. 100 = no effect. |
System.Boolean | ScaleCropArea | Should associated crop areas also be scaled proportionally (recommended true) |
Returns
Type | Description |
---|---|
OcrInput | This OcrInput object allowing for LINQ style fluent notation. |
Scale(Int32, Int32, Boolean)
Scales the OCRInput pages up in size.
Declaration
public OcrInput Scale(int MaxWidth, int MaxHeight, bool ScaleCropArea = true)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | MaxWidth | Maximum width in pixels. |
System.Int32 | MaxHeight | Maximum height in pixels. |
System.Boolean | ScaleCropArea | Should associated crop areas also be scaled proportionally (recommended true) |
Returns
Type | Description |
---|---|
OcrInput | This OcrInput object allowing for LINQ style fluent notation. |
Sharpen()
Sharpens blurred OCR Documents. Flattens Alpha channels to white.
Declaration
public OcrInput Sharpen()
Returns
Type | Description |
---|---|
OcrInput | This OcrInput object allowing for LINQ style fluent notation. |
ToGrayScale()
This image filter turns every pixel into a shade of grayscale. Unlikely to improve OCR accuracy but may improve speed.
Declaration
public OcrInput ToGrayScale()
Returns
Type | Description |
---|---|
OcrInput | This OcrInput object allowing for LINQ style fluent notation. |
WithTitle(String)
Adds a Title to the OcrInput Document. This title will be used when calling SaveAsHocrFile(String) and SaveAsSearchablePdf(String)
Declaration
public OcrInput WithTitle(string Title)
Parameters
Type | Name | Description |
---|---|---|
System.String | Title | The document title as a string. |
Returns
Type | Description |
---|---|
OcrInput | This OcrInput object allowing for LINQ style fluent notation. |