Class OcrInput
OcrInput provides a robust class for preparing one or more Image Files, PDFs, IronSoftware.Drawing.AnyBitmap, SixLabors.ImageSharp.Image, System.Drawing.Bitmap, other famous image library Objects, Streams and Binary Image data for OCR. Instances of OcrInput can be read by the IronTesseract class.
We recognise that much of the quality of OCR results depends on preparing images to be read. This class allows developers to enhance their scanned documents provide faster, more accurate OCR results using filters such as: EnhanceResolution(Int32), DeNoise(Boolean), ToGrayScale(), IronOcr.OcrInput.Deskew(IronOcr.OcrLanguage,System.Int32,IronOcr.OrientationConfidence), Rotate(Double) and Sharpen().
Supports for multi-paged OCR input.
Inheritance
Implements
Namespace: IronOcr
Assembly: IronOcr.dll
Syntax
public class OcrInput : Object, IDisposable
Constructors
OcrInput(AnyBitmap)
Create a new OcrInput object populated with a IronSoftware.Drawing.AnyBitmap.
This class is IDisposable and is best initiated with a 'using' statement.
Declaration
public OcrInput(AnyBitmap Bitmap)
Parameters
Type | Name | Description |
---|---|---|
IronSoftware.Drawing.AnyBitmap | Bitmap | A IronSoftware.Drawing.AnyBitmap. |
OcrInput(AnyBitmap, CropRectangle)
Create a new OcrInput object populated with a IronSoftware.Drawing.AnyBitmap.
This class is IDisposable and is best initiated with a 'using' statement.
Declaration
public OcrInput(AnyBitmap Bitmap, CropRectangle ContentArea)
Parameters
Type | Name | Description |
---|---|---|
IronSoftware.Drawing.AnyBitmap | Bitmap | A IronSoftware.Drawing.AnyBitmap. |
IronSoftware.Drawing.CropRectangle | ContentArea | Specifies a region of the image to extract text from as a IronSoftware.Drawing.CropRectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
OcrInput(CropRectangle, Object[])
Create a new OcrInput object populated with one or more images sharing a common crop area.
This class is IDisposable and is best initiated with a 'using' statement.
This constructor accepts any number of images as File Paths, Streams, Byte Arrays, IronSoftware.Drawing.AnyBitmap, SixLabors.ImageSharp.Image, System.Drawing.Bitmap, or System.Drawing.Image. Each will become a OcrInput.Page.
Declaration
public OcrInput(CropRectangle ContentArea, params object[] Inputs)
Parameters
Type | Name | Description |
---|---|---|
IronSoftware.Drawing.CropRectangle | ContentArea | Specifies a region of the image to extract text from as a IronSoftware.Drawing.CropRectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
System.Object[] | Inputs | Any number of images as File Paths, Streams, Byte Arrays, SixLabors.ImageSharp.Image, System.Drawing.Bitmap, or System.Drawing.Image. |
OcrInput(Image)
Create a new OcrInput object populated with a SixLabors.ImageSharp.Image.
This class is IDisposable and is best initiated with a 'using' statement.
Declaration
public OcrInput(Image Image)
Parameters
Type | Name | Description |
---|---|---|
SixLabors.ImageSharp.Image | Image | SixLabors.ImageSharp.Image |
OcrInput(Image, CropRectangle)
Create a new OcrInput object populated with a specified region of a SixLabors.ImageSharp.Image.
This class is IDisposable and is best initiated with a 'using' statement.
Declaration
public OcrInput(Image Image, CropRectangle ContentArea)
Parameters
Type | Name | Description |
---|---|---|
SixLabors.ImageSharp.Image | Image | SixLabors.ImageSharp.Image |
IronSoftware.Drawing.CropRectangle | ContentArea | Specifies a region of the image to extract text from as a IronSoftware.Drawing.CropRectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
OcrInput(Byte[])
Create a new OcrInput object populated with an Image file as binary data.
This class is IDisposable and is best initiated with a 'using' statement.
Declaration
public OcrInput(byte[] Bytes)
Parameters
Type | Name | Description |
---|---|---|
System.Byte[] | Bytes | Bytes of an Image or PDF file. |
OcrInput(Byte[], CropRectangle)
Create a new OcrInput object populated with an Image file as binary data.
This class is IDisposable and is best initiated with a 'using' statement.
Declaration
public OcrInput(byte[] Bytes, CropRectangle ContentArea)
Parameters
Type | Name | Description |
---|---|---|
System.Byte[] | Bytes | Bytes of an Image or PDF file. |
IronSoftware.Drawing.CropRectangle | ContentArea | Specifies a region of each image to extract text from as a IronSoftware.Drawing.CropRectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
OcrInput(IEnumerable<AnyBitmap>)
Create a new OcrInput object populated with multiple IronSoftware.Drawing.AnyBitmap.
This class is IDisposable and is best initiated with a 'using' statement.
Declaration
public OcrInput(IEnumerable<AnyBitmap> Bitmaps)
Parameters
Type | Name | Description |
---|---|---|
System.Collections.Generic.IEnumerable<IronSoftware.Drawing.AnyBitmap> | Bitmaps | An IEnumerable of IronSoftware.Drawing.AnyBitmap. |
OcrInput(IEnumerable<AnyBitmap>, CropRectangle)
Create a new OcrInput object populated with multiple IronSoftware.Drawing.AnyBitmap sharing a common ContentArea.
This class is IDisposable and is best initiated with a 'using' statement.
Declaration
public OcrInput(IEnumerable<AnyBitmap> Bitmaps, CropRectangle ContentArea)
Parameters
Type | Name | Description |
---|---|---|
System.Collections.Generic.IEnumerable<IronSoftware.Drawing.AnyBitmap> | Bitmaps | An IEnumerable of IronSoftware.Drawing.AnyBitmap. |
IronSoftware.Drawing.CropRectangle | ContentArea | Specifies a region of the image to extract text from as a IronSoftware.Drawing.CropRectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
OcrInput(IEnumerable<Image>)
Create a new OcrInput object populated with any number of SixLabors.ImageSharp.Image.
This class is IDisposable and is best initiated with a 'using' statement.
Declaration
public OcrInput(IEnumerable<Image> Images)
Parameters
Type | Name | Description |
---|---|---|
System.Collections.Generic.IEnumerable<SixLabors.ImageSharp.Image> | Images | Any Number of SixLabors.ImageSharp.Image |
OcrInput(IEnumerable<Image>, CropRectangle)
Create a new OcrInput object populated with any number of SixLabors.ImageSharp.Image Image sharing a common ContentArea.
This class is IDisposable and is best initiated with a 'using' statement.
Declaration
public OcrInput(IEnumerable<Image> Images, CropRectangle ContentArea)
Parameters
Type | Name | Description |
---|---|---|
System.Collections.Generic.IEnumerable<SixLabors.ImageSharp.Image> | Images | Any Number of SixLabors.ImageSharp.Image |
IronSoftware.Drawing.CropRectangle | ContentArea | Specifies a region of the image to extract text from as a IronSoftware.Drawing.CropRectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
OcrInput(IEnumerable<Byte[]>)
Create a new OcrInput object populated with the binary data of multiple Images with a common ContentArea.
This class is IDisposable and is best initiated with a 'using' statement.
Declaration
public OcrInput(IEnumerable<byte[]> Bytes)
Parameters
Type | Name | Description |
---|---|---|
System.Collections.Generic.IEnumerable<System.Byte[]> | Bytes | An IEnumerable of byte arrays containing Image or PDF files. |
OcrInput(IEnumerable<Byte[]>, CropRectangle)
Create a new OcrInput object populated with the binary data of multiple Images with a common ContentArea.
This class is IDisposable and is best initiated with a 'using' statement.
Declaration
public OcrInput(IEnumerable<byte[]> Bytes, CropRectangle ContentArea)
Parameters
Type | Name | Description |
---|---|---|
System.Collections.Generic.IEnumerable<System.Byte[]> | Bytes | An IEnumerable of byte arrays containing Image or PDF files. |
IronSoftware.Drawing.CropRectangle | ContentArea | Specifies a region of each image to extract text from as a IronSoftware.Drawing.CropRectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
OcrInput(IEnumerable<Stream>)
Create a new OcrInput object populated with multiple images as Streams.
This class is IDisposable and is best initiated with a 'using' statement.
Declaration
public OcrInput(IEnumerable<Stream> Streams)
Parameters
Type | Name | Description |
---|---|---|
System.Collections.Generic.IEnumerable<System.IO.Stream> | Streams | Steam containing an Image or PDF file. |
OcrInput(IEnumerable<Stream>, CropRectangle)
Create a new OcrInput object populated with multiple images as Streams sharing a common ContentArea.
This class is IDisposable and is best initiated with a 'using' statement.
Declaration
public OcrInput(IEnumerable<Stream> Streams, CropRectangle ContentArea)
Parameters
Type | Name | Description |
---|---|---|
System.Collections.Generic.IEnumerable<System.IO.Stream> | Streams | Steam containing an Image or PDF file. |
IronSoftware.Drawing.CropRectangle | ContentArea | Specifies a region of the image to extract text from as a IronSoftware.Drawing.CropRectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
OcrInput(IEnumerable<String>)
Create a new OcrInput object populated with multiple Image files.
This class is IDisposable and is best initiated with a 'using' statement.
Declaration
public OcrInput(IEnumerable<string> FilePaths)
Parameters
Type | Name | Description |
---|---|---|
System.Collections.Generic.IEnumerable<System.String> | FilePaths | An IEnumerable of paths to Image or PDF files. |
OcrInput(IEnumerable<String>, CropRectangle)
Create a new OcrInput object populated with multiple Image files with a common ContentArea.
This class is IDisposable and is best initiated with a 'using' statement.
Declaration
public OcrInput(IEnumerable<string> FilePaths, CropRectangle ContentArea)
Parameters
Type | Name | Description |
---|---|---|
System.Collections.Generic.IEnumerable<System.String> | FilePaths | An IEnumerable of paths to Image or PDF files. |
IronSoftware.Drawing.CropRectangle | ContentArea | Specifies a region of each image to extract text from as a IronSoftware.Drawing.CropRectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
OcrInput(Stream)
Create a new OcrInput object populated with image data as a Stream.
This class is IDisposable and is best initiated with a 'using' statement.
Declaration
public OcrInput(Stream Stream)
Parameters
Type | Name | Description |
---|---|---|
System.IO.Stream | Stream | Steam containing an Image or PDF file. |
OcrInput(Stream, CropRectangle)
Create a new OcrInput object populated with image data as a Stream.
This class is IDisposable and is best initiated with a 'using' statement.
Declaration
public OcrInput(Stream Stream, CropRectangle ContentArea)
Parameters
Type | Name | Description |
---|---|---|
System.IO.Stream | Stream | Steam containing an Image or PDF file. |
IronSoftware.Drawing.CropRectangle | ContentArea | Specifies a region of the image to extract text from as a IronSoftware.Drawing.CropRectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
OcrInput(Object[])
Create a new OcrInput object to which images and PDF pages may be added.
This class is IDisposable and is best initiated with a 'using' statement.
Declaration
public OcrInput(params object[] Inputs)
Parameters
Type | Name | Description |
---|---|---|
System.Object[] | Inputs | Any number of images as File Paths, Streams, Byte Arrays, IronSoftware.Drawing.AnyBitmap and SixLabors.ImageSharp.Image. |
OcrInput(String)
Create a new OcrInput object populated with an Image file or PDF document.
This class is IDisposable and is best initiated with a 'using' statement.
Declaration
public OcrInput(string FilePath)
Parameters
Type | Name | Description |
---|---|---|
System.String | FilePath | Path to an Image or PDF file. |
OcrInput(String, CropRectangle, Nullable<Int32>)
Create a new OcrInput object populated with an Image file.
This class is IDisposable and is best initiated with a 'using' statement.
Declaration
public OcrInput(string FilePath, CropRectangle ContentArea, Nullable<int> DPI = null)
Parameters
Type | Name | Description |
---|---|---|
System.String | FilePath | Path to an Image or PDF file. |
IronSoftware.Drawing.CropRectangle | ContentArea | Specifies a region of the image to extract text from as a IronSoftware.Drawing.CropRectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
System.Nullable<System.Int32> | DPI | Optional target DPI for the input content |
Fields
OriginaPdfPageDimensions
Declaration
public List<PdfPage> OriginaPdfPageDimensions
Field Value
Type | Description |
---|---|
System.Collections.Generic.List<IronSoftware.Pdfium.PdfPage> |
TargetDPI
The resolution that low resolution images will be enhanced to. To disable upscaling, set this to 0 (will affect read quality).
TargetDPI also determines the resolution at which PDF documents will be sampled.
Declaration
public int TargetDPI
Field Value
Type | Description |
---|---|
System.Int32 |
Properties
Pages
Access to every OcrInput.Page within this OcrInput
Declaration
public List<OcrInput.Page> Pages { get; }
Property Value
Type | Description |
---|---|
System.Collections.Generic.List<OcrInput.Page> |
Title
A title for the OcrInput document. This is relevant as it becomes metadata when exporting searchable PDFs and HOCR files from IronTesseract results.
Declaration
public string Title { get; set; }
Property Value
Type | Description |
---|---|
System.String |
Methods
AdaptiveThreshold(Nullable<Single>)
Applies Bradley Adaptive Threshold to the image.
Adaptive thresholding is the method where the threshold value is calculated for smaller regions and therefore, there will be different threshold values for different regions.
Declaration
public OcrInput AdaptiveThreshold(Nullable<float> thresholdLimit = null)
Parameters
Type | Name | Description |
---|---|---|
System.Nullable<System.Single> | thresholdLimit | Threshold limit (0.0-1.0) to consider for binarization. 0.0 to consider threshold is completely white 1.0 to consider threshold is completely black |
Returns
Type | Description |
---|---|
OcrInput | This OcrInput object allowing for LINQ style fluent notation. |
Add(OcrInput, CropRectangle)
Adds all pages of an OcrInput to this OcrInput.
Declaration
public void Add(OcrInput imageAsOcrInput, CropRectangle ContentArea = null)
Parameters
Type | Name | Description |
---|---|---|
OcrInput | imageAsOcrInput | OcrInput object to be added to this OcrInput. |
IronSoftware.Drawing.CropRectangle | ContentArea | Area to use of each page of the OcrInput object. |
Add(OcrInput.Page, CropRectangle)
Adds a OcrInput OcrInput.Page to this OcrInput.
Declaration
public void Add(OcrInput.Page imageAsOcrInputPage, CropRectangle ContentArea)
Parameters
Type | Name | Description |
---|---|---|
OcrInput.Page | imageAsOcrInputPage | Page to be added. |
IronSoftware.Drawing.CropRectangle | ContentArea | Area of the page to be added. |
Add(AnyBitmap, CropRectangle)
Adds a IronSoftware.Drawing.AnyBitmap to this OcrInput.
Declaration
public void Add(AnyBitmap imageAsBitmap, CropRectangle ContentArea = null)
Parameters
Type | Name | Description |
---|---|---|
IronSoftware.Drawing.AnyBitmap | imageAsBitmap | A managed IronSoftware.Drawing.AnyBitmap object. |
IronSoftware.Drawing.CropRectangle | ContentArea | Area of the image to use with IronOCR. |
Add(Image, CropRectangle)
Adds a SixLabors.ImageSharp.Image to this OcrInput.
Declaration
public void Add(Image image, CropRectangle ContentArea = null)
Parameters
Type | Name | Description |
---|---|---|
SixLabors.ImageSharp.Image | image | A managed Image object. |
IronSoftware.Drawing.CropRectangle | ContentArea | Area of the image to use with IronOCR. |
Add(Byte[], CropRectangle)
Adds a byte array containing the binary data of an image to this OcrInput.
Declaration
public void Add(byte[] imageAsByteArray, CropRectangle ContentArea = null)
Parameters
Type | Name | Description |
---|---|---|
System.Byte[] | imageAsByteArray | A byte[] containing an image. Supported formats include JPEG, TIFF, GIF, PNG, PDF, BMP. |
IronSoftware.Drawing.CropRectangle | ContentArea | Area of the image to use with IronOCR. |
Add(IEnumerable<OcrInput.Page>, CropRectangle)
Adds a IEnumerable of OcrInput OcrInput.Page to this OcrInput.
Declaration
public void Add(IEnumerable<OcrInput.Page> imagesAsOcrInputPages, CropRectangle ContentArea)
Parameters
Type | Name | Description |
---|---|---|
System.Collections.Generic.IEnumerable<OcrInput.Page> | imagesAsOcrInputPages | Pages to be added. |
IronSoftware.Drawing.CropRectangle | ContentArea | Area of every page to be added. |
Add(IEnumerable<AnyBitmap>, CropRectangle)
Adds a IEnumerable of IronSoftware.Drawing.AnyBitmap to this OcrInput.
Declaration
public void Add(IEnumerable<AnyBitmap> imageAsBitmaps, CropRectangle ContentArea = null)
Parameters
Type | Name | Description |
---|---|---|
System.Collections.Generic.IEnumerable<IronSoftware.Drawing.AnyBitmap> | imageAsBitmaps | An IEnumerable of managed IronSoftware.Drawing.AnyBitmap objects. |
IronSoftware.Drawing.CropRectangle | ContentArea | Area of the every image to use with IronOCR. |
Add(IEnumerable<Image>, CropRectangle)
Adds an IEnumerable of SixLabors.ImageSharp.Images to this OcrInput.
Declaration
public void Add(IEnumerable<Image> images, CropRectangle ContentArea = null)
Parameters
Type | Name | Description |
---|---|---|
System.Collections.Generic.IEnumerable<SixLabors.ImageSharp.Image> | images | IEnumerable of managed Image objects. |
IronSoftware.Drawing.CropRectangle | ContentArea | Area of the every image to use with IronOCR. |
Add(IEnumerable<Byte[]>, CropRectangle)
Adds a IEnumerable of byte array containing the binary data of images to this OcrInput.
Declaration
public void Add(IEnumerable<byte[]> imagesAsByteArrays, CropRectangle ContentArea = null)
Parameters
Type | Name | Description |
---|---|---|
System.Collections.Generic.IEnumerable<System.Byte[]> | imagesAsByteArrays | A IEnumerable of byte[] containing image data. Supported formats include JPEG, TIFF, GIF, PNG, PDF, BMP. |
IronSoftware.Drawing.CropRectangle | ContentArea | Area of the every image to use with IronOCR. |
Add(IEnumerable<Stream>, CropRectangle)
Adds an IEnumerable of System.IO.Stream of image raw data to this OcrInput.
Declaration
public void Add(IEnumerable<Stream> sourceStreams, CropRectangle ContentArea = null)
Parameters
Type | Name | Description |
---|---|---|
System.Collections.Generic.IEnumerable<System.IO.Stream> | sourceStreams | A IEnumerable of Streams containing raw data of images. Supported formats include JPEG, TIFF, GIF, PNG, PDF, BMP. |
IronSoftware.Drawing.CropRectangle | ContentArea | Area of the every image to use with IronOCR. |
Add(IEnumerable<String>, CropRectangle)
Adds images to this this OcrInput.
Declaration
public void Add(IEnumerable<string> imageFilePaths, CropRectangle ContentArea = null)
Parameters
Type | Name | Description |
---|---|---|
System.Collections.Generic.IEnumerable<System.String> | imageFilePaths | IEnumerable of image file paths. Supported formats include JPEG, TIFF, GIF, PNG, PDF, BMP. |
IronSoftware.Drawing.CropRectangle | ContentArea | Area of the every image to use with IronOCR. |
Add(Stream, CropRectangle)
Adds a System.IO.Stream containing the raw data of an image to this OcrInput.
Declaration
public void Add(Stream sourceStream, CropRectangle ContentArea = null)
Parameters
Type | Name | Description |
---|---|---|
System.IO.Stream | sourceStream | A Stream containing an image. Supported formats include JPEG, TIFF, GIF, PNG, PDF, BMP. |
IronSoftware.Drawing.CropRectangle | ContentArea | Area of the image to use with IronOCR. |
Add(String, CropRectangle)
Adds an image to this this OcrInput.
Declaration
public void Add(string imageFilePath, CropRectangle ContentArea = null)
Parameters
Type | Name | Description |
---|---|---|
System.String | imageFilePath | File path to an image file. Supported formats include JPEG, TIFF, GIF, PNG, PDF, BMP. |
IronSoftware.Drawing.CropRectangle | ContentArea | Area of the image to use with IronOCR. |
AddFrameFromTiff(Byte[], Int32, CropRectangle)
Adds a single frame (a page) from a Multi-frame TIFF file to the OcrInput document. The Tiff may be input as a file, byte array or stream.
Each Frame will become a page of this OcrInput
Declaration
public void AddFrameFromTiff(byte[] TiffBytes, int FrameIndex, CropRectangle ContentArea = null)
Parameters
Type | Name | Description |
---|---|---|
System.Byte[] | TiffBytes | A byte[] containing a TIFF file. |
System.Int32 | FrameIndex | Zero based frame number. |
IronSoftware.Drawing.CropRectangle | ContentArea | Optionally specifies a region of the image to extract text from as a IronSoftware.Drawing.CropRectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
AddFrameFromTiff(Stream, Int32, CropRectangle)
Adds a single frame (a page) from a Multi-frame TIFF file to the OcrInput document. The Tiff may be input as a file, byte array or stream.
Each Frame will become a page of this OcrInput
Declaration
public void AddFrameFromTiff(Stream TiffStream, int FrameIndex, CropRectangle ContentArea = null)
Parameters
Type | Name | Description |
---|---|---|
System.IO.Stream | TiffStream | A Stream containing a TIFF file. |
System.Int32 | FrameIndex | Zero based frame number. |
IronSoftware.Drawing.CropRectangle | ContentArea | Optionally specifies a region of the image to extract text from as a IronSoftware.Drawing.CropRectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
AddFrameFromTiff(String, Int32, CropRectangle)
Adds a single frame (a page) from a Multi-frame TIFF file to the OcrInput document. The Tiff may be input as a file, byte array or stream.
Each Frame will become a page of this OcrInput
Declaration
public void AddFrameFromTiff(string TiffPath, int FrameIndex, CropRectangle ContentArea = null)
Parameters
Type | Name | Description |
---|---|---|
System.String | TiffPath | A file path to a TIFF image. |
System.Int32 | FrameIndex | Zero based frame number. |
IronSoftware.Drawing.CropRectangle | ContentArea | Optionally specifies a region of the image to extract text from as a IronSoftware.Drawing.CropRectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
AddImage(AnyBitmap)
Adds a IronSoftware.Drawing.AnyBitmap to this OcrInput.
Declaration
public void AddImage(AnyBitmap Bitmap)
Parameters
Type | Name | Description |
---|---|---|
IronSoftware.Drawing.AnyBitmap | Bitmap | A managed IronSoftware.Drawing.AnyBitmap object. |
AddImage(AnyBitmap, CropRectangle)
Adds a IronSoftware.Drawing.AnyBitmap to this OcrInput.
Declaration
public void AddImage(AnyBitmap Bitmap, CropRectangle ContentArea)
Parameters
Type | Name | Description |
---|---|---|
IronSoftware.Drawing.AnyBitmap | Bitmap | A managed IronSoftware.Drawing.AnyBitmap object. |
IronSoftware.Drawing.CropRectangle | ContentArea | Optionally specifies a region of the bitmap to extract text from as a IronSoftware.Drawing.CropRectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
AddImage(AnyBitmap, CropRectangle[])
Adds a IronSoftware.Drawing.AnyBitmap to this OcrInput with many content area regions. If an empty array is supplied, will use whole image instead.
Note: Output PDF of SaveAsSearchablePdf when using multiple Crop Rectangles will generate one page per Rectangle/>
Declaration
public void AddImage(AnyBitmap Bitmap, CropRectangle[] Rectangles)
Parameters
Type | Name | Description |
---|---|---|
IronSoftware.Drawing.AnyBitmap | Bitmap | A managed IronSoftware.Drawing.AnyBitmap object. |
IronSoftware.Drawing.CropRectangle[] | Rectangles | Array of crop rectangles of various content regions. |
AddImage(Image)
Adds a SixLabors.ImageSharp.Image to this OcrInput.
Declaration
public void AddImage(Image Image)
Parameters
Type | Name | Description |
---|---|---|
SixLabors.ImageSharp.Image | Image | A managed Image object. |
AddImage(Image, CropRectangle)
Adds a SixLabors.ImageSharp.Image to this OcrInput. Adds a SixLabors.ImageSharp.Image to this OcrInput.
Declaration
public void AddImage(Image Image, CropRectangle ContentArea)
Parameters
Type | Name | Description |
---|---|---|
SixLabors.ImageSharp.Image | Image | A managed Image object. |
IronSoftware.Drawing.CropRectangle | ContentArea | Optionally specifies a region of the image to extract text from as a IronSoftware.Drawing.CropRectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
AddImage(Image, CropRectangle[])
Adds a SixLabors.ImageSharp.Image to this OcrInput with many content area regions. If an empty array is supplied, will use whole image instead.
Note: Output PDF of SaveAsSearchablePdf when using multiple Crop Rectangles will generate one page per Rectangle/>
Declaration
public void AddImage(Image Image, CropRectangle[] Rectangles)
Parameters
Type | Name | Description |
---|---|---|
SixLabors.ImageSharp.Image | Image | A managed Image object. |
IronSoftware.Drawing.CropRectangle[] | Rectangles | Array of crop rectangles of various content regions. |
AddImage(Byte[])
Adds a byte array containing the binary data of an image to this OcrInput.
Declaration
public void AddImage(byte[] ImageBytes)
Parameters
Type | Name | Description |
---|---|---|
System.Byte[] | ImageBytes | A byte[] containing an image. Supported formats include JPEG, TIFF, GIF, PNG, PDF, BMP. |
AddImage(Byte[], CropRectangle)
Adds a byte array containing the binary data of an image to this OcrInput.
Declaration
public void AddImage(byte[] ImageBytes, CropRectangle ContentArea)
Parameters
Type | Name | Description |
---|---|---|
System.Byte[] | ImageBytes | A byte[] containing an image. Supported formats include JPEG, TIFF, GIF, PNG, PDF, BMP. |
IronSoftware.Drawing.CropRectangle | ContentArea | Optionally specifies a region of the image to extract text from as a IronSoftware.Drawing.CropRectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
AddImage(Stream)
Adds a System.IO.Stream containing the raw data of an image to this OcrInput.
Declaration
public void AddImage(Stream ImageStream)
Parameters
Type | Name | Description |
---|---|---|
System.IO.Stream | ImageStream | A Stream containing an image. Supported formats include JPEG, TIFF, GIF, PNG, PDF, BMP. |
AddImage(Stream, CropRectangle)
Adds a System.IO.Stream containing the raw data of an image to this OcrInput.
Declaration
public void AddImage(Stream ImageStream, CropRectangle ContentArea)
Parameters
Type | Name | Description |
---|---|---|
System.IO.Stream | ImageStream | A Stream containing an image. Supported formats include JPEG, TIFF, GIF, PNG, PDF, BMP. |
IronSoftware.Drawing.CropRectangle | ContentArea | Optionally specifies a region of the image to extract text from as a IronSoftware.Drawing.CropRectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
AddImage(String)
Adds an image file to this OcrInput.
Declaration
public void AddImage(string ImagePath)
Parameters
Type | Name | Description |
---|---|---|
System.String | ImagePath | File path to an image file. Supported formats include JPEG, TIFF, GIF, PNG, PDF, BMP. |
AddImage(String, CropRectangle)
Adds an image file to this OcrInput.
Declaration
public void AddImage(string ImagePath, CropRectangle ContentArea)
Parameters
Type | Name | Description |
---|---|---|
System.String | ImagePath | File path to an image file. Supported formats include JPEG, TIFF, GIF, PNG, PDF, BMP. |
IronSoftware.Drawing.CropRectangle | ContentArea | Optionally specifies a region of the image to extract text from as a IronSoftware.Drawing.CropRectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
AddMultiFrameTiff(Byte[])
Adds a byte[] containing the binary data of a TIFF image with multiple pages to this OcrInput.
Declaration
public void AddMultiFrameTiff(byte[] TiffBytes)
Parameters
Type | Name | Description |
---|---|---|
System.Byte[] | TiffBytes | A byte[] containing a TIFF file. |
AddMultiFrameTiff(Byte[], CropRectangle)
Adds a byte[] containing the binary data of a TIFF image with multiple pages to this OcrInput.
Declaration
public void AddMultiFrameTiff(byte[] TiffBytes, CropRectangle ContentArea)
Parameters
Type | Name | Description |
---|---|---|
System.Byte[] | TiffBytes | A byte[] containing a TIFF file. |
IronSoftware.Drawing.CropRectangle | ContentArea | Optionally specifies a region of the image to extract text from as a IronSoftware.Drawing.CropRectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
AddMultiFrameTiff(Stream)
Adds a Stream containing the binary data of a TIFF image with multiple pages to this OcrInput.
Declaration
public void AddMultiFrameTiff(Stream TiffStream)
Parameters
Type | Name | Description |
---|---|---|
System.IO.Stream | TiffStream | A System.IO.Stream containing a TIFF file . |
AddMultiFrameTiff(Stream, CropRectangle)
Adds a Stream containing the binary data of a TIFF image with multiple pages to this OcrInput.
Declaration
public void AddMultiFrameTiff(Stream TiffStream, CropRectangle ContentArea)
Parameters
Type | Name | Description |
---|---|---|
System.IO.Stream | TiffStream | A System.IO.Stream containing a TIFF file . |
IronSoftware.Drawing.CropRectangle | ContentArea | Optionally specifies a region of the image to extract text from as a IronSoftware.Drawing.CropRectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
AddMultiFrameTiff(String)
Adds a Multi-frame TIFF file to the OcrInput document.
Each Frame will become a page of this OcrInput
Declaration
public void AddMultiFrameTiff(string ImagePath)
Parameters
Type | Name | Description |
---|---|---|
System.String | ImagePath | A file path to a TIFF image. |
AddMultiFrameTiff(String, CropRectangle)
Adds a Multi-frame TIFF file to the OcrInput document.
Each Frame will become a page of this OcrInput
Declaration
public void AddMultiFrameTiff(string ImagePath, CropRectangle ContentArea)
Parameters
Type | Name | Description |
---|---|---|
System.String | ImagePath | A file path to a TIFF image. |
IronSoftware.Drawing.CropRectangle | ContentArea | Optionally specifies a region of the image to extract text from as a IronSoftware.Drawing.CropRectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
AddPdf(Byte[], String, CropRectangle, Nullable<Int32>)
Adds all pages of a PDF document to this OcrInput.
Declaration
public void AddPdf(byte[] PdfBytes, string Password = null, CropRectangle ContentArea = null, Nullable<int> DPI)
Parameters
Type | Name | Description |
---|---|---|
System.Byte[] | PdfBytes | Binary data of a PDF file |
System.String | Password | Optional Password to unlock an encrypted or protected PDF |
IronSoftware.Drawing.CropRectangle | ContentArea | Specifies a region of the image to extract text from as a IronSoftware.Drawing.CropRectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
System.Nullable<System.Int32> | DPI | Resolution at which to sample the PDF. If null or zero will use TargetDPI |
AddPdf(Stream, String, CropRectangle, Nullable<Int32>)
Adds all pages of a PDF document to this OcrInput.
Declaration
public void AddPdf(Stream PdfStream, string Password = null, CropRectangle ContentArea = null, Nullable<int> DPI)
Parameters
Type | Name | Description |
---|---|---|
System.IO.Stream | PdfStream | System.IO.Stream containing a PDF |
System.String | Password | Optional Password to unlock an encrypted or protected PDF |
IronSoftware.Drawing.CropRectangle | ContentArea | Specifies a region of the image to extract text from as a IronSoftware.Drawing.CropRectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
System.Nullable<System.Int32> | DPI | Resolution at which to sample the PDF. If null or zero will use TargetDPI |
AddPdf(String, Int32, String)
Adds all pages of a PDF document to this OcrInput.
Declaration
public void AddPdf(string PdfPath, int DPI, string Password = null)
Parameters
Type | Name | Description |
---|---|---|
System.String | PdfPath | String file path to the PDF |
System.Int32 | DPI | Resolution at which to sample the PDF. If null or zero will use TargetDPI |
System.String | Password | Optional Password to unlock an encrypted or protected PDF |
AddPdf(String, String, CropRectangle, Nullable<Int32>)
Adds all pages of a PDF document to this OcrInput.
Declaration
public void AddPdf(string PdfPath, string Password = null, CropRectangle ContentArea = null, Nullable<int> DPI)
Parameters
Type | Name | Description |
---|---|---|
System.String | PdfPath | String file path to the PDF |
System.String | Password | Optional Password to unlock an encrypted or protected PDF |
IronSoftware.Drawing.CropRectangle | ContentArea | Specifies a region of the image to extract text from as a IronSoftware.Drawing.CropRectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
System.Nullable<System.Int32> | DPI | Resolution at which to sample the PDF. If null or zero will use TargetDPI |
AddPdfPage(Byte[], Int32, String, CropRectangle, Nullable<Int32>)
Adds one page of a PDF document to this OcrInput.
Declaration
public void AddPdfPage(byte[] PdfBytes, int Page, string Password = null, CropRectangle ContentArea = null, Nullable<int> DPI)
Parameters
Type | Name | Description |
---|---|---|
System.Byte[] | PdfBytes | Binary data of a PDF file |
System.Int32 | Page | The page number within the PDF to read. Zero based (first page is number 0) |
System.String | Password | Optional Password to unlock an encrypted or protected PDF |
IronSoftware.Drawing.CropRectangle | ContentArea | Specifies a region of the image to extract text from as a IronSoftware.Drawing.CropRectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
System.Nullable<System.Int32> | DPI | Resolution at which to sample the PDF. If null or zero will use TargetDPI |
AddPdfPage(Stream, Int32, String, CropRectangle, Nullable<Int32>)
Adds one page of a PDF document to this OcrInput.
Declaration
public void AddPdfPage(Stream PdfStream, int Page, string Password = null, CropRectangle ContentArea = null, Nullable<int> DPI)
Parameters
Type | Name | Description |
---|---|---|
System.IO.Stream | PdfStream | System.IO.Stream containing a PDF |
System.Int32 | Page | The page number within the PDF to read. Zero based (first page is number 0) |
System.String | Password | Optional Password to unlock an encrypted or protected PDF |
IronSoftware.Drawing.CropRectangle | ContentArea | Specifies a region of the image to extract text from as a IronSoftware.Drawing.CropRectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
System.Nullable<System.Int32> | DPI | Resolution at which to sample the PDF. If null or zero will use TargetDPI |
AddPdfPage(String, Int32, String, CropRectangle, Nullable<Int32>)
Adds one page of a PDF document to this OcrInput.
Declaration
public void AddPdfPage(string PdfPath, int Page, string Password = null, CropRectangle ContentArea = null, Nullable<int> DPI)
Parameters
Type | Name | Description |
---|---|---|
System.String | PdfPath | String file path to the PDF |
System.Int32 | Page | The page number within the PDF to read. Zero based (first page is number 0) |
System.String | Password | Optional Password to unlock an encrypted or protected PDF |
IronSoftware.Drawing.CropRectangle | ContentArea | Specifies a region of the image to extract text from as a IronSoftware.Drawing.CropRectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
System.Nullable<System.Int32> | DPI | Resolution at which to sample the PDF. If null or zero will use TargetDPI |
AddPdfPages(Byte[], IEnumerable<Int32>, String, CropRectangle, Nullable<Int32>)
Adds selected pages of a PDF document to this OcrInput.
Declaration
public void AddPdfPages(byte[] PdfBytes, IEnumerable<int> Pages, string Password = null, CropRectangle ContentArea = null, Nullable<int> DPI)
Parameters
Type | Name | Description |
---|---|---|
System.Byte[] | PdfBytes | Binary data of a PDF file |
System.Collections.Generic.IEnumerable<System.Int32> | Pages | The page numbers within the PDF to read. Zero based (first page is number 0) |
System.String | Password | Optional Password to unlock an encrypted or protected PDF |
IronSoftware.Drawing.CropRectangle | ContentArea | Specifies a region of the image to extract text from as a IronSoftware.Drawing.CropRectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
System.Nullable<System.Int32> | DPI | Resolution at which to sample the PDF. If null or zero will use TargetDPI |
AddPdfPages(Stream, IEnumerable<Int32>, String, CropRectangle, Nullable<Int32>)
Adds selected pages of a PDF document to this OcrInput.
Declaration
public void AddPdfPages(Stream PdfStream, IEnumerable<int> Pages, string Password = null, CropRectangle ContentArea = null, Nullable<int> DPI)
Parameters
Type | Name | Description |
---|---|---|
System.IO.Stream | PdfStream | System.IO.Stream containing a PDF |
System.Collections.Generic.IEnumerable<System.Int32> | Pages | The page numbers within the PDF to read. Zero based (first page is number 0) |
System.String | Password | Optional Password to unlock an encrypted or protected PDF |
IronSoftware.Drawing.CropRectangle | ContentArea | Specifies a region of the image to extract text from as a IronSoftware.Drawing.CropRectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
System.Nullable<System.Int32> | DPI | Resolution at which to sample the PDF. If null or zero will use TargetDPI |
AddPdfPages(String, IEnumerable<Int32>, String, CropRectangle, Nullable<Int32>)
Adds selected pages from a PDF document into this OcrInput.
Declaration
public void AddPdfPages(string PdfPath, IEnumerable<int> Pages, string Password = null, CropRectangle ContentArea = null, Nullable<int> DPI)
Parameters
Type | Name | Description |
---|---|---|
System.String | PdfPath | String file path to the PDF |
System.Collections.Generic.IEnumerable<System.Int32> | Pages | |
System.String | Password | Optional Password to unlock an encrypted or protected PDF |
IronSoftware.Drawing.CropRectangle | ContentArea | Specifies a region of the image to extract text from as a IronSoftware.Drawing.CropRectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
System.Nullable<System.Int32> | DPI | Resolution at which to sample the PDF. If null or zero will use TargetDPI |
AddRange(OcrInput)
Combines 2 instances of OcrInput, appending pages to the end of this OcrInput document.
Declaration
public void AddRange(OcrInput Range)
Parameters
Type | Name | Description |
---|---|---|
OcrInput | Range | An Ocr Input to be appended to this instance. |
ApplyMultipleFilters(OcrFilters, Double, Int32, Int32, Int32, Boolean, Nullable<Int32>)
Apply multiple imaging filters using the specified paramaeters
Declaration
public void ApplyMultipleFilters(OcrFilters filters, double Rotation = 0, int MaxDeskewAngle = 45, int MaxWidth = 0, int MaxHeight = 0, bool Use3x3 = false, Nullable<int> thresholdLimit = null)
Parameters
Type | Name | Description |
---|---|---|
OcrFilters | filters | Filters to apply |
System.Double | Rotation | Rotation amount. Required when using the Rotation filter. Rotate |
System.Int32 | MaxDeskewAngle | Optional MaxDeskewAngle amount when using the Deskew filter. Defaults to 45 degrees.Deskew |
System.Int32 | MaxWidth | Maximum width. Required when using the Scale filter. Scale |
System.Int32 | MaxHeight | Maximum height. Required when using the Scale filter. Scale |
System.Boolean | Use3x3 | Optional morphology when using the Despeckle (DeNoise), Dilate, or Erode filter. DeNoise, Dilate, or Erode |
System.Nullable<System.Int32> | thresholdLimit | Optional Threshold limit (0.0-1.0) to consider for binarization when using the Bradley Adaptive Threshold. |
Remarks
This method serves as an alternative way to apply multiple filters. Filters are applied in what is typically the optimal order.
Exceptions
Type | Condition |
---|---|
System.ArgumentOutOfRangeException |
Binarize()
This image filter turns every pixel black or white with no middle ground. May Improve OCR performance cases of very low contrast of text to background.
Declaration
public OcrInput Binarize()
Returns
Type | Description |
---|---|
OcrInput | This OcrInput object allowing for LINQ style fluent notation. |
Close(Boolean)
Advanced Morphology.
Closing is reverse of Opening, Dilation followed by Erosion. It is useful in closing small holes inside the foreground objects.
Declaration
public OcrInput Close(bool use3x3 = false)
Parameters
Type | Name | Description |
---|---|---|
System.Boolean | use3x3 | 2x2 is default morphology |
Returns
Type | Description |
---|---|
OcrInput | This OcrInput object allowing for LINQ style fluent notation. |
Contrast(Single)
Increases contrast automatically. This filter often improves OCR speed and accuracy in low contrast scans. Flattens Alpha channels to white.
Declaration
public OcrInput Contrast(float amount = 1.1F)
Parameters
Type | Name | Description |
---|---|---|
System.Single | amount | Amount which is used to adjust contrast. A value of 0 will create an image that is completely gray. A value of 1 leaves the input unchanged. Amount values greater than 0 increase contrast making light areas lighter and dark areas darker. Amount values less than 0 decrease contrast - decreasing variety of contrast. |
Returns
Type | Description |
---|---|
OcrInput | This OcrInput object allowing for LINQ style fluent notation. |
DeNoise(Boolean)
Removes digital noise. This filter should only be used where noise is expected. Flattens Alpha channels to white.
Declaration
public OcrInput DeNoise(bool use3x3 = false)
Parameters
Type | Name | Description |
---|---|---|
System.Boolean | use3x3 | 2x2 is default morphology |
Returns
Type | Description |
---|---|
OcrInput | This OcrInput object allowing for LINQ style fluent notation. |
See Also
Deskew(Int32)
Rotates an image so it is the right way up and orthogonal. This is very useful for OCR because Tesseract tolerance for skewed scans can be as low as 5 degrees.
This also helps when producing searchable PDF documents from IronTesseract because the pages will likely all be the right way up.
This version uses only Hough transform to make minor correction. Example: pages that where put in a scanner at a slight angle.
Declaration
public bool Deskew(int MaxDeskewAngle = 45)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | MaxDeskewAngle | Maximum angle of skew to correct for. Higher values can lead to more opportunity for correction, but may be slower and more prone to error including upside down pages. |
Returns
Type | Description |
---|---|
System.Boolean | Returns a boolean result of whether or not IronOCR was able to detect image orientation. True = Deskew was applied. False = Failed to detect image orientation and image remains unchanged. |
See Also
Despeckle(Boolean)
DeSpeckle as an alias of DeNoise.
Alias of DeNoise(Boolean) to make this method easily to find in Intensense.
Declaration
public OcrInput Despeckle(bool use3x3 = false)
Parameters
Type | Name | Description |
---|---|---|
System.Boolean | use3x3 | 2x2 is default morphology |
Returns
Type | Description |
---|---|
OcrInput | This OcrInput object allowing for LINQ style fluent notation. |
Dilate(Boolean)
Advanced Morphology. Dilation is the opposite of Erosion, instead of shrinking it expands the foreground object.
Opposite of Erode(Boolean).
Declaration
public OcrInput Dilate(bool use3x3 = false)
Parameters
Type | Name | Description |
---|---|---|
System.Boolean | use3x3 | 2x2 is default morphology |
Returns
Type | Description |
---|---|
OcrInput | This OcrInput object allowing for LINQ style fluent notation. |
Dispose()
OcrInput is IDisposable. For best practice and to avoid memory leaks, remember to dispose, or initialize instances with a "using" statement.
Declaration
public void Dispose()
Dispose(Boolean)
OcrInput is IDisposable. For best practice and to avoid memory leaks, remember to dispose, or initialize instances with a "using" statement.
Declaration
public void Dispose(bool disposing = true)
Parameters
Type | Name | Description |
---|---|---|
System.Boolean | disposing |
EnhanceResolution(Int32)
Enhances the resolution of low quality images. This filter is not often needed because TargetDPI will automatically catch and resolve low resolution inputs.
May not work for all images if their metadata is corrupted.
Declaration
public OcrInput EnhanceResolution(int TargetDPI = 225)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | TargetDPI | The target DPI to resample to. |
Returns
Type | Description |
---|---|
OcrInput | This OcrInput object allowing for LINQ style fluent notation. |
Erode(Boolean)
Advanced Morphology. Erosion is the morphological operation used to diminish the size of the foreground object.
Opposite of Erode(Boolean).
Declaration
public OcrInput Erode(bool use3x3 = false)
Parameters
Type | Name | Description |
---|---|---|
System.Boolean | use3x3 | 2x2 is default morphology |
Returns
Type | Description |
---|---|
OcrInput | This OcrInput object allowing for LINQ style fluent notation. |
Finalize()
OcrInput has a safe finaliser that cleans up undisposed native images in memory.
Declaration
protected override void Finalize()
FindMultipleTextRegions(Double, Int32, Boolean, Boolean)
Use computer vision to detect areas which contain text elements and divide the input into separate images based on text regions.
Declaration
public void FindMultipleTextRegions(double Scale = 0, int DilationAmount = -1, bool Binarize = true, bool Invert = false)
Parameters
Type | Name | Description |
---|---|---|
System.Double | Scale | (Only used during text region detection) Resolution scale factor. Image width and height will be multiplied by this value. |
System.Int32 | DilationAmount | (Only used during text region detection) Dilation amount, in pixels. Text areas width and height will be increased by this value. |
System.Boolean | Binarize | (Only used during text region detection) True to convert the image to black and white, False otherwise |
System.Boolean | Invert | (Only used during text region detection) True to invert image colors during binarization, False otherwise |
Remarks
Useful for generating several OCR results from a single image/page
FindTextRegion(Double, Int32, Boolean, Boolean)
Use computer vision to detect regions which contain text elements on each page
Declaration
public void FindTextRegion(double Scale = 0, int DilationAmount = -1, bool Binarize = true, bool Invert = false)
Parameters
Type | Name | Description |
---|---|---|
System.Double | Scale | (Only used during text region detection) Resolution scale factor. Image width and height will be multiplied by this value. |
System.Int32 | DilationAmount | (Only used during text region detection) Dilation amount, in pixels. Text areas width and height will be increased by this value. |
System.Boolean | Binarize | (Only used during text region detection) True to convert the image to black and white, False otherwise |
System.Boolean | Invert | (Only used during text region detection) True to invert image colors when binarizing, False otherwise |
FromPdf(Byte[], String, CropRectangle, Nullable<Int32>)
Create a new OcrInput object populated with a PDF as binary data.
This class is IDisposable and is best initiated with a 'using' statement.
This constructor accepts pdf as File Paths, Streams, or Byte Arrays. Each will become a OcrInput.Page.
Declaration
public static OcrInput FromPdf(byte[] PdfBytes, string Password = null, CropRectangle ContentArea = null, Nullable<int> DPI = null)
Parameters
Type | Name | Description |
---|---|---|
System.Byte[] | PdfBytes | The PDF document as binary data in memory. |
System.String | Password | Optional Password to unlock an encrypted or protected PDF |
IronSoftware.Drawing.CropRectangle | ContentArea | Specifies a region of the image to extract text from as a IronSoftware.Drawing.CropRectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
System.Nullable<System.Int32> | DPI | Resolution at which to sample the PDF. If null or zero will use TargetDPI |
Returns
Type | Description |
---|---|
OcrInput |
FromPdf(Stream, String, CropRectangle, Nullable<Int32>)
Create a new OcrInput object populated with a PDF as a Stream.
This class is IDisposable and is best initiated with a 'using' statement.
This constructor accepts pdf as File Paths, Streams, or Byte Arrays. Each will become a OcrInput.Page.
Declaration
public static OcrInput FromPdf(Stream PdfStream, string Password = null, CropRectangle ContentArea = null, Nullable<int> DPI = null)
Parameters
Type | Name | Description |
---|---|---|
System.IO.Stream | PdfStream | The PDF document as a System.IO.Stream. |
System.String | Password | Optional Password to unlock an encrypted or protected PDF |
IronSoftware.Drawing.CropRectangle | ContentArea | Specifies a region of the image to extract text from as a IronSoftware.Drawing.CropRectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
System.Nullable<System.Int32> | DPI | Resolution at which to sample the PDF. If null or zero will use TargetDPI |
Returns
Type | Description |
---|---|
OcrInput |
FromPdf(String, Int32, String)
Create a new OcrInput object populated with a PDF.
This class is IDisposable and is best initiated with a 'using' statement.
This constructor accepts pdf as File Paths, Streams, or Byte Arrays. Each will become a OcrInput.Page.
Declaration
public static OcrInput FromPdf(string PdfPath, int DPI, string Password = null)
Parameters
Type | Name | Description |
---|---|---|
System.String | PdfPath | File path to the PDF |
System.Int32 | DPI | Resolution at which to sample the PDF. If null or zero will use TargetDPI |
System.String | Password | Optional Password to unlock an encrypted or protected PDF |
Returns
Type | Description |
---|---|
OcrInput |
FromPdf(String, String, CropRectangle, Nullable<Int32>)
Create a new OcrInput object populated with a PDF.
This class is IDisposable and is best initiated with a 'using' statement.
This constructor accepts pdf as File Paths, Streams, or Byte Arrays. Each will become a OcrInput.Page.
Declaration
public static OcrInput FromPdf(string PdfPath, string Password = null, CropRectangle ContentArea = null, Nullable<int> DPI = null)
Parameters
Type | Name | Description |
---|---|---|
System.String | PdfPath | File path to the PDF |
System.String | Password | Optional Password to unlock an encrypted or protected PDF |
IronSoftware.Drawing.CropRectangle | ContentArea | Specifies a region of the image to extract text from as a IronSoftware.Drawing.CropRectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
System.Nullable<System.Int32> | DPI | Resolution at which to sample the PDF. If null or zero will use TargetDPI |
Returns
Type | Description |
---|---|
OcrInput |
FromPdfPage(Byte[], Int32, String, CropRectangle, Nullable<Int32>)
Create a new OcrInput object populated with a single page from a PDF as binary data.
This class is IDisposable and is best initiated with a 'using' statement.
This constructor accepts pdf as File Paths, Streams, or Byte Arrays. Each will become a OcrInput.Page.
Declaration
public static OcrInput FromPdfPage(byte[] PdfBytes, int Page, string Password = null, CropRectangle ContentArea = null, Nullable<int> DPI = null)
Parameters
Type | Name | Description |
---|---|---|
System.Byte[] | PdfBytes | The PDF document as binary data in memory. |
System.Int32 | Page | The page number within the PDF to read. Zero based (first page is number 0) |
System.String | Password | Optional Password to unlock an encrypted or protected PDF |
IronSoftware.Drawing.CropRectangle | ContentArea | Specifies a region of the image to extract text from as a IronSoftware.Drawing.CropRectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
System.Nullable<System.Int32> | DPI | Resolution at which to sample the PDF. If null or zero will use TargetDPI |
Returns
Type | Description |
---|---|
OcrInput |
FromPdfPage(Stream, Int32, String, CropRectangle, Nullable<Int32>)
Create a new OcrInput object populated with a single page from a PDF as a Stream.
This class is IDisposable and is best initiated with a 'using' statement.
This constructor accepts pdf as File Paths, Streams, or Byte Arrays. Each will become a OcrInput.Page.
Declaration
public static OcrInput FromPdfPage(Stream PdfStream, int Page, string Password = null, CropRectangle ContentArea = null, Nullable<int> DPI = null)
Parameters
Type | Name | Description |
---|---|---|
System.IO.Stream | PdfStream | The PDF document as a System.IO.Stream. |
System.Int32 | Page | The page number within the PDF to read. Zero based (first page is number 0) |
System.String | Password | Optional Password to unlock an encrypted or protected PDF |
IronSoftware.Drawing.CropRectangle | ContentArea | Specifies a region of the image to extract text from as a IronSoftware.Drawing.CropRectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
System.Nullable<System.Int32> | DPI | Resolution at which to sample the PDF. If null or zero will use TargetDPI |
Returns
Type | Description |
---|---|
OcrInput |
FromPdfPage(String, Int32, String, CropRectangle, Nullable<Int32>)
Create a new OcrInput object populated with a single page of a PDF.
This class is IDisposable and is best initiated with a 'using' statement.
This constructor accepts pdf as File Paths, Streams, or Byte Arrays. Each will become a OcrInput.Page.
Declaration
public static OcrInput FromPdfPage(string PdfPath, int Page, string Password = null, CropRectangle ContentArea = null, Nullable<int> DPI = null)
Parameters
Type | Name | Description |
---|---|---|
System.String | PdfPath | File path to the PDF |
System.Int32 | Page | Which page of the PDF to read. Zero based (first page is number 0) |
System.String | Password | Optional Password to unlock an encrypted or protected PDF |
IronSoftware.Drawing.CropRectangle | ContentArea | Specifies a region of the image to extract text from as a IronSoftware.Drawing.CropRectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
System.Nullable<System.Int32> | DPI | Resolution at which to sample the PDF. If null or zero will use TargetDPI |
Returns
Type | Description |
---|---|
OcrInput |
FromPdfPages(Byte[], IEnumerable<Int32>, String, CropRectangle, Nullable<Int32>)
Create a new OcrInput object populated with multiple pages from a PDF as binary data.
This class is IDisposable and is best initiated with a 'using' statement.
This constructor accepts pdf as File Paths, Streams, or Byte Arrays. Each will become a OcrInput.Page.
Declaration
public static OcrInput FromPdfPages(byte[] PdfBytes, IEnumerable<int> Pages, string Password = null, CropRectangle ContentArea = null, Nullable<int> DPI = null)
Parameters
Type | Name | Description |
---|---|---|
System.Byte[] | PdfBytes | The PDF document as binary data in memory. |
System.Collections.Generic.IEnumerable<System.Int32> | Pages | List which pages of the PDF which will be read. Zero based (first page is number 0) |
System.String | Password | Optional Password to unlock an encrypted or protected PDF |
IronSoftware.Drawing.CropRectangle | ContentArea | Specifies a region of the image to extract text from as a IronSoftware.Drawing.CropRectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
System.Nullable<System.Int32> | DPI | Resolution at which to sample the PDF. If null or zero will use TargetDPI |
Returns
Type | Description |
---|---|
OcrInput |
FromPdfPages(Stream, IEnumerable<Int32>, String, CropRectangle, Nullable<Int32>)
Create a new OcrInput object populated with multiple selected pages from a PDF as a Stream.
This class is IDisposable and is best initiated with a 'using' statement.
This constructor accepts pdf as File Paths, Streams, or Byte Arrays. Each will become a OcrInput.Page.
Declaration
public static OcrInput FromPdfPages(Stream PdfStream, IEnumerable<int> Pages, string Password = null, CropRectangle ContentArea = null, Nullable<int> DPI = null)
Parameters
Type | Name | Description |
---|---|---|
System.IO.Stream | PdfStream | The PDF document as a System.IO.Stream. |
System.Collections.Generic.IEnumerable<System.Int32> | Pages | The pages of the PDF to read. Zero based (first page is number 0) |
System.String | Password | Optional Password to unlock an encrypted or protected PDF |
IronSoftware.Drawing.CropRectangle | ContentArea | Specifies a region of the image to extract text from as a IronSoftware.Drawing.CropRectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
System.Nullable<System.Int32> | DPI | Resolution at which to sample the PDF. If null or zero will use TargetDPI |
Returns
Type | Description |
---|---|
OcrInput |
FromPdfPages(String, IEnumerable<Int32>, String, CropRectangle, Nullable<Int32>)
Create a new OcrInput object populated with multiple pages from a PDF.
This class is IDisposable and is best initiated with a 'using' statement.
This constructor accepts pdf as File Paths, Streams, or Byte Arrays. Each will become a OcrInput.Page.
Declaration
public static OcrInput FromPdfPages(string PdfPath, IEnumerable<int> Pages, string Password = null, CropRectangle ContentArea = null, Nullable<int> DPI = null)
Parameters
Type | Name | Description |
---|---|---|
System.String | PdfPath | File path to the PDF |
System.Collections.Generic.IEnumerable<System.Int32> | Pages | List which pages of the PDF will be read. Zero based (first page is number 0) |
System.String | Password | Optional Password to unlock an encrypted or protected PDF |
IronSoftware.Drawing.CropRectangle | ContentArea | Specifies a region of the image to extract text from as a IronSoftware.Drawing.CropRectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed. |
System.Nullable<System.Int32> | DPI | Resolution at which to sample the PDF. If null or zero will use TargetDPI |
Returns
Type | Description |
---|---|
OcrInput |
HighlightTextAndSaveAsImages(IronTesseract, String, ResultHighlightType)
Based on the ResultHighlightType, will draw red boxes around characters/words/lines/paragraphs detected, and save to a PNG image.
For best results, perform all filters before calling.
Declaration
public void HighlightTextAndSaveAsImages(IronTesseract tesseract, string filename, ResultHighlightType type)
Parameters
Type | Name | Description |
---|---|---|
IronTesseract | tesseract | IronTesseract instance used to scan the OcrInput. |
System.String | filename | File will be saved as : 'filename_page_0.png'. You may use an absolute or relative path. |
ResultHighlightType | type | Choose whether each box represents a character, word, line, paragraph. |
HoughTransformStraighten(Int32)
Uses a Hough Transform to rotate and image to the nearest 90 degrees of straightness. This is very useful for OCR because Tesseract tolerance for skewed scans can be as low as 5 degrees.
A Synonym of Deskew(Int32)
Declaration
public OcrInput HoughTransformStraighten(int MaxDeskewAngle = 45)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | MaxDeskewAngle | Maximum angle of skew to correct for. Higher values can lead to more opportunity for correction, but may be slower and more prone to error including upside down pages. |
Returns
Type | Description |
---|---|
OcrInput | This OcrInput object allowing for LINQ style fluent notation. |
See Also
Invert(Boolean)
Inverts every color. E.g. White becomes black : black becomes white.
Declaration
public OcrInput Invert(bool GrayScale = true)
Parameters
Type | Name | Description |
---|---|---|
System.Boolean | GrayScale | Optionally remove all color channels and return a GrayScale image. |
Returns
Type | Description |
---|---|
OcrInput | This OcrInput object allowing for LINQ style fluent notation. |
Open(Boolean)
Advanced Morphology.
Opening is just another name of erosion followed by dilation. It is useful in removing noise.
Declaration
public OcrInput Open(bool use3x3 = false)
Parameters
Type | Name | Description |
---|---|---|
System.Boolean | use3x3 | 2x2 is default morphology |
Returns
Type | Description |
---|---|
OcrInput | This OcrInput object allowing for LINQ style fluent notation. |
OrientPagesWithOSD(IronTesseract, OrientationConfidence)
Declaration
public OcrInput OrientPagesWithOSD(IronTesseract TesseractInstance, OrientationConfidence Confidence = null)
Parameters
Type | Name | Description |
---|---|---|
IronTesseract | TesseractInstance | Reads OcrLanguage settings from your IronTesseract instance to help detect letters and numbers to straighten pages. If you wish to use multiple languages please use the IronOcr.OcrInput.Deskew(IronOcr.IronTesseract,System.Int32,IronOcr.OrientationConfidence) overload. |
OrientationConfidence | Confidence | Optional OrientationConfidence class used to control and measure OSD by way of confidence thresholds. |
Returns
Type | Description |
---|---|
OcrInput | This OcrInput object allowing for LINQ style fluent notation. |
See Also
OrientPagesWithOSD(OcrLanguage, OrientationConfidence)
Uses Tesseract "Orientation & Script Detection: to turn OcrInput pages the right way up in multiples of 90 degress.
Declaration
public OcrInput OrientPagesWithOSD(OcrLanguage CharacterDetectionLanguage, OrientationConfidence Confidence = null)
Parameters
Type | Name | Description |
---|---|---|
OcrLanguage | CharacterDetectionLanguage | An OcrLanguage used to help detect letters and numbers to straighten pages. If you wish to use multiple languages please use the IronOcr.OcrInput.Deskew(IronOcr.IronTesseract,System.Int32,IronOcr.OrientationConfidence) overload. |
OrientationConfidence | Confidence | Optional OrientationConfidence class used to control and measure OSD by way of confidence thresholds. |
Returns
Type | Description |
---|---|
OcrInput | This OcrInput object allowing for LINQ style fluent notation. |
See Also
PageCount()
The number of OcrInput.Pages currently present in this OcrInput
Declaration
public int PageCount()
Returns
Type | Description |
---|---|
System.Int32 |
ReplaceColor(Color, Color, Int32)
Replace current color to new color in Image
Declaration
public OcrInput ReplaceColor(Color currentColor, Color newColor, int tolerance = 10)
Parameters
Type | Name | Description |
---|---|---|
IronSoftware.Drawing.Color | currentColor | Current IronSoftware.Drawing.Color |
IronSoftware.Drawing.Color | newColor | New IronSoftware.Drawing.Color |
System.Int32 | tolerance | Tolerance Value |
Returns
Type | Description |
---|---|
OcrInput | This OcrInput object allowing for LINQ style fluent notation. |
Rotate(Double)
Rotates images by a number of degrees clockwise. For anti-clockwise, use negative numbers. Also see Deskew(Int32)
Declaration
public OcrInput Rotate(double Degrees)
Parameters
Type | Name | Description |
---|---|---|
System.Double | Degrees | A number of clockwise degrees. |
Returns
Type | Description |
---|---|
OcrInput | This OcrInput object allowing for LINQ style fluent notation. |
SaveAsImages(String, OcrInput.ImageType)
Exports an OcrInput object as Images
Declaration
public string[] SaveAsImages(string Prefix = "export_of_page", OcrInput.ImageType Extension)
Parameters
Type | Name | Description |
---|---|---|
System.String | Prefix | Will save images at {Prefix}_(page_number).{Extension}. May include a fully qualified file path. |
OcrInput.ImageType | Extension | Output file extension in lower-case. |
Returns
Type | Description |
---|---|
System.String[] | Array of saved image file names. Can be multiple if OcrInput used has multiple pages. |
Exceptions
Type | Condition |
---|---|
System.Exception | Throws an exception if there are no pages. See OcrInput.Page |
Scale(Int32, Boolean)
Scales OCRInput pages proportionally.
Declaration
public OcrInput Scale(int Percentage, bool ScaleCropArea = true)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | Percentage | The percentage scale. 100 = no effect. |
System.Boolean | ScaleCropArea | Should associated crop areas also be scaled proportionally (recommended true) |
Returns
Type | Description |
---|---|
OcrInput | This OcrInput object allowing for LINQ style fluent notation. |
Scale(Int32, Int32, Boolean)
Scales the OCRInput pages up in size.
Declaration
public OcrInput Scale(int MaxWidth, int MaxHeight, bool ScaleCropArea = true)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | MaxWidth | Maximum width in pixels. |
System.Int32 | MaxHeight | Maximum height in pixels. |
System.Boolean | ScaleCropArea | Should associated crop areas also be scaled proportionally (recommended true) |
Returns
Type | Description |
---|---|
OcrInput | This OcrInput object allowing for LINQ style fluent notation. |
SelectTextColor(Color, Int32)
Binarize an image to read pixels of a color (with threshold) as text and ignore other colors as background.
This is useful if you image has many colors and a normal binarize will not work. It will turn all text of the color specified into black and the rest of the image to white.
Declaration
public OcrInput SelectTextColor(Color selectColor, int tolerance = 10)
Parameters
Type | Name | Description |
---|---|---|
IronSoftware.Drawing.Color | selectColor | IronSoftware.Drawing.Color of text to isolate from background. |
System.Int32 | tolerance | [0,255]; Acceptable range of the difference between PixelColor and selectColor for each R, G, and B value |
Returns
Type | Description |
---|---|
OcrInput | This OcrInput object allowing for LINQ style fluent notation. |
SelectTextColors(IEnumerable<Color>, Int32)
Binarize an image to read pixels of only selected colors (with thresholds) as text and ignore other colors as background.
This is useful if you image has many colors and a normal binarize will not work. It will turn all text of the color specified into black and the rest of the image to white.
Declaration
public OcrInput SelectTextColors(IEnumerable<Color> selectColors, int tolerance = 10)
Parameters
Type | Name | Description |
---|---|---|
System.Collections.Generic.IEnumerable<IronSoftware.Drawing.Color> | selectColors | IronSoftware.Drawing.Color of text to isolate from background. |
System.Int32 | tolerance | [0,255]; Acceptable range of the difference between PixelColor and selectColor for each R, G, and B value |
Returns
Type | Description |
---|---|
OcrInput | This OcrInput object allowing for LINQ style fluent notation. |
Sharpen()
Sharpens blurred OCR Documents. Applies a Gaussian sharpening filter to image.
Declaration
public OcrInput Sharpen()
Returns
Type | Description |
---|---|
OcrInput | This OcrInput object allowing for LINQ style fluent notation. |
StampCropRectangleAndSaveAs(CropRectangle, Color, String, OcrInput.ImageType)
Saves a copy of the image with a rectangle applied and to visualize and debug where CropRectangle will be applied when using IronSoftware.Drawing.CropRectangle on your image.
Declaration
public string[] StampCropRectangleAndSaveAs(CropRectangle cropRectangle, Color rectangleColor, string Prefix = "rectangle_on_page", OcrInput.ImageType extension)
Parameters
Type | Name | Description |
---|---|---|
IronSoftware.Drawing.CropRectangle | cropRectangle | Use a IronSoftware.Drawing.CropRectangle to debug the area that will be scanned on an image. |
IronSoftware.Drawing.Color | rectangleColor | Color of rectangle drawn. Red is recommended for easy contrast. |
System.String | Prefix | Will save images at {Prefix}_(page_number).{Extension}. May include a fully qualified file path. |
OcrInput.ImageType | extension | Output file extension in lower-case. |
Returns
Type | Description |
---|---|
System.String[] | Array of saved image file names. Can be multiple if OcrInput used has multiple pages. |
Exceptions
Type | Condition |
---|---|
System.Exception | Throws an exception if there are no pages. See OcrInput.Page |
ToGrayScale()
This image filter turns every pixel into a shade of grayscale. Unlikely to improve OCR accuracy but may improve speed.
Declaration
public OcrInput ToGrayScale()
Returns
Type | Description |
---|---|
OcrInput | This OcrInput object allowing for LINQ style fluent notation. |
WithTitle(String)
Adds a Title to the OcrInput Document. This title will be used when calling SaveAsHocrFile(String) and SaveAsSearchablePdf(String)
Declaration
public OcrInput WithTitle(string Title)
Parameters
Type | Name | Description |
---|---|---|
System.String | Title | The document title as a string. |
Returns
Type | Description |
---|---|
OcrInput | This OcrInput object allowing for LINQ style fluent notation. |