Search Results for

    Show / Hide Table of Contents

    Class OcrInput

    OcrInput provides a robust class for preparing one or more Image Files, PDFs, System.Drawing Objects, Streams and Binary Image data for OCR. Instances of OcrInput can be read by the IronTesseract class.

    We recognise that much of the quality of OCR results depends on preparing images to be read. This class allows developers to enhance their scanned documents provide faster, more accurate OCR results using filters such as: EnhanceResolution(Int32), DeNoise(), ToGrayScale(), Deskew(Int32, Boolean), Rotate(Double) and Sharpen().

    Supports for multi-paged OCR input.

    Inheritance
    System.Object
    OcrInput
    Implements
    System.IDisposable
    Inherited Members
    System.Object.Equals(System.Object)
    System.Object.Equals(System.Object, System.Object)
    System.Object.GetHashCode()
    System.Object.GetType()
    System.Object.MemberwiseClone()
    System.Object.ToString()
    System.Object.ReferenceEquals(System.Object, System.Object)
    Namespace: IronOcr
    Assembly: IronOcr.dll
    Syntax
    public class OcrInput : IDisposable

    Constructors

    OcrInput(Byte[])

    Create a new OcrInput object populated with an Image file as binary data.

    This class is IDisposable and is best initiated with a 'using' statement.

    Declaration
    public OcrInput(byte[] Bytes)
    Parameters
    Type Name Description
    System.Byte[] Bytes

    Bytes of an Image or PDF file.

    OcrInput(Byte[], Rectangle)

    Create a new OcrInput object populated with an Image file as binary data.

    This class is IDisposable and is best initiated with a 'using' statement.

    Declaration
    public OcrInput(byte[] Bytes, Rectangle ContentArea)
    Parameters
    Type Name Description
    System.Byte[] Bytes

    Bytes of an Image or PDF file.

    System.Drawing.Rectangle ContentArea

    Specifies a region of each image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

    OcrInput(IEnumerable<Byte[]>)

    Create a new OcrInput object populated with the binary data of multiple Images with a common ContentArea.

    This class is IDisposable and is best initiated with a 'using' statement.

    Declaration
    public OcrInput(IEnumerable<byte[]> Bytes)
    Parameters
    Type Name Description
    System.Collections.Generic.IEnumerable<System.Byte[]> Bytes

    An IEnumerable of byte arrays containing Image or PDF files.

    OcrInput(IEnumerable<Byte[]>, Rectangle)

    Create a new OcrInput object populated with the binary data of multiple Images with a common ContentArea.

    This class is IDisposable and is best initiated with a 'using' statement.

    Declaration
    public OcrInput(IEnumerable<byte[]> Bytes, Rectangle ContentArea)
    Parameters
    Type Name Description
    System.Collections.Generic.IEnumerable<System.Byte[]> Bytes

    An IEnumerable of byte arrays containing Image or PDF files.

    System.Drawing.Rectangle ContentArea

    Specifies a region of each image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

    OcrInput(IEnumerable<Bitmap>)

    Create a new OcrInput object populated with multiple System.Drawing.Bitmap.

    This class is IDisposable and is best initiated with a 'using' statement.

    Declaration
    public OcrInput(IEnumerable<Bitmap> Bitmaps)
    Parameters
    Type Name Description
    System.Collections.Generic.IEnumerable<System.Drawing.Bitmap> Bitmaps

    An IEnumerable of System.Drawing.Bitmap.

    OcrInput(IEnumerable<Bitmap>, Rectangle)

    Create a new OcrInput object populated with multiple System.Drawing.Bitmaps sharing a common ContentArea.

    This class is IDisposable and is best initiated with a 'using' statement.

    Declaration
    public OcrInput(IEnumerable<Bitmap> Bitmaps, Rectangle ContentArea)
    Parameters
    Type Name Description
    System.Collections.Generic.IEnumerable<System.Drawing.Bitmap> Bitmaps

    An IEnumerable of System.Drawing.Bitmap.

    System.Drawing.Rectangle ContentArea

    Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

    OcrInput(IEnumerable<Image>)

    Create a new OcrInput object populated with any number of System.Drawing.Image.

    This class is IDisposable and is best initiated with a 'using' statement.

    Declaration
    public OcrInput(IEnumerable<Image> Images)
    Parameters
    Type Name Description
    System.Collections.Generic.IEnumerable<System.Drawing.Image> Images

    Any Number of System.Drawing.Image

    OcrInput(IEnumerable<Image>, Rectangle)

    Create a new OcrInput object populated with any number of System.Drawing.Image sharing a common ContentArea.

    This class is IDisposable and is best initiated with a 'using' statement.

    Declaration
    public OcrInput(IEnumerable<Image> Images, Rectangle ContentArea)
    Parameters
    Type Name Description
    System.Collections.Generic.IEnumerable<System.Drawing.Image> Images

    Any Number of System.Drawing.Image

    System.Drawing.Rectangle ContentArea

    Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

    OcrInput(IEnumerable<Stream>)

    Create a new OcrInput object populated with multiple images as Streams.

    This class is IDisposable and is best initiated with a 'using' statement.

    Declaration
    public OcrInput(IEnumerable<Stream> Streams)
    Parameters
    Type Name Description
    System.Collections.Generic.IEnumerable<System.IO.Stream> Streams

    Steam containing an Image or PDF file.

    OcrInput(IEnumerable<Stream>, Rectangle)

    Create a new OcrInput object populated with multiple images as Streams sharing a common ContentArea.

    This class is IDisposable and is best initiated with a 'using' statement.

    Declaration
    public OcrInput(IEnumerable<Stream> Streams, Rectangle ContentArea)
    Parameters
    Type Name Description
    System.Collections.Generic.IEnumerable<System.IO.Stream> Streams

    Steam containing an Image or PDF file.

    System.Drawing.Rectangle ContentArea

    Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

    OcrInput(IEnumerable<String>)

    Create a new OcrInput object populated with multiple Image files.

    This class is IDisposable and is best initiated with a 'using' statement.

    Declaration
    public OcrInput(IEnumerable<string> FilePaths)
    Parameters
    Type Name Description
    System.Collections.Generic.IEnumerable<System.String> FilePaths

    An IEnumerable of paths to Image or PDF files.

    OcrInput(IEnumerable<String>, Rectangle)

    Create a new OcrInput object populated with multiple Image files with a common ContentArea.

    This class is IDisposable and is best initiated with a 'using' statement.

    Declaration
    public OcrInput(IEnumerable<string> FilePaths, Rectangle ContentArea)
    Parameters
    Type Name Description
    System.Collections.Generic.IEnumerable<System.String> FilePaths

    An IEnumerable of paths to Image or PDF files.

    System.Drawing.Rectangle ContentArea

    Specifies a region of each image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

    OcrInput(Bitmap)

    Create a new OcrInput object populated with a System.Drawing.Bitmap.

    This class is IDisposable and is best initiated with a 'using' statement.

    Declaration
    public OcrInput(Bitmap Bitmap)
    Parameters
    Type Name Description
    System.Drawing.Bitmap Bitmap

    A System.Drawing.Bitmap.

    OcrInput(Bitmap, Rectangle)

    Create a new OcrInput object populated with a System.Drawing.Bitmap.

    This class is IDisposable and is best initiated with a 'using' statement.

    Declaration
    public OcrInput(Bitmap Bitmap, Rectangle ContentArea)
    Parameters
    Type Name Description
    System.Drawing.Bitmap Bitmap

    A System.Drawing.Bitmap.

    System.Drawing.Rectangle ContentArea

    Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

    OcrInput(Image)

    Create a new OcrInput object populated with a System.Drawing.Image.

    This class is IDisposable and is best initiated with a 'using' statement.

    Declaration
    public OcrInput(Image Image)
    Parameters
    Type Name Description
    System.Drawing.Image Image

    System.Drawing.Image

    OcrInput(Image, Rectangle)

    Create a new OcrInput object populated with a specified region of a System.Drawing.Image.

    This class is IDisposable and is best initiated with a 'using' statement.

    Declaration
    public OcrInput(Image Image, Rectangle ContentArea)
    Parameters
    Type Name Description
    System.Drawing.Image Image

    System.Drawing.Image

    System.Drawing.Rectangle ContentArea

    Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

    OcrInput(Rectangle, Object[])

    Create a new OcrInput object populated with one or more images sharing a common crop area.

    This class is IDisposable and is best initiated with a 'using' statement.

    This constructor accepts any number of images as File Paths, Streams, Byte Arrays, System.Drawing.Image and System.Drawing.Bitmap. Each will become a OcrInput.Page.

    Declaration
    public OcrInput(Rectangle ContentArea, params object[] Inputs)
    Parameters
    Type Name Description
    System.Drawing.Rectangle ContentArea

    Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

    System.Object[] Inputs

    Any number of images as File Paths, Streams, Byte Arrays, System.Drawing.Image and System.Drawing.Bitmap.

    OcrInput(Stream)

    Create a new OcrInput object populated with image data as a Stream.

    This class is IDisposable and is best initiated with a 'using' statement.

    Declaration
    public OcrInput(Stream Stream)
    Parameters
    Type Name Description
    System.IO.Stream Stream

    Steam containing an Image or PDF file.

    OcrInput(Stream, Rectangle)

    Create a new OcrInput object populated with image data as a Stream.

    This class is IDisposable and is best initiated with a 'using' statement.

    Declaration
    public OcrInput(Stream Stream, Rectangle ContentArea)
    Parameters
    Type Name Description
    System.IO.Stream Stream

    Steam containing an Image or PDF file.

    System.Drawing.Rectangle ContentArea

    Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

    OcrInput(Object[])

    Create a new OcrInput object to which images and PDF pages may be added.

    This class is IDisposable and is best initiated with a 'using' statement.

    Declaration
    public OcrInput(params object[] Inputs)
    Parameters
    Type Name Description
    System.Object[] Inputs

    Any number of images as File Paths, Streams, Byte Arrays, System.Drawing.Image and System.Drawing.Bitmap.

    OcrInput(String)

    Create a new OcrInput object populated with an Image file or PDF document.

    This class is IDisposable and is best initiated with a 'using' statement.

    Declaration
    public OcrInput(string FilePath)
    Parameters
    Type Name Description
    System.String FilePath

    Path to an Image or PDF file.

    OcrInput(String, Rectangle)

    Create a new OcrInput object populated with an Image file.

    This class is IDisposable and is best initiated with a 'using' statement.

    Declaration
    public OcrInput(string FilePath, Rectangle ContentArea)
    Parameters
    Type Name Description
    System.String FilePath

    Path to an Image or PDF file.

    System.Drawing.Rectangle ContentArea

    Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

    Fields

    MinimumDPI

    OcrInput will automatically detect images with a resolution below MinimumDPI (typically 150DPI) and upscale them to TargetDPI to avoid poor OCR speed and results.

    To disable this automatic image enhancement functionality, set MinimumDPI = null.

    Declaration
    public int? MinimumDPI
    Field Value
    Type Description
    System.Nullable<System.Int32>

    TargetDPI

    The resolution that low resolution images will be enhanced to. See MinimumDPI.

    TargetDPI also determines the resolution at which PDF documents will be sampled.

    Declaration
    public int TargetDPI
    Field Value
    Type Description
    System.Int32

    Properties

    Pages

    Access to every OcrInput.Page within this OcrInput

    Declaration
    public IEnumerable<OcrInput.Page> Pages { get; }
    Property Value
    Type Description
    System.Collections.Generic.IEnumerable<OcrInput.Page>

    Title

    A title for the OcrInput document. This is relevant as it becomes metadata when exporting searchable PDFs and HOCR files from IronTesseract results.

    See SaveAsSearchablePdf(String) and SaveAsHocrFile(String)

    Declaration
    public string Title { get; set; }
    Property Value
    Type Description
    System.String

    Methods

    AddImage(Byte[])

    Adds a byte array containing the binary data of an image to this OcrInput.

    Declaration
    public void AddImage(byte[] ImageBytes)
    Parameters
    Type Name Description
    System.Byte[] ImageBytes

    A byte[] containing an image. Supported formats include JPEG, TIFF, GIF, PNG, PDF, BMP.

    AddImage(Byte[], Rectangle)

    Adds a byte array containing the binary data of an image to this OcrInput.

    Declaration
    public void AddImage(byte[] ImageBytes, Rectangle ContentArea)
    Parameters
    Type Name Description
    System.Byte[] ImageBytes

    A byte[] containing an image. Supported formats include JPEG, TIFF, GIF, PNG, PDF, BMP.

    System.Drawing.Rectangle ContentArea

    Optionally specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

    AddImage(Bitmap)

    Adds a System.Drawing.Bitmap to this OcrInput.

    Declaration
    public void AddImage(Bitmap Bitmap)
    Parameters
    Type Name Description
    System.Drawing.Bitmap Bitmap

    A managed Bitmap object.

    AddImage(Bitmap, Rectangle)

    Adds a System.Drawing.Bitmap to this OcrInput.

    Declaration
    public void AddImage(Bitmap Bitmap, Rectangle ContentArea)
    Parameters
    Type Name Description
    System.Drawing.Bitmap Bitmap

    A managed Bitmap object.

    System.Drawing.Rectangle ContentArea

    Optionally specifies a region of the bitmap to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

    AddImage(Image)

    Adds a System.Drawing.Image to this OcrInput.

    Declaration
    public void AddImage(Image Image)
    Parameters
    Type Name Description
    System.Drawing.Image Image

    A managed Image object.

    AddImage(Image, Rectangle)

    Adds a System.Drawing.Image to this OcrInput.

    Declaration
    public void AddImage(Image Image, Rectangle ContentArea)
    Parameters
    Type Name Description
    System.Drawing.Image Image

    A managed Image object.

    System.Drawing.Rectangle ContentArea

    Optionally specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

    AddImage(Stream)

    Adds a System.IO.Stream containing the raw data of an image to this OcrInput.

    Declaration
    public void AddImage(Stream ImageStream)
    Parameters
    Type Name Description
    System.IO.Stream ImageStream

    A Stream containing an image. Supported formats include JPEG, TIFF, GIF, PNG, PDF, BMP.

    AddImage(Stream, Rectangle)

    Adds a System.IO.Stream containing the raw data of an image to this OcrInput.

    Declaration
    public void AddImage(Stream ImageStream, Rectangle ContentArea)
    Parameters
    Type Name Description
    System.IO.Stream ImageStream

    A Stream containing an image. Supported formats include JPEG, TIFF, GIF, PNG, PDF, BMP.

    System.Drawing.Rectangle ContentArea

    Optionally specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

    AddImage(String)

    Adds an image file to this OcrInput.

    Declaration
    public void AddImage(string ImagePath)
    Parameters
    Type Name Description
    System.String ImagePath

    File path to an image file. Supported formats include JPEG, TIFF, GIF, PNG, PDF, BMP.

    AddImage(String, Rectangle)

    Adds an image file to this OcrInput.

    Declaration
    public void AddImage(string ImagePath, Rectangle ContentArea)
    Parameters
    Type Name Description
    System.String ImagePath

    File path to an image file. Supported formats include JPEG, TIFF, GIF, PNG, PDF, BMP.

    System.Drawing.Rectangle ContentArea

    Optionally specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

    AddMultiFrameTiff(Byte[])

    Adds a byte[] containing the binary data of a TIFF image with multiple pages to this OcrInput.

    Declaration
    public void AddMultiFrameTiff(byte[] TiffBytes)
    Parameters
    Type Name Description
    System.Byte[] TiffBytes

    A byte[] containing a TIFF file.

    AddMultiFrameTiff(Byte[], Rectangle)

    Adds a byte[] containing the binary data of a TIFF image with multiple pages to this OcrInput.

    Declaration
    public void AddMultiFrameTiff(byte[] TiffBytes, Rectangle ContentArea)
    Parameters
    Type Name Description
    System.Byte[] TiffBytes

    A byte[] containing a TIFF file.

    System.Drawing.Rectangle ContentArea

    Optionally specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

    AddMultiFrameTiff(Stream)

    Adds a Stream containing the binary data of a TIFF image with multiple pages to this OcrInput.

    Declaration
    public void AddMultiFrameTiff(Stream TiffSteam)
    Parameters
    Type Name Description
    System.IO.Stream TiffSteam

    A System.IO.Stream containing a TIFF file .

    AddMultiFrameTiff(Stream, Rectangle)

    Adds a Stream containing the binary data of a TIFF image with multiple pages to this OcrInput.

    Declaration
    public void AddMultiFrameTiff(Stream TiffSteam, Rectangle ContentArea)
    Parameters
    Type Name Description
    System.IO.Stream TiffSteam

    A System.IO.Stream containing a TIFF file .

    System.Drawing.Rectangle ContentArea

    Optionally specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

    AddMultiFrameTiff(String)

    Adds a Multi-frame TIFF file to the OcrInput document.

    Each Frame will become a page of this OcrInput

    Declaration
    public void AddMultiFrameTiff(string ImagePath)
    Parameters
    Type Name Description
    System.String ImagePath

    A file path to a TIFF image.

    AddMultiFrameTiff(String, Rectangle)

    Adds a Multi-frame TIFF file to the OcrInput document.

    Each Frame will become a page of this OcrInput

    Declaration
    public void AddMultiFrameTiff(string ImagePath, Rectangle ContentArea)
    Parameters
    Type Name Description
    System.String ImagePath

    A file path to a TIFF image.

    System.Drawing.Rectangle ContentArea

    Optionally specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

    AddPdf(Byte[], String, Nullable<Rectangle>, Nullable<Int32>)

    Adds all pages of a PDF document to this OcrInput.

    Declaration
    public void AddPdf(byte[] PdfBytes, string Password = null, Rectangle? ContentArea = default(Rectangle? ), int? DPI)
    Parameters
    Type Name Description
    System.Byte[] PdfBytes

    Binary data of a PDF file

    System.String Password

    Optional Password to unlock an encrypted or protected PDF

    System.Nullable<System.Drawing.Rectangle> ContentArea

    Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

    System.Nullable<System.Int32> DPI

    Resolution at which to sample the PDF. If null or zero will use TargetDPI

    AddPdf(Stream, String, Nullable<Rectangle>, Nullable<Int32>)

    Adds all pages of a PDF document to this OcrInput.

    Declaration
    public void AddPdf(Stream PdfStream, string Password = null, Rectangle? ContentArea = default(Rectangle? ), int? DPI)
    Parameters
    Type Name Description
    System.IO.Stream PdfStream

    System.IO.Stream containing a PDF

    System.String Password

    Optional Password to unlock an encrypted or protected PDF

    System.Nullable<System.Drawing.Rectangle> ContentArea

    Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

    System.Nullable<System.Int32> DPI

    Resolution at which to sample the PDF. If null or zero will use TargetDPI

    AddPdf(String, String, Nullable<Rectangle>, Nullable<Int32>)

    Adds all pages of a PDF document to this OcrInput.

    Declaration
    public void AddPdf(string PdfPath, string Password = null, Rectangle? ContentArea = default(Rectangle? ), int? DPI)
    Parameters
    Type Name Description
    System.String PdfPath

    String file path to the PDF

    System.String Password

    Optional Password to unlock an encrypted or protected PDF

    System.Nullable<System.Drawing.Rectangle> ContentArea

    Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

    System.Nullable<System.Int32> DPI

    Resolution at which to sample the PDF. If null or zero will use TargetDPI

    AddPdfPage(Byte[], Int32, String, Nullable<Rectangle>, Nullable<Int32>)

    Adds one page of a PDF document to this OcrInput.

    Declaration
    public void AddPdfPage(byte[] PdfBytes, int Page, string Password = null, Rectangle? ContentArea = default(Rectangle? ), int? DPI)
    Parameters
    Type Name Description
    System.Byte[] PdfBytes

    Binary data of a PDF file

    System.Int32 Page

    The page number within the PDF to read. Zero based (first page is number 0)

    System.String Password

    Optional Password to unlock an encrypted or protected PDF

    System.Nullable<System.Drawing.Rectangle> ContentArea

    Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

    System.Nullable<System.Int32> DPI

    Resolution at which to sample the PDF. If null or zero will use TargetDPI

    AddPdfPage(Stream, Int32, String, Nullable<Rectangle>, Nullable<Int32>)

    Adds one page of a PDF document to this OcrInput.

    Declaration
    public void AddPdfPage(Stream PdfStream, int Page, string Password = null, Rectangle? ContentArea = default(Rectangle? ), int? DPI)
    Parameters
    Type Name Description
    System.IO.Stream PdfStream

    System.IO.Stream containing a PDF

    System.Int32 Page

    The page number within the PDF to read. Zero based (first page is number 0)

    System.String Password

    Optional Password to unlock an encrypted or protected PDF

    System.Nullable<System.Drawing.Rectangle> ContentArea

    Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

    System.Nullable<System.Int32> DPI

    Resolution at which to sample the PDF. If null or zero will use TargetDPI

    AddPdfPage(String, Int32, String, Nullable<Rectangle>, Nullable<Int32>)

    Adds one page of a PDF document to this OcrInput.

    Declaration
    public void AddPdfPage(string PdfPath, int Page, string Password = null, Rectangle? ContentArea = default(Rectangle? ), int? DPI)
    Parameters
    Type Name Description
    System.String PdfPath

    String file path to the PDF

    System.Int32 Page

    The page number within the PDF to read. Zero based (first page is number 0)

    System.String Password

    Optional Password to unlock an encrypted or protected PDF

    System.Nullable<System.Drawing.Rectangle> ContentArea

    Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

    System.Nullable<System.Int32> DPI

    Resolution at which to sample the PDF. If null or zero will use TargetDPI

    AddPdfPages(Byte[], IEnumerable<Int32>, String, Nullable<Rectangle>, Nullable<Int32>)

    Adds selected pages of a PDF document to this OcrInput.

    Declaration
    public void AddPdfPages(byte[] PdfBytes, IEnumerable<int> Pages, string Password = null, Rectangle? ContentArea = default(Rectangle? ), int? DPI)
    Parameters
    Type Name Description
    System.Byte[] PdfBytes

    Binary data of a PDF file

    System.Collections.Generic.IEnumerable<System.Int32> Pages

    The page numbers within the PDF to read. Zero based (first page is number 0)

    System.String Password

    Optional Password to unlock an encrypted or protected PDF

    System.Nullable<System.Drawing.Rectangle> ContentArea

    Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

    System.Nullable<System.Int32> DPI

    Resolution at which to sample the PDF. If null or zero will use TargetDPI

    AddPdfPages(Stream, IEnumerable<Int32>, String, Nullable<Rectangle>, Nullable<Int32>)

    Adds selected pages of a PDF document to this OcrInput.

    Declaration
    public void AddPdfPages(Stream PdfStream, IEnumerable<int> Pages, string Password = null, Rectangle? ContentArea = default(Rectangle? ), int? DPI)
    Parameters
    Type Name Description
    System.IO.Stream PdfStream

    System.IO.Stream containing a PDF

    System.Collections.Generic.IEnumerable<System.Int32> Pages

    The page numbers within the PDF to read. Zero based (first page is number 0)

    System.String Password

    Optional Password to unlock an encrypted or protected PDF

    System.Nullable<System.Drawing.Rectangle> ContentArea

    Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

    System.Nullable<System.Int32> DPI

    Resolution at which to sample the PDF. If null or zero will use TargetDPI

    AddPdfPages(String, IEnumerable<Int32>, String, Nullable<Rectangle>, Nullable<Int32>)

    Adds selected pages from a PDF document into this OcrInput.

    Declaration
    public void AddPdfPages(string PdfPath, IEnumerable<int> Pages, string Password = null, Rectangle? ContentArea = default(Rectangle? ), int? DPI)
    Parameters
    Type Name Description
    System.String PdfPath

    String file path to the PDF

    System.Collections.Generic.IEnumerable<System.Int32> Pages
    System.String Password

    Optional Password to unlock an encrypted or protected PDF

    System.Nullable<System.Drawing.Rectangle> ContentArea

    Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

    System.Nullable<System.Int32> DPI

    Resolution at which to sample the PDF. If null or zero will use TargetDPI

    AddRange(OcrInput)

    Combines 2 instances of OcrInput, appending pages to the end of this OcrInput document.

    Declaration
    public void AddRange(OcrInput Range)
    Parameters
    Type Name Description
    OcrInput Range

    an Ocr Input to be appended to this instance.

    AddRange(IEnumerable<Object>)

    Adds an IEnumerable of Images to this OcrInput.

    Supports FilePaths, System.Drawing.Image, Bitmaps, Streams & ByteArray.

    Declaration
    public void AddRange(IEnumerable<object> Images)
    Parameters
    Type Name Description
    System.Collections.Generic.IEnumerable<System.Object> Images

    Adds an IEnumerable of objects representing images. Supports FilePaths, System.Drawing.Image, Bitmaps, Streams & byte[].

    AddRange(IEnumerable<Object>, Rectangle)

    Adds an IEnumerable of Images to this OcrInput.

    Supports FilePaths, System.Drawing.Image, Bitmaps, Streams & ByteArray.

    Declaration
    public void AddRange(IEnumerable<object> Images, Rectangle ContentArea)
    Parameters
    Type Name Description
    System.Collections.Generic.IEnumerable<System.Object> Images

    Adds an IEnumerable of objects representing images. Supports FilePaths, System.Drawing.Image, Bitmaps, Streams & byte[].

    System.Drawing.Rectangle ContentArea

    Specifies a region of the images to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

    Binarize()

    This image filter turns every pixel black or white with no middle ground. May Improve OCR performance cases of very low contrast of text to background.

    Declaration
    public OcrInput Binarize()
    Returns
    Type Description
    OcrInput

    This OcrInput object allowing for LINQ style fluent notation.

    Contrast()

    Increases contrast automatically. This filter often improves OCR speed and accuracy in low contrast scans. Flattens Alpha channels to white.

    Declaration
    public OcrInput Contrast()
    Returns
    Type Description
    OcrInput

    This OcrInput object allowing for LINQ style fluent notation.

    DeepCleanBackgroundNoise()

    Heavy background noise removal. Only use this filter in case extreme document background noise is known, because this filter will also risk reducing OCR accuracy of clean documents, and is very CPU expensive.

    Declaration
    public OcrInput DeepCleanBackgroundNoise()
    Returns
    Type Description
    OcrInput

    This OcrInput object allowing for LINQ style fluent notation.

    DeNoise()

    Removes digital noise. This filter should only be used where noise is expected. Flattens Alpha channels to white.

    Declaration
    public OcrInput DeNoise()
    Returns
    Type Description
    OcrInput

    This OcrInput object allowing for LINQ style fluent notation.

    Deskew(Int32, Boolean)

    Rotates an image so it is the right way up and orthogonal. This is very useful for OCR because Tesseract tolerance for skewed scans can be as low as 5 degrees.

    Declaration
    public OcrInput Deskew(int MaxAngle = 85, bool EveryPageTheSameAmount = false)
    Parameters
    Type Name Description
    System.Int32 MaxAngle

    Maximum expected angle of skew

    System.Boolean EveryPageTheSameAmount

    DO we know that every page of the OcrInput is skewed by the same angle. Enhances speed if true.

    Returns
    Type Description
    OcrInput

    This OcrInput object allowing for LINQ style fluent notation.

    Dilate(Boolean)

    Advanced Morphology. Opposite of Erode(Boolean).

    Declaration
    public OcrInput Dilate(bool use3x3 = false)
    Parameters
    Type Name Description
    System.Boolean use3x3

    2x2 is default morphology

    Returns
    Type Description
    OcrInput

    This OcrInput object allowing for LINQ style fluent notation.

    Dispose()

    OcrInput is IDisposable. For best practice and to avoid memory leaks, remember to dispose, or initialize instances with a "using" statement.

    Declaration
    public void Dispose()

    EnhanceResolution(Int32)

    Enhances the resolution of low quality images. This filter is not often needed because MinimumDPI and TargetDPI will automatically catch and resolve low resolution inputs.

    May not work for all images if their metadata is corrupted.

    Declaration
    public OcrInput EnhanceResolution(int TargetDPI = 225)
    Parameters
    Type Name Description
    System.Int32 TargetDPI

    The target DPI to resample to.

    Returns
    Type Description
    OcrInput

    This OcrInput object allowing for LINQ style fluent notation.

    EnhanceResolution(Int32, Int32)

    Enhances the resolution of low quality images. This filter is not often needed because MinimumDPI and TargetDPI will automatically catch and resolve low resolution inputs.

    May not work for all images if their metadata is corrupted.

    Declaration
    public OcrInput EnhanceResolution(int TargetDPI, int MinimumDPI)
    Parameters
    Type Name Description
    System.Int32 TargetDPI

    The target DPI to resample to.

    System.Int32 MinimumDPI

    Only resamples images below this DPI threshold.

    Returns
    Type Description
    OcrInput

    This OcrInput object allowing for LINQ style fluent notation.

    Erode(Boolean)

    Advanced Morphology. Opposite of Dilate(Boolean).

    Declaration
    public OcrInput Erode(bool use3x3 = false)
    Parameters
    Type Name Description
    System.Boolean use3x3

    2x2 is default morphology

    Returns
    Type Description
    OcrInput

    This OcrInput object allowing for LINQ style fluent notation.

    Finalize()

    OcrInput has a safe finaliser that cleans up undisposed native images in memory.

    Declaration
    protected void Finalize()

    FromPdf(Byte[], String, Nullable<Rectangle>, Nullable<Int32>)

    Create a new OcrInput object populated with a PDF as binary data.

    This class is IDisposable and is best initiated with a 'using' statement.

    This constructor accepts any number of images as File Paths, Streams, Byte Arrays, System.Drawing.Image and System.Drawing.Bitmap. Each will become a OcrInput.Page.

    Declaration
    public static OcrInput FromPdf(byte[] PdfBytes, string Password = null, Rectangle? ContentArea = default(Rectangle? ), int? DPI)
    Parameters
    Type Name Description
    System.Byte[] PdfBytes

    The PDF document as binary data in memory.

    System.String Password

    Optional Password to unlock an encrypted or protected PDF

    System.Nullable<System.Drawing.Rectangle> ContentArea

    Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

    System.Nullable<System.Int32> DPI

    Resolution at which to sample the PDF. If null or zero will use TargetDPI

    Returns
    Type Description
    OcrInput

    FromPdf(Stream, String, Nullable<Rectangle>, Nullable<Int32>)

    Create a new OcrInput object populated with a PDF as a Stream.

    This class is IDisposable and is best initiated with a 'using' statement.

    This constructor accepts any number of images as File Paths, Streams, Byte Arrays, System.Drawing.Image and System.Drawing.Bitmap. Each will become a OcrInput.Page.

    Declaration
    public static OcrInput FromPdf(Stream PdfStream, string Password = null, Rectangle? ContentArea = default(Rectangle? ), int? DPI)
    Parameters
    Type Name Description
    System.IO.Stream PdfStream

    The PDF document as a System.IO.Stream.

    System.String Password

    Optional Password to unlock an encrypted or protected PDF

    System.Nullable<System.Drawing.Rectangle> ContentArea

    Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

    System.Nullable<System.Int32> DPI

    Resolution at which to sample the PDF. If null or zero will use TargetDPI

    Returns
    Type Description
    OcrInput

    FromPdf(String, String, Nullable<Rectangle>, Nullable<Int32>)

    Create a new OcrInput object populated with a PDF.

    This class is IDisposable and is best initiated with a 'using' statement.

    This constructor accepts any number of images as File Paths, Streams, Byte Arrays, System.Drawing.Image and System.Drawing.Bitmap. Each will become a OcrInput.Page.

    Declaration
    public static OcrInput FromPdf(string PdfPath, string Password = null, Rectangle? ContentArea = default(Rectangle? ), int? DPI = default(int? ))
    Parameters
    Type Name Description
    System.String PdfPath

    File path to the PDF

    System.String Password

    Optional Password to unlock an encrypted or protected PDF

    System.Nullable<System.Drawing.Rectangle> ContentArea

    Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

    System.Nullable<System.Int32> DPI

    Resolution at which to sample the PDF. If null or zero will use TargetDPI

    Returns
    Type Description
    OcrInput

    FromPdfPage(Byte[], Int32, String, Nullable<Rectangle>, Nullable<Int32>)

    Create a new OcrInput object populated with a single page from a PDF as binary data.

    This class is IDisposable and is best initiated with a 'using' statement.

    This constructor accepts any number of images as File Paths, Streams, Byte Arrays, System.Drawing.Image and System.Drawing.Bitmap. Each will become a OcrInput.Page.

    Declaration
    public static OcrInput FromPdfPage(byte[] PdfBytes, int Page, string Password = null, Rectangle? ContentArea = default(Rectangle? ), int? DPI)
    Parameters
    Type Name Description
    System.Byte[] PdfBytes

    The PDF document as binary data in memory.

    System.Int32 Page

    The page number within the PDF to read. Zero based (first page is number 0)

    System.String Password

    Optional Password to unlock an encrypted or protected PDF

    System.Nullable<System.Drawing.Rectangle> ContentArea

    Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

    System.Nullable<System.Int32> DPI

    Resolution at which to sample the PDF. If null or zero will use TargetDPI

    Returns
    Type Description
    OcrInput

    FromPdfPage(Stream, Int32, String, Nullable<Rectangle>, Nullable<Int32>)

    Create a new OcrInput object populated with a single page from a PDF as a Stream.

    This class is IDisposable and is best initiated with a 'using' statement.

    This constructor accepts any number of images as File Paths, Streams, Byte Arrays, System.Drawing.Image and System.Drawing.Bitmap. Each will become a OcrInput.Page.

    Declaration
    public static OcrInput FromPdfPage(Stream PdfStream, int Page, string Password = null, Rectangle? ContentArea = default(Rectangle? ), int? DPI)
    Parameters
    Type Name Description
    System.IO.Stream PdfStream

    The PDF document as a System.IO.Stream.

    System.Int32 Page

    The page number within the PDF to read. Zero based (first page is number 0)

    System.String Password

    Optional Password to unlock an encrypted or protected PDF

    System.Nullable<System.Drawing.Rectangle> ContentArea

    Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

    System.Nullable<System.Int32> DPI

    Resolution at which to sample the PDF. If null or zero will use TargetDPI

    Returns
    Type Description
    OcrInput

    FromPdfPage(String, Int32, String, Nullable<Rectangle>, Nullable<Int32>)

    Create a new OcrInput object populated with a single page of a PDF.

    This class is IDisposable and is best initiated with a 'using' statement.

    This constructor accepts any number of images as File Paths, Streams, Byte Arrays, System.Drawing.Image and System.Drawing.Bitmap. Each will become a OcrInput.Page.

    Declaration
    public static OcrInput FromPdfPage(string PdfPath, int Page, string Password = null, Rectangle? ContentArea = default(Rectangle? ), int? DPI)
    Parameters
    Type Name Description
    System.String PdfPath

    File path to the PDF

    System.Int32 Page

    Which page of the PDF to read. Zero based (first page is number 0)

    System.String Password

    Optional Password to unlock an encrypted or protected PDF

    System.Nullable<System.Drawing.Rectangle> ContentArea

    Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

    System.Nullable<System.Int32> DPI

    Resolution at which to sample the PDF. If null or zero will use TargetDPI

    Returns
    Type Description
    OcrInput

    FromPdfPages(Byte[], IEnumerable<Int32>, String, Nullable<Rectangle>, Nullable<Int32>)

    Create a new OcrInput object populated with multiple pages from a PDF as binary data.

    This class is IDisposable and is best initiated with a 'using' statement.

    This constructor accepts any number of images as File Paths, Streams, Byte Arrays, System.Drawing.Image and System.Drawing.Bitmap. Each will become a OcrInput.Page.

    Declaration
    public static OcrInput FromPdfPages(byte[] PdfBytes, IEnumerable<int> Pages, string Password = null, Rectangle? ContentArea = default(Rectangle? ), int? DPI)
    Parameters
    Type Name Description
    System.Byte[] PdfBytes

    The PDF document as binary data in memory.

    System.Collections.Generic.IEnumerable<System.Int32> Pages

    List which pages of the PDF which will be read. Zero based (first page is number 0)

    System.String Password

    Optional Password to unlock an encrypted or protected PDF

    System.Nullable<System.Drawing.Rectangle> ContentArea

    Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

    System.Nullable<System.Int32> DPI

    Resolution at which to sample the PDF. If null or zero will use TargetDPI

    Returns
    Type Description
    OcrInput

    FromPdfPages(Stream, IEnumerable<Int32>, String, Nullable<Rectangle>, Nullable<Int32>)

    Create a new OcrInput object populated with multiple selected pages from a PDF as a Stream.

    This class is IDisposable and is best initiated with a 'using' statement.

    This constructor accepts any number of images as File Paths, Streams, Byte Arrays, System.Drawing.Image and System.Drawing.Bitmap. Each will become a OcrInput.Page.

    Declaration
    public static OcrInput FromPdfPages(Stream PdfStream, IEnumerable<int> Pages, string Password = null, Rectangle? ContentArea = default(Rectangle? ), int? DPI)
    Parameters
    Type Name Description
    System.IO.Stream PdfStream

    The PDF document as a System.IO.Stream.

    System.Collections.Generic.IEnumerable<System.Int32> Pages

    The pages of the PDF to read. Zero based (first page is number 0)

    System.String Password

    Optional Password to unlock an encrypted or protected PDF

    System.Nullable<System.Drawing.Rectangle> ContentArea

    Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

    System.Nullable<System.Int32> DPI

    Resolution at which to sample the PDF. If null or zero will use TargetDPI

    Returns
    Type Description
    OcrInput

    FromPdfPages(String, IEnumerable<Int32>, String, Nullable<Rectangle>, Nullable<Int32>)

    Create a new OcrInput object populated with multiple pages from a PDF.

    This class is IDisposable and is best initiated with a 'using' statement.

    This constructor accepts any number of images as File Paths, Streams, Byte Arrays, System.Drawing.Image and System.Drawing.Bitmap. Each will become a OcrInput.Page.

    Declaration
    public static OcrInput FromPdfPages(string PdfPath, IEnumerable<int> Pages, string Password = null, Rectangle? ContentArea = default(Rectangle? ), int? DPI)
    Parameters
    Type Name Description
    System.String PdfPath

    File path to the PDF

    System.Collections.Generic.IEnumerable<System.Int32> Pages

    List which pages of the PDF will be read. Zero based (first page is number 0)

    System.String Password

    Optional Password to unlock an encrypted or protected PDF

    System.Nullable<System.Drawing.Rectangle> ContentArea

    Specifies a region of the image to extract text from as a System.Drawing.Rectangle with X, Y Width and Height in pixels. Setting a ContentArea can improve OCR speed.

    System.Nullable<System.Int32> DPI

    Resolution at which to sample the PDF. If null or zero will use TargetDPI

    Returns
    Type Description
    OcrInput

    Invert(Boolean)

    Inverts every color. E.g. White becomes black : black becomes white.

    Declaration
    public OcrInput Invert(bool GrayScale = true)
    Parameters
    Type Name Description
    System.Boolean GrayScale

    Optionally remove all color channels and return a GrayScale image.

    Returns
    Type Description
    OcrInput

    This OcrInput object allowing for LINQ style fluent notation.

    PageCount()

    The number of OcrInput.Pages currently present in this OcrInput

    Declaration
    public int PageCount()
    Returns
    Type Description
    System.Int32

    Rotate(Double)

    Rotates images by a number of degrees clockwise. For anti-clockwise, use negative numbers. Also see Deskew(Int32, Boolean)

    Declaration
    public OcrInput Rotate(double Degrees)
    Parameters
    Type Name Description
    System.Double Degrees

    A number of clockwise degrees.

    Returns
    Type Description
    OcrInput

    This OcrInput object allowing for LINQ style fluent notation.

    Scale(Int32, Boolean)

    Scales OCRInput pages proportionally.

    Declaration
    public OcrInput Scale(int Percentage, bool ScaleCropArea = true)
    Parameters
    Type Name Description
    System.Int32 Percentage

    The percentage scale. 100 = no effect.

    System.Boolean ScaleCropArea

    Should associated crop areas also be scaled proportionally (recommended true)

    Returns
    Type Description
    OcrInput

    This OcrInput object allowing for LINQ style fluent notation.

    Scale(Int32, Int32, Boolean)

    Scales the OCRInput pages up in size.

    Declaration
    public OcrInput Scale(int MaxWidth, int MaxHeight, bool ScaleCropArea = true)
    Parameters
    Type Name Description
    System.Int32 MaxWidth

    Maximum width in pixels.

    System.Int32 MaxHeight

    Maximum height in pixels.

    System.Boolean ScaleCropArea

    Should associated crop areas also be scaled proportionally (recommended true)

    Returns
    Type Description
    OcrInput

    This OcrInput object allowing for LINQ style fluent notation.

    Sharpen()

    Sharpens blurred OCR Documents. Flattens Alpha channels to white.

    Declaration
    public OcrInput Sharpen()
    Returns
    Type Description
    OcrInput

    This OcrInput object allowing for LINQ style fluent notation.

    ToGrayScale()

    This image filter turns every pixel into a shade of grayscale. Unlikely to improve OCR accuracy but may improve speed.

    Declaration
    public OcrInput ToGrayScale()
    Returns
    Type Description
    OcrInput

    This OcrInput object allowing for LINQ style fluent notation.

    WithTitle(String)

    Adds a Title to the OcrInput Document. This title will be used when calling SaveAsHocrFile(String) and SaveAsSearchablePdf(String)

    Declaration
    public OcrInput WithTitle(string Title)
    Parameters
    Type Name Description
    System.String Title

    The document title as a string.

    Returns
    Type Description
    OcrInput

    This OcrInput object allowing for LINQ style fluent notation.

    Implements

    System.IDisposable
    ☀
    ☾
    In This Article
    Back to top
    Install with Nuget