Search Results for

    Show / Hide Table of Contents

    Class OcrResult

    A full document object model (DOM) for results when IronTesseract reads an image or OcrInput.

    Gives access to Text, Pages, Words, Paragraphs, Lines, Words, Characters, Images, Barcodes, Coordinates, Font information in granular detail.

    Inheritance
    System.Object
    OcrResult
    Implements
    IronSoftware.IOcrResult
    IronSoftware.Abstractions.IDocumentPageContainer<OcrResultPagesCollection>
    IronSoftware.Abstractions.IDocumentWithExtractableText
    Namespace: IronOcr
    Assembly: IronOcr.dll
    Syntax
    public class OcrResult : Object, IOcrResult, IDocumentPageContainer<OcrResultPagesCollection>, IDocumentWithExtractableText

    Properties

    Barcodes

    Represents every barcode discovered in this OCR document. Developers must set ReadBarCodes = True for this feature to be active.

    Declaration
    public OcrResult.Barcode[] Barcodes { get; }
    Property Value
    Type Description
    OcrResult.Barcode[]

    Blocks

    Represents every block of text discovered in this OCR document in order of appearance. A Block is a collection of 1 or more paragraphs located closely together.

    Declaration
    public OcrResult.Block[] Blocks { get; }
    Property Value
    Type Description
    OcrResult.Block[]

    Cancelled

    Indicates that the Ocr reading was cancelled by the user or after a timeout

    Declaration
    public bool Cancelled { get; }
    Property Value
    Type Description
    System.Boolean

    Characters

    Represents every symbol (char) discovered in this OCR document in order of appearance.

    Declaration
    public OcrResult.Character[] Characters { get; }
    Property Value
    Type Description
    OcrResult.Character[]

    Confidence

    OCR statistical accuracy confidence as an average of every character.

    1 = 100%, 0 = 0%

    Declaration
    public double Confidence { get; }
    Property Value
    Type Description
    System.Double

    EngineModeUsed

    The TesseractEngineMode used to generate this OcrResult.

    Declaration
    public TesseractEngineMode EngineModeUsed { get; }
    Property Value
    Type Description
    TesseractEngineMode
    See Also
    TesseractEngineMode

    Lines

    Represents every line of text discovered in this OCR document in order of appearance.

    Declaration
    public OcrResult.Line[] Lines { get; }
    Property Value
    Type Description
    OcrResult.Line[]

    PageCount

    Declaration
    public int PageCount { get; }
    Property Value
    Type Description
    System.Int32

    Pages

    Represents every page within this OcrResult object.

    Declaration
    public OcrResultPagesCollection Pages { get; }
    Property Value
    Type Description
    OcrResultPagesCollection

    Paragraphs

    Represents every paragraph of text discovered in this OCR document in order of appearance.

    Declaration
    public OcrResult.Paragraph[] Paragraphs { get; }
    Property Value
    Type Description
    OcrResult.Paragraph[]

    Tables

    Represents every table that can be rationalized clearrly in this OCR document. To see tables in the OcrResult, access the Tables property. To enable table reading, set IronTesseract's Configuration.ReadDataTables to true.

     var Ocr = new IronTesseract();
     Ocr.Configuration.ReadDataTables = true;

    Declaration
    public OcrResult.Table[] Tables { get; }
    Property Value
    Type Description
    OcrResult.Table[]

    TesseractVersion

    The TesseractVersion used to generate this OcrResult.

    Declaration
    public string TesseractVersion { get; }
    Property Value
    Type Description
    System.String
    See Also
    TesseractVersion

    Text

    Returns the entire Text content of this OCR document. 4 System.Environment.NewLine characters will separate pages. This is truncated when the product is unlicensed.

    Declaration
    public string Text { get; set; }
    Property Value
    Type Description
    System.String

    Words

    Represents every word discovered in this OCR document in order of appearance.

    Declaration
    public OcrResult.Word[] Words { get; }
    Property Value
    Type Description
    OcrResult.Word[]

    Methods

    ExtractTextFromPage(Int32)

    Declaration
    public string ExtractTextFromPage(int PageIndex)
    Parameters
    Type Name Description
    System.Int32 PageIndex
    Returns
    Type Description
    System.String

    ExtractTextFromPages(IEnumerable<Int32>)

    Declaration
    public string ExtractTextFromPages(IEnumerable<int> PageIndices)
    Parameters
    Type Name Description
    System.Collections.Generic.IEnumerable<System.Int32> PageIndices
    Returns
    Type Description
    System.String

    FromJson(String)

    Deserializes the JSON to the OcrResult object.

    Declaration
    public static OcrResult FromJson(string json)
    Parameters
    Type Name Description
    System.String json

    A JSON string representation of the OcrResult.

    Returns
    Type Description
    OcrResult

    The deserialized OcrResult object from the JSON string.

    Exceptions
    Type Condition
    System.ArgumentNullException

    json is null

    FromJsonFile(String)

    Deserializes the JSON file to the OcrResult object.

    Declaration
    public OcrResult FromJsonFile(string Path)
    Parameters
    Type Name Description
    System.String Path
    Returns
    Type Description
    OcrResult
    Remarks

    Use the method FromJson(String) instead if you want deserialize from a JSON string.

    SaveAsHocrFile(String)

    Exports an hOCR version of the Tesseract results object document. This is an XHTML file which can be read as XML or HTML.

    https://en.wikipedia.org/wiki/HOCR

    Declaration
    public void SaveAsHocrFile(string Path)
    Parameters
    Type Name Description
    System.String Path

    The file path the xhtml file will be saved to.

    SaveAsHocrString()

    Exports an hOCR version of the Tesseract results object document as a string. This is an XHTML file which can be read as XML or HTML.

    https://en.wikipedia.org/wiki/HOCR

    Declaration
    public string SaveAsHocrString()
    Returns
    Type Description
    System.String

    SaveAsHtmlDocument(String, String, Int32, Boolean)

    Converts the OcrResult into an HTML Document eg.: example.html

    Declaration
    public void SaveAsHtmlDocument(string path, string title, int pdfPageMargin = 10, bool fullContentWidth = false)
    Parameters
    Type Name Description
    System.String path

    File path to save to

    System.String title

    Title for the HTML document

    System.Int32 pdfPageMargin

    Margin to use for PDF page

    System.Boolean fullContentWidth

    Optionally use full content width in the HTML

    Remarks

    IronTesseract's Configuration.RenderSearchablePdf flag must be set to true.

    SaveAsHtmlString(String, Int32, Boolean)

    Converts the OcrResult into an HTML string

    Declaration
    public string SaveAsHtmlString(string title, int pdfPageMargin, bool fullContentWidth)
    Parameters
    Type Name Description
    System.String title

    Title for the HTML document

    System.Int32 pdfPageMargin

    Margin to use for PDF page

    System.Boolean fullContentWidth

    Optionally use full content width in the HTML

    Returns
    Type Description
    System.String
    Remarks

    IronTesseract's Configuration.RenderSearchablePdf flag must be set to true.

    SaveAsSearchablePdf(String, Boolean)

    Exports a searchable PDF version of the OCR input document. Works for all input formats including PDFs & Images.

    Declaration
    public byte[] SaveAsSearchablePdf(string Path = null, bool ApplyFilters = false)
    Parameters
    Type Name Description
    System.String Path

    The file path the PDF will be saved to.

    System.Boolean ApplyFilters

    Determine whether applying OcrFilters to the output searchable pdf or not; default is false.

    Returns
    Type Description
    System.Byte[]

    SaveAsSearchablePdfBytes(Boolean)

    Exports a searchable PDF version of the OCR input document as a byte array. Works for all input formats including PDFs & Images.

    Declaration
    public byte[] SaveAsSearchablePdfBytes(bool ApplyFilters = false)
    Parameters
    Type Name Description
    System.Boolean ApplyFilters

    Determine whether applying OcrFilters to the output searchable pdf or not; default is false.

    Returns
    Type Description
    System.Byte[]

    SaveAsSearchablePdfStream(Boolean)

    Exports a searchable PDF version of the OCR input document as a Stream. Works for all input formats including PDFs & Images.

    Declaration
    public Stream SaveAsSearchablePdfStream(bool ApplyFilters = false)
    Parameters
    Type Name Description
    System.Boolean ApplyFilters

    Determine whether applying OcrFilters to the output searchable pdf or not; default is false.

    Returns
    Type Description
    System.IO.Stream

    SaveAsTextFile(String)

    Exports a .txt version of the Tesseract results objects document. This is a plain text file.

    4 Environment.Newlines between pages. 2 Environment.Newlines between paragraphs.

    Declaration
    public void SaveAsTextFile(string Path)
    Parameters
    Type Name Description
    System.String Path

    The file path the text file will be saved to.

    SaveJsonAs(String)

    Serializes the OcrResult object to a JSON file..

    Declaration
    public void SaveJsonAs(string Path)
    Parameters
    Type Name Description
    System.String Path
    Remarks

    Use the method ToJson() instead if you want to get the string instead of saving to disk.

    ToJson()

    Serializes the OcrResult object to a JSON string.

    Declaration
    public string ToJson()
    Returns
    Type Description
    System.String

    The JSON string representation of the OcrResult.

    Remarks

    Use the method FromJson(String) for deserializing the JSON to the OcrResult object.

    Implements

    IronSoftware.IOcrResult
    IronSoftware.Abstractions.IDocumentPageContainer<>
    IronSoftware.Abstractions.IDocumentWithExtractableText
    ☀
    ☾
    Downloads
    • Download with Nuget
    • Start for Free
    In This Article
    Back to top
    Install with Nuget
    Want to deploy IronOCR to a live project for FREE?
    What’s included?
    30 days of fully-functional product
    Test and share in a live environment
    No watermarks in production
    Get your free 30-day Trial Key instantly.
    No credit card or account creation required
    Your Trial License Key has been emailed to you.
    Download IronOCR free to apply
    your Trial Licenses Key
    Install with NuGet View Licenses
    Licenses from $499. Have a question? Get in touch.