Class OcrResult
A full document object model (DOM) for results when IronTesseract reads an image or OcrInput.
Gives access to Text, Pages, Words, Paragraphs, Lines, Words, Characters, Images, Barcodes, Coordinates, Font information in granular detail.
Inheritance
Implements
Namespace: IronOcr
Assembly: IronOcr.dll
Syntax
public class OcrResult : Object, IOcrResult, IDocumentPageContainer<OcrResultPagesCollection>, IDocumentWithExtractableText
Properties
Barcodes
Represents every barcode discovered in this OCR document. Developers must set ReadBarCodes = True for this feature to be active.
Declaration
public OcrResult.Barcode[] Barcodes { get; }
Property Value
Type | Description |
---|---|
OcrResult.Barcode[] |
Blocks
Represents every block of text discovered in this OCR document in order of appearance. A Block is a collection of 1 or more paragraphs located closely together.
Declaration
public OcrResult.Block[] Blocks { get; }
Property Value
Type | Description |
---|---|
OcrResult.Block[] |
Cancelled
Indicates that the Ocr reading was cancelled by the user or after a timeout
Declaration
public bool Cancelled { get; }
Property Value
Type | Description |
---|---|
System.Boolean |
Characters
Represents every symbol (char) discovered in this OCR document in order of appearance.
Declaration
public OcrResult.Character[] Characters { get; }
Property Value
Type | Description |
---|---|
OcrResult.Character[] |
Confidence
OCR statistical accuracy confidence as an average of every character.
1 = 100%, 0 = 0%
Declaration
public double Confidence { get; }
Property Value
Type | Description |
---|---|
System.Double |
EngineModeUsed
The TesseractEngineMode used to generate this OcrResult.
Declaration
public TesseractEngineMode EngineModeUsed { get; }
Property Value
Type | Description |
---|---|
TesseractEngineMode |
See Also
Lines
Represents every line of text discovered in this OCR document in order of appearance.
Declaration
public OcrResult.Line[] Lines { get; }
Property Value
Type | Description |
---|---|
OcrResult.Line[] |
PageCount
Declaration
public int PageCount { get; }
Property Value
Type | Description |
---|---|
System.Int32 |
Pages
Represents every page within this OcrResult object.
Declaration
public OcrResultPagesCollection Pages { get; }
Property Value
Type | Description |
---|---|
OcrResultPagesCollection |
Paragraphs
Represents every paragraph of text discovered in this OCR document in order of appearance.
Declaration
public OcrResult.Paragraph[] Paragraphs { get; }
Property Value
Type | Description |
---|---|
OcrResult.Paragraph[] |
Tables
Represents every table that can be rationalized clearrly in this OCR document. To see tables in the OcrResult, access the Tables property. To enable table reading, set IronTesseract's Configuration.ReadDataTables to true.
var Ocr = new IronTesseract();
Ocr.Configuration.ReadDataTables = true;
Declaration
public OcrResult.Table[] Tables { get; }
Property Value
Type | Description |
---|---|
OcrResult.Table[] |
TesseractVersion
The TesseractVersion used to generate this OcrResult.
Declaration
public string TesseractVersion { get; }
Property Value
Type | Description |
---|---|
System.String |
See Also
Text
Returns the entire Text content of this OCR document. 4 System.Environment.NewLine characters will separate pages. This is truncated when the product is unlicensed.
Declaration
public string Text { get; set; }
Property Value
Type | Description |
---|---|
System.String |
Words
Represents every word discovered in this OCR document in order of appearance.
Declaration
public OcrResult.Word[] Words { get; }
Property Value
Type | Description |
---|---|
OcrResult.Word[] |
Methods
ExtractTextFromPage(Int32)
Declaration
public string ExtractTextFromPage(int PageIndex)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | PageIndex |
Returns
Type | Description |
---|---|
System.String |
ExtractTextFromPages(IEnumerable<Int32>)
Declaration
public string ExtractTextFromPages(IEnumerable<int> PageIndices)
Parameters
Type | Name | Description |
---|---|---|
System.Collections.Generic.IEnumerable<System.Int32> | PageIndices |
Returns
Type | Description |
---|---|
System.String |
FromJson(String)
Deserializes the JSON to the OcrResult object.
Declaration
public static OcrResult FromJson(string json)
Parameters
Type | Name | Description |
---|---|---|
System.String | json | A JSON string representation of the OcrResult. |
Returns
Type | Description |
---|---|
OcrResult | The deserialized OcrResult object from the JSON string. |
Exceptions
Type | Condition |
---|---|
System.ArgumentNullException |
|
FromJsonFile(String)
Deserializes the JSON file to the OcrResult object.
Declaration
public OcrResult FromJsonFile(string Path)
Parameters
Type | Name | Description |
---|---|---|
System.String | Path |
Returns
Type | Description |
---|---|
OcrResult |
Remarks
Use the method FromJson(String) instead if you want deserialize from a JSON string.
SaveAsHocrFile(String)
Exports an hOCR version of the Tesseract results object document. This is an XHTML file which can be read as XML or HTML.
https://en.wikipedia.org/wiki/HOCR
Declaration
public void SaveAsHocrFile(string Path)
Parameters
Type | Name | Description |
---|---|---|
System.String | Path | The file path the xhtml file will be saved to. |
SaveAsHocrString()
Exports an hOCR version of the Tesseract results object document as a string. This is an XHTML file which can be read as XML or HTML.
https://en.wikipedia.org/wiki/HOCR
Declaration
public string SaveAsHocrString()
Returns
Type | Description |
---|---|
System.String |
SaveAsHtmlDocument(String, String, Int32, Boolean)
Converts the OcrResult into an HTML Document eg.: example.html
Declaration
public void SaveAsHtmlDocument(string path, string title, int pdfPageMargin = 10, bool fullContentWidth = false)
Parameters
Type | Name | Description |
---|---|---|
System.String | path | File path to save to |
System.String | title | Title for the HTML document |
System.Int32 | pdfPageMargin | Margin to use for PDF page |
System.Boolean | fullContentWidth | Optionally use full content width in the HTML |
Remarks
IronTesseract's Configuration.RenderSearchablePdf flag must be set to true.
SaveAsHtmlString(String, Int32, Boolean)
Converts the OcrResult into an HTML string
Declaration
public string SaveAsHtmlString(string title, int pdfPageMargin, bool fullContentWidth)
Parameters
Type | Name | Description |
---|---|---|
System.String | title | Title for the HTML document |
System.Int32 | pdfPageMargin | Margin to use for PDF page |
System.Boolean | fullContentWidth | Optionally use full content width in the HTML |
Returns
Type | Description |
---|---|
System.String |
Remarks
IronTesseract's Configuration.RenderSearchablePdf flag must be set to true.
SaveAsSearchablePdf(String)
Exports a searchable PDF version of the OCR input document. Works for all input formats including PDFs & Images.
Declaration
public byte[] SaveAsSearchablePdf(string Path = null)
Parameters
Type | Name | Description |
---|---|---|
System.String | Path | The file path the PDF will be saved to. |
Returns
Type | Description |
---|---|
System.Byte[] |
SaveAsSearchablePdfBytes()
Exports a searchable PDF version of the OCR input document as a byte array. Works for all input formats including PDFs & Images.
Declaration
public byte[] SaveAsSearchablePdfBytes()
Returns
Type | Description |
---|---|
System.Byte[] |
SaveAsSearchablePdfStream()
Exports a searchable PDF version of the OCR input document as a Stream. Works for all input formats including PDFs & Images.
Declaration
public Stream SaveAsSearchablePdfStream()
Returns
Type | Description |
---|---|
System.IO.Stream |
SaveAsTextFile(String)
Exports a .txt version of the Tesseract results objects document. This is a plain text file.
4 Environment.Newlines between pages. 2 Environment.Newlines between paragraphs.
Declaration
public void SaveAsTextFile(string Path)
Parameters
Type | Name | Description |
---|---|---|
System.String | Path | The file path the text file will be saved to. |
SaveJsonAs(String)
Serializes the OcrResult object to a JSON file..
Declaration
public void SaveJsonAs(string Path)
Parameters
Type | Name | Description |
---|---|---|
System.String | Path |
Remarks
Use the method ToJson() instead if you want to get the string instead of saving to disk.
ToJson()
Serializes the OcrResult object to a JSON string.
Declaration
public string ToJson()
Returns
Type | Description |
---|---|
System.String | The JSON string representation of the OcrResult. |
Remarks
Use the method FromJson(String) for deserializing the JSON to the OcrResult object.