Class AdvancedOcrResultBase
Base class for advanced OCR results that support searchable PDF generation
Implements
Namespace: IronOcr
Assembly: IronOcr.dll
Syntax
public abstract class AdvancedOcrResultBase : Object, IOcrResult
Constructors
AdvancedOcrResultBase()
Declaration
protected AdvancedOcrResultBase()
Properties
Characters
Represents every character discovered by the advanced OCR engine, including bounding box coordinates.
Characters are extracted from the internal character rectangle data computed during OCR processing. Each character includes its approximate text, position (X, Y), dimensions (Width, Height), and page number.
------------------------------------------------
Usage:
var ocr = new IronTesseract();
var input = new OcrInput();
input.LoadPdf("scanned.pdf");
var result = ocr.ReadDocumentAdvanced(input);
// Access all characters with coordinates
foreach (var ch in result.Characters)
{
Console.WriteLine($"'{ch.Text}' at ({ch.X},{ch.Y}) size ({ch.Width}x{ch.Height}) page {ch.PageNumber}");
}
------------------------------------------------
Declaration
public AdvancedCharacter[] Characters { get; }
Property Value
| Type | Description |
|---|---|
| AdvancedCharacter[] |
Remarks
Important Considerations:
Character Mapping: Characters are mapped from the region text to character rectangles. When the character count matches the rectangle count, characters are mapped 1:1. Otherwise, rectangles are returned with the full region text as context.
Performance: The character collection is computed once on first access and cached for subsequent calls.
Confidence
OCR statistical accuracy confidence as an average of every character.
1 = 100%, 0 = 0%.
Declaration
public abstract double Confidence { get; }
Property Value
| Type | Description |
|---|---|
| System.Double |
Text
All OCR texts from OcrInput.
Declaration
public abstract string Text { get; }
Property Value
| Type | Description |
|---|---|
| System.String |
Words
Represents every word discovered by the advanced OCR engine, including bounding box coordinates.
Words are extracted from the internal region and word rectangle data computed during OCR processing. Each word includes its text, position (X, Y), dimensions (Width, Height), and page number.
This property provides API parity with Words for the advanced document reading pipeline.
------------------------------------------------
Usage:
var ocr = new IronTesseract();
var input = new OcrInput();
input.LoadPdf("scanned.pdf");
var result = ocr.ReadDocumentAdvanced(input);
// Access all words with coordinates
foreach (var word in result.Words)
{
Console.WriteLine($"'{word.Text}' at ({word.X},{word.Y}) size ({word.Width}x{word.Height}) page {word.PageNumber}");
}
// Reconstruct reading order for words on page 1
var orderedWords = result.Words
.Where(w => w.PageNumber == 1)
.OrderBy(w => w.Y)
.ThenBy(w => w.X);
string reconstructedText = string.Join(" ", orderedWords.Select(w => w.Text));
------------------------------------------------
Declaration
public AdvancedWord[] Words { get; }
Property Value
| Type | Description |
|---|---|
| AdvancedWord[] |
Remarks
Important Considerations:
Word-to-Rectangle Mapping: When the number of detected word rectangles matches the number of space-separated tokens in a text region, words are mapped 1:1 to their bounding boxes. When character rectangles are available, word bounding boxes are computed by grouping character rects at word boundaries.
Approximate Coordinates: As a last resort, when neither word nor character rectangles are available, words are distributed proportionally within the region rectangle based on character count. This assumes roughly uniform character widths and may be less accurate for proportional fonts. Coordinates from this fallback path should be treated as approximate.
Performance: The word collection is computed once on first access and cached for subsequent calls.
Reading Order: Words are returned in the order they were detected by the OCR engine (top-to-bottom,
left-to-right within each region). Use LINQ ordering on
Methods
SaveAsSearchablePdf(String, Boolean, String, String)
Exports the OCR result as a searchable PDF document and optionally saves it to a file.
A searchable PDF overlays invisible OCR text on the original image, allowing text selection and search while preserving the original appearance.
------------------------------------------------
Usage:
// Load OCR input
var input = new IronOcr.OcrInput();
input.LoadPdf("scanned.pdf");
// Instantiate OCR engine and enable searchable PDF rendering
var ocr = new IronOcr.IronTesseract();
ocr.Configuration.RenderSearchablePdf = true;
// Read the document
var result = ocr.ReadPhoto(input);
// Export and save as searchable PDF
byte[] pdfBytes = result.SaveAsSearchablePdf("output.pdf");
// Or with custom font for UTF-8 support (e.g., for Polish text)
byte[] pdfBytes = result.SaveAsSearchablePdf(
Path: "output.pdf",
ApplyFilters: false,
CustomFontFile: @"C:\Fonts\DejaVuSans.ttf",
CustomFontName: "DejaVu Sans");
// The file is saved and bytes are also returned for further processing if needed
------------------------------------------------
Declaration
public byte[] SaveAsSearchablePdf(string Path = null, bool ApplyFilters = false, string CustomFontFile = null, string CustomFontName = null)
Parameters
| Type | Name | Description |
|---|---|---|
| System.String | Path | Optional. The file path where the PDF will be saved. If |
| System.Boolean | ApplyFilters | Optional. If |
| System.String | CustomFontFile | Optional. Path to a custom font file (.ttf, .otf, .ttc) to use for rendering text in the PDF. Useful for languages requiring full UTF-8 support (e.g., Polish, Arabic, Chinese). If not specified, the default TimesNewRoman font is used. |
| System.String | CustomFontName | Optional. Name of the custom font. If not provided, the font name will be automatically extracted from the font file metadata.
This parameter is only used when |
Returns
| Type | Description |
|---|---|
| System.Byte[] | A byte array containing the searchable PDF document. |
Remarks
Important Considerations:
Configuration Required: You must set IronTesseract.Configuration.RenderSearchablePdf = true before reading the document, otherwise this method will throw an exception.
Performance: The PDF bytes are cached after first generation, so subsequent calls return immediately without regenerating.
Filters: Setting ApplyFilters to true applies image enhancement filters to improve PDF quality.
File Path Optional: If Path is null, the PDF is not saved to disk but the bytes are still returned.
Custom Fonts: Use CustomFontFile to provide a font with full UTF-8 support for languages like Polish, Arabic, Chinese, etc. The default TimesNewRoman has limited character support.
How-To Guide: Learn more about creating searchable PDFs
API Reference: See related searchable PDF methods
Exceptions
| Type | Condition |
|---|---|
| IronOcrProductException | Thrown when |
| IronOcrInputException | Thrown when the custom font file is invalid, inaccessible, or contains malicious content. |
SaveAsSearchablePdfBytes(Boolean, String, String)
Exports the OCR result as a searchable PDF document and returns it as a byte array.
A searchable PDF overlays invisible OCR text on the original image, allowing text selection and search while preserving the original appearance.
------------------------------------------------
Usage:
// Load OCR input
var input = new IronOcr.OcrInput();
input.LoadPdf("scanned.pdf");
// Instantiate OCR engine and enable searchable PDF rendering
var ocr = new IronOcr.IronTesseract();
ocr.Configuration.RenderSearchablePdf = true;
// Read the document
var result = ocr.ReadPhoto(input);
// Export as searchable PDF bytes
byte[] pdfBytes = result.SaveAsSearchablePdfBytes();
// Or with custom font for UTF-8 support
byte[] pdfBytes = result.SaveAsSearchablePdfBytes(
ApplyFilters: false,
CustomFontFile: @"C:\Fonts\DejaVuSans.ttf");
System.IO.File.WriteAllBytes("output.pdf", pdfBytes);
------------------------------------------------
Declaration
public byte[] SaveAsSearchablePdfBytes(bool ApplyFilters = false, string CustomFontFile = null, string CustomFontName = null)
Parameters
| Type | Name | Description |
|---|---|---|
| System.Boolean | ApplyFilters | Optional. If |
| System.String | CustomFontFile | Optional. Path to a custom font file (.ttf, .otf, .ttc) to use for rendering text in the PDF. Useful for languages requiring full UTF-8 support (e.g., Polish, Arabic, Chinese). If not specified, the default TimesNewRoman font is used. |
| System.String | CustomFontName | Optional. Name of the custom font. If not provided, the font name will be automatically extracted from the font file metadata.
This parameter is only used when |
Returns
| Type | Description |
|---|---|
| System.Byte[] | A byte array containing the searchable PDF document. |
Remarks
Important Considerations:
Configuration Required: You must set IronTesseract.Configuration.RenderSearchablePdf = true before reading the document, otherwise this method will throw an exception.
Performance: The PDF bytes are cached after first generation, so subsequent calls return immediately without regenerating.
Filters: Setting ApplyFilters to true applies image enhancement filters to improve PDF quality.
Custom Fonts: Use CustomFontFile to provide a font with full UTF-8 support for languages like Polish, Arabic, Chinese, etc. The default TimesNewRoman has limited character support.
How-To Guide: Learn more about creating searchable PDFs
API Reference: See related searchable PDF methods
Exceptions
| Type | Condition |
|---|---|
| IronOcrProductException | Thrown when |
| IronOcrInputException | Thrown when the custom font file is invalid, inaccessible, or contains malicious content. |
SaveAsSearchablePdfStream(Boolean, String, String)
Exports the OCR result as a searchable PDF document and returns it as a Stream.
A searchable PDF overlays invisible OCR text on the original image, allowing text selection and search while preserving the original appearance.
------------------------------------------------
Usage:
// Load OCR input
var input = new IronOcr.OcrInput();
input.LoadPdf("scanned.pdf");
// Instantiate OCR engine and enable searchable PDF rendering
var ocr = new IronOcr.IronTesseract();
ocr.Configuration.RenderSearchablePdf = true;
// Read the document
var result = ocr.ReadPhoto(input);
// Export as searchable PDF stream
Stream pdfStream = result.SaveAsSearchablePdfStream();
// Or with custom font for UTF-8 support
Stream pdfStream = result.SaveAsSearchablePdfStream(
ApplyFilters: false,
CustomFontFile: @"C:\Fonts\DejaVuSans.ttf");
// Use the stream (e.g., send via HTTP response, save to database, etc.)
------------------------------------------------
Declaration
public Stream SaveAsSearchablePdfStream(bool ApplyFilters = false, string CustomFontFile = null, string CustomFontName = null)
Parameters
| Type | Name | Description |
|---|---|---|
| System.Boolean | ApplyFilters | Optional. If |
| System.String | CustomFontFile | Optional. Path to a custom font file (.ttf, .otf, .ttc) to use for rendering text in the PDF. Useful for languages requiring full UTF-8 support (e.g., Polish, Arabic, Chinese). If not specified, the default TimesNewRoman font is used. |
| System.String | CustomFontName | Optional. Name of the custom font. If not provided, the font name will be automatically extracted from the font file metadata.
This parameter is only used when |
Returns
| Type | Description |
|---|---|
| System.IO.Stream | A Stream containing the searchable PDF document. |
Remarks
Important Considerations:
Configuration Required: You must set IronTesseract.Configuration.RenderSearchablePdf = true before reading the document, otherwise this method will throw an exception.
Performance: This method returns a new MemoryStream containing the cached PDF bytes, so subsequent calls create new streams but don't regenerate the PDF.
Filters: Setting ApplyFilters to true applies image enhancement filters to improve PDF quality.
Custom Fonts: Use CustomFontFile to provide a font with full UTF-8 support for languages like Polish, Arabic, Chinese, etc. The default TimesNewRoman has limited character support.
How-To Guide: Learn more about creating searchable PDFs
API Reference: See related searchable PDF methods
Exceptions
| Type | Condition |
|---|---|
| IronOcrProductException | Thrown when |
| IronOcrInputException | Thrown when the custom font file is invalid, inaccessible, or contains malicious content. |