Search Results for

    Show / Hide Table of Contents

    Class AdvancedOcrResultBase

    Base class for advanced OCR results that support searchable PDF generation

    Inheritance
    System.Object
    AdvancedOcrResultBase
    OcrDocAdvancedResult
    OcrPhotoResult
    Implements
    IronSoftware.Abstractions.Ocr.IOcrResult
    Namespace: IronOcr
    Assembly: IronOcr.dll
    Syntax
    public abstract class AdvancedOcrResultBase : Object, IOcrResult

    Constructors

    AdvancedOcrResultBase()

    Declaration
    protected AdvancedOcrResultBase()

    Properties

    Characters

    Represents every character discovered by the advanced OCR engine, including bounding box coordinates.

    Characters are extracted from the internal character rectangle data computed during OCR processing. Each character includes its approximate text, position (X, Y), dimensions (Width, Height), and page number.

    ------------------------------------------------

    Usage:

    var ocr = new IronTesseract();
    var input = new OcrInput();
    input.LoadPdf("scanned.pdf");
    var result = ocr.ReadDocumentAdvanced(input);
    

    // Access all characters with coordinates foreach (var ch in result.Characters) { Console.WriteLine($"'{ch.Text}' at ({ch.X},{ch.Y}) size ({ch.Width}x{ch.Height}) page {ch.PageNumber}"); }

    ------------------------------------------------

    Declaration
    public AdvancedCharacter[] Characters { get; }
    Property Value
    Type Description
    AdvancedCharacter[]
    Remarks

    Important Considerations:

    Character Mapping: Characters are mapped from the region text to character rectangles. When the character count matches the rectangle count, characters are mapped 1:1. Otherwise, rectangles are returned with the full region text as context.

    Performance: The character collection is computed once on first access and cached for subsequent calls.

    Confidence

    OCR statistical accuracy confidence as an average of every character.

    1 = 100%, 0 = 0%.

    Declaration
    public abstract double Confidence { get; }
    Property Value
    Type Description
    System.Double

    Text

    All OCR texts from OcrInput.

    Declaration
    public abstract string Text { get; }
    Property Value
    Type Description
    System.String

    Words

    Represents every word discovered by the advanced OCR engine, including bounding box coordinates.

    Words are extracted from the internal region and word rectangle data computed during OCR processing. Each word includes its text, position (X, Y), dimensions (Width, Height), and page number.

    This property provides API parity with Words for the advanced document reading pipeline.

    ------------------------------------------------

    Usage:

    var ocr = new IronTesseract();
    var input = new OcrInput();
    input.LoadPdf("scanned.pdf");
    var result = ocr.ReadDocumentAdvanced(input);
    

    // Access all words with coordinates foreach (var word in result.Words) { Console.WriteLine($"'{word.Text}' at ({word.X},{word.Y}) size ({word.Width}x{word.Height}) page {word.PageNumber}"); }

    // Reconstruct reading order for words on page 1 var orderedWords = result.Words .Where(w => w.PageNumber == 1) .OrderBy(w => w.Y) .ThenBy(w => w.X); string reconstructedText = string.Join(" ", orderedWords.Select(w => w.Text));

    ------------------------------------------------

    Declaration
    public AdvancedWord[] Words { get; }
    Property Value
    Type Description
    AdvancedWord[]
    Remarks

    Important Considerations:

    Word-to-Rectangle Mapping: When the number of detected word rectangles matches the number of space-separated tokens in a text region, words are mapped 1:1 to their bounding boxes. When character rectangles are available, word bounding boxes are computed by grouping character rects at word boundaries.

    Approximate Coordinates: As a last resort, when neither word nor character rectangles are available, words are distributed proportionally within the region rectangle based on character count. This assumes roughly uniform character widths and may be less accurate for proportional fonts. Coordinates from this fallback path should be treated as approximate.

    Performance: The word collection is computed once on first access and cached for subsequent calls.

    Reading Order: Words are returned in the order they were detected by the OCR engine (top-to-bottom, left-to-right within each region). Use LINQ ordering on and to reconstruct deterministic reading order.

    Methods

    SaveAsSearchablePdf(String, Boolean, String, String)

    Exports the OCR result as a searchable PDF document and optionally saves it to a file.

    A searchable PDF overlays invisible OCR text on the original image, allowing text selection and search while preserving the original appearance.

    ------------------------------------------------

    Usage:

    // Load OCR input
    var input = new IronOcr.OcrInput();
    input.LoadPdf("scanned.pdf");
    // Instantiate OCR engine and enable searchable PDF rendering
    var ocr = new IronOcr.IronTesseract();
    ocr.Configuration.RenderSearchablePdf = true;
    // Read the document
    var result = ocr.ReadPhoto(input);
    // Export and save as searchable PDF
    byte[] pdfBytes = result.SaveAsSearchablePdf("output.pdf");
    // Or with custom font for UTF-8 support (e.g., for Polish text)
    byte[] pdfBytes = result.SaveAsSearchablePdf(
        Path: "output.pdf",
        ApplyFilters: false,
        CustomFontFile: @"C:\Fonts\DejaVuSans.ttf",
        CustomFontName: "DejaVu Sans");
    // The file is saved and bytes are also returned for further processing if needed

    ------------------------------------------------

    Declaration
    public byte[] SaveAsSearchablePdf(string Path = null, bool ApplyFilters = false, string CustomFontFile = null, string CustomFontName = null)
    Parameters
    Type Name Description
    System.String Path

    Optional. The file path where the PDF will be saved. If null, the PDF is not saved to disk.

    System.Boolean ApplyFilters

    Optional. If true, applies image filters to enhance the output PDF quality. Default is false.

    System.String CustomFontFile

    Optional. Path to a custom font file (.ttf, .otf, .ttc) to use for rendering text in the PDF. Useful for languages requiring full UTF-8 support (e.g., Polish, Arabic, Chinese). If not specified, the default TimesNewRoman font is used.

    System.String CustomFontName

    Optional. Name of the custom font. If not provided, the font name will be automatically extracted from the font file metadata. This parameter is only used when CustomFontFile is specified.

    Returns
    Type Description
    System.Byte[]

    A byte array containing the searchable PDF document.

    Remarks

    Important Considerations:

    Configuration Required: You must set IronTesseract.Configuration.RenderSearchablePdf = true before reading the document, otherwise this method will throw an exception.

    Performance: The PDF bytes are cached after first generation, so subsequent calls return immediately without regenerating.

    Filters: Setting ApplyFilters to true applies image enhancement filters to improve PDF quality.

    File Path Optional: If Path is null, the PDF is not saved to disk but the bytes are still returned.

    Custom Fonts: Use CustomFontFile to provide a font with full UTF-8 support for languages like Polish, Arabic, Chinese, etc. The default TimesNewRoman has limited character support.

    Related Documentation:

    How-To Guide: Learn more about creating searchable PDFs

    API Reference: See related searchable PDF methods

    Exceptions
    Type Condition
    IronOcrProductException

    Thrown when IronTesseract.Configuration.RenderSearchablePdf was not set to true before reading the document.

    IronOcrInputException

    Thrown when the custom font file is invalid, inaccessible, or contains malicious content.

    SaveAsSearchablePdfBytes(Boolean, String, String)

    Exports the OCR result as a searchable PDF document and returns it as a byte array.

    A searchable PDF overlays invisible OCR text on the original image, allowing text selection and search while preserving the original appearance.

    ------------------------------------------------

    Usage:

    // Load OCR input
    var input = new IronOcr.OcrInput();
    input.LoadPdf("scanned.pdf");
    // Instantiate OCR engine and enable searchable PDF rendering
    var ocr = new IronOcr.IronTesseract();
    ocr.Configuration.RenderSearchablePdf = true;
    // Read the document
    var result = ocr.ReadPhoto(input);
    // Export as searchable PDF bytes
    byte[] pdfBytes = result.SaveAsSearchablePdfBytes();
    // Or with custom font for UTF-8 support
    byte[] pdfBytes = result.SaveAsSearchablePdfBytes(
        ApplyFilters: false,
        CustomFontFile: @"C:\Fonts\DejaVuSans.ttf");
    System.IO.File.WriteAllBytes("output.pdf", pdfBytes);

    ------------------------------------------------

    Declaration
    public byte[] SaveAsSearchablePdfBytes(bool ApplyFilters = false, string CustomFontFile = null, string CustomFontName = null)
    Parameters
    Type Name Description
    System.Boolean ApplyFilters

    Optional. If true, applies image filters to enhance the output PDF quality. Default is false.

    System.String CustomFontFile

    Optional. Path to a custom font file (.ttf, .otf, .ttc) to use for rendering text in the PDF. Useful for languages requiring full UTF-8 support (e.g., Polish, Arabic, Chinese). If not specified, the default TimesNewRoman font is used.

    System.String CustomFontName

    Optional. Name of the custom font. If not provided, the font name will be automatically extracted from the font file metadata. This parameter is only used when CustomFontFile is specified.

    Returns
    Type Description
    System.Byte[]

    A byte array containing the searchable PDF document.

    Remarks

    Important Considerations:

    Configuration Required: You must set IronTesseract.Configuration.RenderSearchablePdf = true before reading the document, otherwise this method will throw an exception.

    Performance: The PDF bytes are cached after first generation, so subsequent calls return immediately without regenerating.

    Filters: Setting ApplyFilters to true applies image enhancement filters to improve PDF quality.

    Custom Fonts: Use CustomFontFile to provide a font with full UTF-8 support for languages like Polish, Arabic, Chinese, etc. The default TimesNewRoman has limited character support.

    Related Documentation:

    How-To Guide: Learn more about creating searchable PDFs

    API Reference: See related searchable PDF methods

    Exceptions
    Type Condition
    IronOcrProductException

    Thrown when IronTesseract.Configuration.RenderSearchablePdf was not set to true before reading the document.

    IronOcrInputException

    Thrown when the custom font file is invalid, inaccessible, or contains malicious content.

    SaveAsSearchablePdfStream(Boolean, String, String)

    Exports the OCR result as a searchable PDF document and returns it as a Stream.

    A searchable PDF overlays invisible OCR text on the original image, allowing text selection and search while preserving the original appearance.

    ------------------------------------------------

    Usage:

    // Load OCR input
    var input = new IronOcr.OcrInput();
    input.LoadPdf("scanned.pdf");
    // Instantiate OCR engine and enable searchable PDF rendering
    var ocr = new IronOcr.IronTesseract();
    ocr.Configuration.RenderSearchablePdf = true;
    // Read the document
    var result = ocr.ReadPhoto(input);
    // Export as searchable PDF stream
    Stream pdfStream = result.SaveAsSearchablePdfStream();
    // Or with custom font for UTF-8 support
    Stream pdfStream = result.SaveAsSearchablePdfStream(
        ApplyFilters: false,
        CustomFontFile: @"C:\Fonts\DejaVuSans.ttf");
    // Use the stream (e.g., send via HTTP response, save to database, etc.)

    ------------------------------------------------

    Declaration
    public Stream SaveAsSearchablePdfStream(bool ApplyFilters = false, string CustomFontFile = null, string CustomFontName = null)
    Parameters
    Type Name Description
    System.Boolean ApplyFilters

    Optional. If true, applies image filters to enhance the output PDF quality. Default is false.

    System.String CustomFontFile

    Optional. Path to a custom font file (.ttf, .otf, .ttc) to use for rendering text in the PDF. Useful for languages requiring full UTF-8 support (e.g., Polish, Arabic, Chinese). If not specified, the default TimesNewRoman font is used.

    System.String CustomFontName

    Optional. Name of the custom font. If not provided, the font name will be automatically extracted from the font file metadata. This parameter is only used when CustomFontFile is specified.

    Returns
    Type Description
    System.IO.Stream

    A Stream containing the searchable PDF document.

    Remarks

    Important Considerations:

    Configuration Required: You must set IronTesseract.Configuration.RenderSearchablePdf = true before reading the document, otherwise this method will throw an exception.

    Performance: This method returns a new MemoryStream containing the cached PDF bytes, so subsequent calls create new streams but don't regenerate the PDF.

    Filters: Setting ApplyFilters to true applies image enhancement filters to improve PDF quality.

    Custom Fonts: Use CustomFontFile to provide a font with full UTF-8 support for languages like Polish, Arabic, Chinese, etc. The default TimesNewRoman has limited character support.

    Related Documentation:

    How-To Guide: Learn more about creating searchable PDFs

    API Reference: See related searchable PDF methods

    Exceptions
    Type Condition
    IronOcrProductException

    Thrown when IronTesseract.Configuration.RenderSearchablePdf was not set to true before reading the document.

    IronOcrInputException

    Thrown when the custom font file is invalid, inaccessible, or contains malicious content.

    Implements

    IronSoftware.Abstractions.Ocr.IOcrResult
    ☀
    ☾
    Downloads
    • Download with Nuget
    • Start for Free
    In This Article
    Back to top
    Install with Nuget
    IronOCR_for_dotnet_log2o
    Blue key in circleGet started for FREE
    No credit card required
    Test in a live environment

    Test in production without watermarks.
    Works wherever you need it to.

    Fully-functional product

    Get 30 days of fully functional product.
    Have it up and running in minutes.

    24/5 technical support

    Full access to our support engineering team during your product trial

    Grey key in circleGet started for FREE
    The trial form was submitted successfully.
    Calendar in circleBook Free Live Demo
    No contact, no card details, no commitments Book a 30-minute, personal demo.
    Here's what to expect:

    A live demo of our product and its key features

    Get project specific feature recommendations

    All your questions are answered to make sure you have all the information you need. (No commitment whatsoever.)

    Grey key in circleBook Free Live Demo
    Your booking has been completed Check your e-mail for confirmation
    Support Team Member 6 related to The C# PDF Library Support Team Member 14 related to The C# PDF Library Support Team Member 4 related to The C# PDF Library Support Team Member 2 related to The C# PDF Library
    Online 24/5
    Need help? Our sales team would be glad to help you.
    Try the Enterprise Trial
    ironpdf_for_dotnet_log2o
    Key in blue circle
    Get your free 30-day Trial Key instantly.
    bullet_checkedNo credit card or account creation required
    Key in blue circle
    Get your free 30-day Trial Key instantly.
    Blue key in circleNo credit card or account creation required
    Green Check in orange circle
    The trial form was submitted successfully.
    badge_greencheck_in_yellowcircle
    Thank you for starting a trial

    Please check your email for the trial license key.

    If you don’t receive an email, please start a live chat or email support@ironsoftware.com

    Install with NuGet
    View Licensing
    • Logo Aetna
    • Logo NASA
    • Logo GE
    • Logo Porsche
    • Logo USDA
    • Logo Qatar
    Join Millions of Engineers who’ve tried IronOCR