How to Save Results as a Searchable PDF in C#

Save Searchable PDFs in C# with IronOCR

IronOCR enables C# developers to convert scanned documents and images into searchable PDFs using OCR technology, supporting output as files, bytes, or streams with just a few lines of code.

A searchable PDF, often referred to as an OCR (Optical Character Recognition) PDF, is a type of PDF document that contains both scanned images and machine-readable text. These PDFs are created by performing OCR on scanned paper documents or images, recognizing the text in the images, and converting it into selectable and searchable text.

SaveAsSearchablePdf is also available on results from ReadPhoto, ReadScreenShot, and ReadDocumentAdvanced, enabling searchable PDF creation from photo and advanced document OCR workflows. This capability is particularly useful when digitizing paper archives or making legacy PDFs searchable for better document management.

Quickstart: Export Searchable PDF in One Line

Set RenderSearchablePdf = true, run Read(...) on your input, and invoke SaveAsSearchablePdf(...). That's all it takes to generate a fully searchable PDF with IronOCR.

  1. Install IronOCR with NuGet Package Manager

    PM > Install-Package IronOcr
  2. Copy and run this code snippet.

    new IronOcr.IronTesseract { Configuration = { RenderSearchablePdf = true } } .Read(new IronOcr.OcrImageInput("file.jpg")).SaveAsSearchablePdf("searchable.pdf");
  3. Deploy to test on your live environment

    Start using IronOCR in your project today with a free trial

    arrow pointer


How Do I Export OCR Results as a Searchable PDF?

To export the result as a searchable PDF using IronOCR, set the Configuration.RenderSearchablePdf property to true, obtain the OCR result object from the Read method, and call SaveAsSearchablePdf with the output file path.

Input

A single page from a Harry Potter novel, scanned as a TIFF file and loaded via OcrImageInput. The page contains dense printed text, a realistic input for testing the searchable PDF text layer.

Page from Harry Potter book showing Chapter Eight 'The Deathday Party' with text about Harry meeting Nearly Headless Nick

potter.tiff: Scanned novel page used as OCR input to produce a searchable PDF with an invisible text layer.

:path=/static-assets/ocr/content-code-examples/how-to/searchable-pdf-searchable-pdf.cs
using IronOcr;

// Create the OCR engine: defaults to English with balanced speed and accuracy
IronTesseract ocrTesseract = new IronTesseract();

// Required: without this flag the text overlay layer is not built, and SaveAsSearchablePdf produces a plain image PDF
ocrTesseract.Configuration.RenderSearchablePdf = true;

// Wrap the TIFF in OcrImageInput: handles DPI detection and page layout automatically
using var imageInput = new OcrImageInput("Potter.tiff");
// Run OCR; returns a result containing the recognized text and spatial layout data
OcrResult ocrResult = ocrTesseract.Read(imageInput);

// Write the output: the original scanned image is preserved with an invisible text layer on top
ocrResult.SaveAsSearchablePdf("searchablePdf.pdf");
Imports IronOcr

' Create the OCR engine: defaults to English with balanced speed and accuracy
Dim ocrTesseract As New IronTesseract()

' Required: without this flag the text overlay layer is not built, and SaveAsSearchablePdf produces a plain image PDF
ocrTesseract.Configuration.RenderSearchablePdf = True

' Wrap the TIFF in OcrImageInput: handles DPI detection and page layout automatically
Using imageInput As New OcrImageInput("Potter.tiff")
    ' Run OCR; returns a result containing the recognized text and spatial layout data
    Dim ocrResult As OcrResult = ocrTesseract.Read(imageInput)

    ' Write the output: the original scanned image is preserved with an invisible text layer on top
    ocrResult.SaveAsSearchablePdf("searchablePdf.pdf")
End Using
$vbLabelText   $csharpLabel

Output

searchablePdf.pdf: Searchable PDF output. Select or search any word to verify the OCR text layer.

The resulting PDF embeds the original scanned page image with an invisible text layer positioned over each recognized word. Select or search any word in the viewer to confirm the text layer is present.

IronOCR uses a particular font for the overlay, which may cause slight discrepancies in rendered text size compared to the original.

When working with multi-page TIFF files or complex documents, IronOCR automatically processes all pages and includes them in the output. The library handles page ordering and text overlay positioning automatically, ensuring accurate text-to-image mapping.

How Do I Create Searchable PDFs from Photos or Advanced Document Scans?

Searchable PDF export is also available when using ReadPhoto, ReadScreenShot, or ReadDocumentAdvanced. Each of these methods returns a result type that supports SaveAsSearchablePdf.

You can optionally pass a ModelType when calling these methods. The default is Normal, while Enhanced provides better accuracy at the cost of speed.

Input

A photo of a wall mural with painted text, loaded via LoadImage. The scene contains multiple words embedded in a real-world environment, making it a practical test for ReadPhoto with the Enhanced model.

Photo containing text used as input for ReadPhoto OCR

photo.png: Wall mural photo loaded via ReadPhoto with the Enhanced model to produce a searchable PDF.

using IronOcr;

var ocr = new IronTesseract();
using var input = new OcrInput();
input.LoadImage("photo.png");

// ReadPhoto with Enhanced model
OcrPhotoResult photoResult = ocr.ReadPhoto(input, ModelType.Enhanced);
Console.WriteLine(photoResult.Text);

// Save as searchable PDF
byte[] pdfBytes = photoResult.SaveAsSearchablePdf();
File.WriteAllBytes("searchable-photo.pdf", pdfBytes);
using IronOcr;

var ocr = new IronTesseract();
using var input = new OcrInput();
input.LoadImage("photo.png");

// ReadPhoto with Enhanced model
OcrPhotoResult photoResult = ocr.ReadPhoto(input, ModelType.Enhanced);
Console.WriteLine(photoResult.Text);

// Save as searchable PDF
byte[] pdfBytes = photoResult.SaveAsSearchablePdf();
File.WriteAllBytes("searchable-photo.pdf", pdfBytes);
Imports IronOcr

Dim ocr As New IronTesseract()
Using input As New OcrInput()
    input.LoadImage("photo.png")

    ' ReadPhoto with Enhanced model
    Dim photoResult As OcrPhotoResult = ocr.ReadPhoto(input, ModelType.Enhanced)
    Console.WriteLine(photoResult.Text)

    ' Save as searchable PDF
    Dim pdfBytes As Byte() = photoResult.SaveAsSearchablePdf()
    File.WriteAllBytes("searchable-photo.pdf", pdfBytes)
End Using
$vbLabelText   $csharpLabel

Output

searchable-photo.pdf: Searchable PDF output from ReadPhoto. The text layer supports full-text search in any PDF viewer.

The resulting searchable PDF contains an invisible text layer over the recognized words. Searching "Milk" in the PDF viewer returns 3 matches, extracted directly from the painted text in the original photo.

The same approach works with ReadDocumentAdvanced, which returns an OcrDocAdvancedResult:

Input

A scanned invoice loaded via LoadImage. It contains structured fields (vendor name, line items, and totals) that ReadDocumentAdvanced with the Enhanced model recognizes and embeds as a searchable text layer.

Invoice document used as input for ReadDocumentAdvanced OCR

invoice.png: Scanned invoice loaded into OcrInput and passed to ReadDocumentAdvanced with the Enhanced model.

using IronOcr;

var ocr = new IronTesseract();
using var input = new OcrInput();
input.LoadImage("invoice.png");

// ReadDocumentAdvanced with Enhanced model
OcrDocAdvancedResult docResult = ocr.ReadDocumentAdvanced(input, ModelType.Enhanced);
byte[] docPdfBytes = docResult.SaveAsSearchablePdf();
File.WriteAllBytes("searchable-doc.pdf", docPdfBytes);
using IronOcr;

var ocr = new IronTesseract();
using var input = new OcrInput();
input.LoadImage("invoice.png");

// ReadDocumentAdvanced with Enhanced model
OcrDocAdvancedResult docResult = ocr.ReadDocumentAdvanced(input, ModelType.Enhanced);
byte[] docPdfBytes = docResult.SaveAsSearchablePdf();
File.WriteAllBytes("searchable-doc.pdf", docPdfBytes);
Imports IronOcr

Dim ocr As New IronTesseract()
Using input As New OcrInput()
    input.LoadImage("invoice.png")

    ' ReadDocumentAdvanced with Enhanced model
    Dim docResult As OcrDocAdvancedResult = ocr.ReadDocumentAdvanced(input, ModelType.Enhanced)
    Dim docPdfBytes As Byte() = docResult.SaveAsSearchablePdf()
    File.WriteAllBytes("searchable-doc.pdf", docPdfBytes)
End Using
$vbLabelText   $csharpLabel

Output

searchable-doc.pdf: Searchable PDF output from ReadDocumentAdvanced. Invoice fields are selectable and searchable.

WarningSaveAsSearchablePdf is not supported for ReadPassport or ReadLicensePlate results and will throw an ExtensionAdvancedScanException.

Working with Multi-Page Documents

When dealing with PDF OCR operations on multi-page documents, IronOCR processes each page sequentially and maintains the original document structure.

Input

An 11-page annual report from Hartwell Capital Management loaded via OcrPdfInput. Pages 1–10 (indices 0–9) are selected using the PageIndices range and processed in a single Read call.

multi-page-scan.pdf: 11-page Hartwell Capital Management annual report used as input for multi-page searchable PDF conversion.

:path=/static-assets/ocr/content-code-examples/how-to/searchable-pdf-multi-page.cs
using IronOcr;

// Create the OCR engine. RenderSearchablePdf is false by default; no need to set it when using OcrPdfInput directly
var ocrTesseract = new IronTesseract();

// Load pages 1–10 (indices 0–9) only; PageIndices avoids loading and OCR-ing the full document unnecessarily
using var pdfInput = new OcrPdfInput("multi-page-scan.pdf", PageIndices: Enumerable.Range(0, 10));

// Run OCR across all selected pages in order
OcrResult result = ocrTesseract.Read(pdfInput);

// Write the searchable PDF; true = apply the input's image filters to the embedded page images in the output
result.SaveAsSearchablePdf("searchable-multi-page.pdf", true);
Imports IronOcr

' Create the OCR engine. RenderSearchablePdf is false by default; no need to set it when using OcrPdfInput directly
Dim ocrTesseract As New IronTesseract()

' Load pages 1–10 (indices 0–9) only; PageIndices avoids loading and OCR-ing the full document unnecessarily
Using pdfInput As New OcrPdfInput("multi-page-scan.pdf", PageIndices:=Enumerable.Range(0, 10))
    ' Run OCR across all selected pages in order
    Dim result As OcrResult = ocrTesseract.Read(pdfInput)

    ' Write the searchable PDF; true = apply the input's image filters to the embedded page images in the output
    result.SaveAsSearchablePdf("searchable-multi-page.pdf", True)
End Using
$vbLabelText   $csharpLabel

Output

searchable-multi-page.pdf: 10-page searchable PDF output. Each page has an invisible text layer for full-text search.

The resulting PDF contains 10 pages (pages 1–10 from the original report), each with an invisible text layer that makes the extracted content selectable and searchable in any PDF viewer.

How Can I Apply Filters When Creating Searchable PDFs?

The SaveAsSearchablePdf second parameter accepts a boolean that controls whether image filters are applied to the embedded output. Using image optimization filters can significantly improve OCR accuracy, especially when dealing with low-quality scans.

The example below applies the grayscale filter and passes true as the second argument to embed the filtered image in the searchable PDF output.

:path=/static-assets/ocr/content-code-examples/how-to/image-quality-correction-searchable-pdf.cs
using IronOcr;

// Create OCR engine: filters are applied at the OcrInput level, so no configuration changes are needed here
var ocr = new IronTesseract();
var ocrInput = new OcrInput();

// Load the scanned PDF as the OCR source
ocrInput.LoadPdf("invoice.pdf");

// Convert to grayscale: removes color noise that can reduce OCR accuracy on color-printed documents
ocrInput.ToGrayScale();
// Run OCR on the preprocessed input
OcrResult result = ocr.Read(ocrInput);

// Write the searchable PDF; true = embed the grayscale-filtered image rather than the original color scan
result.SaveAsSearchablePdf("outputGrayscale.pdf", true);
Imports IronOcr

' Create OCR engine: filters are applied at the OcrInput level, so no configuration changes are needed here
Dim ocr As New IronTesseract()
Dim ocrInput As New OcrInput()

' Load the scanned PDF as the OCR source
ocrInput.LoadPdf("invoice.pdf")

' Convert to grayscale: removes color noise that can reduce OCR accuracy on color-printed documents
ocrInput.ToGrayScale()
' Run OCR on the preprocessed input
Dim result As OcrResult = ocr.Read(ocrInput)

' Write the searchable PDF; True = embed the grayscale-filtered image rather than the original color scan
result.SaveAsSearchablePdf("outputGrayscale.pdf", True)
$vbLabelText   $csharpLabel

For optimal results, consider using the Filter Wizard to automatically determine the best combination of filters for your specific document type. This tool analyzes your input and suggests appropriate preprocessing steps.

How Do I Fix Incorrect Characters in Searchable PDFs?

If text appears correct in the PDF visually but shows corrupted characters when you search or copy it, the issue is caused by the default font used in the searchable text layer. By default, SaveAsSearchablePdf uses Times New Roman, which does not fully support all Unicode characters. This affects languages with accented or non-ASCII characters.

To fix this, provide a Unicode-compatible font file as the third parameter:

result.SaveAsSearchablePdf("output.pdf", false, "Fonts/LiberationSerif-Regular.ttf");
result.SaveAsSearchablePdf("output.pdf", false, "Fonts/LiberationSerif-Regular.ttf");
result.SaveAsSearchablePdf("output.pdf", False, "Fonts/LiberationSerif-Regular.ttf")
$vbLabelText   $csharpLabel

You can also specify a custom font name as a fourth parameter:

result.SaveAsSearchablePdf("output.pdf", false, "Fonts/LiberationSerif-Regular.ttf", "MyFont");
result.SaveAsSearchablePdf("output.pdf", false, "Fonts/LiberationSerif-Regular.ttf", "MyFont");
result.SaveAsSearchablePdf("output.pdf", False, "Fonts/LiberationSerif-Regular.ttf", "MyFont")
$vbLabelText   $csharpLabel

This applies to all result types including OcrResult, OcrPhotoResult, and OcrDocAdvancedResult, so the fix works regardless of which read method produced the result.

Please noteFor documents originally typeset in Times New Roman, Liberation Serif is recommended as it is metrically compatible, preserving the original spacing and layout. For general-purpose multilingual use, Noto Sans or DejaVu Sans are good alternatives.

For scenarios where writing to a file path is not possible, IronOCR also supports returning the searchable PDF as a byte array or stream.


How Do I Export Searchable PDFs as Bytes or Streams?

The output of the searchable PDF can also be handled as bytes or streams using SaveAsSearchablePdfBytes and SaveAsSearchablePdfStream methods, respectively. The code example below shows how to use these methods.

:path=/static-assets/ocr/content-code-examples/how-to/searchable-pdf-searchable-pdf-byte-stream.cs
// Return as a byte array: suited for storing in a database or sending in an HTTP response body
byte[] pdfByte = ocrResult.SaveAsSearchablePdfBytes();

// Return as a stream: suited for uploading to cloud storage or piping to another I/O operation without buffering the full file
Stream pdfStream = ocrResult.SaveAsSearchablePdfStream();
' Return as a byte array: suited for storing in a database or sending in an HTTP response body
Dim pdfByte As Byte() = ocrResult.SaveAsSearchablePdfBytes()

' Return as a stream: suited for uploading to cloud storage or piping to another I/O operation without buffering the full file
Dim pdfStream As Stream = ocrResult.SaveAsSearchablePdfStream()
$vbLabelText   $csharpLabel

These output options are particularly useful when integrating with cloud storage services, databases, or web applications where file system access may be limited. The example below demonstrates practical applications:

using IronOcr;
using System.IO;

public class SearchablePdfExporter
{
    public async Task ProcessAndUploadPdf(string inputPath)
    {
        var ocr = new IronTesseract
        {
            Configuration = { RenderSearchablePdf = true }
        };

        // Process the input
        using var input = new OcrImageInput(inputPath);
        var result = ocr.Read(input);

        // Option 1: Save to database as byte array
        byte[] pdfBytes = result.SaveAsSearchablePdfBytes();
        // Store pdfBytes in database BLOB field

        // Option 2: Upload to cloud storage using stream
        using (Stream pdfStream = result.SaveAsSearchablePdfStream())
        {
            // Upload stream to Azure Blob Storage, AWS S3, etc.
            await UploadToCloudStorage(pdfStream, "searchable-output.pdf");
        }

        // Option 3: Return as web response
        // return File(pdfBytes, "application/pdf", "searchable.pdf");
    }

    private async Task UploadToCloudStorage(Stream stream, string fileName)
    {
        // Cloud upload implementation
    }
}
using IronOcr;
using System.IO;

public class SearchablePdfExporter
{
    public async Task ProcessAndUploadPdf(string inputPath)
    {
        var ocr = new IronTesseract
        {
            Configuration = { RenderSearchablePdf = true }
        };

        // Process the input
        using var input = new OcrImageInput(inputPath);
        var result = ocr.Read(input);

        // Option 1: Save to database as byte array
        byte[] pdfBytes = result.SaveAsSearchablePdfBytes();
        // Store pdfBytes in database BLOB field

        // Option 2: Upload to cloud storage using stream
        using (Stream pdfStream = result.SaveAsSearchablePdfStream())
        {
            // Upload stream to Azure Blob Storage, AWS S3, etc.
            await UploadToCloudStorage(pdfStream, "searchable-output.pdf");
        }

        // Option 3: Return as web response
        // return File(pdfBytes, "application/pdf", "searchable.pdf");
    }

    private async Task UploadToCloudStorage(Stream stream, string fileName)
    {
        // Cloud upload implementation
    }
}
Imports IronOcr
Imports System.IO
Imports System.Threading.Tasks

Public Class SearchablePdfExporter
    Public Async Function ProcessAndUploadPdf(inputPath As String) As Task
        Dim ocr As New IronTesseract With {
            .Configuration = New TesseractConfiguration With {
                .RenderSearchablePdf = True
            }
        }

        ' Process the input
        Using input As New OcrImageInput(inputPath)
            Dim result = ocr.Read(input)

            ' Option 1: Save to database as byte array
            Dim pdfBytes As Byte() = result.SaveAsSearchablePdfBytes()
            ' Store pdfBytes in database BLOB field

            ' Option 2: Upload to cloud storage using stream
            Using pdfStream As Stream = result.SaveAsSearchablePdfStream()
                ' Upload stream to Azure Blob Storage, AWS S3, etc.
                Await UploadToCloudStorage(pdfStream, "searchable-output.pdf")
            End Using

            ' Option 3: Return as web response
            ' Return File(pdfBytes, "application/pdf", "searchable.pdf")
        End Using
    End Function

    Private Async Function UploadToCloudStorage(stream As Stream, fileName As String) As Task
        ' Cloud upload implementation
    End Function
End Class
$vbLabelText   $csharpLabel

Performance Considerations

When processing large volumes of documents, consider implementing multithreaded OCR operations to improve throughput. IronOCR supports concurrent processing, allowing you to handle multiple documents simultaneously:

using IronOcr;
using System.Threading.Tasks;
using System.Collections.Concurrent;

public class BatchPdfProcessor
{
    private readonly IronTesseract _ocr;

    public BatchPdfProcessor()
    {
        _ocr = new IronTesseract
        {
            Configuration = 
            {
                RenderSearchablePdf = true,
                // Configure for optimal performance
                Language = OcrLanguage.English
            }
        };
    }

    public async Task ProcessBatchAsync(string[] filePaths)
    {
        var results = new ConcurrentBag<(string source, string output)>();

        await Parallel.ForEachAsync(filePaths, async (filePath, ct) =>
        {
            using var input = new OcrImageInput(filePath);
            var result = _ocr.Read(input);

            string outputPath = Path.ChangeExtension(filePath, ".searchable.pdf");
            result.SaveAsSearchablePdf(outputPath);

            results.Add((filePath, outputPath));
        });

        Console.WriteLine($"Processed {results.Count} files");
    }
}
using IronOcr;
using System.Threading.Tasks;
using System.Collections.Concurrent;

public class BatchPdfProcessor
{
    private readonly IronTesseract _ocr;

    public BatchPdfProcessor()
    {
        _ocr = new IronTesseract
        {
            Configuration = 
            {
                RenderSearchablePdf = true,
                // Configure for optimal performance
                Language = OcrLanguage.English
            }
        };
    }

    public async Task ProcessBatchAsync(string[] filePaths)
    {
        var results = new ConcurrentBag<(string source, string output)>();

        await Parallel.ForEachAsync(filePaths, async (filePath, ct) =>
        {
            using var input = new OcrImageInput(filePath);
            var result = _ocr.Read(input);

            string outputPath = Path.ChangeExtension(filePath, ".searchable.pdf");
            result.SaveAsSearchablePdf(outputPath);

            results.Add((filePath, outputPath));
        });

        Console.WriteLine($"Processed {results.Count} files");
    }
}
Imports IronOcr
Imports System.Threading.Tasks
Imports System.Collections.Concurrent

Public Class BatchPdfProcessor
    Private ReadOnly _ocr As IronTesseract

    Public Sub New()
        _ocr = New IronTesseract With {
            .Configuration = New OcrConfiguration With {
                .RenderSearchablePdf = True,
                ' Configure for optimal performance
                .Language = OcrLanguage.English
            }
        }
    End Sub

    Public Async Function ProcessBatchAsync(filePaths As String()) As Task
        Dim results As New ConcurrentBag(Of (source As String, output As String))()

        Await Task.Run(Sub()
                           Parallel.ForEach(filePaths, Sub(filePath)
                                                           Using input As New OcrImageInput(filePath)
                                                               Dim result = _ocr.Read(input)

                                                               Dim outputPath As String = Path.ChangeExtension(filePath, ".searchable.pdf")
                                                               result.SaveAsSearchablePdf(outputPath)

                                                               results.Add((filePath, outputPath))
                                                           End Using
                                                       End Sub)
                       End Sub)

        Console.WriteLine($"Processed {results.Count} files")
    End Function
End Class
$vbLabelText   $csharpLabel

Advanced Configuration Options

For more advanced scenarios, you can leverage detailed Tesseract configuration to fine-tune the OCR engine for specific document types or languages:

var advancedOcr = new IronTesseract
{
    Configuration = 
    {
        RenderSearchablePdf = true,
        TesseractVariables = new Dictionary<string, object>
        {
            { "preserve_interword_spaces", 1 },
            { "tessedit_char_whitelist", "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" }
        },
        PageSegmentationMode = TesseractPageSegmentationMode.SingleColumn
    },
    Language = OcrLanguage.EnglishBest
};
var advancedOcr = new IronTesseract
{
    Configuration = 
    {
        RenderSearchablePdf = true,
        TesseractVariables = new Dictionary<string, object>
        {
            { "preserve_interword_spaces", 1 },
            { "tessedit_char_whitelist", "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" }
        },
        PageSegmentationMode = TesseractPageSegmentationMode.SingleColumn
    },
    Language = OcrLanguage.EnglishBest
};
Imports IronOcr

Dim advancedOcr As New IronTesseract With {
    .Configuration = New TesseractConfiguration With {
        .RenderSearchablePdf = True,
        .TesseractVariables = New Dictionary(Of String, Object) From {
            {"preserve_interword_spaces", 1},
            {"tessedit_char_whitelist", "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"}
        },
        .PageSegmentationMode = TesseractPageSegmentationMode.SingleColumn
    },
    .Language = OcrLanguage.EnglishBest
}
$vbLabelText   $csharpLabel

These configuration options apply equally to all three output methods: SaveAsSearchablePdf, SaveAsSearchablePdfBytes, and SaveAsSearchablePdfStream. The Summary below collects the full set of searchable PDF methods with their appropriate output formats.

Summary

Creating searchable PDFs with IronOCR is straightforward and flexible. Whether you need to process single images, multi-page documents, photos via ReadPhoto, or advanced document scans via ReadDocumentAdvanced, the library provides robust methods for generating searchable PDFs in various formats. Use the ModelType parameter to choose between the standard and enhanced ML models for accuracy. The ability to export as files, bytes, or streams makes it adaptable to any application architecture, from desktop applications to cloud-based services.

For more advanced OCR scenarios, explore the comprehensive code examples or refer to the API documentation for detailed method signatures and options.

Frequently Asked Questions

How do I create a searchable PDF from scanned images in C#?

IronOCR makes it simple to create searchable PDFs from scanned images. Just set RenderSearchablePdf to true in the configuration, use the Read() method on your input image, and call SaveAsSearchablePdf() with your desired output path. IronOCR will perform OCR on the image and generate a PDF with selectable, searchable text overlaid on the original image.

What file formats can be converted to searchable PDFs?

IronOCR can convert various image formats including JPG, PNG, TIFF, and existing PDF documents into searchable PDFs. The library supports both single-page images and multi-page documents like TIFF files, automatically processing all pages and maintaining proper page ordering in the output searchable PDF.

Can I export searchable PDFs as byte arrays or streams instead of files?

Yes, IronOCR supports exporting searchable PDFs in multiple formats. Besides saving directly to a file using SaveAsSearchablePdf(), you can also export the OCR results as byte arrays or streams, making it easy to integrate with web applications, cloud storage, or database systems without creating temporary files.

What is the minimum code required to create a searchable PDF?

Creating a searchable PDF with IronOCR can be done in just one line of code: new IronOcr.IronTesseract { Configuration = { RenderSearchablePdf = true } }.Read(new IronOcr.OcrImageInput("file.jpg")).SaveAsSearchablePdf("searchable.pdf"). This demonstrates IronOCR's streamlined API design.

How does the invisible text layer work in searchable PDFs?

IronOCR automatically handles the positioning of recognized text as an invisible layer on top of the original image in the PDF. This ensures accurate text-to-image mapping, allowing users to select and search text while maintaining the visual appearance of the original document. The library uses specialized fonts and positioning algorithms to achieve this.

Can I create searchable PDFs from photos or screenshots?

Yes, SaveAsSearchablePdf is supported on results from ReadPhoto, ReadScreenShot, and ReadDocumentAdvanced. Each method returns a result type that supports searchable PDF export, making it easy to convert real-world photos, screenshots, or complex document scans into searchable PDFs.

What does the ModelType parameter do?

The ModelType parameter controls which pre-trained ML model is used for OCR. Normal is the default and processes images resized to 960 pixels for fast results. Enhanced supports images up to 2560 pixels, retaining finer detail and improving accuracy for high-resolution inputs.

Why do copied or searched characters appear corrupted in my searchable PDF?

This happens because the default font (Times New Roman) used in the searchable text layer does not fully support all Unicode characters. To fix this, pass a Unicode-compatible font file as the third parameter of SaveAsSearchablePdf. If your documents were originally typeset in Times New Roman and you notice spacing inconsistencies with other fonts, try Liberation Serif as it shares the same glyph metrics and preserves the original layout.

Curtis Chau
Technical Writer

Curtis Chau holds a Bachelor’s degree in Computer Science (Carleton University) and specializes in front-end development with expertise in Node.js, TypeScript, JavaScript, and React. Passionate about crafting intuitive and aesthetically pleasing user interfaces, Curtis enjoys working with modern frameworks and creating well-structured, visually appealing manuals.

...

Read More
Reviewed by
Jeff Fritz
Jeffrey T. Fritz
Principal Program Manager - .NET Community Team
Jeff is also a Principal Program Manager for the .NET and Visual Studio teams. He is the executive producer of the .NET Conf virtual conference series and hosts 'Fritz and Friends' a live stream for developers that airs twice weekly where he talks tech and writes code together with viewers. Jeff writes workshops, presentations, and plans content for the largest Microsoft developer events including Microsoft Build, Microsoft Ignite, .NET Conf, and the Microsoft MVP Summit
Ready to Get Started?
Nuget Downloads 5,570,591 | Version: 2026.4 just released
Still Scrolling Icon

Still Scrolling?

Want proof fast? PM > Install-Package IronOcr
run a sample watch your image become searchable text.