How to Save Results as a Searchable PDF in C#

Save Searchable PDFs in C# with IronOCR

IronOCR enables C# developers to convert scanned documents and images into searchable PDFs using OCR technology, supporting output as files, bytes, or streams with just a few lines of code.

A searchable PDF, often referred to as an OCR (Optical Character Recognition) PDF, is a type of PDF document that contains both scanned images and machine-readable text. These PDFs are created by performing OCR on scanned paper documents or images, recognizing the text in the images, and converting it into selectable and searchable text.

IronOCR provides a solution for performing optical character recognition on documents and exporting the results as searchable PDFs. It supports exporting searchable PDFs as files, bytes, and streams. This capability is particularly useful when working with scanned documents, digitizing paper archives, or making legacy PDFs searchable for better document management.

Quickstart: Export Searchable PDF in One Line

Set RenderSearchablePdf = true, run Read(...) on your input, and invoke SaveAsSearchablePdf(...) — that's all it takes to generate a fully searchable PDF with IronOCR.

Nuget IconGet started making PDFs with NuGet now:

  1. Install IronOCR with NuGet Package Manager

    PM > Install-Package IronOcr

  2. Copy and run this code snippet.

    new IronOcr.IronTesseract { Configuration = { RenderSearchablePdf = true } } .Read(new IronOcr.OcrImageInput("file.jpg")).SaveAsSearchablePdf("searchable.pdf");
  3. Deploy to test on your live environment

    Start using IronOCR in your project today with a free trial
    arrow pointer


How Do I Export OCR Results as a Searchable PDF?

Here's how you can export the result as a searchable PDF using IronOCR. You must first set the Configuration.RenderSearchablePdf property to true. After obtaining the OCR result object from the Read method, use the SaveAsSearchablePdf method by specifying the output file path. The code below demonstrates using a sample TIFF file.

:path=/static-assets/ocr/content-code-examples/how-to/searchable-pdf-searchable-pdf.cs
using IronOcr;

// Instantiate IronTesseract
IronTesseract ocrTesseract = new IronTesseract();

// Enable render as searchable PDF
ocrTesseract.Configuration.RenderSearchablePdf = true;

// Add image
using var imageInput = new OcrImageInput("Potter.tiff");
// Perform OCR
OcrResult ocrResult = ocrTesseract.Read(imageInput);

// Export as searchable PDF
ocrResult.SaveAsSearchablePdf("searchablePdf.pdf");
$vbLabelText   $csharpLabel

When working with multi-page TIFF files or complex documents, IronOCR automatically processes all pages and includes them in the searchable PDF output. The library handles page ordering and text overlay positioning automatically, ensuring accurate text-to-image mapping.

Below is a screenshot of the sample TIFF and an embedded searchable PDF. Attempt to select the text in the PDF to confirm its searchability. The ability to select also means the text can be searched in a PDF viewer.

IronOCR uses a particular font to overlay text on the image file, which might result in some discrepancies in text size.

Page from Harry Potter book showing Chapter Eight 'The Deathday Party' with text about Harry meeting Nearly Headless Nick

Working with Multi-Page Documents

When dealing with PDF OCR operations on multi-page documents, IronOCR processes each page sequentially and maintains the original document structure. Here's an example of converting a multi-page scanned PDF into a searchable PDF:

using IronOcr;

// Initialize IronTesseract with configuration
var ocrTesseract = new IronTesseract
{
    Configuration = 
    {
        RenderSearchablePdf = true,
        PageSegmentationMode = TesseractPageSegmentationMode.Auto
    }
};

// Load a multi-page PDF
using var pdfInput = new OcrPdfInput("multi-page-scan.pdf");

// Optionally specify page range (e.g., pages 1-10)
pdfInput.SelectPages(1, 10);

// Perform OCR with progress tracking
OcrResult result = ocrTesseract.Read(pdfInput);

// Save as searchable PDF
result.SaveAsSearchablePdf("searchable-multi-page.pdf");

// Display total pages processed
Console.WriteLine($"Processed {result.Pages.Length} pages");
using IronOcr;

// Initialize IronTesseract with configuration
var ocrTesseract = new IronTesseract
{
    Configuration = 
    {
        RenderSearchablePdf = true,
        PageSegmentationMode = TesseractPageSegmentationMode.Auto
    }
};

// Load a multi-page PDF
using var pdfInput = new OcrPdfInput("multi-page-scan.pdf");

// Optionally specify page range (e.g., pages 1-10)
pdfInput.SelectPages(1, 10);

// Perform OCR with progress tracking
OcrResult result = ocrTesseract.Read(pdfInput);

// Save as searchable PDF
result.SaveAsSearchablePdf("searchable-multi-page.pdf");

// Display total pages processed
Console.WriteLine($"Processed {result.Pages.Length} pages");
$vbLabelText   $csharpLabel

How Can I Apply Filters When Creating Searchable PDFs?

The SaveAsSearchablePdf also accepts a boolean flag as a second parameter that allows you to apply filters to a searchable PDF or not, giving developers the flexibility to choose. Using image optimization filters can significantly improve OCR accuracy, especially when dealing with low-quality scans.

Below is an example of applying the grayscale filter and then saving the PDF with a filter by putting true in the second parameter of SaveAsSearchablePdf.

:path=/static-assets/ocr/content-code-examples/how-to/image-quality-correction-searchable-pdf.cs
using IronOcr;

var ocr = new IronTesseract();
var ocrInput = new OcrInput();

// Load a PDF file
ocrInput.LoadPdf("invoice.pdf");

// Apply gray scale filter
ocrInput.ToGrayScale();
OcrResult result = ocr.Read(ocrInput);

// Save the result as a searchable PDF with filters applied
result.SaveAsSearchablePdf("outputGrayscale.pdf", true);
$vbLabelText   $csharpLabel

For optimal results, consider using the Filter Wizard to automatically determine the best combination of filters for your specific document type. This tool analyzes your input and suggests appropriate preprocessing steps.


How Do I Export Searchable PDFs as Bytes or Streams?

The output of the searchable PDF can also be handled as bytes or streams using SaveAsSearchablePdfBytes and SaveAsSearchablePdfStream methods, respectively. The code example below shows how to utilize these methods.

:path=/static-assets/ocr/content-code-examples/how-to/searchable-pdf-searchable-pdf-byte-stream.cs
// Export searchable PDF byte
byte[] pdfByte = ocrResult.SaveAsSearchablePdfBytes();

// Export searchable PDF stream
Stream pdfStream = ocrResult.SaveAsSearchablePdfStream();
$vbLabelText   $csharpLabel

These output options are particularly useful when integrating with cloud storage services, databases, or web applications where file system access may be limited. Here's an extended example showing practical applications:

using IronOcr;
using System.IO;

public class SearchablePdfExporter
{
    public async Task ProcessAndUploadPdf(string inputPath)
    {
        var ocr = new IronTesseract
        {
            Configuration = { RenderSearchablePdf = true }
        };

        // Process the input
        using var input = new OcrImageInput(inputPath);
        var result = ocr.Read(input);

        // Option 1: Save to database as byte array
        byte[] pdfBytes = result.SaveAsSearchablePdfBytes();
        // Store pdfBytes in database BLOB field

        // Option 2: Upload to cloud storage using stream
        using (Stream pdfStream = result.SaveAsSearchablePdfStream())
        {
            // Upload stream to Azure Blob Storage, AWS S3, etc.
            await UploadToCloudStorage(pdfStream, "searchable-output.pdf");
        }

        // Option 3: Return as web response
        // return File(pdfBytes, "application/pdf", "searchable.pdf");
    }

    private async Task UploadToCloudStorage(Stream stream, string fileName)
    {
        // Cloud upload implementation
    }
}
using IronOcr;
using System.IO;

public class SearchablePdfExporter
{
    public async Task ProcessAndUploadPdf(string inputPath)
    {
        var ocr = new IronTesseract
        {
            Configuration = { RenderSearchablePdf = true }
        };

        // Process the input
        using var input = new OcrImageInput(inputPath);
        var result = ocr.Read(input);

        // Option 1: Save to database as byte array
        byte[] pdfBytes = result.SaveAsSearchablePdfBytes();
        // Store pdfBytes in database BLOB field

        // Option 2: Upload to cloud storage using stream
        using (Stream pdfStream = result.SaveAsSearchablePdfStream())
        {
            // Upload stream to Azure Blob Storage, AWS S3, etc.
            await UploadToCloudStorage(pdfStream, "searchable-output.pdf");
        }

        // Option 3: Return as web response
        // return File(pdfBytes, "application/pdf", "searchable.pdf");
    }

    private async Task UploadToCloudStorage(Stream stream, string fileName)
    {
        // Cloud upload implementation
    }
}
$vbLabelText   $csharpLabel

Performance Considerations

When processing large volumes of documents, consider implementing multithreaded OCR operations to improve throughput. IronOCR supports concurrent processing, allowing you to handle multiple documents simultaneously:

using IronOcr;
using System.Threading.Tasks;
using System.Collections.Concurrent;

public class BatchPdfProcessor
{
    private readonly IronTesseract _ocr;

    public BatchPdfProcessor()
    {
        _ocr = new IronTesseract
        {
            Configuration = 
            {
                RenderSearchablePdf = true,
                // Configure for optimal performance
                Language = OcrLanguage.English
            }
        };
    }

    public async Task ProcessBatchAsync(string[] filePaths)
    {
        var results = new ConcurrentBag<(string source, string output)>();

        await Parallel.ForEachAsync(filePaths, async (filePath, ct) =>
        {
            using var input = new OcrImageInput(filePath);
            var result = _ocr.Read(input);

            string outputPath = Path.ChangeExtension(filePath, ".searchable.pdf");
            result.SaveAsSearchablePdf(outputPath);

            results.Add((filePath, outputPath));
        });

        Console.WriteLine($"Processed {results.Count} files");
    }
}
using IronOcr;
using System.Threading.Tasks;
using System.Collections.Concurrent;

public class BatchPdfProcessor
{
    private readonly IronTesseract _ocr;

    public BatchPdfProcessor()
    {
        _ocr = new IronTesseract
        {
            Configuration = 
            {
                RenderSearchablePdf = true,
                // Configure for optimal performance
                Language = OcrLanguage.English
            }
        };
    }

    public async Task ProcessBatchAsync(string[] filePaths)
    {
        var results = new ConcurrentBag<(string source, string output)>();

        await Parallel.ForEachAsync(filePaths, async (filePath, ct) =>
        {
            using var input = new OcrImageInput(filePath);
            var result = _ocr.Read(input);

            string outputPath = Path.ChangeExtension(filePath, ".searchable.pdf");
            result.SaveAsSearchablePdf(outputPath);

            results.Add((filePath, outputPath));
        });

        Console.WriteLine($"Processed {results.Count} files");
    }
}
$vbLabelText   $csharpLabel

Advanced Configuration Options

For more advanced scenarios, you can leverage detailed Tesseract configuration to fine-tune the OCR engine for specific document types or languages:

var advancedOcr = new IronTesseract
{
    Configuration = 
    {
        RenderSearchablePdf = true,
        TesseractVariables = new Dictionary<string, object>
        {
            { "preserve_interword_spaces", 1 },
            { "tessedit_char_whitelist", "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" }
        },
        PageSegmentationMode = TesseractPageSegmentationMode.SingleColumn
    },
    Language = OcrLanguage.EnglishBest
};
var advancedOcr = new IronTesseract
{
    Configuration = 
    {
        RenderSearchablePdf = true,
        TesseractVariables = new Dictionary<string, object>
        {
            { "preserve_interword_spaces", 1 },
            { "tessedit_char_whitelist", "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" }
        },
        PageSegmentationMode = TesseractPageSegmentationMode.SingleColumn
    },
    Language = OcrLanguage.EnglishBest
};
$vbLabelText   $csharpLabel

Summary

Creating searchable PDFs with IronOCR is straightforward and flexible. Whether you need to process single images, multi-page documents, or batch operations, the library provides robust methods for generating searchable PDFs in various formats. The ability to export as files, bytes, or streams makes it adaptable to any application architecture, from desktop applications to cloud-based services.

For more advanced OCR scenarios, explore the comprehensive code examples or refer to the API documentation for detailed method signatures and options.

Frequently Asked Questions

How do I create a searchable PDF from scanned images in C#?

IronOCR makes it simple to create searchable PDFs from scanned images. Just set RenderSearchablePdf to true in the configuration, use the Read() method on your input image, and call SaveAsSearchablePdf() with your desired output path. IronOCR will perform OCR on the image and generate a PDF with selectable, searchable text overlaid on the original image.

What file formats can be converted to searchable PDFs?

IronOCR can convert various image formats including JPG, PNG, TIFF, and existing PDF documents into searchable PDFs. The library supports both single-page images and multi-page documents like TIFF files, automatically processing all pages and maintaining proper page ordering in the output searchable PDF.

Can I export searchable PDFs as byte arrays or streams instead of files?

Yes, IronOCR supports exporting searchable PDFs in multiple formats. Besides saving directly to a file using SaveAsSearchablePdf(), you can also export the OCR results as byte arrays or streams, making it easy to integrate with web applications, cloud storage, or database systems without creating temporary files.

What is the minimum code required to create a searchable PDF?

Creating a searchable PDF with IronOCR can be done in just one line of code: new IronOcr.IronTesseract { Configuration = { RenderSearchablePdf = true } }.Read(new IronOcr.OcrImageInput("file.jpg")).SaveAsSearchablePdf("searchable.pdf"). This demonstrates IronOCR's streamlined API design.

How does the text overlay work in searchable PDFs?

IronOCR automatically handles the positioning of recognized text as an invisible overlay on top of the original image in the PDF. This ensures accurate text-to-image mapping, allowing users to select and search text while maintaining the visual appearance of the original document. The library uses specialized fonts and positioning algorithms to achieve this.

Chaknith Bin
Software Engineer
Chaknith works on IronXL and IronBarcode. He has deep expertise in C# and .NET, helping improve the software and support customers. His insights from user interactions contribute to better products, documentation, and overall experience.
Reviewed by
Jeff Fritz
Jeffrey T. Fritz
Principal Program Manager - .NET Community Team
Jeff is also a Principal Program Manager for the .NET and Visual Studio teams. He is the executive producer of the .NET Conf virtual conference series and hosts 'Fritz and Friends' a live stream for developers that airs twice weekly where he talks tech and writes code together with viewers. Jeff writes workshops, presentations, and plans content for the largest Microsoft developer events including Microsoft Build, Microsoft Ignite, .NET Conf, and the Microsoft MVP Summit
Ready to Get Started?
Nuget Downloads 5,269,558 | Version: 2025.12 just released