How to Save Results as a Searchable PDF

A searchable PDF, often referred to as an OCR (Optical Character Recognition) PDF, is a type of PDF document that contains both scanned images and machine-readable text. These PDFs are created by performing OCR on scanned paper documents or images, recognizing the text in the images, and converting it into selectable and searchable text.

IronOCR provides a solution for performing optical character recognition on documents and exporting the results as searchable PDFs. It supports exporting searchable PDFs as files, bytes, and streams.

Get started with IronOCR

Start using IronOCR in your project today with a free trial.

First Step:
green arrow pointer



Export as Searchable PDF Example

Here's how you can export the result as a searchable PDF using IronOCR. You must first set the Configuration.RenderSearchablePdf property to true. After obtaining the OCR result object from the Read method, use the SaveAsSearchablePdf method by specifying the output file path. The code below demonstrates using a sample TIFF file.

:path=/static-assets/ocr/content-code-examples/how-to/searchable-pdf-searchable-pdf.cs
using IronOcr;

// Instantiate IronTesseract
IronTesseract ocrTesseract = new IronTesseract();

// Enable render as searchable PDF
ocrTesseract.Configuration.RenderSearchablePdf = true;

// Add image
using var imageInput = new OcrImageInput("Potter.tiff");
// Perform OCR
OcrResult ocrResult = ocrTesseract.Read(imageInput);

// Export as searchable PDF
ocrResult.SaveAsSearchablePdf("searchablePdf.pdf");
Imports IronOcr

' Instantiate IronTesseract
Private ocrTesseract As New IronTesseract()

' Enable render as searchable PDF
ocrTesseract.Configuration.RenderSearchablePdf = True

' Add image
Dim imageInput = New OcrImageInput("Potter.tiff")
' Perform OCR
Dim ocrResult As OcrResult = ocrTesseract.Read(imageInput)

' Export as searchable PDF
ocrResult.SaveAsSearchablePdf("searchablePdf.pdf")
$vbLabelText   $csharpLabel

Below is a screenshot of the sample TIFF and an embedded searchable PDF. Attempt to select the text in the PDF to confirm its searchability. The ability to select also means the text can be searched in a PDF viewer.

Please note
IronOCR uses a particular font to overlay text on the image file, which might result in some discrepancies in text size.

TIFF file

Searchable PDF as Byte and Stream

The output of the searchable PDF can also be handled as bytes or streams using SaveAsSearchablePdfBytes and SaveAsSearchablePdfStream methods, respectively. The code example below shows how to utilize these methods.

:path=/static-assets/ocr/content-code-examples/how-to/searchable-pdf-searchable-pdf-byte-stream.cs
// Export searchable PDF byte
byte[] pdfByte = ocrResult.SaveAsSearchablePdfBytes();

// Export searchable PDF stream
Stream pdfStream = ocrResult.SaveAsSearchablePdfStream();
' Export searchable PDF byte
Dim pdfByte() As Byte = ocrResult.SaveAsSearchablePdfBytes()

' Export searchable PDF stream
Dim pdfStream As Stream = ocrResult.SaveAsSearchablePdfStream()
$vbLabelText   $csharpLabel

Frequently Asked Questions

What is a searchable PDF?

A searchable PDF is a type of PDF document that contains both scanned images and machine-readable text, created by performing OCR on scanned documents or images.

How can I create a searchable PDF?

To create a searchable PDF using IronOCR, set the Configuration.RenderSearchablePdf property to true, perform OCR using the Read method, and then use SaveAsSearchablePdf to specify the output file path.

What file formats can I use to create searchable PDFs?

IronOCR can handle various file formats for OCR processing, such as TIFF files, to create searchable PDFs.

Can the software export searchable PDFs as bytes or streams?

Yes, IronOCR can export searchable PDFs as bytes using SaveAsSearchablePdfBytes and as streams using SaveAsSearchablePdfStream.

Is it possible to select and search text in a searchable PDF created by the software?

Yes, a searchable PDF created by IronOCR allows you to select and search text in a PDF viewer.

Where can I download the software?

You can download IronOCR from the official source on NuGet.

What programming language is used for creating searchable PDFs?

IronOCR is used with C# to create searchable PDFs.

Does the software use a specific font for overlay text in searchable PDFs?

Yes, IronOCR uses a specific font to overlay text on the image file, which might result in some discrepancies in text size.

Chaknith related to Searchable PDF as Byte and Stream
Software Engineer
Chaknith is the Sherlock Holmes of developers. It first occurred to him he might have a future in software engineering, when he was doing code challenges for fun. His focus is on IronXL and IronBarcode, but he takes pride in helping customers with every product. Chaknith leverages his knowledge from talking directly with customers, to help further improve the products themselves. His anecdotal feedback goes beyond Jira tickets and supports product development, documentation and marketing, to improve customer’s overall experience.When he isn’t in the office, he can be found learning about machine learning, coding and hiking.
Reviewed by
Jeff Fritz
Jeffrey T. Fritz
Principal Program Manager - .NET Community Team
Jeff is also a Principal Program Manager for the .NET and Visual Studio teams. He is the executive producer of the .NET Conf virtual conference series and hosts 'Fritz and Friends' a live stream for developers that airs twice weekly where he talks tech and writes code together with viewers. Jeff writes workshops, presentations, and plans content for the largest Microsoft developer events including Microsoft Build, Microsoft Ignite, .NET Conf, and the Microsoft MVP Summit