How to Save Results as a Searchable PDF

A searchable PDF, often referred to as an OCR (Optical Character Recognition) PDF, is a type of PDF document that contains both scanned images and machine-readable text. These PDFs are created by performing OCR on scanned paper documents or images, recognizing the text in the images, and converting it into selectable and searchable text.

IronOCR provides a solution for performing optical character recognition on documents and exporting the results as searchable PDFs. It supports exporting searchable PDFs as files, bytes, and streams.

Get started with IronOCR

Start using IronOCR in your project today with a free trial.

First Step:
green arrow pointer



Export as Searchable PDF Example

Here's how you can export the result as a searchable PDF using IronOCR. You must first set the Configuration.RenderSearchablePdf property to true. After obtaining the OCR result object from the Read method, use the SaveAsSearchablePdf method by specifying the output file path. The code below demonstrates using a sample TIFF file.

:path=/static-assets/ocr/content-code-examples/how-to/searchable-pdf-searchable-pdf.cs
using IronOcr;

// Instantiate IronTesseract
IronTesseract ocrTesseract = new IronTesseract();

// Enable render as searchable PDF
ocrTesseract.Configuration.RenderSearchablePdf = true;

// Add image
using var imageInput = new OcrImageInput("Potter.tiff");
// Perform OCR
OcrResult ocrResult = ocrTesseract.Read(imageInput);

// Export as searchable PDF
ocrResult.SaveAsSearchablePdf("searchablePdf.pdf");
Imports IronOcr

' Instantiate IronTesseract
Private ocrTesseract As New IronTesseract()

' Enable render as searchable PDF
ocrTesseract.Configuration.RenderSearchablePdf = True

' Add image
Dim imageInput = New OcrImageInput("Potter.tiff")
' Perform OCR
Dim ocrResult As OcrResult = ocrTesseract.Read(imageInput)

' Export as searchable PDF
ocrResult.SaveAsSearchablePdf("searchablePdf.pdf")
$vbLabelText   $csharpLabel

Below is a screenshot of the sample TIFF and an embedded searchable PDF. Attempt to select the text in the PDF to confirm its searchability. The ability to select also means the text can be searched in a PDF viewer.

Please noteIronOCR uses a particular font to overlay text on the image file, which might result in some discrepancies in text size.

TIFF file

Searchable PDF as Byte and Stream

The output of the searchable PDF can also be handled as bytes or streams using SaveAsSearchablePdfBytes and SaveAsSearchablePdfStream methods, respectively. The code example below shows how to utilize these methods.

:path=/static-assets/ocr/content-code-examples/how-to/searchable-pdf-searchable-pdf-byte-stream.cs
// Export searchable PDF byte
byte[] pdfByte = ocrResult.SaveAsSearchablePdfBytes();

// Export searchable PDF stream
Stream pdfStream = ocrResult.SaveAsSearchablePdfStream();
' Export searchable PDF byte
Dim pdfByte() As Byte = ocrResult.SaveAsSearchablePdfBytes()

' Export searchable PDF stream
Dim pdfStream As Stream = ocrResult.SaveAsSearchablePdfStream()
$vbLabelText   $csharpLabel

Frequently Asked Questions

What is a searchable PDF?

A searchable PDF is a document that combines scanned images with machine-readable text, created by performing OCR on scanned documents or images, allowing users to select and search text within the document.

How can I convert scanned documents into searchable PDFs in C#?

To convert scanned documents into searchable PDFs in C#, use IronOCR by setting the Configuration.RenderSearchablePdf property to true, executing OCR with the Read method, and then saving the output using SaveAsSearchablePdf.

Can I export searchable PDFs as bytes or streams?

Yes, IronOCR allows exporting searchable PDFs as bytes using SaveAsSearchablePdfBytes and as streams using SaveAsSearchablePdfStream.

How do I handle different file formats for OCR processing?

IronOCR supports various file formats such as TIFF for OCR processing, enabling the creation of searchable PDFs from these formats.

Is it possible to select and search text in a searchable PDF created using OCR technology?

Yes, searchable PDFs created using IronOCR technology allow text selection and searchability within a PDF viewer.

What steps are involved in creating a searchable PDF using OCR?

The steps involve downloading the IronOCR C# library, preparing documents for OCR, setting the RenderSearchablePdf property to true, and using the SaveAsSearchablePdf method to save the file.

How can I ensure that the text in my PDF is searchable?

Ensure the text is searchable by using IronOCR to perform OCR on your images and set the RenderSearchablePdf property to true before saving the document.

Does IronOCR use a specific font for overlay text in searchable PDFs?

Yes, IronOCR uses a specific font for overlay text on image files, which may result in some discrepancies in text size.

Chaknith Bin
Software Engineer
Chaknith works on IronXL and IronBarcode. He has deep expertise in C# and .NET, helping improve the software and support customers. His insights from user interactions contribute to better products, documentation, and overall experience.
Reviewed by
Jeff Fritz
Jeffrey T. Fritz
Principal Program Manager - .NET Community Team
Jeff is also a Principal Program Manager for the .NET and Visual Studio teams. He is the executive producer of the .NET Conf virtual conference series and hosts 'Fritz and Friends' a live stream for developers that airs twice weekly where he talks tech and writes code together with viewers. Jeff writes workshops, presentations, and plans content for the largest Microsoft developer events including Microsoft Build, Microsoft Ignite, .NET Conf, and the Microsoft MVP Summit