How to Save Results as a Searchable PDF

by Chaknith Bin

A searchable PDF, often referred to as an OCR (Optical Character Recognition) PDF, is a type of PDF document that contains both scanned images and machine-readable text. These PDFs are created by performing OCR on scanned paper documents or images, which recognizes the text in the images and converts it into selectable and searchable text.

IronOCR provides a solution for performing optical character recognition on documents and exporting the results as searchable PDFs. It supports exporting searchable PDFs as files, bytes and streams.

How to Save Results as a Searchable PDF

Download a C# library to save results as searchable PDF
Prepare the image and PDF document for OCR
Set the RenderSearchablePdfsAndHocr property to true
Utilize the SaveAsSearchablePdf method to output a searchable PDF file
Export the searchable PDF as byte and string

or

Export as Searchable PDF Example

To export the result as a searchable PDF, the user must first set the Configuration.RenderSearchablePdfsAndHocr property to true. After obtaining the OCR result object from the Read method, use the SaveAsSearchablePdf method by specifying the output file path. The code below demonstrates this using the following sample TIFF file.

:path=/static-assets/ocr/content-code-examples/how-to/searchable-pdf-searchable-pdf.cs

using IronOcr;

// Instantiate IronTesseract
IronTesseract ocrTesseract = new IronTesseract();

// Enable render as searchable PDF
ocrTesseract.Configuration.RenderHocr = true;

// Add image
using var imageInput = new OcrImageInput("Potter.tiff");
// Perform OCR
OcrResult ocrResult = ocrTesseract.Read(imageInput);

// Export as searchable PDF
ocrResult.SaveAsSearchablePdf("searchablePdf.pdf");

Imports IronOcr

' Instantiate IronTesseract
Private ocrTesseract As New IronTesseract()

' Enable render as searchable PDF
ocrTesseract.Configuration.RenderHocr = True

' Add image
Dim imageInput = New OcrImageInput("Potter.tiff")
' Perform OCR
Dim ocrResult As OcrResult = ocrTesseract.Read(imageInput)

' Export as searchable PDF
ocrResult.SaveAsSearchablePdf("searchablePdf.pdf")

VB C#

Below is a screenshot of the sample TIFF and an embedded searchable PDF. You can try selecting the searchable PDF to verify that the text is selectable. Selectable text also enables the searching ability in PDF viewer software.

Please note

IronOCR uses a particular font to overlay text on the image file. Therefore, in some cases, the selected text size may not be the same as the text size.

TIFF file

Searchable PDF

Searchable PDF as Byte and Stream

Byte and stream information of the searchable PDF file can also be output using the SaveAsSearchablePdfBytes and SaveAsSearchablePdfStream methods, respectively. The code example below shows how to utilize these methods.

How to Save Results as a Searchable PDF