How to Save Results as a Searchable PDF

by Chaknith Bin

A searchable PDF, often referred to as an OCR (Optical Character Recognition) PDF, is a type of PDF document that contains both scanned images and machine-readable text. These PDFs are created by performing OCR on scanned paper documents or images, which recognizes the text in the images and converts it into selectable and searchable text.

IronOCR provides a solution for performing optical character recognition on documents and exporting the results as searchable PDFs. It supports exporting searchable PDFs as files, bytes and streams.


C# NuGet Library for OCR

Install with NuGet

Install-Package IronOcr
or
C# OCR DLL

Download DLL

Download DLL

Manually install into your project

Export as Searchable PDF Example

To export the result as a searchable PDF, the user must first set the Configuration.RenderSearchablePdfsAndHocr property to true. After obtaining the OCR result object from the Read method, use the SaveAsSearchablePdf method by specifying the output file path. The code below demonstrates this using the following sample TIFF file.

:path=/static-assets/ocr/content-code-examples/how-to/searchable-pdf-searchable-pdf.cs
using IronOcr;

// Instantiate IronTesseract
IronTesseract ocrTesseract = new IronTesseract();

// Enable render as searchable PDF
ocrTesseract.Configuration.RenderHocr = true;

// Add image
using var imageInput = new OcrImageInput("Potter.tiff");
// Perform OCR
OcrResult ocrResult = ocrTesseract.Read(imageInput);

// Export as searchable PDF
ocrResult.SaveAsSearchablePdf("searchablePdf.pdf");
Imports IronOcr

' Instantiate IronTesseract
Private ocrTesseract As New IronTesseract()

' Enable render as searchable PDF
ocrTesseract.Configuration.RenderHocr = True

' Add image
Dim imageInput = New OcrImageInput("Potter.tiff")
' Perform OCR
Dim ocrResult As OcrResult = ocrTesseract.Read(imageInput)

' Export as searchable PDF
ocrResult.SaveAsSearchablePdf("searchablePdf.pdf")
VB   C#

Below is a screenshot of the sample TIFF and an embedded searchable PDF. You can try selecting the searchable PDF to verify that the text is selectable. Selectable text also enables the searching ability in PDF viewer software.

Please note
IronOCR uses a particular font to overlay text on the image file. Therefore, in some cases, the selected text size may not be the same as the text size.

TIFF file

Searchable PDF as Byte and Stream

Byte and stream information of the searchable PDF file can also be output using the SaveAsSearchablePdfBytes and SaveAsSearchablePdfStream methods, respectively. The code example below shows how to utilize these methods.

:path=/static-assets/ocr/content-code-examples/how-to/searchable-pdf-searchable-pdf-byte-stream.cs
// Export searchable PDF byte
byte[] pdfByte = ocrResult.SaveAsSearchablePdfBytes();

// Export searchable PDF stream
Stream pdfStream = ocrResult.SaveAsSearchablePdfStream();
' Export searchable PDF byte
Dim pdfByte() As Byte = ocrResult.SaveAsSearchablePdfBytes()

' Export searchable PDF stream
Dim pdfStream As Stream = ocrResult.SaveAsSearchablePdfStream()
VB   C#

Chaknith Bin

Software Engineer

Chaknith is the Sherlock Holmes of developers. It first occurred to him he might have a future in software engineering, when he was doing code challenges for fun. His focus is on IronXL and IronBarcode, but he takes pride in helping customers with every product. Chaknith leverages his knowledge from talking directly with customers, to help further improve the products themselves. His anecdotal feedback goes beyond Jira tickets and supports product development, documentation and marketing, to improve customer’s overall experience.When he isn’t in the office, he can be found learning about machine learning, coding and hiking.