How to Save Results as a Searchable PDF
A searchable PDF, often referred to as an OCR (Optical Character Recognition) PDF, is a type of PDF document that contains both scanned images and machine-readable text. These PDFs are created by performing OCR on scanned paper documents or images, which recognizes the text in the images and converts it into selectable and searchable text.
IronOCR provides a solution for performing optical character recognition on documents and exporting the results as searchable PDFs. It supports exporting searchable PDFs as files, bytes and streams.
How to Save Results as a Searchable PDF
Install with NuGet
Install-Package IronOcr
Download DLL
Manually install into your project
Install with NuGet
Install-Package IronOcr
Download DLL
Manually install into your project
Start using IronPDF in your project today with a free trial.
Check out IronOCR on Nuget for quick installation and deployment. With over 8 million downloads, it's transforming OCR with C#.
Install-Package IronOcr
Consider installing the IronOCR DLL directly. Download and manually install it for your project or GAC form: IronOcr.zip
Manually install into your project
Download DLLExport as Searchable PDF Example
To export the result as a searchable PDF, the user must first set the Configuration.RenderSearchablePdf property to true. After obtaining the OCR result object from the Read
method, use the SaveAsSearchablePdf
method by specifying the output file path. The code below demonstrates this using the following sample TIFF file.
:path=/static-assets/ocr/content-code-examples/how-to/searchable-pdf-searchable-pdf.cs
using IronOcr;
// Instantiate IronTesseract
IronTesseract ocrTesseract = new IronTesseract();
// Enable render as searchable PDF
ocrTesseract.Configuration.RenderSearchablePdf = true;
// Add image
using var imageInput = new OcrImageInput("Potter.tiff");
// Perform OCR
OcrResult ocrResult = ocrTesseract.Read(imageInput);
// Export as searchable PDF
ocrResult.SaveAsSearchablePdf("searchablePdf.pdf");
Imports IronOcr
' Instantiate IronTesseract
Private ocrTesseract As New IronTesseract()
' Enable render as searchable PDF
ocrTesseract.Configuration.RenderSearchablePdf = True
' Add image
Dim imageInput = New OcrImageInput("Potter.tiff")
' Perform OCR
Dim ocrResult As OcrResult = ocrTesseract.Read(imageInput)
' Export as searchable PDF
ocrResult.SaveAsSearchablePdf("searchablePdf.pdf")
Below is a screenshot of the sample TIFF and an embedded searchable PDF. You can try selecting the searchable PDF to verify that the text is selectable. Selectable text also enables the searching ability in PDF viewer software.
Please note
TIFF file
Searchable PDF
Searchable PDF as Byte and Stream
Byte and stream information of the searchable PDF file can also be output using the SaveAsSearchablePdfBytes
and SaveAsSearchablePdfStream
methods, respectively. The code example below shows how to utilize these methods.
:path=/static-assets/ocr/content-code-examples/how-to/searchable-pdf-searchable-pdf-byte-stream.cs
// Export searchable PDF byte
byte[] pdfByte = ocrResult.SaveAsSearchablePdfBytes();
// Export searchable PDF stream
Stream pdfStream = ocrResult.SaveAsSearchablePdfStream();
' Export searchable PDF byte
Dim pdfByte() As Byte = ocrResult.SaveAsSearchablePdfBytes()
' Export searchable PDF stream
Dim pdfStream As Stream = ocrResult.SaveAsSearchablePdfStream()