How to Save Results as a Searchable PDF
A searchable PDF, often referred to as an OCR (Optical Character Recognition) PDF, is a type of PDF document that contains both scanned images and machine-readable text. These PDFs are created by performing OCR on scanned paper documents or images, recognizing the text in the images, and converting it into selectable and searchable text.
IronOCR provides a solution for performing optical character recognition on documents and exporting the results as searchable PDFs. It supports exporting searchable PDFs as files, bytes, and streams.
Get started with IronOCR
Start using IronOCR in your project today with a free trial.
How to Save Results as a Searchable PDF
Export as Searchable PDF Example
Here's how you can export the result as a searchable PDF using IronOCR. You must first set the Configuration.RenderSearchablePdf
property to true
. After obtaining the OCR result object from the Read
method, use the SaveAsSearchablePdf
method by specifying the output file path. The code below demonstrates using a sample TIFF file.
:path=/static-assets/ocr/content-code-examples/how-to/searchable-pdf-searchable-pdf.cs
using IronOcr;
// This code demonstrates how to use IronOcr's IronTesseract class
// to perform Optical Character Recognition (OCR) on an image and export
// the result as a searchable PDF file.
class Program
{
static void Main()
{
// Instantiate an instance of the IronTesseract OCR engine
IronTesseract ocrTesseract = new IronTesseract();
// Enable the option to render OCR results into a searchable PDF
ocrTesseract.Configuration.RenderSearchablePdf = true;
// Load an image input file for OCR processing.
// The 'using' statement ensures the OcrInput resource is disposed of properly.
using var imageInput = new OcrInput("Potter.tiff");
// Perform the OCR operation on the input image and obtain the result.
OcrResult ocrResult = ocrTesseract.Read(imageInput);
// Save the OCR results as a searchable PDF.
// The output PDF file is named 'searchablePdf.pdf'.
ocrResult.SaveAsSearchablePdf("searchablePdf.pdf");
}
}
Imports IronOcr
' This code demonstrates how to use IronOcr's IronTesseract class
' to perform Optical Character Recognition (OCR) on an image and export
' the result as a searchable PDF file.
Friend Class Program
Shared Sub Main()
' Instantiate an instance of the IronTesseract OCR engine
Dim ocrTesseract As New IronTesseract()
' Enable the option to render OCR results into a searchable PDF
ocrTesseract.Configuration.RenderSearchablePdf = True
' Load an image input file for OCR processing.
' The 'using' statement ensures the OcrInput resource is disposed of properly.
Dim imageInput = New OcrInput("Potter.tiff")
' Perform the OCR operation on the input image and obtain the result.
Dim ocrResult As OcrResult = ocrTesseract.Read(imageInput)
' Save the OCR results as a searchable PDF.
' The output PDF file is named 'searchablePdf.pdf'.
ocrResult.SaveAsSearchablePdf("searchablePdf.pdf")
End Sub
End Class
Below is a screenshot of the sample TIFF and an embedded searchable PDF. Attempt to select the text in the PDF to confirm its searchability. The ability to select also means the text can be searched in a PDF viewer.
Please note

TIFF file
Searchable PDF
Searchable PDF as Byte and Stream
The output of the searchable PDF can also be handled as bytes or streams using SaveAsSearchablePdfBytes
and SaveAsSearchablePdfStream
methods, respectively. The code example below shows how to utilize these methods.
:path=/static-assets/ocr/content-code-examples/how-to/searchable-pdf-searchable-pdf-byte-stream.cs
// Source Code:
// Export searchable PDF as byte array
byte[] pdfBytes = ocrResult.SaveAsSearchablePdfBytes();
// Check if the byte array is not null and has data
if (pdfBytes != null && pdfBytes.Length > 0)
{
// Perform operations with the byte array
Console.WriteLine("PDF bytes have been successfully generated.");
}
else
{
// Handle the case when byte array is null or empty
Console.WriteLine("Failed to generate PDF byte array, or it is empty.");
}
// Export searchable PDF as a stream
Stream pdfStream = ocrResult.SaveAsSearchablePdfStream();
// Check if the stream is not null
if (pdfStream != null)
{
// Ensure the stream's position is set to the start for reading, if the stream supports seeking
if (pdfStream.CanSeek)
{
pdfStream.Position = 0;
}
// Perform operations with the stream using a StreamReader
using (StreamReader reader = new StreamReader(pdfStream))
{
// Buffer to hold read characters from the stream
char[] buffer = new char[100];
// Read characters into the buffer, with a maximum of buffer's length
int readCount = reader.Read(buffer, 0, buffer.Length);
// Print a confirmation message with the read character count
Console.WriteLine($"PDF stream has been successfully generated. Read {readCount} characters.");
}
}
else
{
// Handle the case when the stream is null
Console.WriteLine("Failed to generate the PDF stream.");
}
' Source Code:
' Export searchable PDF as byte array
Dim pdfBytes() As Byte = ocrResult.SaveAsSearchablePdfBytes()
' Check if the byte array is not null and has data
If pdfBytes IsNot Nothing AndAlso pdfBytes.Length > 0 Then
' Perform operations with the byte array
Console.WriteLine("PDF bytes have been successfully generated.")
Else
' Handle the case when byte array is null or empty
Console.WriteLine("Failed to generate PDF byte array, or it is empty.")
End If
' Export searchable PDF as a stream
Dim pdfStream As Stream = ocrResult.SaveAsSearchablePdfStream()
' Check if the stream is not null
If pdfStream IsNot Nothing Then
' Ensure the stream's position is set to the start for reading, if the stream supports seeking
If pdfStream.CanSeek Then
pdfStream.Position = 0
End If
' Perform operations with the stream using a StreamReader
Using reader As New StreamReader(pdfStream)
' Buffer to hold read characters from the stream
Dim buffer(99) As Char
' Read characters into the buffer, with a maximum of buffer's length
Dim readCount As Integer = reader.Read(buffer, 0, buffer.Length)
' Print a confirmation message with the read character count
Console.WriteLine($"PDF stream has been successfully generated. Read {readCount} characters.")
End Using
Else
' Handle the case when the stream is null
Console.WriteLine("Failed to generate the PDF stream.")
End If
Frequently Asked Questions
What is a searchable PDF?
A searchable PDF is a type of PDF document that contains both scanned images and machine-readable text, created by performing OCR on scanned documents or images.
How can I create a searchable PDF using IronOCR?
To create a searchable PDF using IronOCR, set the Configuration.RenderSearchablePdf property to true, perform OCR using the Read method, and then use SaveAsSearchablePdf to specify the output file path.
What file formats can I use with IronOCR to create searchable PDFs?
IronOCR can handle various file formats for OCR processing, such as TIFF files, to create searchable PDFs.
Can IronOCR export searchable PDFs as bytes or streams?
Yes, IronOCR can export searchable PDFs as bytes using SaveAsSearchablePdfBytes and as streams using SaveAsSearchablePdfStream.
Is it possible to select and search text in a searchable PDF created by IronOCR?
Yes, a searchable PDF created by IronOCR allows you to select and search text in a PDF viewer.
Where can I download IronOCR?
You can download IronOCR from the official source on NuGet.
What programming language is used for creating searchable PDFs with IronOCR?
IronOCR is used with C# to create searchable PDFs.
Does IronOCR use a specific font for overlay text in searchable PDFs?
Yes, IronOCR uses a specific font to overlay text on the image file, which might result in some discrepancies in text size.