How to Save Results as a Searchable PDF

A searchable PDF, often referred to as an OCR (Optical Character Recognition) PDF, is a type of PDF document that contains both scanned images and machine-readable text. These PDFs are created by performing OCR on scanned paper documents or images, recognizing the text in the images, and converting it into selectable and searchable text.

IronOCR provides a solution for performing optical character recognition on documents and exporting the results as searchable PDFs. It supports exporting searchable PDFs as files, bytes, and streams.

Get started with IronOCR

Start using IronOCR in your project today with a free trial.

First Step:
green arrow pointer



Export as Searchable PDF Example

Here's how you can export the result as a searchable PDF using IronOCR. You must first set the Configuration.RenderSearchablePdf property to true. After obtaining the OCR result object from the Read method, use the SaveAsSearchablePdf method by specifying the output file path. The code below demonstrates using a sample TIFF file.

:path=/static-assets/ocr/content-code-examples/how-to/searchable-pdf-searchable-pdf.cs
using IronOcr;

// This code demonstrates how to use IronOcr's IronTesseract class 
// to perform Optical Character Recognition (OCR) on an image and export 
// the result as a searchable PDF file.

class Program
{
    static void Main()
    {
        // Instantiate an instance of the IronTesseract OCR engine
        IronTesseract ocrTesseract = new IronTesseract();

        // Enable the option to render OCR results into a searchable PDF
        ocrTesseract.Configuration.RenderSearchablePdf = true;

        // Load an image input file for OCR processing.
        // The 'using' statement ensures the OcrInput resource is disposed of properly.
        using var imageInput = new OcrInput("Potter.tiff");

        // Perform the OCR operation on the input image and obtain the result.
        OcrResult ocrResult = ocrTesseract.Read(imageInput);

        // Save the OCR results as a searchable PDF. 
        // The output PDF file is named 'searchablePdf.pdf'.
        ocrResult.SaveAsSearchablePdf("searchablePdf.pdf");
    }
}
Imports IronOcr

' This code demonstrates how to use IronOcr's IronTesseract class 
' to perform Optical Character Recognition (OCR) on an image and export 
' the result as a searchable PDF file.

Friend Class Program
	Shared Sub Main()
		' Instantiate an instance of the IronTesseract OCR engine
		Dim ocrTesseract As New IronTesseract()

		' Enable the option to render OCR results into a searchable PDF
		ocrTesseract.Configuration.RenderSearchablePdf = True

		' Load an image input file for OCR processing.
		' The 'using' statement ensures the OcrInput resource is disposed of properly.
		Dim imageInput = New OcrInput("Potter.tiff")

		' Perform the OCR operation on the input image and obtain the result.
		Dim ocrResult As OcrResult = ocrTesseract.Read(imageInput)

		' Save the OCR results as a searchable PDF. 
		' The output PDF file is named 'searchablePdf.pdf'.
		ocrResult.SaveAsSearchablePdf("searchablePdf.pdf")
	End Sub
End Class
$vbLabelText   $csharpLabel

Below is a screenshot of the sample TIFF and an embedded searchable PDF. Attempt to select the text in the PDF to confirm its searchability. The ability to select also means the text can be searched in a PDF viewer.

Please note
IronOCR uses a particular font to overlay text on the image file, which might result in some discrepancies in text size.

TIFF file

Searchable PDF as Byte and Stream

The output of the searchable PDF can also be handled as bytes or streams using SaveAsSearchablePdfBytes and SaveAsSearchablePdfStream methods, respectively. The code example below shows how to utilize these methods.

:path=/static-assets/ocr/content-code-examples/how-to/searchable-pdf-searchable-pdf-byte-stream.cs
// Source Code:

// Export searchable PDF as byte array
byte[] pdfBytes = ocrResult.SaveAsSearchablePdfBytes();

// Check if the byte array is not null and has data
if (pdfBytes != null && pdfBytes.Length > 0)
{
    // Perform operations with the byte array
    Console.WriteLine("PDF bytes have been successfully generated.");
}
else
{
    // Handle the case when byte array is null or empty
    Console.WriteLine("Failed to generate PDF byte array, or it is empty.");
}

// Export searchable PDF as a stream
Stream pdfStream = ocrResult.SaveAsSearchablePdfStream();

// Check if the stream is not null
if (pdfStream != null)
{
    // Ensure the stream's position is set to the start for reading, if the stream supports seeking
    if (pdfStream.CanSeek)
    {
        pdfStream.Position = 0;
    }

    // Perform operations with the stream using a StreamReader
    using (StreamReader reader = new StreamReader(pdfStream))
    {
        // Buffer to hold read characters from the stream
        char[] buffer = new char[100];
        
        // Read characters into the buffer, with a maximum of buffer's length
        int readCount = reader.Read(buffer, 0, buffer.Length);

        // Print a confirmation message with the read character count
        Console.WriteLine($"PDF stream has been successfully generated. Read {readCount} characters.");
    }
}
else
{
    // Handle the case when the stream is null
    Console.WriteLine("Failed to generate the PDF stream.");
}
' Source Code:

' Export searchable PDF as byte array
Dim pdfBytes() As Byte = ocrResult.SaveAsSearchablePdfBytes()

' Check if the byte array is not null and has data
If pdfBytes IsNot Nothing AndAlso pdfBytes.Length > 0 Then
	' Perform operations with the byte array
	Console.WriteLine("PDF bytes have been successfully generated.")
Else
	' Handle the case when byte array is null or empty
	Console.WriteLine("Failed to generate PDF byte array, or it is empty.")
End If

' Export searchable PDF as a stream
Dim pdfStream As Stream = ocrResult.SaveAsSearchablePdfStream()

' Check if the stream is not null
If pdfStream IsNot Nothing Then
	' Ensure the stream's position is set to the start for reading, if the stream supports seeking
	If pdfStream.CanSeek Then
		pdfStream.Position = 0
	End If

	' Perform operations with the stream using a StreamReader
	Using reader As New StreamReader(pdfStream)
		' Buffer to hold read characters from the stream
		Dim buffer(99) As Char

		' Read characters into the buffer, with a maximum of buffer's length
		Dim readCount As Integer = reader.Read(buffer, 0, buffer.Length)

		' Print a confirmation message with the read character count
		Console.WriteLine($"PDF stream has been successfully generated. Read {readCount} characters.")
	End Using
Else
	' Handle the case when the stream is null
	Console.WriteLine("Failed to generate the PDF stream.")
End If
$vbLabelText   $csharpLabel

Frequently Asked Questions

What is a searchable PDF?

A searchable PDF is a type of PDF document that contains both scanned images and machine-readable text, created by performing OCR on scanned documents or images.

How can I create a searchable PDF using IronOCR?

To create a searchable PDF using IronOCR, set the Configuration.RenderSearchablePdf property to true, perform OCR using the Read method, and then use SaveAsSearchablePdf to specify the output file path.

What file formats can I use with IronOCR to create searchable PDFs?

IronOCR can handle various file formats for OCR processing, such as TIFF files, to create searchable PDFs.

Can IronOCR export searchable PDFs as bytes or streams?

Yes, IronOCR can export searchable PDFs as bytes using SaveAsSearchablePdfBytes and as streams using SaveAsSearchablePdfStream.

Is it possible to select and search text in a searchable PDF created by IronOCR?

Yes, a searchable PDF created by IronOCR allows you to select and search text in a PDF viewer.

Where can I download IronOCR?

You can download IronOCR from the official source on NuGet.

What programming language is used for creating searchable PDFs with IronOCR?

IronOCR is used with C# to create searchable PDFs.

Does IronOCR use a specific font for overlay text in searchable PDFs?

Yes, IronOCR uses a specific font to overlay text on the image file, which might result in some discrepancies in text size.

Chaknith related to Searchable PDF as Byte and Stream
Software Engineer
Chaknith is the Sherlock Holmes of developers. It first occurred to him he might have a future in software engineering, when he was doing code challenges for fun. His focus is on IronXL and IronBarcode, but he takes pride in helping customers with every product. Chaknith leverages his knowledge from talking directly with customers, to help further improve the products themselves. His anecdotal feedback goes beyond Jira tickets and supports product development, documentation and marketing, to improve customer’s overall experience.When he isn’t in the office, he can be found learning about machine learning, coding and hiking.