Create Searchable PDFs by OCR

VB C#

using IronOcr;

var ocrTesseract = new IronTesseract();

using var ocrInput = new OcrInput();

ocrInput.LoadImage(@"images\page1.png");
ocrInput.LoadImage(@"images\page2.bmp");

var page = new int[]{ 2, 3 };
ocrInput.LoadImageFrames(@"images\page3.tiff", page);

ocrInput.Deskew();

var ocrResult = ocrTesseract.Read(ocrInput);

ocrResult.SaveAsSearchablePdf("searchable.pdf");

Imports IronOcr

Private ocrTesseract = New IronTesseract()

Private ocrInput = New OcrInput()

ocrInput.LoadImage("images\page1.png")
ocrInput.LoadImage("images\page2.bmp")

Dim page = New Integer(){ 2, 3 }
ocrInput.LoadImageFrames("images\page3.tiff", page)

ocrInput.Deskew()

Dim ocrResult = ocrTesseract.Read(ocrInput)

ocrResult.SaveAsSearchablePdf("searchable.pdf")

Install-Package IronOcr

Create Searchable PDFs by OCR

We can use Iron's advanced Tesseract engine to convert images to searchable PDFs. It can also make existing PDFs searchable.

This adds to SEO performance and internal search indexing within intranets and databases.

How to Create Searchable PDFs with IronOCR Tesseract

Install the OCR library to create searchable PDFs.
Install the OCR library to create searchable PDFs.
Create an OcrInput object and use AddImage to register the image path.
Call all the required methods to process the image.
Use the Read method on the OcrInput object.
Call SaveAsSearchablePdf to save the images as a single PDF.

To implement the above steps in code, you can follow this C# example:

using IronOcr;  // Import the IronOcr namespace

class Program
{
    static void Main()
    {
        // Step 1: Create an instance of the IronTesseract class
        var Ocr = new IronTesseract();

        // Step 2: Initialize OcrInput
        var Input = new OcrInput();

        // Step 3: Add the image path(s) that you want to convert to a searchable PDF
        Input.AddImage("path/to/your/image.jpg");

        // Step 4: Perform OCR processing using the Read method
        // This method returns an OcrResult, which includes the text read from the image
        OcrResult Result = Ocr.Read(Input);

        // Step 5: Save the output to a searchable PDF
        // The SaveAsSearchablePdf method creates a PDF with the OCR text
        Result.SaveAsSearchablePdf("path/to/output.pdf");
    }
}

using IronOcr;  // Import the IronOcr namespace

class Program
{
    static void Main()
    {
        // Step 1: Create an instance of the IronTesseract class
        var Ocr = new IronTesseract();

        // Step 2: Initialize OcrInput
        var Input = new OcrInput();

        // Step 3: Add the image path(s) that you want to convert to a searchable PDF
        Input.AddImage("path/to/your/image.jpg");

        // Step 4: Perform OCR processing using the Read method
        // This method returns an OcrResult, which includes the text read from the image
        OcrResult Result = Ocr.Read(Input);

        // Step 5: Save the output to a searchable PDF
        // The SaveAsSearchablePdf method creates a PDF with the OCR text
        Result.SaveAsSearchablePdf("path/to/output.pdf");
    }
}

Imports IronOcr ' Import the IronOcr namespace

Friend Class Program
	Shared Sub Main()
		' Step 1: Create an instance of the IronTesseract class
		Dim Ocr = New IronTesseract()

		' Step 2: Initialize OcrInput
		Dim Input = New OcrInput()

		' Step 3: Add the image path(s) that you want to convert to a searchable PDF
		Input.AddImage("path/to/your/image.jpg")

		' Step 4: Perform OCR processing using the Read method
		' This method returns an OcrResult, which includes the text read from the image
		Dim Result As OcrResult = Ocr.Read(Input)

		' Step 5: Save the output to a searchable PDF
		' The SaveAsSearchablePdf method creates a PDF with the OCR text
		Result.SaveAsSearchablePdf("path/to/output.pdf")
	End Sub
End Class

$vbLabelText $csharpLabel

Explanation

IronTesseract Class: This is the main class used to perform OCR. It allows configuration of the OCR settings.
OcrInput Object: This object holds the images you want to process. You add images to this object using AddImage.
Read Method: This method takes the OcrInput object and processes the images, extracting text from them.
SaveAsSearchablePdf Method: This saves the OCR result as a searchable PDF, embedding the recognized text under the images, making the PDF text searchable while maintaining the original image layout.

Make sure to replace "path/to/your/image.jpg" and "path/to/output.pdf" with the actual file paths you intend to use.