OCR TOOLS

OCR From PDF (Free Online Tools)

Name: IronOCR
Brand: Iron Software
Availability: InStock
Rating: 4.86 (101 reviews)

ByKannapat Udonpant

January 15, 2023

Updated June 22, 2025

Optical Character Recognition, or OCR, is a technology used to recognize text in images. This technology has been created to scan printed text or an image file and recognize them on computers. This is because many things today are digital, such as e-mails or books. However, OCR technology has evolved into something more sophisticated with specialized algorithms capable of recognizing text in many different fonts, even if they have been distorted by noise or other common distortions like JPEG compression. OCR can also read handwriting on paper with 98% accuracy.

Text that is scanned using OCR can then be edited, indexed, searched, printed out, and archived. OCR software is widely used in the healthcare, pharma, insurance, and law industries. It helps convert paper documents to digital documents so they can be reused more easily and shared with others.

Let's see how you can do OCR of PDF files using different tools.

Adobe Acrobat Pro

Adobe is the company that initially developed PDF. They offer a fast, efficient OCR engine that can edit any PDF document you throw at it. It’s one of the most powerful OCR engines in the market, and if you have lots of PDFs to edit, Adobe Acrobat DC is what you should purchase. This software has been designed in such a way that it can convert any text-based document into PDF format with great accuracy. It also retains the font of the original document using its Custom Font generator.

Let's see how we can do PDF OCR using Adobe Acrobat:

Open the file in Adobe Acrobat Pro DC.
Click on the "Edit PDF" option in the right pane.
It will convert a PDF file to an editable PDF using its OCR capabilities.
Now, you can edit any text and change image files in the documents easily.
You can save the file by choosing "File > Save As" and giving a proper name to the new PDF document.

You can easily perform OCR of multiple scanned PDF documents at a time.

Sejda

Sejda is OCR-enabled PDF editing software that can be hosted on the cloud or downloaded as a desktop application to macOS, Windows, or Linux. Sejda allows users to compress, edit, digitally sign, merge, and fill out PDF files. Files in various formats, including JPEG and Excel, for example, can be turned into PDF files. PDFs can similarly be turned into other formats such as Word and PowerPoint documents. Let's see how you can do OCR of PDF documents using Sejda OCR.

Open Sejda OCR website.
Click on the "Upload PDF file" button to upload files, or drag and drop files from your computer.
After uploading, you'll see the uploaded file name. Select the language of the document.
After selecting the language, you have to choose the output format. You can choose "PDF" or "Text". After setting the output format, click on the "Recognize text on all pages" button. It'll start extracting text.
When the process is completed, you can download the extracted text.

SodaPDF

SodaPDF OCR is free online OCR software that can extract text from images. It is a PDF OCR conversion tool that converts scanned documents, faxes, and other printouts into editable text, PDFs, and searchable PDFs. The most common use case of SodaPDF OCR is for converting scanned documents or faxes into editable files. It is free online OCR software. All uploaded documents are automatically deleted from the server after a specific time. It has multiple features like converting PDF to Word, which can then be opened using Microsoft Word.

Let's see how we can perform OCR on a PDF using SodaPDF:

Open the SodaPDF website.
Click the "Choose File" button and select the desired PDF documents to upload.
After uploading, it'll give you a user interface for editing the PDF text and images. You can download the file using the Download button.

IronOCR: .NET OCR Library

IronOCR is a robust library for OCR in the .NET Framework. It provides a powerful API to work with text and images, offering features like real-time recognition, field detection, and optical character recognition for scanned PDF files. IronPDF can also edit scanned documents.

IronOCR gives developers the power of text recognition in their applications. It can be used for various purposes, like converting scanned documents into digital formats or recognizing captions on images. The IronOCR .NET Library provides an easy-to-use, low-level interface to the IronOCR SDK. On top of that, it includes an image processing pipeline that automatically handles low-DPI images and extracts text from PDF documents.

Let's see how we can do OCR of a PDF file using the OCR tool:

OCR of a Complete PDF File

The following code can perform OCR on an entire PDF document.

using IronOcr;

var Ocr = new IronTesseract();

using (var Input = new OcrInput())
{
    // Add the entire PDF document for OCR processing
    Input.AddPdf("example.pdf", "password");

    var Result = Ocr.Read(Input);
    // Print the extracted text to the console
    Console.WriteLine(Result.Text);
}

using IronOcr;

var Ocr = new IronTesseract();

using (var Input = new OcrInput())
{
    // Add the entire PDF document for OCR processing
    Input.AddPdf("example.pdf", "password");

    var Result = Ocr.Read(Input);
    // Print the extracted text to the console
    Console.WriteLine(Result.Text);
}

Imports IronOcr

Private Ocr = New IronTesseract()

Using Input = New OcrInput()
	' Add the entire PDF document for OCR processing
	Input.AddPdf("example.pdf", "password")

	Dim Result = Ocr.Read(Input)
	' Print the extracted text to the console
	Console.WriteLine(Result.Text)
End Using

$vbLabelText $csharpLabel

OCR of Selected Pages of a PDF

You can do OCR on selected PDF pages by using the AddPdfPages function.

using IronOcr;

var Ocr = new IronTesseract();

using (var Input = new OcrInput())
{
    // Add specific pages of the PDF document for OCR processing
    Input.AddPdfPages("example.pdf", new [] { 1, 2, 3 }, "password");

    var Result = Ocr.Read(Input);
    // Print the extracted text to the console
    Console.WriteLine(Result.Text);
}

using IronOcr;

var Ocr = new IronTesseract();

using (var Input = new OcrInput())
{
    // Add specific pages of the PDF document for OCR processing
    Input.AddPdfPages("example.pdf", new [] { 1, 2, 3 }, "password");

    var Result = Ocr.Read(Input);
    // Print the extracted text to the console
    Console.WriteLine(Result.Text);
}

Imports IronOcr

Private Ocr = New IronTesseract()

Using Input = New OcrInput()
	' Add specific pages of the PDF document for OCR processing
	Input.AddPdfPages("example.pdf", { 1, 2, 3 }, "password")

	Dim Result = Ocr.Read(Input)
	' Print the extracted text to the console
	Console.WriteLine(Result.Text)
End Using

$vbLabelText $csharpLabel

Convert PDF to Searchable PDF

You can convert a PDF file to a searchable PDF file using IronOCR by using the SaveAsSearchablePdf function.

using IronOcr;

var Ocr = new IronTesseract();

using (var Input = new OcrInput())
{
    // Add the PDF for processing and specify the password if any
    Input.AddPdf("scan.pdf", "password");

    // Correct twisted or skewed pages
    Input.Deskew();

    var Result = Ocr.Read(Input);
    // Save the processed result as a searchable PDF
    Result.SaveAsSearchablePdf("searchable.pdf");
}

using IronOcr;

var Ocr = new IronTesseract();

using (var Input = new OcrInput())
{
    // Add the PDF for processing and specify the password if any
    Input.AddPdf("scan.pdf", "password");

    // Correct twisted or skewed pages
    Input.Deskew();

    var Result = Ocr.Read(Input);
    // Save the processed result as a searchable PDF
    Result.SaveAsSearchablePdf("searchable.pdf");
}

Imports IronOcr

Private Ocr = New IronTesseract()

Using Input = New OcrInput()
	' Add the PDF for processing and specify the password if any
	Input.AddPdf("scan.pdf", "password")

	' Correct twisted or skewed pages
	Input.Deskew()

	Dim Result = Ocr.Read(Input)
	' Save the processed result as a searchable PDF
	Result.SaveAsSearchablePdf("searchable.pdf")
End Using

$vbLabelText $csharpLabel

Conclusion

We have explored a few great software tools to perform optical character recognition. These tools allow you to programmatically recognize text and create searchable and editable PDFs.

If writing in the .NET Framework, IronOCR is our recommendation. IronOCR allows you to easily perform OCR in the .NET Framework; it is powerful and so can easily be used even when the original document has been damaged or distorted, such as through water damage.

Another use case is converting old paper forms filled out by hand, such as invoices and sales receipts, into digital versions. This allows these documents to be processed automatically by accounting software, thereby increasing accuracy and efficiency.

Kannapat Udonpant

Chat with engineering team now

Software Engineer

Before becoming a Software Engineer, Kannapat completed a Environmental Resources PhD from Hokkaido University in Japan. While pursuing his degree, Kannapat also became a member of the Vehicle Robotics Laboratory, which is part of the Department of Bioproduction Engineering. In 2022, he leveraged his C# skills to join Iron Software's engineering team, where he focuses on IronPDF. Kannapat values his job because he learns directly from the developer who writes most of the code used in IronPDF. In addition to peer learning, Kannapat enjoys the social aspect of working at Iron Software. When he's not writing code or documentation, Kannapat can usually be found gaming on his PS5 or rewatching The Last of Us.

Install Tesseract (Step-By-Step Tutorial With Images)

How to OCR a PDF Tutorial (Free Online Tools)