Test in a live environment
Test in production without watermarks.
Works wherever you need it to.
OCR or Optical Character Recognition is a process of converting textual information into digital form. PDF OCR is a popular application that can be used to improve business processes. One of the benefits of PDF OCR is that it can be used to improve the accessibility of information. This is particularly important for documents that are not available in a format that everyone can use or read. PDF OCR can be used to produce a copy of the document that is available in a format that everyone can use.
Another use of PDF OCR is in the tracking of documents. When a document is filed, scanned, or transcribed, it can be difficult to track which version of the document is associated with which file. With PDF OCR, it is possible to track the changes made to a document and determine which versions are associated with which file. This can be useful for managing document archives and preventing the loss of important information.
In this article, you'll learn how you can use OCR for any PDF file using Adobe Acrobat Pro software. This article will also introduce the .NET OCR library IronOCR which is one of the most efficient and feature-rich libraries available. Let's begin with Adobe Acrobat Pro.
Adobe Acrobat Pro DC is the Pro version of Adobe Acrobat Reader DC. It is the most popular and powerful tool for PDF manipulation. With this software, you can create, edit, sign, and review any PDF document. Moreover, it enables you to convert PDFs to PowerPoint presentations, Word documents, or Excel files. It can also edit scanned documents.
The new version of Acrobat DC is also a document scanner that can quickly turn scanned documents into digital files using OCR technology. It features Optical Character Recognition as well as intelligent business card scanning that automatically detects and saves contact information from cards in seconds.
Along with being able to extract text from PDF files, Acrobat Pro DC has many features that make it a valuable tool for PDF transcription.
Let's see how we can use OCR of a scanned document using Adobe Acrobat Pro.
Select "Edit PDF" from the right pane of the document.
This will convert scanned PDF documents to fully editable PDF documents. You'll be able to edit text and image files on the PDF file itself.
After making any changes, save the file and you'll see these changes reflected in the document.
IronOCR is a .NET OCR library and OCR tool which can read text documents and images by converting them into a machine-readable format.
This Optical Character Recognition library was developed with the following considerations in mind:
IronOCR makes it easier for developers to create software that supports scanning documents, extracting text and metadata, indexing scanned image files, converting images to searchable PDFs, and converting scanned documents into readable text. IronOCR offers a lot of options when it comes to encoding, image format conversion, and text recognition and extraction. IronOCR supports 125 languages.
IronOCR provides an intuitive, robust, and accurate OCR process to recognize text from scanned documents, photographs, and screenshots while reducing time-consuming tasks like page segmentation and layout analysis. The library is developed in C# and its API design is straightforward with good readability.
Let's explore some code examples using IronOCR:
using IronOcr;
var Ocr = new IronTesseract();
using (var Input = new OcrInput())
{
// OCR entire document
Input.AddPdf("example.pdf", "password");
// Alternatively OCR selected page numbers
Input.AddPdfPages("example.pdf", new [] { 1, 2, 3 }, "password");
var Result = Ocr.Read(Input);
Console.WriteLine(Result.Text);
}
using IronOcr;
var Ocr = new IronTesseract();
using (var Input = new OcrInput())
{
// OCR entire document
Input.AddPdf("example.pdf", "password");
// Alternatively OCR selected page numbers
Input.AddPdfPages("example.pdf", new [] { 1, 2, 3 }, "password");
var Result = Ocr.Read(Input);
Console.WriteLine(Result.Text);
}
Imports IronOcr
Private Ocr = New IronTesseract()
Using Input = New OcrInput()
' OCR entire document
Input.AddPdf("example.pdf", "password")
' Alternatively OCR selected page numbers
Input.AddPdfPages("example.pdf", { 1, 2, 3 }, "password")
Dim Result = Ocr.Read(Input)
Console.WriteLine(Result.Text)
End Using
IronOCR provides you with the option of doing OCR of a whole PDF document or some selected page range of a PDF file.
You can convert a PDF into a selectable PDF using IronOCR; it's very simple and straightforward. See the code snippet of the PDF conversion below:
using IronOcr;
var Ocr = new IronTesseract();
using (var Input = new OcrInput())
{
Input.AddPdf("scan.pdf","password");
// clean up twisted pages
Input.Deskew();
var Result = Ocr.Read(Input);
Result.SaveAsSearchablePdf("searchable.pdf");
}
using IronOcr;
var Ocr = new IronTesseract();
using (var Input = new OcrInput())
{
Input.AddPdf("scan.pdf","password");
// clean up twisted pages
Input.Deskew();
var Result = Ocr.Read(Input);
Result.SaveAsSearchablePdf("searchable.pdf");
}
Imports IronOcr
Private Ocr = New IronTesseract()
Using Input = New OcrInput()
Input.AddPdf("scan.pdf","password")
' clean up twisted pages
Input.Deskew()
Dim Result = Ocr.Read(Input)
Result.SaveAsSearchablePdf("searchable.pdf")
End Using
IronOCR offers many other tools and features. You can explore IronOCR features by visiting the following link.
The IronOCR library has several advantages over other libraries available on the market. You can modify and extend its functionality by adding your own modules with just a few lines of code. IronOCR can currently read texts in over 125 languages. It has been developed to produce higher quality, more reliable results while consuming much less time and memory resources when compared to other libraries.
IronOCR is free for development. IronOCR also offers a free trial for testing in production. For more details about pricing and a free trial of IronOCR, follow the link.
9 .NET API products for your office documents