How to Read Scanned Documents Using IronOCR

ByCurtis Chau

February 16, 2025

Updated June 22, 2025

Many PDFs contain non-searchable, image-based text. IronOCR can convert this into searchable content, making it easier to locate specific information and enhancing document accessibility, especially for individuals with visual impairments.

Instead of manually copying or recreating text and images, automated extraction ensures accuracy and efficiency. This is particularly useful for research, legal documents, and content creation, where reusing specific portions of PDFs is common.

Businesses can extract critical data from PDFs for analysis or system integration, streamlining workflows. Designers and marketers can also extract images for enhancement and reuse in various projects.

In this tutorial, we'll explore the OcrPdfInput methods, covering the available options and parameters to showcase how IronOCR simplifies PDF text and image extraction for various applications.

How to Read Scanned Documents Using IronOCR

Download the C# library for reading scanned documents
Import the scanned document for processing
Use the LoadImage method for images or LoadPdf for scanned PDFs
Extract text using the ReadDocument method
Save or export the extracted text as needed for further use

Start using IronOCR in your project today with a free trial.

First Step:

To use this function, you must also install the IronOcr.Extensions.AdvancedScan package.

Read Scanned Documents Example

To extract text from all images within a document, use the ReadDocument method. This method processes the document and returns an object containing the extracted text, which can be accessed through the Text property. The example below demonstrates how to use this method with a sample TIFF file.

Please note

The method currently only works for English, Chinese, Japanese, Korean, and LatinAlphabet.
Using advanced scan on .NET Framework requires the project to run on x64 architecture.

Input

input

Code

:path=/static-assets/ocr/content-code-examples/how-to/read-scanned-document-read-scanned-document.cs

using IronOcr;

// Instantiate OCR engine
var ocr = new IronTesseract();

// Configure OCR engine
using var input = new OcrInput();
input.LoadImage("potter.tiff");

// Perform OCR
OcrResult result = ocr.ReadDocument(input);

Console.WriteLine(result.Text);

Imports IronOcr

' Instantiate OCR engine
Private ocr = New IronTesseract()

' Configure OCR engine
Private input = New OcrInput()
input.LoadImage("potter.tiff")

' Perform OCR
Dim result As OcrResult = ocr.ReadDocument(input)

Console.WriteLine(result.Text)

$vbLabelText $csharpLabel

Output

output

If you need to perform OCR on a PDF file instead, simply replace the LoadImage method with LoadPdf. This allows IronOCR to process and extract text from scanned PDFs in the same way.

Frequently Asked Questions

What is this library for reading scanned documents?

IronOCR is a C# library that enables text extraction from scanned documents, converting image-based text into searchable and accessible content.

How does this document processing tool improve accessibility?

IronOCR converts image-based text in PDFs into searchable content, making it easier to locate specific information and enhancing accessibility for individuals with visual impairments.

What are the primary uses of this text extraction tool?

IronOCR is useful for automating text extraction in research, legal work, content creation, data analysis, and system integration, enhancing efficiency and accuracy.

What file formats does this OCR tool support?

IronOCR supports image files and scanned PDFs. It can extract text from both formats using the `LoadImage` and `LoadPdf` methods.

Which languages are supported by this OCR library for text extraction?

IronOCR currently supports English, Chinese, Japanese, Korean, and LatinAlphabet for text extraction.

How can I extract text from a scanned document using this tool?

To extract text, import the scanned document, use the `LoadImage` or `LoadPdf` methods, and then apply the `ReadDocument` method to extract the text.

What is required to use this OCR library in a .NET project?

To use IronOCR, you need to download the C# library from NuGet and install the IronOcr.Extensions.AdvancedScan package.

Can this library process PDF files?

Yes, IronOCR can process scanned PDF files by using the `LoadPdf` method to extract text similarly to how it processes images.

What are the steps to read scanned documents using this library?

The steps include downloading the library, importing the document, using the appropriate method to load the document, extracting text with `ReadDocument`, and saving or exporting the text as needed.

Is there any architecture requirement for using advanced scan features in this tool?

Yes, when using advanced scan features on the .NET Framework, the project must run on x64 architecture.

Curtis Chau

Chat with engineering team now

Technical Writer

Curtis Chau holds a Bachelor’s degree in Computer Science (Carleton University) and specializes in front-end development with expertise in Node.js, TypeScript, JavaScript, and React. Passionate about crafting intuitive and aesthetically pleasing user interfaces, Curtis enjoys working with modern frameworks and creating well-structured, visually appealing manuals.

Beyond development, Curtis has a strong interest in the Internet of Things (IoT), exploring innovative ways to integrate hardware and software. In his free time, he enjoys gaming and building Discord bots, combining his love for technology with creativity.