How to Read Scanned Documents Using IronOCR
Many PDFs contain non-searchable, image-based text. IronOCR can convert this into searchable content, making it easier to locate specific information and enhancing document accessibility, especially for individuals with visual impairments.
Instead of manually copying or recreating text and images, automated extraction ensures accuracy and efficiency. This is particularly useful for research, legal documents, and content creation, where reusing specific portions of PDFs is common.
Businesses can extract critical data from PDFs for analysis or system integration, streamlining workflows. Designers and marketers can also extract images for enhancement and reuse in various projects.
In this tutorial, we'll explore the OcrPdfInput methods, covering the available options and parameters to showcase how IronOCR simplifies PDF text and image extraction for various applications.
How to Read Scanned Documents Using IronOCR
- Download the C# library for reading scanned documents
- Import the scanned document for processing
- Use the
LoadImage
method for images orLoadPdf
for scanned PDFs - Extract text using the
ReadDocument
method - Save or export the extracted text as needed for further use
Start using IronOCR in your project today with a free trial.
To use this function, you must also install the IronOcr.Extension.AdvancedScan package.
Read Scanned Documents Example
To extract text from all images within a document, use the ReadDocument
method. This method processes the document and returns an object containing the extracted text, which can be accessed through the Text property. The example below demonstrates how to use this method with a sample TIFF file.
Please note
- The method currently only works for English, Chinese, Japanese, Korean, and LatinAlphabet.
- Using advanced scan on .NET Framework requires the project to run on x64 architecture.
Input
Code
:path=/static-assets/ocr/content-code-examples/how-to/read-scanned-document-read-scanned-document.cs
using IronOcr;
using System;
// Instantiate OCR engine
var ocr = new IronTesseract();
// Configure OCR engine
using var input = new OcrInput();
input.LoadImage("potter.tiff");
// Perform OCR
OcrResult result = ocr.ReadDocument(input);
Console.WriteLine(result.Text);
Imports IronOcr
Imports System
' Instantiate OCR engine
Private ocr = New IronTesseract()
' Configure OCR engine
Private input = New OcrInput()
input.LoadImage("potter.tiff")
' Perform OCR
Dim result As OcrResult = ocr.ReadDocument(input)
Console.WriteLine(result.Text)
Output
If you need to perform OCR on a PDF file instead, simply replace the LoadImage
method with LoadPdf
. This allows IronOCR to process and extract text from scanned PDFs in the same way.