How to Read PDFs with IronOCR

In this hands-on tutorial, you’ll learn how to extract text from PDF files in C# using IronOCR, a powerful .NET OCR library. The walkthrough begins with setting up IronOCR and initializing the OCR engine using your license key. You’ll see how to extract text from an entire PDF document, then refine the process to read only specific pages using indexed page ranges.

For more precision, the tutorial demonstrates region-based text extraction using Rectangle objects—perfect for extracting content from forms, tables, or designated areas on each page. IronOCR provides flexibility and precision in parsing scanned or image-based PDFs, making it an essential tool for automating document processing, data extraction, and PDF analysis in C#. With clear code examples and console output, this video helps developers get started quickly with practical OCR implementations. Try it for yourself by downloading the IronOCR trial and integrating PDF OCR into your own C# applications.

Here's a sample C# code snippet to demonstrate how this can be achieved:

using IronOcr;

class Program
{
    static void Main()
    {
        // Initialize OcrEngine with your IronOCR license key
        var ocr = new IronTesseract();

        // Load and extract text from a PDF file
        using (var input = new OcrInput(@"path\to\sample.pdf"))
        {
            // Perform OCR on the entire PDF
            OcrResult result = ocr.Read(input);

            // Display and process the extracted text
            Console.WriteLine(result.Text);
        }
    }
}
using IronOcr;

class Program
{
    static void Main()
    {
        // Initialize OcrEngine with your IronOCR license key
        var ocr = new IronTesseract();

        // Load and extract text from a PDF file
        using (var input = new OcrInput(@"path\to\sample.pdf"))
        {
            // Perform OCR on the entire PDF
            OcrResult result = ocr.Read(input);

            // Display and process the extracted text
            Console.WriteLine(result.Text);
        }
    }
}
Imports IronOcr

Friend Class Program
	Shared Sub Main()
		' Initialize OcrEngine with your IronOCR license key
		Dim ocr = New IronTesseract()

		' Load and extract text from a PDF file
		Using input = New OcrInput("path\to\sample.pdf")
			' Perform OCR on the entire PDF
			Dim result As OcrResult = ocr.Read(input)

			' Display and process the extracted text
			Console.WriteLine(result.Text)
		End Using
	End Sub
End Class
$vbLabelText   $csharpLabel

Process Specific Pages:

using IronOcr;
using System;

class Program
{
    static void Main(string[] args)
    {
        var ocr = new IronTesseract();

        // Define a specific page range for OCR
        using (var input = new OcrInput())
        {
            // Add only specific pages to the OcrInput
            input.AddPdfPages(@"path\to\sample.pdf", new int[] { 1, 2 }); // Read only pages 1 and 2

            OcrResult result = ocr.Read(input);

            // Output the text of the specified pages
            Console.WriteLine(result.Text);
        }
    }
}
using IronOcr;
using System;

class Program
{
    static void Main(string[] args)
    {
        var ocr = new IronTesseract();

        // Define a specific page range for OCR
        using (var input = new OcrInput())
        {
            // Add only specific pages to the OcrInput
            input.AddPdfPages(@"path\to\sample.pdf", new int[] { 1, 2 }); // Read only pages 1 and 2

            OcrResult result = ocr.Read(input);

            // Output the text of the specified pages
            Console.WriteLine(result.Text);
        }
    }
}
Imports IronOcr
Imports System

Friend Class Program
	Shared Sub Main(ByVal args() As String)
		Dim ocr = New IronTesseract()

		' Define a specific page range for OCR
		Using input = New OcrInput()
			' Add only specific pages to the OcrInput
			input.AddPdfPages("path\to\sample.pdf", New Integer() { 1, 2 }) ' Read only pages 1 and 2

			Dim result As OcrResult = ocr.Read(input)

			' Output the text of the specified pages
			Console.WriteLine(result.Text)
		End Using
	End Sub
End Class
$vbLabelText   $csharpLabel

Region-based Text Extraction:

using IronOcr;
using System;
using System.Drawing;

class Program
{
    static void Main(string[] args)
    {
        var ocr = new IronTesseract();

        using (var input = new OcrInput(@"path\to\sample.pdf"))
        {
            // Define a region of interest with a Rectangle
            var region = new Rectangle(100, 50, 200, 100); 
            // Only pass the defined region for OCR
            input.SelectRegions(region);

            OcrResult result = ocr.Read(input);

            // Display and process the extracted text from the specific region
            Console.WriteLine(result.Text);
        }
    }
}
using IronOcr;
using System;
using System.Drawing;

class Program
{
    static void Main(string[] args)
    {
        var ocr = new IronTesseract();

        using (var input = new OcrInput(@"path\to\sample.pdf"))
        {
            // Define a region of interest with a Rectangle
            var region = new Rectangle(100, 50, 200, 100); 
            // Only pass the defined region for OCR
            input.SelectRegions(region);

            OcrResult result = ocr.Read(input);

            // Display and process the extracted text from the specific region
            Console.WriteLine(result.Text);
        }
    }
}
Imports IronOcr
Imports System
Imports System.Drawing

Friend Class Program
	Shared Sub Main(ByVal args() As String)
		Dim ocr = New IronTesseract()

		Using input = New OcrInput("path\to\sample.pdf")
			' Define a region of interest with a Rectangle
			Dim region = New Rectangle(100, 50, 200, 100)
			' Only pass the defined region for OCR
			input.SelectRegions(region)

			Dim result As OcrResult = ocr.Read(input)

			' Display and process the extracted text from the specific region
			Console.WriteLine(result.Text)
		End Using
	End Sub
End Class
$vbLabelText   $csharpLabel

Further Reading:

How to Read PDFs

Kannaopat Udonpant
Software Engineer
Before becoming a Software Engineer, Kannapat completed a Environmental Resources PhD from Hokkaido University in Japan. While pursuing his degree, Kannapat also became a member of the Vehicle Robotics Laboratory, which is part of the Department of Bioproduction Engineering. In 2022, he leveraged his C# skills to join Iron Software's engineering team, where he focuses on IronPDF. Kannapat values his job because he learns directly from the developer who writes most of the code used in IronPDF. In addition to peer learning, Kannapat enjoys the social aspect of working at Iron Software. When he's not writing code or documentation, Kannapat can usually be found gaming on his PS5 or rewatching The Last of Us.
< PREVIOUS
How to Use System Drawing Images for OCR Processing in C#
NEXT >
How to use Computer Vision to Find Text in C#