Handling Large PDF Files in IronOCR

Updated:June 23, 2026

Running OCR on large multi-page PDFs can spike memory and crash the process. The usual culprit is loading every page into memory at once.

System.OutOfMemoryException

OcrInput.LoadPdf("large.pdf") reads all pages in one go. IronOCR's imaging system then renders every page simultaneously, and because the DPI defaults to 200, the memory footprint climbs fast. On a big document that means an OutOfMemoryException or resource deadlock.

Solution

The fix is to process the PDF one page at a time and keep the render DPI as low as the text quality allows.

1. Get the Page Count

Open the document with IronPdf.PdfDocument (or any PDF library) to read how many pages it has.

using var pdf = IronPdf.PdfDocument.FromFile(pdfPath);
var pageCount = pdf.PageCount;

using var pdf = IronPdf.PdfDocument.FromFile(pdfPath);
var pageCount = pdf.PageCount;

Imports IronPdf

Using pdf = IronPdf.PdfDocument.FromFile(pdfPath)
    Dim pageCount = pdf.PageCount
End Using

$vbLabelText $csharpLabel

2. Load One Page at a Time

Loop through the pages and load each individually with OcrInput.LoadPdfPage("file.pdf", pageIndex, dpi). Where the visual quality holds up, drop the DPI as low as 80 to conserve memory.

3. Read and Concatenate

Pass each single-page input to IronTesseract.Read() and append the result to a StringBuilder.

var ocr = new IronTesseract();
var pdfPath = "large.pdf";
using var pdf = IronPdf.PdfDocument.FromFile(pdfPath);
var pageCount = pdf.PageCount;
var textBuilder = new StringBuilder();
for (int i = 0; i < pageCount; i++)
{
    using var input = new OcrInput();
    input.LoadPdfPage(pdfPath, i, 80);
    var result = ocr.Read(input);
    textBuilder.Append(result.Text);
    textBuilder.Append(' '); // Add space between pages
}
Console.WriteLine(textBuilder.ToString().Trim());

var ocr = new IronTesseract();
var pdfPath = "large.pdf";
using var pdf = IronPdf.PdfDocument.FromFile(pdfPath);
var pageCount = pdf.PageCount;
var textBuilder = new StringBuilder();
for (int i = 0; i < pageCount; i++)
{
    using var input = new OcrInput();
    input.LoadPdfPage(pdfPath, i, 80);
    var result = ocr.Read(input);
    textBuilder.Append(result.Text);
    textBuilder.Append(' '); // Add space between pages
}
Console.WriteLine(textBuilder.ToString().Trim());

Imports IronOcr
Imports IronPdf
Imports System.Text

Dim ocr As New IronTesseract()
Dim pdfPath As String = "large.pdf"
Using pdf = PdfDocument.FromFile(pdfPath)
    Dim pageCount As Integer = pdf.PageCount
    Dim textBuilder As New StringBuilder()
    For i As Integer = 0 To pageCount - 1
        Using input As New OcrInput()
            input.LoadPdfPage(pdfPath, i, 80)
            Dim result = ocr.Read(input)
            textBuilder.Append(result.Text)
            textBuilder.Append(" ") ' Add space between pages
        End Using
    Next
    Console.WriteLine(textBuilder.ToString().Trim())
End Using

$vbLabelText $csharpLabel

The using on each OcrInput releases the page's image data before the next iteration, so memory stays flat across the loop instead of growing with page count.

Option: Compress the PDF First

For very complex or image-heavy files, the page-by-page loop may still struggle. Compress the PDF with IronPDF's Compress API before OCR to cut down the image data IronOCR has to handle. This pays off most on scanned or image-heavy documents.

Compress straight to a stream, then load that stream into OcrInput:

var pdf = PdfDocument.FromFile(pdfPath);
var stream = pdf.CompressPdfToStream(CompressStructTree: true);
var ocrTesseract = new IronTesseract();
using var ocrInput = new OcrInput();
ocrInput.LoadPdfPage(stream, 1);

var pdf = PdfDocument.FromFile(pdfPath);
var stream = pdf.CompressPdfToStream(CompressStructTree: true);
var ocrTesseract = new IronTesseract();
using var ocrInput = new OcrInput();
ocrInput.LoadPdfPage(stream, 1);

Imports IronTesseract

Dim pdf = PdfDocument.FromFile(pdfPath)
Dim stream = pdf.CompressPdfToStream(CompressStructTree:=True)
Dim ocrTesseract = New IronTesseract()
Using ocrInput As New OcrInput()
    ocrInput.LoadPdfPage(stream, 1)
End Using

$vbLabelText $csharpLabel

When a PDF carries embedded images, lower JpegQuality during compression to shrink the data further:

var pdf = PdfDocument.FromFile(@"D:\hugePdf.pdf");
var stream = pdf.CompressPdfToStream(JpegQuality: 75, CompressStructTree: true);

var pdf = PdfDocument.FromFile(@"D:\hugePdf.pdf");
var stream = pdf.CompressPdfToStream(JpegQuality: 75, CompressStructTree: true);

Imports System

Dim pdf = PdfDocument.FromFile("D:\hugePdf.pdf")
Dim stream = pdf.CompressPdfToStream(JpegQuality:=75, CompressStructTree:=True)

$vbLabelText $csharpLabel

WarningReusing the same MemoryStream across loop iterations? Reset its position to 0 before each read. A stream is consumed once read, so the next read fails if the position isn't reset.

Debug Tips

Lower the DPI: values in the 80 to 100 range cut memory use sharply when the text is still legible.
Avoid LoadPdf() on large files: read the whole document at once only when it is genuinely small.
Dispose early: wrap OcrInput in using statements so memory is freed between pages.
Parallelize with care: run pages concurrently only when the machine has spare memory and CPU; it backfires on large PDFs.

Curtis Chau

Chat with engineering team now

Technical Writer

Curtis Chau holds a Bachelor’s degree in Computer Science (Carleton University) and specializes in front-end development with expertise in Node.js, TypeScript, JavaScript, and React. Passionate about crafting intuitive and aesthetically pleasing user interfaces, Curtis enjoys working with modern frameworks and creating well-structured, visually appealing manuals.

...

Ready to Get Started?

Nuget Downloads 6,147,812 | Version: 2026.7 just released

View Licenses

Still Scrolling?

Want proof fast? PM > Install-Package IronOcr
run a sample watch your image become searchable text.

View Licenses

Customer Highlight:

Developer Spotlight:

Webinars:

Start Free 30 Day Trial

On This Page

Handling Large PDF Files in IronOCR

Solution

1. Get the Page Count

2. Load One Page at a Time

3. Read and Concatenate

Option: Compress the PDF First

Debug Tips

Still Scrolling?

Your license key has been delivered to your inbox

Your demo request is in.

Iron Support Team

Start Free 30 Day Trial

On This Page

Handling Large PDF Files in IronOCR

Solution

1. Get the Page Count

2. Load One Page at a Time

3. Read and Concatenate

Option: Compress the PDF First

Debug Tips

Still Scrolling?

Next step: Start free 30-day Trial

Thank You

Next step: Start free 30-day Trial

Want to deploy IronSuite to a live project for FREE?

What’s included?

Your license key has been delivered to your inbox

Your demo request is in.

Trusted by Millions of Engineers Worldwide

Iron Support Team