Handling Large PDF Files in IronOCR
Running OCR on large multi-page PDFs can spike memory and crash the process. The usual culprit is loading every page into memory at once.
System.OutOfMemoryException
OcrInput.LoadPdf("large.pdf") reads all pages in one go. IronOCR's imaging system then renders every page simultaneously, and because the DPI defaults to 200, the memory footprint climbs fast. On a big document that means an OutOfMemoryException or resource deadlock.
Solution
The fix is to process the PDF one page at a time and keep the render DPI as low as the text quality allows.
1. Get the Page Count
Open the document with IronPdf.PdfDocument (or any PDF library) to read how many pages it has.
using var pdf = IronPdf.PdfDocument.FromFile(pdfPath);
var pageCount = pdf.PageCount;
using var pdf = IronPdf.PdfDocument.FromFile(pdfPath);
var pageCount = pdf.PageCount;
Imports IronPdf
Using pdf = IronPdf.PdfDocument.FromFile(pdfPath)
Dim pageCount = pdf.PageCount
End Using
2. Load One Page at a Time
Loop through the pages and load each individually with OcrInput.LoadPdfPage("file.pdf", pageIndex, dpi). Where the visual quality holds up, drop the DPI as low as 80 to conserve memory.
3. Read and Concatenate
Pass each single-page input to IronTesseract.Read() and append the result to a StringBuilder.
var ocr = new IronTesseract();
var pdfPath = "large.pdf";
using var pdf = IronPdf.PdfDocument.FromFile(pdfPath);
var pageCount = pdf.PageCount;
var textBuilder = new StringBuilder();
for (int i = 0; i < pageCount; i++)
{
using var input = new OcrInput();
input.LoadPdfPage(pdfPath, i, 80);
var result = ocr.Read(input);
textBuilder.Append(result.Text);
textBuilder.Append(' '); // Add space between pages
}
Console.WriteLine(textBuilder.ToString().Trim());
var ocr = new IronTesseract();
var pdfPath = "large.pdf";
using var pdf = IronPdf.PdfDocument.FromFile(pdfPath);
var pageCount = pdf.PageCount;
var textBuilder = new StringBuilder();
for (int i = 0; i < pageCount; i++)
{
using var input = new OcrInput();
input.LoadPdfPage(pdfPath, i, 80);
var result = ocr.Read(input);
textBuilder.Append(result.Text);
textBuilder.Append(' '); // Add space between pages
}
Console.WriteLine(textBuilder.ToString().Trim());
Imports IronOcr
Imports IronPdf
Imports System.Text
Dim ocr As New IronTesseract()
Dim pdfPath As String = "large.pdf"
Using pdf = PdfDocument.FromFile(pdfPath)
Dim pageCount As Integer = pdf.PageCount
Dim textBuilder As New StringBuilder()
For i As Integer = 0 To pageCount - 1
Using input As New OcrInput()
input.LoadPdfPage(pdfPath, i, 80)
Dim result = ocr.Read(input)
textBuilder.Append(result.Text)
textBuilder.Append(" ") ' Add space between pages
End Using
Next
Console.WriteLine(textBuilder.ToString().Trim())
End Using
The using on each OcrInput releases the page's image data before the next iteration, so memory stays flat across the loop instead of growing with page count.
Option: Compress the PDF First
For very complex or image-heavy files, the page-by-page loop may still struggle. Compress the PDF with IronPDF's Compress API before OCR to cut down the image data IronOCR has to handle. This pays off most on scanned or image-heavy documents.
Compress straight to a stream, then load that stream into OcrInput:
var pdf = PdfDocument.FromFile(pdfPath);
var stream = pdf.CompressPdfToStream(CompressStructTree: true);
var ocrTesseract = new IronTesseract();
using var ocrInput = new OcrInput();
ocrInput.LoadPdfPage(stream, 1);
var pdf = PdfDocument.FromFile(pdfPath);
var stream = pdf.CompressPdfToStream(CompressStructTree: true);
var ocrTesseract = new IronTesseract();
using var ocrInput = new OcrInput();
ocrInput.LoadPdfPage(stream, 1);
Imports IronTesseract
Dim pdf = PdfDocument.FromFile(pdfPath)
Dim stream = pdf.CompressPdfToStream(CompressStructTree:=True)
Dim ocrTesseract = New IronTesseract()
Using ocrInput As New OcrInput()
ocrInput.LoadPdfPage(stream, 1)
End Using
When a PDF carries embedded images, lower JpegQuality during compression to shrink the data further:
var pdf = PdfDocument.FromFile(@"D:\hugePdf.pdf");
var stream = pdf.CompressPdfToStream(JpegQuality: 75, CompressStructTree: true);
var pdf = PdfDocument.FromFile(@"D:\hugePdf.pdf");
var stream = pdf.CompressPdfToStream(JpegQuality: 75, CompressStructTree: true);
Imports System
Dim pdf = PdfDocument.FromFile("D:\hugePdf.pdf")
Dim stream = pdf.CompressPdfToStream(JpegQuality:=75, CompressStructTree:=True)
Debug Tips
- Lower the DPI: values in the
80to100range cut memory use sharply when the text is still legible. - Avoid
LoadPdf()on large files: read the whole document at once only when it is genuinely small. - Dispose early: wrap
OcrInputinusingstatements so memory is freed between pages. - Parallelize with care: run pages concurrently only when the machine has spare memory and CPU; it backfires on large PDFs.

