High Peak Memory During Bulk OCR

Running OCR over many PDF segments at once multiplies memory use. Each task renders full-page bitmaps through an OcrInput, and a fresh IronTesseract engine per segment reloads the language model files every time. At full processor concurrency this pushes peak memory into the multi-GB range, with spikes that fail in memory-limited environments.

OCR is memory-heavy by nature. Every OcrInput renders full-page bitmaps, and every IronTesseract engine loads language model files into memory. Creating a new engine per segment reloads those models repeatedly, and running one OCR task per CPU core (Environment.ProcessorCount) lets many bitmap-heavy jobs run side by side. With nothing limiting how many tasks are active, peak memory scales directly with concurrency.

The fix is to bound the number of in-flight jobs: cap concurrency, reuse engines from a pool, and gate work with a semaphore.

Solution

1. Cap OCR concurrency

Clamp the number of simultaneous OCR tasks to a small ceiling. Fewer concurrent tasks mean fewer full-page bitmaps in memory at once, which directly lowers the peak. Tune the ceiling to the machine's capability.

// Clamp concurrency to avoid memory saturation and CPU over-subscription.
int concurrency = Math.Clamp(Environment.ProcessorCount / 2, 1, 4);
// Clamp concurrency to avoid memory saturation and CPU over-subscription.
int concurrency = Math.Clamp(Environment.ProcessorCount / 2, 1, 4);
Imports System

' Clamp concurrency to avoid memory saturation and CPU over-subscription.
Dim concurrency As Integer = Math.Clamp(Environment.ProcessorCount \ 2, 1, 4)
$vbLabelText   $csharpLabel

2. Pool the engines

Create exactly one IronTesseract engine per concurrent slot at startup and reuse them across every segment, rather than constructing a new engine and reloading the language model each time.

// Pre-create one engine per concurrent slot and reuse them across segments.
var enginePool = new ConcurrentBag<IronTesseract>(
    Enumerable.Range(0, concurrency).Select(_ => new IronTesseract())
);
// Pre-create one engine per concurrent slot and reuse them across segments.
var enginePool = new ConcurrentBag<IronTesseract>(
    Enumerable.Range(0, concurrency).Select(_ => new IronTesseract())
);
' Pre-create one engine per concurrent slot and reuse them across segments.
Dim enginePool As New ConcurrentBag(Of IronTesseract)(
    Enumerable.Range(0, concurrency).Select(Function(_) New IronTesseract())
)
$vbLabelText   $csharpLabel

Building the pool once amortizes the language-model load cost across the whole run instead of paying it per segment.

3. Gate work with a semaphore

Initialize a SemaphoreSlim to the concurrency limit and wrap it in using. Each task calls WaitAsync() before it starts and Release() in a finally, so only the allowed number of segments are ever in flight at once.

using var semaphore = new SemaphoreSlim(concurrency);
await semaphore.WaitAsync();
try
{
    // Rent a pre-loaded engine from the pool.
    if (!enginePool.TryTake(out var ocr))
        ocr = new IronTesseract(); // Defensive fallback; should never be reached.
    try
    {
        using var input = new OcrInput();
        input.LoadPdf(segmentStream); // page-range segment produced upstream
        var ocrResult = await ocr.ReadAsync(input);
        ocrResult.SaveAsSearchablePdf(outputPath);
    }
    finally
    {
        enginePool.Add(ocr); // Return engine to pool for the next waiting segment.
    }
}
finally
{
    semaphore.Release();
}
using var semaphore = new SemaphoreSlim(concurrency);
await semaphore.WaitAsync();
try
{
    // Rent a pre-loaded engine from the pool.
    if (!enginePool.TryTake(out var ocr))
        ocr = new IronTesseract(); // Defensive fallback; should never be reached.
    try
    {
        using var input = new OcrInput();
        input.LoadPdf(segmentStream); // page-range segment produced upstream
        var ocrResult = await ocr.ReadAsync(input);
        ocrResult.SaveAsSearchablePdf(outputPath);
    }
    finally
    {
        enginePool.Add(ocr); // Return engine to pool for the next waiting segment.
    }
}
finally
{
    semaphore.Release();
}
Imports System.Threading
Imports IronOcr

Dim semaphore As New SemaphoreSlim(concurrency)
Await semaphore.WaitAsync()
Try
    ' Rent a pre-loaded engine from the pool.
    Dim ocr As IronTesseract = Nothing
    If Not enginePool.TryTake(ocr) Then
        ocr = New IronTesseract() ' Defensive fallback; should never be reached.
    End If
    Try
        Using input As New OcrInput()
            input.LoadPdf(segmentStream) ' page-range segment produced upstream
            Dim ocrResult = Await ocr.ReadAsync(input)
            ocrResult.SaveAsSearchablePdf(outputPath)
        End Using
    Finally
        enginePool.Add(ocr) ' Return engine to pool for the next waiting segment.
    End Try
Finally
    semaphore.Release()
End Try
$vbLabelText   $csharpLabel

The WaitAsync() call blocks until a slot frees up, and returning the engine in the inner finally hands a pre-loaded engine straight to the next waiting segment.

4. Dispose OcrInput per segment

Wrap each OcrInput in using so its rendered page bitmaps are released the moment the segment is read, before the next task claims the slot.

TipsThe using on OcrInput is what keeps bitmap memory from accumulating across segments; without it, freed slots still hold their page bitmaps.

Curtis Chau
Technical Writer

Curtis Chau holds a Bachelor’s degree in Computer Science (Carleton University) and specializes in front-end development with expertise in Node.js, TypeScript, JavaScript, and React. Passionate about crafting intuitive and aesthetically pleasing user interfaces, Curtis enjoys working with modern frameworks and creating well-structured, visually appealing manuals.

...

Read More
Ready to Get Started?
Nuget Downloads 6,106,091 | Version: 2026.7 just released
Still Scrolling Icon

Still Scrolling?

Want proof fast? PM > Install-Package IronOcr
run a sample watch your image become searchable text.