How to Debug OCR in C#

Debugging OCR pipelines means catching failures before they reach production, logging enough detail to reproduce issues, and validating that extracted text meets accuracy thresholds. IronOCR provides built-in tools for each of these: file-based diagnostic logging, a typed exception hierarchy, confidence scoring at every granularity level, and a real-time progress event for long-running jobs.

Quickstart: Enable OCR Diagnostic Logging

Configure Installation.LogFilePath and Installation.LoggingMode before any OCR call to capture detailed engine output covering Tesseract initialization, language pack loading, and image preprocessing steps.

Nuget IconGet started making PDFs with NuGet now:

  1. Install IronOCR with NuGet Package Manager

    PM > Install-Package IronOcr

  2. Copy and run this code snippet.

    using IronOcr;
    
    // Enable comprehensive logging before any OCR call
    Installation.LogFilePath = "ironocr_debug.log";
    Installation.LoggingMode = Installation.LoggingModes.All;
  3. Deploy to test on your live environment

    Start using IronOCR in your project today with a free trial
    arrow pointer

How to Enable Diagnostic Logging?

IronOCR's Installation class exposes three logging controls that we configure before calling any Read method.

using IronOcr;

// Write logs to a specific file
Installation.LogFilePath = "logs/ocr_diagnostics.log";

// Enable all logging channels: file + debug output
Installation.LoggingMode = Installation.LoggingModes.All;

// Or pipe logs into your existing ILogger pipeline
Installation.CustomLogger = myLoggerInstance;
using IronOcr;

// Write logs to a specific file
Installation.LogFilePath = "logs/ocr_diagnostics.log";

// Enable all logging channels: file + debug output
Installation.LoggingMode = Installation.LoggingModes.All;

// Or pipe logs into your existing ILogger pipeline
Installation.CustomLogger = myLoggerInstance;
$vbLabelText   $csharpLabel

LoggingMode accepts flag values from the Installation.LoggingModes enum:

Table 1 — LoggingModes Options
ModeOutput TargetUse Case
NoneDisabledProduction with external monitoring
DebugIDE debug output windowLocal development
FileLogFilePathServer-side log collection
AllDebug + FileFull diagnostic capture

The CustomLogger property accepts any Microsoft.Extensions.Logging.ILogger implementation, so we can route OCR diagnostics into Serilog, NLog, or any structured logging sink already running in the pipeline. Call Installation.ClearLogFiles() to purge accumulated log data between runs.


What Exceptions Can OCR Operations Throw?

IronOCR defines typed exceptions under the IronOcr.Exceptions namespace. Catching these specifically — rather than a blanket catch (Exception) — lets us route each failure type to the correct remediation path.

Table 2 — IronOCR Exception Reference
ExceptionCommon CauseFix
IronOcrInputExceptionCorrupt or unsupported image/PDFValidate file before loading into OcrInput
IronOcrProductExceptionInternal engine error during OCR executionEnable logging, check log output, update to latest NuGet version
IronOcrDictionaryExceptionMissing or corrupt .traineddata language fileReinstall the language pack NuGet or set Installation.LanguagePackDirectory
IronOcrNativeExceptionNative C++ interop failureInstall Visual C++ Redistributable; check AVX support
IronOcrLicensingExceptionMissing or expired license keySet Installation.LicenseKey before calling Read
LanguagePackExceptionLanguage pack not found at expected pathVerify LanguagePackDirectory or reinstall the NuGet language package
IronOcrAssemblyVersionMismatchExceptionMismatched assembly versions after partial updateClear NuGet cache, restore packages, ensure all IronOCR packages match

The following try-catch block demonstrates targeted exception handling with exception filters for conditional logging:

using IronOcr;
using IronOcr.Exceptions;

var ocr = new IronTesseract();

try
{
    using var input = new OcrInput();
    input.LoadPdf("invoice_scan.pdf");

    OcrResult result = ocr.Read(input);
    Console.WriteLine($"Text: {result.Text}");
    Console.WriteLine($"Confidence: {result.Confidence:P1}");
}
catch (IronOcrInputException ex)
{
    // File could not be loaded — corrupt, locked, or unsupported format
    Console.Error.WriteLine($"Input error: {ex.Message}");
}
catch (IronOcrDictionaryException ex)
{
    // Language pack missing — common in containerized deployments
    Console.Error.WriteLine($"Language pack error: {ex.Message}");
}
catch (IronOcrNativeException ex) when (ex.Message.Contains("AVX"))
{
    // CPU does not support AVX instructions
    Console.Error.WriteLine($"Hardware incompatibility: {ex.Message}");
}
catch (IronOcrLicensingException)
{
    Console.Error.WriteLine("License key is missing or invalid.");
}
catch (IronOcrProductException ex)
{
    // Catch-all for other IronOCR engine errors
    Console.Error.WriteLine($"OCR engine error: {ex.Message}");
    Console.Error.WriteLine($"Stack trace: {ex.StackTrace}");
}
using IronOcr;
using IronOcr.Exceptions;

var ocr = new IronTesseract();

try
{
    using var input = new OcrInput();
    input.LoadPdf("invoice_scan.pdf");

    OcrResult result = ocr.Read(input);
    Console.WriteLine($"Text: {result.Text}");
    Console.WriteLine($"Confidence: {result.Confidence:P1}");
}
catch (IronOcrInputException ex)
{
    // File could not be loaded — corrupt, locked, or unsupported format
    Console.Error.WriteLine($"Input error: {ex.Message}");
}
catch (IronOcrDictionaryException ex)
{
    // Language pack missing — common in containerized deployments
    Console.Error.WriteLine($"Language pack error: {ex.Message}");
}
catch (IronOcrNativeException ex) when (ex.Message.Contains("AVX"))
{
    // CPU does not support AVX instructions
    Console.Error.WriteLine($"Hardware incompatibility: {ex.Message}");
}
catch (IronOcrLicensingException)
{
    Console.Error.WriteLine("License key is missing or invalid.");
}
catch (IronOcrProductException ex)
{
    // Catch-all for other IronOCR engine errors
    Console.Error.WriteLine($"OCR engine error: {ex.Message}");
    Console.Error.WriteLine($"Stack trace: {ex.StackTrace}");
}
$vbLabelText   $csharpLabel

We order catch blocks from most specific to most general. The when clause on IronOcrNativeException filters for AVX-related failures without catching unrelated native errors. Each handler logs enough context — message, stack trace — for post-mortem analysis.


How to Validate OCR Output with Confidence Scores?

Every OcrResult carries a Confidence property — a value between 0 and 1 representing the engine's statistical certainty averaged across all recognized characters. We access this at every level of the result hierarchy: document, page, paragraph, word, and character.

A threshold-gated validation pattern prevents low-quality OCR output from propagating downstream:

using IronOcr;

var ocr = new IronTesseract();
using var input = new OcrInput();
input.LoadImage("receipt.png");

OcrResult result = ocr.Read(input);
double confidence = result.Confidence;

Console.WriteLine($"Overall confidence: {confidence:P1}");

// Threshold-gated decision
if (confidence >= 0.90)
{
    Console.WriteLine("ACCEPT — high confidence, processing result.");
    ProcessResult(result.Text);
}
else if (confidence >= 0.70)
{
    Console.WriteLine("FLAG — moderate confidence, queuing for review.");
    QueueForReview(result.Text, confidence);
}
else
{
    Console.WriteLine("REJECT — low confidence, logging for investigation.");
    LogRejection("receipt.png", confidence);
}

// Drill into per-page and per-word confidence for diagnostics
foreach (var page in result.Pages)
{
    Console.WriteLine($"  Page {page.PageNumber}: {page.Confidence:P1}");

    var lowConfidenceWords = page.Words
        .Where(w => w.Confidence < 0.70)
        .ToList();

    foreach (var word in lowConfidenceWords)
    {
        Console.WriteLine($"    Low-confidence word: \"{word.Text}\" ({word.Confidence:P1})");
    }
}
using IronOcr;

var ocr = new IronTesseract();
using var input = new OcrInput();
input.LoadImage("receipt.png");

OcrResult result = ocr.Read(input);
double confidence = result.Confidence;

Console.WriteLine($"Overall confidence: {confidence:P1}");

// Threshold-gated decision
if (confidence >= 0.90)
{
    Console.WriteLine("ACCEPT — high confidence, processing result.");
    ProcessResult(result.Text);
}
else if (confidence >= 0.70)
{
    Console.WriteLine("FLAG — moderate confidence, queuing for review.");
    QueueForReview(result.Text, confidence);
}
else
{
    Console.WriteLine("REJECT — low confidence, logging for investigation.");
    LogRejection("receipt.png", confidence);
}

// Drill into per-page and per-word confidence for diagnostics
foreach (var page in result.Pages)
{
    Console.WriteLine($"  Page {page.PageNumber}: {page.Confidence:P1}");

    var lowConfidenceWords = page.Words
        .Where(w => w.Confidence < 0.70)
        .ToList();

    foreach (var word in lowConfidenceWords)
    {
        Console.WriteLine($"    Low-confidence word: \"{word.Text}\" ({word.Confidence:P1})");
    }
}
$vbLabelText   $csharpLabel

This pattern is essential in pipelines where OCR feeds into data entry, invoice processing, or compliance workflows. The per-word drill-down identifies exactly which regions of the source image caused degradation — we can then apply image quality filters or orientation corrections and re-process. For a deeper look at confidence scoring, see the confidence levels how-to.


How to Monitor OCR Progress in Real Time?

For multi-page documents, the OcrProgress event on IronTesseract reports status after each page completes. The OcrProgressEventArgs object exposes ProgressPercent, Duration, TotalPages, PagesComplete, StartTimeUTC, and EndTimeUTC.

using IronOcr;

var ocr = new IronTesseract();

ocr.OcrProgress += (sender, e) =>
{
    Console.WriteLine(
        $"[OCR] {e.ProgressPercent}% complete | " +
        $"Page {e.PagesComplete}/{e.TotalPages} | " +
        $"Elapsed: {e.Duration.TotalSeconds:F1}s"
    );
};

using var input = new OcrInput();
input.LoadPdf("quarterly_report.pdf");

OcrResult result = ocr.Read(input);
Console.WriteLine($"Finished in {result.Pages.Count()} pages, confidence: {result.Confidence:P1}");
using IronOcr;

var ocr = new IronTesseract();

ocr.OcrProgress += (sender, e) =>
{
    Console.WriteLine(
        $"[OCR] {e.ProgressPercent}% complete | " +
        $"Page {e.PagesComplete}/{e.TotalPages} | " +
        $"Elapsed: {e.Duration.TotalSeconds:F1}s"
    );
};

using var input = new OcrInput();
input.LoadPdf("quarterly_report.pdf");

OcrResult result = ocr.Read(input);
Console.WriteLine($"Finished in {result.Pages.Count()} pages, confidence: {result.Confidence:P1}");
$vbLabelText   $csharpLabel

We wire this event into our logging infrastructure to track OCR job duration and detect stalls. If Duration exceeds a threshold without ProgressPercent advancing, the pipeline can flag the job for investigation. This is particularly useful for batch PDF processing where a single malformed page can stall the entire job.


How to Handle Errors in Batch OCR Pipelines?

Production OCR systems process hundreds or thousands of files. A single failure should not halt the pipeline. We isolate errors per file, log failures with context, and produce a summary report at the end.

using IronOcr;
using IronOcr.Exceptions;

var ocr = new IronTesseract();
Installation.LogFilePath = "batch_debug.log";
Installation.LoggingMode = Installation.LoggingModes.File;

string[] files = Directory.GetFiles("scans/", "*.pdf");
int succeeded = 0, failed = 0;
double totalConfidence = 0;
var failures = new List<(string File, string Error)>();

foreach (string file in files)
{
    try
    {
        using var input = new OcrInput();
        input.LoadPdf(file);

        OcrResult result = ocr.Read(input);
        totalConfidence += result.Confidence;
        succeeded++;

        Console.WriteLine($"OK: {Path.GetFileName(file)} — {result.Confidence:P1}");
    }
    catch (IronOcrInputException ex)
    {
        failed++;
        failures.Add((file, $"Input error: {ex.Message}"));
        Console.Error.WriteLine($"FAIL: {Path.GetFileName(file)} — {ex.Message}");
    }
    catch (IronOcrProductException ex)
    {
        failed++;
        failures.Add((file, $"Engine error: {ex.Message}"));
        Console.Error.WriteLine($"FAIL: {Path.GetFileName(file)} — {ex.Message}");
    }
    catch (Exception ex)
    {
        failed++;
        failures.Add((file, $"Unexpected: {ex.Message}"));
        Console.Error.WriteLine($"FAIL: {Path.GetFileName(file)} — {ex.GetType().Name}: {ex.Message}");
    }
}

// Summary report
Console.WriteLine($"\n--- Batch Summary ---");
Console.WriteLine($"Total: {files.Length} | Passed: {succeeded} | Failed: {failed}");
if (succeeded > 0)
    Console.WriteLine($"Average confidence: {totalConfidence / succeeded:P1}");

foreach (var (f, err) in failures)
    Console.WriteLine($"  {Path.GetFileName(f)}: {err}");
using IronOcr;
using IronOcr.Exceptions;

var ocr = new IronTesseract();
Installation.LogFilePath = "batch_debug.log";
Installation.LoggingMode = Installation.LoggingModes.File;

string[] files = Directory.GetFiles("scans/", "*.pdf");
int succeeded = 0, failed = 0;
double totalConfidence = 0;
var failures = new List<(string File, string Error)>();

foreach (string file in files)
{
    try
    {
        using var input = new OcrInput();
        input.LoadPdf(file);

        OcrResult result = ocr.Read(input);
        totalConfidence += result.Confidence;
        succeeded++;

        Console.WriteLine($"OK: {Path.GetFileName(file)} — {result.Confidence:P1}");
    }
    catch (IronOcrInputException ex)
    {
        failed++;
        failures.Add((file, $"Input error: {ex.Message}"));
        Console.Error.WriteLine($"FAIL: {Path.GetFileName(file)} — {ex.Message}");
    }
    catch (IronOcrProductException ex)
    {
        failed++;
        failures.Add((file, $"Engine error: {ex.Message}"));
        Console.Error.WriteLine($"FAIL: {Path.GetFileName(file)} — {ex.Message}");
    }
    catch (Exception ex)
    {
        failed++;
        failures.Add((file, $"Unexpected: {ex.Message}"));
        Console.Error.WriteLine($"FAIL: {Path.GetFileName(file)} — {ex.GetType().Name}: {ex.Message}");
    }
}

// Summary report
Console.WriteLine($"\n--- Batch Summary ---");
Console.WriteLine($"Total: {files.Length} | Passed: {succeeded} | Failed: {failed}");
if (succeeded > 0)
    Console.WriteLine($"Average confidence: {totalConfidence / succeeded:P1}");

foreach (var (f, err) in failures)
    Console.WriteLine($"  {Path.GetFileName(f)}: {err}");
$vbLabelText   $csharpLabel

The outer catch (Exception) block handles unforeseen errors — network timeouts on shared storage, permission issues, or out-of-memory conditions on large TIFFs. Each failure records the file path and error message for the summary, while the loop continues processing remaining files. The log file at batch_debug.log captures engine-level detail for any file that triggers internal diagnostics.

For non-blocking execution in services or web applications, IronOCR supports ReadAsync — wrap the await ocr.ReadAsync(input) call in the same try-catch structure.


What About Debugging OCR Accuracy?

When confidence scores are consistently low, the issue is typically the input image rather than the OCR engine. IronOCR provides preprocessing tools to address this:

  • Image quality filters — sharpen, denoise, dilate, and erode filters improve text clarity before recognition.
  • Orientation correction — automatic deskew and rotation correction for scanned documents.
  • DPI upscaling — low-resolution images benefit from DPI adjustment before processing.
  • Computer vision — OpenCV-based text region detection isolates text areas in complex layouts.
  • IronOCR Utility — a desktop tool for visually testing filter combinations and exporting the optimal C# configuration.

For deployment-specific issues (missing runtimes, native library errors), IronOCR maintains dedicated troubleshooting guides for Azure Functions, Docker and Linux, and general environment setup.


Where to Go from Here?

This guide covered runtime debugging — logging, exceptions, confidence validation, progress monitoring, and batch error handling. For related topics, explore:

Purchase an IronOCR license to deploy in production, or start a free 30-day trial to test these debugging patterns in your environment.

Curtis Chau
Technical Writer

Curtis Chau holds a Bachelor’s degree in Computer Science (Carleton University) and specializes in front-end development with expertise in Node.js, TypeScript, JavaScript, and React. Passionate about crafting intuitive and aesthetically pleasing user interfaces, Curtis enjoys working with modern frameworks and creating well-structured, visually appealing manuals.

...

Read More
Ready to Get Started?
Nuget Downloads 5,462,358 | Version: 2026.3 just released