Skip to footer content
COMPARE TO OTHER COMPONENTS

PaddleOCR vs IronOCR: .NET OCR Library Comparison

PaddleOCR delivers state-of-the-art Chinese character recognition — but getting a single image processed in a .NET application requires five NuGet packages, an async model download (~100MB) that connects to Baidu servers in China, an OpenCvSharp dependency for every image load, and a cold start time of 3-5 seconds on the first run. That is the tax on using a Python-first deep learning system from a .NET project. For CJK-heavy workloads where accuracy on Chinese script is the hard requirement and GPU infrastructure already exists, that tax is worth paying. For everything else, it is overhead the project never needed.

Understanding PaddleOCR

PaddleOCR is an open-source OCR system built by Baidu on top of PaddlePaddle, Baidu's own deep learning framework. The project was designed from the ground up for Python developers and the .NET community accesses it through a community wrapper, Sdcb.PaddleOCR, maintained by Zhou Jie (sdcb) on GitHub.

The recognition pipeline chains three neural networks in sequence: a text detection model using differentiable binarization (DB) that identifies text region bounding boxes, a direction classifier that determines text orientation, and a recognition model using a convolutional recurrent neural network (CRNN) that converts the detected region pixels to a string. Each of those networks is a separate model file that must be present on disk before any inference can begin.

Key architectural characteristics:

  • Multi-package installation: Core package (Sdcb.PaddleOCR), platform runtime (Sdcb.PaddleInference.runtime.win64.mkl or the CUDA equivalent), model download helper (Sdcb.PaddleOCR.Models.Online), and OpenCvSharp for image loading — minimum four packages, five if you separate the OpenCvSharp runtime
  • Mandatory model download: Models are not bundled. First use calls OnlineFullModels.ChineseV4.DownloadAsync(), which downloads from Baidu-controlled servers (bj.bcebos.com) and adds ~100MB to deployment artifacts
  • OpenCV dependency for every image: There is no path from a file path directly to inference. Images must be loaded as an OpenCvSharp Mat object before being passed to ocr.Run(mat)
  • GPU acceleration requires CUDA/cuDNN setup: Achieving the advertised performance figures requires NVIDIA drivers, CUDA Toolkit 11.8, cuDNN 8.6+, and a separate GPU runtime NuGet package — 2-8 hours of setup per environment
  • Python-first release cadence: New PaddleOCR model versions appear in Python first; .NET wrapper updates follow weeks to months later
  • 14 supported languages: Chinese Simplified, Chinese Traditional, English, French, German, Korean, Japanese, Italian, Spanish, Portuguese, Russian, Arabic, Hindi, and Tamil — a hard ceiling

The NuGet package has approximately 200,000 downloads. The smaller footprint means fewer Stack Overflow answers, fewer battle-tested production reports, and a thinner pool of developers who can answer questions from experience.

The Three-Model Pipeline in Practice

Every PaddleOCR inference call runs three neural networks back-to-back. The code below shows the complete setup path that a developer encounters before any text is returned:

// Five packages required before this compiles:
// dotnet add package Sdcb.PaddleOCR
// dotnet add package Sdcb.PaddleOCR.Models.Online
// dotnet add package Sdcb.PaddleInference.runtime.win64.mkl
// dotnet add package OpenCvSharp4
// dotnet add package OpenCvSharp4.runtime.win

using Sdcb.PaddleOCR;
using Sdcb.PaddleOCR.Models.Online;
using OpenCvSharp;

// Async download from Baidu servers — blocks first run for 10-30 seconds
FullOcrModel models = await OnlineFullModels.ChineseV4.DownloadAsync();

// PaddleOcrAll bundles detection, classification, and recognition
using PaddleOcrAll ocr = new PaddleOcrAll(models)
{
    AllowRotateDetection = true,
    Enable180Classification = true
};

// OpenCV required — no file path shortcut
using Mat mat = Cv2.ImRead("document.png");

if (mat.Empty())
{
    throw new FileNotFoundException("Could not load image");
}

// Three neural networks fire in sequence
PaddleOcrResult result = ocr.Run(mat);

Console.WriteLine(result.Text);
// Five packages required before this compiles:
// dotnet add package Sdcb.PaddleOCR
// dotnet add package Sdcb.PaddleOCR.Models.Online
// dotnet add package Sdcb.PaddleInference.runtime.win64.mkl
// dotnet add package OpenCvSharp4
// dotnet add package OpenCvSharp4.runtime.win

using Sdcb.PaddleOCR;
using Sdcb.PaddleOCR.Models.Online;
using OpenCvSharp;

// Async download from Baidu servers — blocks first run for 10-30 seconds
FullOcrModel models = await OnlineFullModels.ChineseV4.DownloadAsync();

// PaddleOcrAll bundles detection, classification, and recognition
using PaddleOcrAll ocr = new PaddleOcrAll(models)
{
    AllowRotateDetection = true,
    Enable180Classification = true
};

// OpenCV required — no file path shortcut
using Mat mat = Cv2.ImRead("document.png");

if (mat.Empty())
{
    throw new FileNotFoundException("Could not load image");
}

// Three neural networks fire in sequence
PaddleOcrResult result = ocr.Run(mat);

Console.WriteLine(result.Text);
Imports Sdcb.PaddleOCR
Imports Sdcb.PaddleOCR.Models.Online
Imports OpenCvSharp

' Five packages required before this compiles:
' dotnet add package Sdcb.PaddleOCR
' dotnet add package Sdcb.PaddleOCR.Models.Online
' dotnet add package Sdcb.PaddleInference.runtime.win64.mkl
' dotnet add package OpenCvSharp4
' dotnet add package OpenCvSharp4.runtime.win

' Async download from Baidu servers — blocks first run for 10-30 seconds
Dim models As FullOcrModel = Await OnlineFullModels.ChineseV4.DownloadAsync()

' PaddleOcrAll bundles detection, classification, and recognition
Using ocr As New PaddleOcrAll(models) With {
    .AllowRotateDetection = True,
    .Enable180Classification = True
}

    ' OpenCV required — no file path shortcut
    Using mat As Mat = Cv2.ImRead("document.png")

        If mat.Empty() Then
            Throw New FileNotFoundException("Could not load image")
        End If

        ' Three neural networks fire in sequence
        Dim result As PaddleOcrResult = ocr.Run(mat)

        Console.WriteLine(result.Text)
    End Using
End Using
$vbLabelText   $csharpLabel

The result.Regions array contains individual text region objects, each with a Text string, a Score confidence value, and a Rect bounding box. To filter by confidence: result.Regions.Where(r => r.Score > 0.8). Region ordering is spatial, not reading-order guaranteed, so sorting by region.Rect.Center.Y then region.Rect.Center.X is a common pattern.

Understanding IronOCR

IronOCR is a commercial OCR library for .NET built around an optimized Tesseract 5 LSTM engine, designed to eliminate the configuration and preprocessing work that raw Tesseract requires. One NuGet package installs the complete runtime, language models, native binaries, and preprocessing pipeline.

Key characteristics:

  • Single package deployment: dotnet add package IronOcr installs everything. No additional runtime packages, no model downloads, no native library configuration
  • Automatic preprocessing: Deskew, DeNoise, Contrast, Binarize, and EnhanceResolution apply automatically for poor-quality inputs; explicit control is available when needed
  • Native PDF support: Scanned PDFs load directly via input.LoadPdf() with no third-party conversion step
  • 125+ languages via NuGet: Each language pack is a standard NuGet dependency — dotnet add package IronOcr.Languages.ChineseSimplified — no model directory management
  • Thread-safe, CPU-optimized: Built-in parallelization without CUDA or GPU hardware requirements
  • Commercial support with SLA: US-based Iron Software provides commercial support; the product is not a community side project
  • Perpetual licensing: $999 Lite / $1,499 Plus / $2,999 Professional — pay once, run indefinitely with no per-page charges

Feature Comparison

Feature PaddleOCR (Sdcb) IronOCR
NuGet packages required 4-5 1
Model download required Yes (~100MB, Baidu servers) No (bundled)
Setup time 30-60 minutes 5 minutes
CJK accuracy Excellent (95%+) Good (85-90%)
English/Latin accuracy Good (92-95%) Excellent (97-99%)
GPU support Yes (complex CUDA setup) CPU-optimized
Native PDF support No Yes
Languages supported 14 125+
License Apache 2.0 (free) Commercial ($999+)
Company origin China (Baidu) USA (Iron Software)

Detailed Feature Comparison

Category / Feature PaddleOCR (Sdcb) IronOCR
Setup
NuGet packages 4-5 1
Model download on install Required Included
First-run cold start 3-5 seconds Under 1 second
Installation time 30-60 minutes Under 5 minutes
OpenCV dependency Required None
Language support
Supported languages 14 125+
Language install method Model download NuGet package
Multi-language simultaneous Sequential model switch Yes (AddSecondaryLanguage)
CJK accuracy Excellent Good
Latin/English accuracy Good Excellent
Input handling
Image file input Via OpenCvSharp Mat Direct file path
Native PDF input No Yes
Password-protected PDF No Yes
Stream input Indirect Yes
Byte array input Indirect Yes
Preprocessing
Automatic preprocessing No Yes
Deskew Manual Built-in
DeNoise Manual Built-in
Contrast enhancement Manual Built-in
Resolution scaling Manual EnhanceResolution(300)
Output
Full text string Yes Yes
Per-region/word results result.Regions result.Words, result.Lines
Confidence scores Per region (region.Score) Per word (word.Confidence)
Bounding boxes region.Rect word.X, word.Y, word.Width, word.Height
Searchable PDF output No Yes
hOCR export No Yes
Performance
CPU per image 300-500ms 150-300ms
GPU per image 50-100ms (requires CUDA) N/A (CPU-optimized)
Memory usage 500MB-1GB 100-200MB
Deployment size 300-500MB ~80MB
Deployment
Docker base image size ~1.5GB (with CUDA base) ~400MB
Cross-platform Windows/Linux (partial) Windows, Linux, macOS
Air-gapped deployment Yes (pre-download models) Yes (no downloads needed)
Support and licensing
License type Apache 2.0 (free) Perpetual commercial
Commercial support Community/GitHub Yes, with SLA
Company / maintainer Baidu / community wrapper Iron Software (US)
NuGet downloads ~200K ~5.3M

CJK Accuracy vs. Setup Cost

CJK accuracy is PaddleOCR's genuine strength. For Chinese Simplified documents, the deep learning CRNN architecture returns accuracy figures of 95%+ — consistently ahead of Tesseract-based engines on dense Chinese character grids, mixed Chinese/English layouts, and vertical text blocks. If a project's primary workload is Chinese invoice processing, Mandarin contracts, or Japanese product catalogs, that accuracy advantage is real.

PaddleOCR Approach

The Chinese model pipeline performs well because the detection and recognition networks were trained on large-scale Chinese document datasets by Baidu, whose core business involves processing Chinese web content at scale.

// Chinese model: highest accuracy for CJK documents
FullOcrModel models = await OnlineFullModels.ChineseV4.DownloadAsync();

using PaddleOcrAll ocr = new PaddleOcrAll(models)
{
    AllowRotateDetection = true,
    Enable180Classification = true
};

using Mat mat = Cv2.ImRead("chinese-invoice.png");
PaddleOcrResult result = ocr.Run(mat);

// Filter low-confidence regions for cleaner output
var highConfidence = result.Regions
    .Where(r => r.Score >= 0.8)
    .OrderBy(r => r.Rect.Center.Y)
    .ThenBy(r => r.Rect.Center.X);

foreach (var region in highConfidence)
{
    Console.WriteLine($"{region.Text} ({region.Score:P0})");
}
// Chinese model: highest accuracy for CJK documents
FullOcrModel models = await OnlineFullModels.ChineseV4.DownloadAsync();

using PaddleOcrAll ocr = new PaddleOcrAll(models)
{
    AllowRotateDetection = true,
    Enable180Classification = true
};

using Mat mat = Cv2.ImRead("chinese-invoice.png");
PaddleOcrResult result = ocr.Run(mat);

// Filter low-confidence regions for cleaner output
var highConfidence = result.Regions
    .Where(r => r.Score >= 0.8)
    .OrderBy(r => r.Rect.Center.Y)
    .ThenBy(r => r.Rect.Center.X);

foreach (var region in highConfidence)
{
    Console.WriteLine($"{region.Text} ({region.Score:P0})");
}
Imports System
Imports System.Linq
Imports OpenCvSharp

' Chinese model: highest accuracy for CJK documents
Dim models As FullOcrModel = Await OnlineFullModels.ChineseV4.DownloadAsync()

Using ocr As New PaddleOcrAll(models) With {
    .AllowRotateDetection = True,
    .Enable180Classification = True
}

    Using mat As Mat = Cv2.ImRead("chinese-invoice.png")
        Dim result As PaddleOcrResult = ocr.Run(mat)

        ' Filter low-confidence regions for cleaner output
        Dim highConfidence = result.Regions _
            .Where(Function(r) r.Score >= 0.8) _
            .OrderBy(Function(r) r.Rect.Center.Y) _
            .ThenBy(Function(r) r.Rect.Center.X)

        For Each region In highConfidence
            Console.WriteLine($"{region.Text} ({region.Score:P0})")
        Next
    End Using
End Using
$vbLabelText   $csharpLabel

The confidence filter pattern (r.Score >= 0.8) is important because the deep learning detection step will occasionally flag non-text regions as text. Without filtering, noise regions appear in the output. Sorting by Rect.Center.Y then Rect.Center.X approximates reading order but does not guarantee it for complex multi-column layouts.

IronOCR Approach

IronOCR's Chinese Simplified support installs as a NuGet package and requires no model management. Accuracy on Chinese documents runs at 85-90% — lower than PaddleOCR's 95%+ on the same material, but sufficient for many production use cases, particularly when the documents have consistent formatting.

// dotnet add package IronOcr.Languages.ChineseSimplified

using IronOcr;

var ocr = new IronTesseract();
ocr.Language = OcrLanguage.ChineseSimplified;

using var input = new OcrInput();
input.LoadImage("chinese-invoice.png");

// Preprocessing handles rotation, noise, and contrast automatically
input.Deskew();
input.DeNoise();

var result = ocr.Read(input);

// Word-level results with coordinates
foreach (var word in result.Words)
{
    Console.WriteLine($"{word.Text} at ({word.X},{word.Y}) — {word.Confidence}%");
}
// dotnet add package IronOcr.Languages.ChineseSimplified

using IronOcr;

var ocr = new IronTesseract();
ocr.Language = OcrLanguage.ChineseSimplified;

using var input = new OcrInput();
input.LoadImage("chinese-invoice.png");

// Preprocessing handles rotation, noise, and contrast automatically
input.Deskew();
input.DeNoise();

var result = ocr.Read(input);

// Word-level results with coordinates
foreach (var word in result.Words)
{
    Console.WriteLine($"{word.Text} at ({word.X},{word.Y}) — {word.Confidence}%");
}
Imports IronOcr

' dotnet add package IronOcr.Languages.ChineseSimplified

Dim ocr As New IronTesseract()
ocr.Language = OcrLanguage.ChineseSimplified

Using input As New OcrInput()
    input.LoadImage("chinese-invoice.png")

    ' Preprocessing handles rotation, noise, and contrast automatically
    input.Deskew()
    input.DeNoise()

    Dim result = ocr.Read(input)

    ' Word-level results with coordinates
    For Each word In result.Words
        Console.WriteLine($"{word.Text} at ({word.X},{word.Y}) — {word.Confidence}%")
    Next
End Using
$vbLabelText   $csharpLabel

For teams that need multiple languages simultaneously — such as mixed Chinese/English documents — IronOCR handles the combination with AddSecondaryLanguage. PaddleOCR requires switching model sets between languages, which means separate download calls and separate engine instances.

The trade-off is specific: PaddleOCR wins on pure CJK accuracy. IronOCR wins on English/Latin accuracy (97-99% vs. 92-95%), preprocessing for poor-quality scans, and the operational simplicity of not managing model files. Teams processing documents in a mix of languages, or primarily processing European-language documents with occasional CJK content, will find IronOCR's accuracy profile more useful in aggregate. See the IronOCR tutorials hub for multi-language and preprocessing examples.

Model Download and Management

The model management difference between PaddleOCR and IronOCR represents the most significant day-to-day operational distinction. It affects first-time setup, CI/CD pipelines, Docker images, and air-gapped deployments.

PaddleOCR Approach

PaddleOCR ships models separately from the NuGet package. Three model directories are required for a full-pipeline OCR run, and each approach to sourcing them has a different trade-off:

// Option A: Online download — simplest, but requires Baidu server access
// This runs on application startup; first run blocks for 10-30 seconds
FullOcrModel onlineModels = await OnlineFullModels.ChineseV4.DownloadAsync();

// Option B: Pre-downloaded local models — for air-gapped or compliance environments
// Developer must manually download and maintain three directories:
//   models/ch_PP-OCRv4_det_infer/
//   models/ch_ppocr_mobile_v2.0_cls_infer/
//   models/ch_PP-OCRv4_rec_infer/
FullOcrModel localModels = new FullOcrModel(
    LocalDetectionModel.FromDirectory("models/ch_PP-OCRv4_det_infer"),
    LocalClassificationModel.FromDirectory("models/ch_ppocr_mobile_v2.0_cls_infer"),
    LocalRecognitionModel.FromDirectory("models/ch_PP-OCRv4_rec_infer")
);

// Option C: Embedded model package — adds 100MB to NuGet restore, simplest deployment
// dotnet add package Sdcb.PaddleOCR.Models.LocalV4

// Japanese and Korean require separate model downloads:
FullOcrModel japaneseModels = await OnlineFullModels.JapanV4.DownloadAsync();
FullOcrModel koreanModels = await OnlineFullModels.KoreanV4.DownloadAsync();
// Option A: Online download — simplest, but requires Baidu server access
// This runs on application startup; first run blocks for 10-30 seconds
FullOcrModel onlineModels = await OnlineFullModels.ChineseV4.DownloadAsync();

// Option B: Pre-downloaded local models — for air-gapped or compliance environments
// Developer must manually download and maintain three directories:
//   models/ch_PP-OCRv4_det_infer/
//   models/ch_ppocr_mobile_v2.0_cls_infer/
//   models/ch_PP-OCRv4_rec_infer/
FullOcrModel localModels = new FullOcrModel(
    LocalDetectionModel.FromDirectory("models/ch_PP-OCRv4_det_infer"),
    LocalClassificationModel.FromDirectory("models/ch_ppocr_mobile_v2.0_cls_infer"),
    LocalRecognitionModel.FromDirectory("models/ch_PP-OCRv4_rec_infer")
);

// Option C: Embedded model package — adds 100MB to NuGet restore, simplest deployment
// dotnet add package Sdcb.PaddleOCR.Models.LocalV4

// Japanese and Korean require separate model downloads:
FullOcrModel japaneseModels = await OnlineFullModels.JapanV4.DownloadAsync();
FullOcrModel koreanModels = await OnlineFullModels.KoreanV4.DownloadAsync();
$vbLabelText   $csharpLabel

The model version must match the wrapper version. When Sdcb.PaddleOCR updates, pre-downloaded models from the previous version may require re-download. CI/CD pipelines that cache the NuGet restore step must also separately manage model file caching or trigger model downloads during the container build phase. The Docker image comparison shows the consequence: a PaddleOCR container lands at ~1.5GB; an IronOCR container at ~400MB.

IronOCR Approach

IronOCR language models are standard NuGet packages. No download calls, no model directory management, no version synchronization. The NuGet package for IronOCR includes the English model; additional language packs install exactly like any other dependency.

// dotnet add package IronOcr
// dotnet add package IronOcr.Languages.ChineseSimplified
// dotnet add package IronOcr.Languages.Japanese

using IronOcr;

// No async initialization, no model path, no version matching
var ocr = new IronTesseract();
ocr.Language = OcrLanguage.ChineseSimplified;
ocr.AddSecondaryLanguage(OcrLanguage.Japanese);

using var input = new OcrInput();
input.LoadImage("mixed-cjk-document.png");

var result = ocr.Read(input);
Console.WriteLine(result.Text);
// dotnet add package IronOcr
// dotnet add package IronOcr.Languages.ChineseSimplified
// dotnet add package IronOcr.Languages.Japanese

using IronOcr;

// No async initialization, no model path, no version matching
var ocr = new IronTesseract();
ocr.Language = OcrLanguage.ChineseSimplified;
ocr.AddSecondaryLanguage(OcrLanguage.Japanese);

using var input = new OcrInput();
input.LoadImage("mixed-cjk-document.png");

var result = ocr.Read(input);
Console.WriteLine(result.Text);
Imports IronOcr

' No async initialization, no model path, no version matching
Dim ocr As New IronTesseract()
ocr.Language = OcrLanguage.ChineseSimplified
ocr.AddSecondaryLanguage(OcrLanguage.Japanese)

Using input As New OcrInput()
    input.LoadImage("mixed-cjk-document.png")

    Dim result = ocr.Read(input)
    Console.WriteLine(result.Text)
End Using
$vbLabelText   $csharpLabel

Language packs restore as part of dotnet restore. The Docker build step that copies models/ directories, runs apt-get install libopencv-dev, and adds the CUDA base image simply does not exist in an IronOCR container. For teams using Docker deployment or Azure deployment, the difference in container build complexity is substantial.

GPU and CUDA Dependency

PaddleOCR's GPU performance numbers — 50-100ms per image versus 300-500ms on CPU — are genuine. The deep learning models benefit significantly from GPU parallelism in a way that Tesseract-based engines do not. For a workload processing tens of thousands of images per day, that 5-10x throughput improvement changes the infrastructure conversation.

PaddleOCR Approach

GPU inference requires an entirely separate software stack outside of NuGet. The environment prerequisites are precise and version-locked:

// Prerequisites (must be installed before this compiles correctly):
// 1. NVIDIA GPU with CUDA capability 3.5+
// 2. NVIDIA Driver 452.39+ (Windows) / 450.80.02+ (Linux)
// 3. CUDA Toolkit 11.8 — exact version, not 12.x
// 4. cuDNN 8.6.0+ for CUDA 11.x
// 5. dotnet add package Sdcb.PaddleInference.runtime.win64.cuda118

using Sdcb.PaddleOCR;
using Sdcb.PaddleOCR.Models.Online;
using Sdcb.PaddleInference;
using OpenCvSharp;

FullOcrModel models = await OnlineFullModels.ChineseV4.DownloadAsync();

// GPU device 0, 1000MB GPU memory pool
using PaddleOcrAll ocr = new PaddleOcrAll(models, PaddleDevice.Gpu(deviceId: 0))
{
    AllowRotateDetection = true,
    Enable180Classification = true
};

using Mat mat = Cv2.ImRead(imagePath);
PaddleOcrResult result = ocr.Run(mat);
// Prerequisites (must be installed before this compiles correctly):
// 1. NVIDIA GPU with CUDA capability 3.5+
// 2. NVIDIA Driver 452.39+ (Windows) / 450.80.02+ (Linux)
// 3. CUDA Toolkit 11.8 — exact version, not 12.x
// 4. cuDNN 8.6.0+ for CUDA 11.x
// 5. dotnet add package Sdcb.PaddleInference.runtime.win64.cuda118

using Sdcb.PaddleOCR;
using Sdcb.PaddleOCR.Models.Online;
using Sdcb.PaddleInference;
using OpenCvSharp;

FullOcrModel models = await OnlineFullModels.ChineseV4.DownloadAsync();

// GPU device 0, 1000MB GPU memory pool
using PaddleOcrAll ocr = new PaddleOcrAll(models, PaddleDevice.Gpu(deviceId: 0))
{
    AllowRotateDetection = true,
    Enable180Classification = true
};

using Mat mat = Cv2.ImRead(imagePath);
PaddleOcrResult result = ocr.Run(mat);
Imports Sdcb.PaddleOCR
Imports Sdcb.PaddleOCR.Models.Online
Imports Sdcb.PaddleInference
Imports OpenCvSharp

' Prerequisites (must be installed before this compiles correctly):
' 1. NVIDIA GPU with CUDA capability 3.5+
' 2. NVIDIA Driver 452.39+ (Windows) / 450.80.02+ (Linux)
' 3. CUDA Toolkit 11.8 — exact version, not 12.x
' 4. cuDNN 8.6.0+ for CUDA 11.x
' 5. dotnet add package Sdcb.PaddleInference.runtime.win64.cuda118

Dim models As FullOcrModel = Await OnlineFullModels.ChineseV4.DownloadAsync()

' GPU device 0, 1000MB GPU memory pool
Using ocr As New PaddleOcrAll(models, PaddleDevice.Gpu(deviceId:=0)) With {
    .AllowRotateDetection = True,
    .Enable180Classification = True
}

    Using mat As Mat = Cv2.ImRead(imagePath)
        Dim result As PaddleOcrResult = ocr.Run(mat)
    End Using

End Using
$vbLabelText   $csharpLabel

If CUDA_PATH is not set, the matching cuDNN DLL is absent from the expected path, or the runtime package version mismatches the installed CUDA toolkit, the call throws a native library load exception. Diagnosing that failure requires checking driver versions, PATH entries, and DLL presence across several directories. In Docker, GPU support requires the nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04 base image plus nvidia-container-toolkit on the host, adding complexity to both the image build and the container runtime configuration.

For CPU-only deployments, PaddleOCR defaults to the MKL-accelerated runtime: ~300-500ms per image with a 3-5 second model-load cold start.

IronOCR Approach

IronOCR is engineered for CPU inference. It delivers 150-300ms per image on standard hardware with no GPU dependency, no CUDA stack, and no driver version concerns. The engine also starts under one second because models are bundled rather than loaded from disk on initialization.

using IronOcr;

// No GPU setup, no CUDA, no driver checks
IronOcr.License.LicenseKey = "YOUR-LICENSE-KEY";

var ocr = new IronTesseract();

// Preprocessing built-in — improves accuracy on poor scans without extra code
using var input = new OcrInput();
input.LoadImage("document.png");
input.Deskew();
input.DeNoise();
input.EnhanceResolution(300);

var result = ocr.Read(input);
Console.WriteLine($"Confidence: {result.Confidence}%");
using IronOcr;

// No GPU setup, no CUDA, no driver checks
IronOcr.License.LicenseKey = "YOUR-LICENSE-KEY";

var ocr = new IronTesseract();

// Preprocessing built-in — improves accuracy on poor scans without extra code
using var input = new OcrInput();
input.LoadImage("document.png");
input.Deskew();
input.DeNoise();
input.EnhanceResolution(300);

var result = ocr.Read(input);
Console.WriteLine($"Confidence: {result.Confidence}%");
Imports IronOcr

' No GPU setup, no CUDA, no driver checks
IronOcr.License.LicenseKey = "YOUR-LICENSE-KEY"

Dim ocr As New IronTesseract()

' Preprocessing built-in — improves accuracy on poor scans without extra code
Using input As New OcrInput()
    input.LoadImage("document.png")
    input.Deskew()
    input.DeNoise()
    input.EnhanceResolution(300)

    Dim result = ocr.Read(input)
    Console.WriteLine($"Confidence: {result.Confidence}%")
End Using
$vbLabelText   $csharpLabel

For throughput at scale without GPU hardware, IronOCR's thread-safe design allows Parallel.ForEach across a batch of images with separate IronTesseract instances — each processes independently without locking. The multithreading example shows the pattern. At 150-300ms per image on a 4-core CPU, a batch of 100 images completes in 20-40 seconds using parallelism — CPU throughput that makes GPU investment unnecessary for most web API and document pipeline scenarios.

The image quality correction guide covers preprocessing configuration in detail for cases where automatic preprocessing needs tuning.

PDF Processing

PaddleOCR has no native PDF support. Every PDF processed through PaddleOCR requires an external library to convert pages to images first, temp file management between conversion and inference, and cleanup logic.

PaddleOCR Approach

// Requires additional package: PdfiumViewer or similar
using Sdcb.PaddleOCR;
using Sdcb.PaddleOCR.Models.Online;
using OpenCvSharp;
using PdfiumViewer;
using System.Drawing.Imaging;
using System.Text;

FullOcrModel models = await OnlineFullModels.ChineseV4.DownloadAsync();
using PaddleOcrAll ocr = new PaddleOcrAll(models);

var results = new StringBuilder();

// Manual PDF-to-image conversion, page by page
using PdfDocument pdf = PdfDocument.Load("document.pdf");

for (int i = 0; i < pdf.PageCount; i++)
{
    // Render page to bitmap — 200 DPI
    using var pageImage = pdf.Render(i, 200, 200, PdfRenderFlags.CorrectFromDpi);

    // Save to temp file — OpenCvSharp cannot read from memory easily
    string tempPath = Path.GetTempFileName() + ".png";
    pageImage.Save(tempPath, ImageFormat.Png);

    try
    {
        using Mat mat = Cv2.ImRead(tempPath);
        var result = ocr.Run(mat);
        results.AppendLine($"--- Page {i + 1} ---");
        results.AppendLine(result.Text);
    }
    finally
    {
        File.Delete(tempPath);  // Must clean up temp files
    }
}

return results.ToString();
// Requires additional package: PdfiumViewer or similar
using Sdcb.PaddleOCR;
using Sdcb.PaddleOCR.Models.Online;
using OpenCvSharp;
using PdfiumViewer;
using System.Drawing.Imaging;
using System.Text;

FullOcrModel models = await OnlineFullModels.ChineseV4.DownloadAsync();
using PaddleOcrAll ocr = new PaddleOcrAll(models);

var results = new StringBuilder();

// Manual PDF-to-image conversion, page by page
using PdfDocument pdf = PdfDocument.Load("document.pdf");

for (int i = 0; i < pdf.PageCount; i++)
{
    // Render page to bitmap — 200 DPI
    using var pageImage = pdf.Render(i, 200, 200, PdfRenderFlags.CorrectFromDpi);

    // Save to temp file — OpenCvSharp cannot read from memory easily
    string tempPath = Path.GetTempFileName() + ".png";
    pageImage.Save(tempPath, ImageFormat.Png);

    try
    {
        using Mat mat = Cv2.ImRead(tempPath);
        var result = ocr.Run(mat);
        results.AppendLine($"--- Page {i + 1} ---");
        results.AppendLine(result.Text);
    }
    finally
    {
        File.Delete(tempPath);  // Must clean up temp files
    }
}

return results.ToString();
Imports Sdcb.PaddleOCR
Imports Sdcb.PaddleOCR.Models.Online
Imports OpenCvSharp
Imports PdfiumViewer
Imports System.Drawing.Imaging
Imports System.Text

Dim models As FullOcrModel = Await OnlineFullModels.ChineseV4.DownloadAsync()
Using ocr As New PaddleOcrAll(models)

    Dim results As New StringBuilder()

    ' Manual PDF-to-image conversion, page by page
    Using pdf As PdfDocument = PdfDocument.Load("document.pdf")

        For i As Integer = 0 To pdf.PageCount - 1
            ' Render page to bitmap — 200 DPI
            Using pageImage = pdf.Render(i, 200, 200, PdfRenderFlags.CorrectFromDpi)

                ' Save to temp file — OpenCvSharp cannot read from memory easily
                Dim tempPath As String = Path.GetTempFileName() & ".png"
                pageImage.Save(tempPath, ImageFormat.Png)

                Try
                    Using mat As Mat = Cv2.ImRead(tempPath)
                        Dim result = ocr.Run(mat)
                        results.AppendLine($"--- Page {i + 1} ---")
                        results.AppendLine(result.Text)
                    End Using
                Finally
                    File.Delete(tempPath)  ' Must clean up temp files
                End Try
            End Using
        Next
    End Using

    Return results.ToString()
End Using
$vbLabelText   $csharpLabel

That is 30+ lines for a single PDF. Adding another NuGet dependency (PdfiumViewer), manual page rendering at a chosen DPI, temp file creation and deletion, and iterative accumulation of results — all before PaddleOCR itself fires. A DPI choice too low degrades accuracy; too high increases processing time. The developer owns that decision.

IronOCR Approach

IronOCR reads PDFs natively. No conversion library, no temp files, no per-page loop at the call site:

using IronOcr;

var ocr = new IronTesseract();

// Native PDF input — no external library needed
using var input = new OcrInput();
input.LoadPdf("document.pdf");

var result = ocr.Read(input);

// Per-page access available if needed
foreach (var page in result.Pages)
{
    Console.WriteLine($"Page {page.PageNumber}: {page.Text.Length} chars");
}

// Produce a searchable PDF in one additional line
result.SaveAsSearchablePdf("searchable-output.pdf");
using IronOcr;

var ocr = new IronTesseract();

// Native PDF input — no external library needed
using var input = new OcrInput();
input.LoadPdf("document.pdf");

var result = ocr.Read(input);

// Per-page access available if needed
foreach (var page in result.Pages)
{
    Console.WriteLine($"Page {page.PageNumber}: {page.Text.Length} chars");
}

// Produce a searchable PDF in one additional line
result.SaveAsSearchablePdf("searchable-output.pdf");
Imports IronOcr

Dim ocr As New IronTesseract()

' Native PDF input — no external library needed
Using input As New OcrInput()
    input.LoadPdf("document.pdf")

    Dim result = ocr.Read(input)

    ' Per-page access available if needed
    For Each page In result.Pages
        Console.WriteLine($"Page {page.PageNumber}: {page.Text.Length} chars")
    Next

    ' Produce a searchable PDF in one additional line
    result.SaveAsSearchablePdf("searchable-output.pdf")
End Using
$vbLabelText   $csharpLabel

Password-protected PDFs use the same LoadPdf path with a Password parameter. The PDF input guide and searchable PDF guide cover multi-page and output options in detail. For organizations digitizing scanned document archives — where every file arrives as a PDF — this difference eliminates a significant layer of integration code.

API Mapping Reference

PaddleOCR (Sdcb) IronOCR Equivalent Notes
Sdcb.PaddleOCR IronOcr Namespace
PaddleOcrAll IronTesseract Main OCR orchestration class
FullOcrModel N/A No equivalent — models bundled
OnlineFullModels.ChineseV4.DownloadAsync() dotnet add package IronOcr.Languages.ChineseSimplified Model acquisition
LocalDetectionModel.FromDirectory(path) N/A No model path management
LocalClassificationModel.FromDirectory(path) N/A No model path management
LocalRecognitionModel.FromDirectory(path) N/A No model path management
new PaddleOcrAll(models) new IronTesseract() Engine instantiation
new PaddleOcrAll(models, PaddleDevice.Gpu(0)) N/A (CPU-optimized) GPU device selection
Cv2.ImRead(imagePath) (OpenCvSharp) new OcrInput(imagePath) or input.LoadImage(path) Image loading
ocr.Run(mat) ocr.Read(input) Execute OCR
result.Text result.Text Full document text
result.Regions result.Words / result.Lines Structured text regions
region.Text word.Text Text from a region
region.Score word.Confidence Confidence value (0-1 vs 0-100)
region.Rect.Center.X / .Center.Y word.X / word.Y Bounding box position
region.Rect.Size.Width / .Height word.Width / word.Height Bounding box dimensions
AllowRotateDetection = true input.Deskew() Rotation handling
Enable180Classification = true Automatic Upside-down text detection
N/A input.DeNoise() Noise removal
N/A input.Contrast() Contrast enhancement
N/A input.EnhanceResolution(300) DPI scaling
N/A input.LoadPdf(path) Native PDF input
N/A result.SaveAsSearchablePdf(path) Searchable PDF output
N/A ocr.AddSecondaryLanguage(lang) Multi-language simultaneously

When Teams Consider Moving from PaddleOCR to IronOCR

When Language Requirements Expand Beyond 14

A project begins processing Chinese invoices — PaddleOCR handles it well. Then the customer base expands and documents arrive in Polish, Dutch, Greek, Vietnamese, and Turkish. PaddleOCR's 14 supported languages do not cover any of these. The migration becomes necessary because the tool cannot grow with the use case. Adding Polish support in IronOCR is a single NuGet package addition and a one-line config change. Teams that scope a project as "initially Chinese-heavy but eventually multi-regional" need to decide upfront whether to start with a tool that can scale language coverage without a rewrite.

When the Setup Cost Exceeds the Problem Budget

The PaddleOCR documentation estimates first-time setup at 30-60 minutes for CPU and 2-8 hours for GPU. Developer time at $100/hour means the first GPU configuration attempt costs $200-800 before the first production image is processed. IronOCR setup is a single install command and a four-line code sample — under five minutes. For internal tools, prototypes, or projects where OCR is a secondary feature rather than the core product, that time difference determines whether the OCR feature ships in the current sprint or gets deprioritized. Getting started with IronOCR takes less time than configuring a CUDA environment.

When PDF Workflows Dominate

Organizations digitizing scanned document archives, processing fax-to-PDF pipelines, or extracting data from PDF-format invoices and contracts face a structural problem with PaddleOCR: every PDF requires an additional library, manual page rendering, temp file management, and DPI selection logic before PaddleOCR can process it. That is not incidental complexity — it is permanent maintenance. Teams that discover their workload is 80%+ PDF-based after initial PaddleOCR integration typically migrate when the PDF conversion layer becomes a source of bugs rather than when PaddleOCR itself fails.

When Compliance Reviews Flag the Baidu Origin

Enterprise security teams in government, defense, healthcare, and financial services organizations increasingly apply review criteria to components with origins in Chinese-state-connected companies. PaddleOCR is a Baidu project; model downloads connect to Baidu Cloud Storage by default; the PaddlePaddle framework is a Baidu product. The code runs locally and documents are not transmitted, but the model artifacts originate from Baidu infrastructure. For FedRAMP, ITAR, or CMMC environments, that provenance triggers a compliance review that can block adoption. IronOCR is a product of Iron Software, a US-based company, processes all content locally, makes no external connections, and requires no model downloads from any third-party server.

When GPU Infrastructure Is Not in Place

PaddleOCR on CPU (300-500ms/image, 3-5 second cold start, 500MB-1GB memory) is slower and heavier than IronOCR on CPU (150-300ms/image, under 1 second cold start, 100-200MB memory). The GPU advantage of PaddleOCR only materializes after CUDA/cuDNN setup, and that advantage only justifies the operational complexity at high volume. Teams running standard cloud infrastructure without GPU instances — EC2 without GPU, Azure App Service, AWS Lambda — are paying the PaddleOCR setup cost without receiving the performance benefit that justifies it.

Common Migration Considerations

Package and Namespace Replacement

The package swap removes five dependencies and adds one. Remove Sdcb.PaddleOCR, Sdcb.PaddleOCR.Models.Online, the Sdcb.PaddleInference.runtime.* package, OpenCvSharp4, and OpenCvSharp4.runtime.win from the .csproj. Add IronOcr. Add language packs as NuGet packages for any non-English language that was previously using a downloaded model set.

The using Sdcb.PaddleOCR and using OpenCvSharp directives replace with using IronOcr. The Mat mat = Cv2.ImRead(path) pattern replaces with new OcrInput(path) or input.LoadImage(path). The ocr.Run(mat) call becomes ocr.Read(input). For projects using the image input API, the OcrInput object accepts file paths, byte arrays, streams, and System.Drawing.Bitmap — the OpenCvSharp intermediary disappears entirely.

Result Structure Remapping

PaddleOCR returns result.Regions — an array of PaddleOcrResultRegion objects where region.Score is a float from 0 to 1. IronOCR returns result.Words — where word.Confidence is a percentage from 0 to 100. The confidence filter r.Score >= 0.8 maps to w.Confidence >= 80. Bounding box access changes from region.Rect.Center.X / region.Rect.Size.Width to word.X / word.Width. The structured results guide covers pages, paragraphs, lines, and words.

Spatial sorting (order by Y then X to approximate reading order) is less commonly needed with IronOCR because the Tesseract layout engine performs reading-order analysis as part of its page segmentation step. For documents where PaddleOCR required explicit sort logic, test whether IronOCR's default output order already meets the requirement before porting the sort code.

Preprocessing Transition

PaddleOCR's deep learning detection model tolerates moderate image quality degradation — the neural network handles noise and skew more reliably than Tesseract's classical segmentation pipeline. When migrating to IronOCR, add explicit preprocessing for documents that were previously processed successfully without it:

using IronOcr;

var ocr = new IronTesseract();

using var input = new OcrInput();
input.LoadImage("low-quality-scan.png");

// These filters recover the accuracy that PaddleOCR provided automatically
// on noisy or skewed inputs
input.Deskew();
input.DeNoise();
input.Contrast();
input.Binarize();
input.EnhanceResolution(300);

var result = ocr.Read(input);
using IronOcr;

var ocr = new IronTesseract();

using var input = new OcrInput();
input.LoadImage("low-quality-scan.png");

// These filters recover the accuracy that PaddleOCR provided automatically
// on noisy or skewed inputs
input.Deskew();
input.DeNoise();
input.Contrast();
input.Binarize();
input.EnhanceResolution(300);

var result = ocr.Read(input);
Imports IronOcr

Dim ocr As New IronTesseract()

Using input As New OcrInput()
    input.LoadImage("low-quality-scan.png")

    ' These filters recover the accuracy that PaddleOCR provided automatically
    ' on noisy or skewed inputs
    input.Deskew()
    input.DeNoise()
    input.Contrast()
    input.Binarize()
    input.EnhanceResolution(300)

    Dim result = ocr.Read(input)
End Using
$vbLabelText   $csharpLabel

For scanned documents at non-standard angles or with background noise, these four filters recover most of the accuracy gap. The image orientation correction guide and image color correction guide document each filter's effect on common scan artifacts. Run representative documents through both pipelines before finalizing the migration to verify accuracy meets requirements on the actual document set.

Deployment Cleanup

After migration, the deployment artifact shrinks significantly. Delete the models/ directory tree (three subdirectories, ~21MB of model files), paddle_inference.dll (~200MB), paddle2onnx.dll (~5MB), and the opencv_world*.dll files (~50MB combined). Remove the apt-get install libopencv-dev line from the Dockerfile and the model-copy step. Remove any CI/CD steps that pre-download models or cache model directories. The Docker image drops from ~1.5GB to ~400MB.

Additional IronOCR Capabilities

Beyond the comparison points covered above, IronOCR provides features that extend OCR beyond text extraction:

  • Barcode reading during OCR: Set ocr.Configuration.ReadBarCodes = true and barcodes on the same document are decoded alongside text in a single pass — no separate barcode library needed
  • Region-based OCR: Use CropRectangle to target specific document zones (invoice number field, header row, signature block) without processing the full page
  • Async OCR: Full async/await support for non-blocking integration in ASP.NET controllers and background services
  • Confidence scoring: Document-level and word-level confidence enables quality gating — reject or flag low-confidence results for human review
  • hOCR export: Produce hOCR output for downstream document management systems that consume positional HTML
  • Handwriting recognition: Extends beyond typed document OCR to handwritten forms and notes
  • Table extraction: Structured extraction of tabular data from invoices, statements, and forms
  • Progress tracking: Report OCR progress for long multi-page batch operations in UI contexts

.NET Compatibility and Future Readiness

IronOCR targets .NET 6, 7, 8, and 9 across Windows x64, Windows x86, Linux x64, and macOS. It runs on Azure App Service, AWS Lambda, Docker containers, and air-gapped on-premise environments — all from the same NuGet package. The library receives regular updates from Iron Software aligned with each .NET release cycle, including planned compatibility with .NET 10 when released. PaddleOCR's .NET wrapper is a community-maintained project; compatibility with new .NET versions depends on a single maintainer's capacity. The wrapper's release timeline and the 3-month lag on new PaddleOCR Python model versions reaching .NET illustrates the dependency on community availability rather than commercial release commitments.

Conclusion

PaddleOCR's CJK accuracy advantage is genuine and specific. For a project whose primary workload is Chinese Simplified documents and whose team has deep learning expertise and existing GPU infrastructure, the multi-package setup cost and CUDA configuration complexity are justified by accuracy figures that Tesseract-based engines do not currently match on dense Chinese character recognition.

The problem is that this specific scenario describes a minority of .NET OCR use cases. Most commercial applications process English and European-language documents where IronOCR's 97-99% accuracy exceeds PaddleOCR's 92-95%. Most development teams do not have CUDA infrastructure pre-configured. Most document pipelines receive PDFs, which PaddleOCR cannot process without an additional library. And most developers working on a .NET project expect a single package install to be the complete installation step — not the first of five packages followed by an async model download and an OpenCV dependency.

For teams that need GPU throughput at scale, PaddleOCR CPU performance (300-500ms) already trails IronOCR CPU performance (150-300ms) — meaning the only scenario where PaddleOCR's performance argument holds is one where CUDA infrastructure is already operational. IronOCR's perpetual license cost is typically recovered within the first week when weighed against GPU configuration time alone.

The opening question is whether Chinese character accuracy at 95%+ justifies the full stack that delivers it. For Chinese-primary workloads in environments with GPU infrastructure and compliance clearance for Baidu-origin software, the answer is yes. For everything else, IronOCR starts working in under five minutes and covers the language, preprocessing, and PDF requirements that most .NET OCR projects actually encounter.

Please notePDFium, PaddleOCR, and Tesseract are registered trademarks of their respective owners. This site is not affiliated with, endorsed by, or sponsored by Baidu, Chromium Project, Google, or PaddlePaddle. All product names, logos, and brands are property of their respective owners. Comparisons are for informational purposes only and reflect publicly available information at the time of writing.

Frequently Asked Questions

What is PaddleOCR?

PaddleOCR is an OCR solution used by developers and enterprises to extract text from images and documents. It is one of several OCR options evaluated alongside IronOCR for .NET application development.

How does IronOCR compare to PaddleOCR for .NET developers?

IronOCR is a NuGet-native .NET OCR library using IronTesseract as its core engine. Compared to PaddleOCR, it offers simpler deployment (no SDK installers), flat-rate pricing, and a clean C# API without COM interop or cloud dependencies.

Is IronOCR easier to set up than PaddleOCR?

IronOCR installs via a single NuGet package. There are no SDK installers, license files to copy, COM components to register, or separate runtime binaries to manage. The entire OCR engine is bundled in the package.

What accuracy differences exist between PaddleOCR and IronOCR?

IronOCR achieves high recognition accuracy for standard business documents, invoices, receipts, and scanned forms. For highly degraded documents or uncommon scripts, accuracy varies by source quality. IronOCR includes image preprocessing filters to improve recognition on low-quality inputs.

Does IronOCR support PDF text extraction?

Yes. IronOCR extracts text from both native PDFs and scanned PDF images in a single call. It also supports multi-page TIFF files, images, and streams. For scanned PDFs, OCR is applied page-by-page with per-page result objects.

How does PaddleOCR licensing compare to IronOCR?

IronOCR uses a flat-rate perpetual license with no per-page or per-scan charges. Organizations processing high document volumes pay the same license cost regardless of volume. Details and volume pricing are on the IronOCR licensing page.

What languages does IronOCR support?

IronOCR supports 127 languages via separate NuGet language packs. Adding a language requires a single 'dotnet add package IronOcr.Languages.{Language}' command. No manual file placement or path configuration is needed.

How do I install IronOCR in a .NET project?

Install via NuGet: 'Install-Package IronOcr' in Package Manager Console or 'dotnet add package IronOcr' in the CLI. Additional language packs are installed the same way. No native SDK installer is required.

Is IronOCR suitable for Docker and containerized deployments, unlike PaddleOCR?

Yes. IronOCR works in Docker containers via its NuGet package. The license key is set via an environment variable. No license files, SDK paths, or volume mounts are required for the OCR engine itself.

Can I try IronOCR before purchasing, compared to PaddleOCR?

Yes. IronOCR trial mode processes documents and returns OCR results with a watermark overlay on output. You can verify accuracy on your own documents before purchasing a license.

Does IronOCR support barcode reading alongside text extraction?

IronOCR focuses on text extraction and OCR. For barcode reading, Iron Software provides IronBarcode as a companion library. Both are available individually or as part of the Iron Suite bundle.

Is it easy to migrate from PaddleOCR to IronOCR?

Migration from PaddleOCR to IronOCR typically involves replacing initialization sequences with IronTesseract instantiation, removing COM lifecycle management, and updating API calls. Most migrations reduce code complexity significantly.

Kannaopat Udonpant
Software Engineer
Before becoming a Software Engineer, Kannapat completed a Environmental Resources PhD from Hokkaido University in Japan. While pursuing his degree, Kannapat also became a member of the Vehicle Robotics Laboratory, which is part of the Department of Bioproduction Engineering. In 2022, he leveraged his C# skills to join Iron Software's engineering ...
Read More

Iron Support Team

We're online 24 hours, 5 days a week.
Chat
Email
Call Me