Skip to footer content
COMPARE TO OTHER COMPONENTS

OCR.space API vs IronOCR: .NET OCR Library

OCR.space has no NuGet package. That single fact defines every integration decision a .NET developer makes when choosing it: you write a custom HttpClient, base64-encode your documents, POST to https://api.ocr.space/parse/image, parse the JSON response manually, catch HTTP 429s, implement exponential backoff, build rate-limit bookkeeping, and define your own exception types — all before a single character of text reaches your application. This article examines what that DIY overhead actually costs in code, in time, and in production reliability, and compares it directly to IronOCR, a native .NET library delivered as a single NuGet package.

Understanding OCR.space

OCR.space is a freemium REST API that processes images and PDFs on remote servers operated by OCR.space. The service positions itself around its free tier: 25,000 requests per month with no credit card required, which appeals to developers experimenting with OCR or building personal projects. There is no NuGet package, no official .NET SDK, and no strongly-typed client library in any language. Every .NET integration is built by hand on top of HttpClient.

The architectural reality for .NET developers:

  • No NuGet package: Zero first-class .NET support. No IntelliSense, no typed models, no integrated error handling.
  • Cloud-only processing: Every document leaves your infrastructure. There is no on-premise option and no self-hosted deployment path.
  • Manual REST integration: Developers encode files to base64, construct FormUrlEncodedContent or MultipartFormDataContent, and parse raw JSON responses.
  • DIY rate limiting: The free tier enforces 60 requests per minute and a hard 500 requests-per-day per IP. Production applications must build this logic themselves.
  • File size constraints: The free tier rejects files over 5 MB. Multi-page PDFs often exceed this limit.
  • Watermarked PDF output: Searchable PDF generation on the free tier embeds OCR.space watermarks, making the output unusable for client delivery or document archival.
  • No SLA: The free tier carries no uptime commitment. Paid plans offer higher limits but the same architectural constraints.

The REST Integration Developers Must Write

OCR.space's documentation points developers at the REST endpoint and leaves them to build everything else. The minimum viable .NET client — before any consideration of production hardening — looks like this:

// OCR.space: what you build before the first line of your actual application
public class OcrSpaceApiClient : IDisposable
{
    private readonly HttpClient _httpClient;
    private readonly string _apiKey;
    private readonly SemaphoreSlim _rateLimiter;
    private const string ApiEndpoint = "https://api.ocr.space/parse/image";
    private const int MaxFileSizeFree = 5 * 1024 * 1024; // 5MB — hard limit on free tier
    private const int RateLimitPerMinute = 60;

    public OcrSpaceApiClient(string apiKey)
    {
        _apiKey = apiKey ?? throw new ArgumentNullException(nameof(apiKey));
        _httpClient = new HttpClient();
        _httpClient.Timeout = TimeSpan.FromSeconds(120);
        _rateLimiter = new SemaphoreSlim(RateLimitPerMinute, RateLimitPerMinute);
    }

    public async Task<OcrResult> ExtractTextFromFileAsync(
        string filePath,
        string language = "eng",
        CancellationToken cancellationToken = default)
    {
        if (!File.Exists(filePath))
            throw new FileNotFoundException("Image file not found", filePath);

        var fileInfo = new FileInfo(filePath);
        if (fileInfo.Length > MaxFileSizeFree)
        {
            throw new InvalidOperationException(
                $"File size {fileInfo.Length / 1024 / 1024}MB exceeds free tier limit of 5MB.");
        }

        await _rateLimiter.WaitAsync(cancellationToken);

        try
        {
            // Document leaves your infrastructure here
            byte[] imageBytes = await File.ReadAllBytesAsync(filePath, cancellationToken);
            string base64Image = Convert.ToBase64String(imageBytes);
            string mimeType = GetMimeType(filePath);

            var formContent = new FormUrlEncodedContent(new[]
            {
                new KeyValuePair<string, string>("apikey", _apiKey),
                new KeyValuePair<string, string>("base64Image", $"data:{mimeType};base64,{base64Image}"),
                new KeyValuePair<string, string>("language", language),
                new KeyValuePair<string, string>("isOverlayRequired", "false"),
                new KeyValuePair<string, string>("filetype", Path.GetExtension(filePath).TrimStart('.')),
                new KeyValuePair<string, string>("detectOrientation", "true"),
                new KeyValuePair<string, string>("scale", "true"),
                new KeyValuePair<string, string>("OCREngine", "2")
            });

            var response = await _httpClient.PostAsync(ApiEndpoint, formContent, cancellationToken);

            if (!response.IsSuccessStatusCode)
            {
                string errorBody = await response.Content.ReadAsStringAsync(cancellationToken);
                throw new OcrSpaceException(
                    $"OCR.space API returned {response.StatusCode}: {errorBody}",
                    (int)response.StatusCode);
            }

            string jsonResponse = await response.Content.ReadAsStringAsync(cancellationToken);
            return ParseOcrResponse(jsonResponse);
        }
        finally
        {
            _ = Task.Run(async () =>
            {
                await Task.Delay(60000 / RateLimitPerMinute);
                _rateLimiter.Release();
            });
        }
    }

    private OcrResult ParseOcrResponse(string jsonResponse)
    {
        using var doc = JsonDocument.Parse(jsonResponse);
        var root = doc.RootElement;

        if (root.TryGetProperty("IsErroredOnProcessing", out var isErrored) && isErrored.GetBoolean())
        {
            string errorMessage = "OCR processing failed";
            if (root.TryGetProperty("ErrorMessage", out var errorMessages))
            {
                // Parse the array of error strings manually
                var messages = new List<string>();
                if (errorMessages.ValueKind == JsonValueKind.Array)
                {
                    foreach (var msg in errorMessages.EnumerateArray())
                        messages.Add(msg.GetString() ?? "Unknown error");
                }
                errorMessage = string.Join("; ", messages);
            }
            throw new OcrSpaceException(errorMessage, 500);
        }

        var result = new OcrResult();

        if (root.TryGetProperty("ParsedResults", out var parsedResults))
        {
            foreach (var parsedResult in parsedResults.EnumerateArray())
            {
                if (parsedResult.TryGetProperty("ParsedText", out var parsedText))
                    result.ParsedText += parsedText.GetString();

                if (parsedResult.TryGetProperty("FileParseExitCode", out var exitCode))
                    result.ExitCodes.Add(exitCode.GetInt32());
            }
        }

        return result;
    }

    private string GetMimeType(string filePath) =>
        Path.GetExtension(filePath).ToLowerInvariant() switch
        {
            ".png"          => "image/png",
            ".jpg" or ".jpeg" => "image/jpeg",
            ".bmp"          => "image/bmp",
            ".tiff" or ".tif" => "image/tiff",
            ".pdf"          => "application/pdf",
            _               => "application/octet-stream"
        };

    public void Dispose()
    {
        _httpClient?.Dispose();
        _rateLimiter?.Dispose();
    }
}

// Custom models — no SDK means you define these yourself
public class OcrResult
{
    public string ParsedText { get; set; } = string.Empty;
    public List<string> Warnings { get; } = new();
    public List<int> ExitCodes { get; } = new();
}

public class OcrSpaceException : Exception
{
    public int StatusCode { get; }
    public OcrSpaceException(string message, int statusCode) : base(message)
        => StatusCode = statusCode;
}
// OCR.space: what you build before the first line of your actual application
public class OcrSpaceApiClient : IDisposable
{
    private readonly HttpClient _httpClient;
    private readonly string _apiKey;
    private readonly SemaphoreSlim _rateLimiter;
    private const string ApiEndpoint = "https://api.ocr.space/parse/image";
    private const int MaxFileSizeFree = 5 * 1024 * 1024; // 5MB — hard limit on free tier
    private const int RateLimitPerMinute = 60;

    public OcrSpaceApiClient(string apiKey)
    {
        _apiKey = apiKey ?? throw new ArgumentNullException(nameof(apiKey));
        _httpClient = new HttpClient();
        _httpClient.Timeout = TimeSpan.FromSeconds(120);
        _rateLimiter = new SemaphoreSlim(RateLimitPerMinute, RateLimitPerMinute);
    }

    public async Task<OcrResult> ExtractTextFromFileAsync(
        string filePath,
        string language = "eng",
        CancellationToken cancellationToken = default)
    {
        if (!File.Exists(filePath))
            throw new FileNotFoundException("Image file not found", filePath);

        var fileInfo = new FileInfo(filePath);
        if (fileInfo.Length > MaxFileSizeFree)
        {
            throw new InvalidOperationException(
                $"File size {fileInfo.Length / 1024 / 1024}MB exceeds free tier limit of 5MB.");
        }

        await _rateLimiter.WaitAsync(cancellationToken);

        try
        {
            // Document leaves your infrastructure here
            byte[] imageBytes = await File.ReadAllBytesAsync(filePath, cancellationToken);
            string base64Image = Convert.ToBase64String(imageBytes);
            string mimeType = GetMimeType(filePath);

            var formContent = new FormUrlEncodedContent(new[]
            {
                new KeyValuePair<string, string>("apikey", _apiKey),
                new KeyValuePair<string, string>("base64Image", $"data:{mimeType};base64,{base64Image}"),
                new KeyValuePair<string, string>("language", language),
                new KeyValuePair<string, string>("isOverlayRequired", "false"),
                new KeyValuePair<string, string>("filetype", Path.GetExtension(filePath).TrimStart('.')),
                new KeyValuePair<string, string>("detectOrientation", "true"),
                new KeyValuePair<string, string>("scale", "true"),
                new KeyValuePair<string, string>("OCREngine", "2")
            });

            var response = await _httpClient.PostAsync(ApiEndpoint, formContent, cancellationToken);

            if (!response.IsSuccessStatusCode)
            {
                string errorBody = await response.Content.ReadAsStringAsync(cancellationToken);
                throw new OcrSpaceException(
                    $"OCR.space API returned {response.StatusCode}: {errorBody}",
                    (int)response.StatusCode);
            }

            string jsonResponse = await response.Content.ReadAsStringAsync(cancellationToken);
            return ParseOcrResponse(jsonResponse);
        }
        finally
        {
            _ = Task.Run(async () =>
            {
                await Task.Delay(60000 / RateLimitPerMinute);
                _rateLimiter.Release();
            });
        }
    }

    private OcrResult ParseOcrResponse(string jsonResponse)
    {
        using var doc = JsonDocument.Parse(jsonResponse);
        var root = doc.RootElement;

        if (root.TryGetProperty("IsErroredOnProcessing", out var isErrored) && isErrored.GetBoolean())
        {
            string errorMessage = "OCR processing failed";
            if (root.TryGetProperty("ErrorMessage", out var errorMessages))
            {
                // Parse the array of error strings manually
                var messages = new List<string>();
                if (errorMessages.ValueKind == JsonValueKind.Array)
                {
                    foreach (var msg in errorMessages.EnumerateArray())
                        messages.Add(msg.GetString() ?? "Unknown error");
                }
                errorMessage = string.Join("; ", messages);
            }
            throw new OcrSpaceException(errorMessage, 500);
        }

        var result = new OcrResult();

        if (root.TryGetProperty("ParsedResults", out var parsedResults))
        {
            foreach (var parsedResult in parsedResults.EnumerateArray())
            {
                if (parsedResult.TryGetProperty("ParsedText", out var parsedText))
                    result.ParsedText += parsedText.GetString();

                if (parsedResult.TryGetProperty("FileParseExitCode", out var exitCode))
                    result.ExitCodes.Add(exitCode.GetInt32());
            }
        }

        return result;
    }

    private string GetMimeType(string filePath) =>
        Path.GetExtension(filePath).ToLowerInvariant() switch
        {
            ".png"          => "image/png",
            ".jpg" or ".jpeg" => "image/jpeg",
            ".bmp"          => "image/bmp",
            ".tiff" or ".tif" => "image/tiff",
            ".pdf"          => "application/pdf",
            _               => "application/octet-stream"
        };

    public void Dispose()
    {
        _httpClient?.Dispose();
        _rateLimiter?.Dispose();
    }
}

// Custom models — no SDK means you define these yourself
public class OcrResult
{
    public string ParsedText { get; set; } = string.Empty;
    public List<string> Warnings { get; } = new();
    public List<int> ExitCodes { get; } = new();
}

public class OcrSpaceException : Exception
{
    public int StatusCode { get; }
    public OcrSpaceException(string message, int statusCode) : base(message)
        => StatusCode = statusCode;
}
Imports System
Imports System.Collections.Generic
Imports System.IO
Imports System.Net.Http
Imports System.Text.Json
Imports System.Threading
Imports System.Threading.Tasks

' OCR.space: what you build before the first line of your actual application
Public Class OcrSpaceApiClient
    Implements IDisposable

    Private ReadOnly _httpClient As HttpClient
    Private ReadOnly _apiKey As String
    Private ReadOnly _rateLimiter As SemaphoreSlim
    Private Const ApiEndpoint As String = "https://api.ocr.space/parse/image"
    Private Const MaxFileSizeFree As Integer = 5 * 1024 * 1024 ' 5MB — hard limit on free tier
    Private Const RateLimitPerMinute As Integer = 60

    Public Sub New(apiKey As String)
        _apiKey = If(apiKey, Throw New ArgumentNullException(NameOf(apiKey)))
        _httpClient = New HttpClient()
        _httpClient.Timeout = TimeSpan.FromSeconds(120)
        _rateLimiter = New SemaphoreSlim(RateLimitPerMinute, RateLimitPerMinute)
    End Sub

    Public Async Function ExtractTextFromFileAsync(filePath As String, Optional language As String = "eng", Optional cancellationToken As CancellationToken = Nothing) As Task(Of OcrResult)
        If Not File.Exists(filePath) Then
            Throw New FileNotFoundException("Image file not found", filePath)
        End If

        Dim fileInfo = New FileInfo(filePath)
        If fileInfo.Length > MaxFileSizeFree Then
            Throw New InvalidOperationException($"File size {fileInfo.Length \ 1024 \ 1024}MB exceeds free tier limit of 5MB.")
        End If

        Await _rateLimiter.WaitAsync(cancellationToken)

        Try
            ' Document leaves your infrastructure here
            Dim imageBytes As Byte() = Await File.ReadAllBytesAsync(filePath, cancellationToken)
            Dim base64Image As String = Convert.ToBase64String(imageBytes)
            Dim mimeType As String = GetMimeType(filePath)

            Dim formContent = New FormUrlEncodedContent(New KeyValuePair(Of String, String)() {
                New KeyValuePair(Of String, String)("apikey", _apiKey),
                New KeyValuePair(Of String, String)("base64Image", $"data:{mimeType};base64,{base64Image}"),
                New KeyValuePair(Of String, String)("language", language),
                New KeyValuePair(Of String, String)("isOverlayRequired", "false"),
                New KeyValuePair(Of String, String)("filetype", Path.GetExtension(filePath).TrimStart("."c)),
                New KeyValuePair(Of String, String)("detectOrientation", "true"),
                New KeyValuePair(Of String, String)("scale", "true"),
                New KeyValuePair(Of String, String)("OCREngine", "2")
            })

            Dim response = Await _httpClient.PostAsync(ApiEndpoint, formContent, cancellationToken)

            If Not response.IsSuccessStatusCode Then
                Dim errorBody = Await response.Content.ReadAsStringAsync(cancellationToken)
                Throw New OcrSpaceException($"OCR.space API returned {response.StatusCode}: {errorBody}", CInt(response.StatusCode))
            End If

            Dim jsonResponse = Await response.Content.ReadAsStringAsync(cancellationToken)
            Return ParseOcrResponse(jsonResponse)
        Finally
            _ = Task.Run(Async Function()
                             Await Task.Delay(60000 \ RateLimitPerMinute)
                             _rateLimiter.Release()
                         End Function)
        End Try
    End Function

    Private Function ParseOcrResponse(jsonResponse As String) As OcrResult
        Using doc = JsonDocument.Parse(jsonResponse)
            Dim root = doc.RootElement

            If root.TryGetProperty("IsErroredOnProcessing", ByRef isErrored) AndAlso isErrored.GetBoolean() Then
                Dim errorMessage As String = "OCR processing failed"
                If root.TryGetProperty("ErrorMessage", ByRef errorMessages) Then
                    ' Parse the array of error strings manually
                    Dim messages = New List(Of String)()
                    If errorMessages.ValueKind = JsonValueKind.Array Then
                        For Each msg In errorMessages.EnumerateArray()
                            messages.Add(msg.GetString() OrElse "Unknown error")
                        Next
                    End If
                    errorMessage = String.Join("; ", messages)
                End If
                Throw New OcrSpaceException(errorMessage, 500)
            End If

            Dim result = New OcrResult()

            If root.TryGetProperty("ParsedResults", ByRef parsedResults) Then
                For Each parsedResult In parsedResults.EnumerateArray()
                    If parsedResult.TryGetProperty("ParsedText", ByRef parsedText) Then
                        result.ParsedText &= parsedText.GetString()
                    End If

                    If parsedResult.TryGetProperty("FileParseExitCode", ByRef exitCode) Then
                        result.ExitCodes.Add(exitCode.GetInt32())
                    End If
                Next
            End If

            Return result
        End Using
    End Function

    Private Function GetMimeType(filePath As String) As String
        Return Path.GetExtension(filePath).ToLowerInvariant() Select Case
            Case ".png" : Return "image/png"
            Case ".jpg", ".jpeg" : Return "image/jpeg"
            Case ".bmp" : Return "image/bmp"
            Case ".tiff", ".tif" : Return "image/tiff"
            Case ".pdf" : Return "application/pdf"
            Case Else : Return "application/octet-stream"
        End Select
    End Function

    Public Sub Dispose() Implements IDisposable.Dispose
        _httpClient?.Dispose()
        _rateLimiter?.Dispose()
    End Sub
End Class

' Custom models — no SDK means you define these yourself
Public Class OcrResult
    Public Property ParsedText As String = String.Empty
    Public ReadOnly Property Warnings As New List(Of String)()
    Public ReadOnly Property ExitCodes As New List(Of Integer)()
End Class

Public Class OcrSpaceException
    Inherits Exception

    Public ReadOnly Property StatusCode As Integer

    Public Sub New(message As String, statusCode As Integer)
        MyBase.New(message)
        Me.StatusCode = statusCode
    End Sub
End Class
$vbLabelText   $csharpLabel

This is 90+ lines of infrastructure code that exists before the first business-logic method. Every OCR.space integration ships this boilerplate (or something equivalent), and every team that writes it is solving the same problem OCR.space chose not to solve for them.

Understanding IronOCR

IronOCR is a commercial OCR library for .NET, installed via the IronOcr NuGet package. It wraps an optimized Tesseract 5 engine with automatic preprocessing, native PDF input, searchable PDF output, 125+ language packs, and structured result extraction — all without cloud dependency, API keys, or per-request charges. Documents are processed locally on whatever infrastructure runs the .NET application.

Key characteristics:

  • Single NuGet package: dotnet add package IronOcr is the complete installation. No native binaries to manage, no tessdata directories, no environment variables.
  • Strongly-typed API: IronTesseract, OcrInput, OcrResult, and CropRectangle are first-class .NET types with full IntelliSense support.
  • Automatic preprocessing: Deskew, DeNoise, Contrast, Binarize, and EnhanceResolution filters apply automatically or on demand without external libraries.
  • Native PDF support: Both reading scanned PDFs and generating searchable PDFs are built-in. No size limits beyond available memory.
  • Local execution: No network call occurs during OCR. The library runs entirely in-process. Air-gapped environments work without configuration changes.
  • Thread-safe: Multiple IronTesseract instances run concurrently without locking or contention management.
  • Perpetual licensing: $999 Lite / $1,499 Plus / $2,999 Professional. No per-request charges at any volume.

Feature Comparison

Feature OCR.space IronOCR
NuGet package None — REST API only IronOcr — full .NET support
Processing location OCR.space cloud servers Local — your infrastructure
Free tier 25,000 req/month (limited) Trial available
Paid pricing Contact OCR.space for current pricing $999–$2,999 one-time perpetual
Rate limiting 60 req/min, 500/day (free) None
Offline capability No Yes — fully air-gapped
Data privacy Documents sent to third party Documents stay on your servers

Detailed Feature Comparison

Feature OCR.space IronOCR
Integration
NuGet package None Yes (IronOcr)
Strongly-typed models None — manual JSON parsing Yes (OcrResult, OcrInput)
IntelliSense support None Full
Custom exception types Must define yourself Built-in
Retry logic Must build yourself Built-in
Processing
Processing location Cloud servers Local in-process
Internet required Yes — every call No
Air-gapped support No Yes
Rate limits Yes — all plans None
File size limits 5 MB (free tier) Memory only
OCR Capability
Image OCR Yes Yes
PDF OCR Yes (with limits) Yes (native, no size limit)
Searchable PDF output Watermarked on free tier Clean output, no watermarks
Automatic preprocessing None (server-side only) Deskew, DeNoise, Contrast, Binarize, EnhanceResolution
Manual preprocessing control None Full filter API
Language support ~25 languages 125+ via NuGet language packs
Multi-language per document Limited Yes — primary + secondary languages
Structured output (words/lines/pages) No — plain text only Yes — pages, paragraphs, lines, words with coordinates
Confidence scores No Yes — per result and per word
Region-based OCR No Yes — CropRectangle
Barcode reading during OCR No Yes — ReadBarCodes = true
Deployment
Docker Requires outbound internet Works out of the box
Linux Requires outbound internet Fully supported
HIPAA-compatible Risk — documents leave your control Yes — no external transmission
GDPR-compatible Risk — EU data transfer unclear Yes — no third-party data handling
Pricing
Pricing model Monthly subscription One-time perpetual license
Entry price Contact OCR.space for current pricing $999 one-time
Per-document cost at scale Yes None
SLA None (free), varies (paid) Enterprise SLA available

SDK Depth vs. Raw HTTP: The Integration Cost

The gap between OCR.space and a native SDK is not a style preference. It is measured in hours of development time, surface area for bugs, and maintenance burden across every deployment.

OCR.space Approach

Every .NET developer using OCR.space writes their own HTTP client. The simplest possible implementation — basic text extraction with no retry logic, no batch support, and no preprocessing — is still 50+ lines:

// OCR.space: minimum viable extraction — 50+ lines before business logic
public class OcrSpaceBasicExtraction
{
    private readonly HttpClient _httpClient;
    private readonly string _apiKey;

    public OcrSpaceBasicExtraction(string apiKey)
    {
        _apiKey = apiKey;
        _httpClient = new HttpClient();
    }

    public async Task<string> ExtractText(string imagePath)
    {
        // Read file and encode — document leaves your infrastructure
        byte[] imageBytes = File.ReadAllBytes(imagePath);
        string base64 = Convert.ToBase64String(imageBytes);

        var content = new FormUrlEncodedContent(new[]
        {
            new KeyValuePair<string, string>("apikey", _apiKey),
            new KeyValuePair<string, string>("base64Image", $"data:image/png;base64,{base64}"),
            new KeyValuePair<string, string>("language", "eng")
        });

        // Send to cloud
        var response = await _httpClient.PostAsync(
            "https://api.ocr.space/parse/image", content);

        if (!response.IsSuccessStatusCode)
            throw new Exception($"API error: {response.StatusCode}");

        // Parse JSON manually — no typed models
        string json = await response.Content.ReadAsStringAsync();
        using var doc = JsonDocument.Parse(json);

        return doc.RootElement
            .GetProperty("ParsedResults")[0]
            .GetProperty("ParsedText")
            .GetString() ?? string.Empty;
    }
}
// OCR.space: minimum viable extraction — 50+ lines before business logic
public class OcrSpaceBasicExtraction
{
    private readonly HttpClient _httpClient;
    private readonly string _apiKey;

    public OcrSpaceBasicExtraction(string apiKey)
    {
        _apiKey = apiKey;
        _httpClient = new HttpClient();
    }

    public async Task<string> ExtractText(string imagePath)
    {
        // Read file and encode — document leaves your infrastructure
        byte[] imageBytes = File.ReadAllBytes(imagePath);
        string base64 = Convert.ToBase64String(imageBytes);

        var content = new FormUrlEncodedContent(new[]
        {
            new KeyValuePair<string, string>("apikey", _apiKey),
            new KeyValuePair<string, string>("base64Image", $"data:image/png;base64,{base64}"),
            new KeyValuePair<string, string>("language", "eng")
        });

        // Send to cloud
        var response = await _httpClient.PostAsync(
            "https://api.ocr.space/parse/image", content);

        if (!response.IsSuccessStatusCode)
            throw new Exception($"API error: {response.StatusCode}");

        // Parse JSON manually — no typed models
        string json = await response.Content.ReadAsStringAsync();
        using var doc = JsonDocument.Parse(json);

        return doc.RootElement
            .GetProperty("ParsedResults")[0]
            .GetProperty("ParsedText")
            .GetString() ?? string.Empty;
    }
}
Imports System
Imports System.Net.Http
Imports System.Threading.Tasks
Imports System.IO
Imports System.Text.Json

Public Class OcrSpaceBasicExtraction
    Private ReadOnly _httpClient As HttpClient
    Private ReadOnly _apiKey As String

    Public Sub New(apiKey As String)
        _apiKey = apiKey
        _httpClient = New HttpClient()
    End Sub

    Public Async Function ExtractText(imagePath As String) As Task(Of String)
        ' Read file and encode — document leaves your infrastructure
        Dim imageBytes As Byte() = File.ReadAllBytes(imagePath)
        Dim base64 As String = Convert.ToBase64String(imageBytes)

        Dim content = New FormUrlEncodedContent(New KeyValuePair(Of String, String)() {
            New KeyValuePair(Of String, String)("apikey", _apiKey),
            New KeyValuePair(Of String, String)("base64Image", $"data:image/png;base64,{base64}"),
            New KeyValuePair(Of String, String)("language", "eng")
        })

        ' Send to cloud
        Dim response = Await _httpClient.PostAsync("https://api.ocr.space/parse/image", content)

        If Not response.IsSuccessStatusCode Then
            Throw New Exception($"API error: {response.StatusCode}")
        End If

        ' Parse JSON manually — no typed models
        Dim json As String = Await response.Content.ReadAsStringAsync()
        Using doc As JsonDocument = JsonDocument.Parse(json)
            Return doc.RootElement _
                .GetProperty("ParsedResults")(0) _
                .GetProperty("ParsedText") _
                .GetString() OrElse String.Empty
        End Using
    End Function
End Class
$vbLabelText   $csharpLabel

Add batch processing and the line count climbs to 80+, because every batch request needs rate-limit throttling, per-request retry logic with exponential backoff, and explicit handling for HTTP 429 responses. Add PDF processing and you layer on a 5 MB file size check before every call, because the free tier will reject larger files with an error rather than a warning.

IronOCR Approach

Installing IronOCR from NuGet replaces all of that infrastructure code with method calls that already exist:

// IronOCR: complete implementation
var result = new IronTesseract().Read("invoice.png");
Console.WriteLine(result.Text);
// IronOCR: complete implementation
var result = new IronTesseract().Read("invoice.png");
Console.WriteLine(result.Text);
$vbLabelText   $csharpLabel

The IronTesseract class handles file reading, preprocessing, engine execution, and result construction. No HTTP client. No base64 encoding. No JSON parsing. No API key. No rate limiter. The reading text from images tutorial covers input variations — file paths, byte arrays, streams, bitmaps — that all follow the same one-call pattern.

The code complexity numbers from the source files are not hypothetical:

Scenario OCR.space Lines IronOCR Lines
Basic extraction 50+ 3
PDF processing 60+ 12
Batch processing 80+ 15
Error handling 70+ 15
Full client infrastructure 200+ 0 (NuGet provides it)

Batch Processing and Rate Limits

The rate-limit problem is where OCR.space's free tier most visibly breaks down under real production conditions.

OCR.space Approach

Batch processing under OCR.space requires building and managing a SemaphoreSlim to stay within the 60 requests-per-minute constraint, plus exponential-backoff retry logic for HTTP 429 responses that arrive when the semaphore logic is imprecise or when shared infrastructure shares an IP address:

// OCR.space batch: 80+ lines of rate-limit plumbing before actual work
public class OcrSpaceBatchProcessing
{
    private readonly HttpClient _httpClient;
    private readonly string _apiKey;
    private readonly SemaphoreSlim _rateLimiter;

    public OcrSpaceBatchProcessing(string apiKey)
    {
        _apiKey = apiKey;
        _httpClient = new HttpClient();
        _rateLimiter = new SemaphoreSlim(60, 60); // Free tier: 60/minute
    }

    public async Task<Dictionary<string, string>> ProcessBatch(string[] imagePaths)
    {
        var results = new Dictionary<string, string>();

        foreach (var imagePath in imagePaths)
        {
            await _rateLimiter.WaitAsync();

            try
            {
                string text = await ProcessWithRetry(imagePath);
                results[imagePath] = text;
            }
            finally
            {
                _ = Task.Run(async () =>
                {
                    await Task.Delay(1000); // 1 second spacing
                    _rateLimiter.Release();
                });
            }
        }

        return results;
    }

    private async Task<string> ProcessWithRetry(string imagePath, int maxRetries = 3)
    {
        for (int attempt = 0; attempt < maxRetries; attempt++)
        {
            try
            {
                byte[] imageBytes = File.ReadAllBytes(imagePath);
                string base64 = Convert.ToBase64String(imageBytes);

                var content = new FormUrlEncodedContent(new[]
                {
                    new KeyValuePair<string, string>("apikey", _apiKey),
                    new KeyValuePair<string, string>("base64Image", $"data:image/png;base64,{base64}"),
                    new KeyValuePair<string, string>("language", "eng")
                });

                var response = await _httpClient.PostAsync(
                    "https://api.ocr.space/parse/image", content);

                if (response.StatusCode == System.Net.HttpStatusCode.TooManyRequests)
                {
                    // Rate limited — exponential backoff
                    await Task.Delay(TimeSpan.FromSeconds(Math.Pow(2, attempt)));
                    continue;
                }

                response.EnsureSuccessStatusCode();

                string json = await response.Content.ReadAsStringAsync();
                using var doc = JsonDocument.Parse(json);

                return doc.RootElement
                    .GetProperty("ParsedResults")[0]
                    .GetProperty("ParsedText")
                    .GetString() ?? string.Empty;
            }
            catch (HttpRequestException) when (attempt < maxRetries - 1)
            {
                await Task.Delay(TimeSpan.FromSeconds(Math.Pow(2, attempt)));
            }
        }

        throw new Exception($"Failed to process {imagePath} after {maxRetries} attempts");
    }
}
// OCR.space batch: 80+ lines of rate-limit plumbing before actual work
public class OcrSpaceBatchProcessing
{
    private readonly HttpClient _httpClient;
    private readonly string _apiKey;
    private readonly SemaphoreSlim _rateLimiter;

    public OcrSpaceBatchProcessing(string apiKey)
    {
        _apiKey = apiKey;
        _httpClient = new HttpClient();
        _rateLimiter = new SemaphoreSlim(60, 60); // Free tier: 60/minute
    }

    public async Task<Dictionary<string, string>> ProcessBatch(string[] imagePaths)
    {
        var results = new Dictionary<string, string>();

        foreach (var imagePath in imagePaths)
        {
            await _rateLimiter.WaitAsync();

            try
            {
                string text = await ProcessWithRetry(imagePath);
                results[imagePath] = text;
            }
            finally
            {
                _ = Task.Run(async () =>
                {
                    await Task.Delay(1000); // 1 second spacing
                    _rateLimiter.Release();
                });
            }
        }

        return results;
    }

    private async Task<string> ProcessWithRetry(string imagePath, int maxRetries = 3)
    {
        for (int attempt = 0; attempt < maxRetries; attempt++)
        {
            try
            {
                byte[] imageBytes = File.ReadAllBytes(imagePath);
                string base64 = Convert.ToBase64String(imageBytes);

                var content = new FormUrlEncodedContent(new[]
                {
                    new KeyValuePair<string, string>("apikey", _apiKey),
                    new KeyValuePair<string, string>("base64Image", $"data:image/png;base64,{base64}"),
                    new KeyValuePair<string, string>("language", "eng")
                });

                var response = await _httpClient.PostAsync(
                    "https://api.ocr.space/parse/image", content);

                if (response.StatusCode == System.Net.HttpStatusCode.TooManyRequests)
                {
                    // Rate limited — exponential backoff
                    await Task.Delay(TimeSpan.FromSeconds(Math.Pow(2, attempt)));
                    continue;
                }

                response.EnsureSuccessStatusCode();

                string json = await response.Content.ReadAsStringAsync();
                using var doc = JsonDocument.Parse(json);

                return doc.RootElement
                    .GetProperty("ParsedResults")[0]
                    .GetProperty("ParsedText")
                    .GetString() ?? string.Empty;
            }
            catch (HttpRequestException) when (attempt < maxRetries - 1)
            {
                await Task.Delay(TimeSpan.FromSeconds(Math.Pow(2, attempt)));
            }
        }

        throw new Exception($"Failed to process {imagePath} after {maxRetries} attempts");
    }
}
Imports System
Imports System.Collections.Generic
Imports System.IO
Imports System.Net.Http
Imports System.Text.Json
Imports System.Threading
Imports System.Threading.Tasks

Public Class OcrSpaceBatchProcessing
    Private ReadOnly _httpClient As HttpClient
    Private ReadOnly _apiKey As String
    Private ReadOnly _rateLimiter As SemaphoreSlim

    Public Sub New(apiKey As String)
        _apiKey = apiKey
        _httpClient = New HttpClient()
        _rateLimiter = New SemaphoreSlim(60, 60) ' Free tier: 60/minute
    End Sub

    Public Async Function ProcessBatch(imagePaths As String()) As Task(Of Dictionary(Of String, String))
        Dim results As New Dictionary(Of String, String)()

        For Each imagePath In imagePaths
            Await _rateLimiter.WaitAsync()

            Try
                Dim text As String = Await ProcessWithRetry(imagePath)
                results(imagePath) = text
            Finally
                _ = Task.Run(Async Function()
                                 Await Task.Delay(1000) ' 1 second spacing
                                 _rateLimiter.Release()
                             End Function)
            End Try
        Next

        Return results
    End Function

    Private Async Function ProcessWithRetry(imagePath As String, Optional maxRetries As Integer = 3) As Task(Of String)
        For attempt As Integer = 0 To maxRetries - 1
            Try
                Dim imageBytes As Byte() = File.ReadAllBytes(imagePath)
                Dim base64 As String = Convert.ToBase64String(imageBytes)

                Dim content As New FormUrlEncodedContent(New KeyValuePair(Of String, String)() {
                    New KeyValuePair(Of String, String)("apikey", _apiKey),
                    New KeyValuePair(Of String, String)("base64Image", $"data:image/png;base64,{base64}"),
                    New KeyValuePair(Of String, String)("language", "eng")
                })

                Dim response As HttpResponseMessage = Await _httpClient.PostAsync("https://api.ocr.space/parse/image", content)

                If response.StatusCode = System.Net.HttpStatusCode.TooManyRequests Then
                    ' Rate limited — exponential backoff
                    Await Task.Delay(TimeSpan.FromSeconds(Math.Pow(2, attempt)))
                    Continue For
                End If

                response.EnsureSuccessStatusCode()

                Dim json As String = Await response.Content.ReadAsStringAsync()
                Using doc As JsonDocument = JsonDocument.Parse(json)
                    Return doc.RootElement _
                        .GetProperty("ParsedResults")(0) _
                        .GetProperty("ParsedText") _
                        .GetString() OrElse String.Empty
                End Using
            Catch ex As HttpRequestException When attempt < maxRetries - 1
                Await Task.Delay(TimeSpan.FromSeconds(Math.Pow(2, attempt)))
            End Try
        Next

        Throw New Exception($"Failed to process {imagePath} after {maxRetries} attempts")
    End Function
End Class
$vbLabelText   $csharpLabel

The 500 requests-per-day cap on the free tier further constrains this. An application processing 600 documents in a single day will fail partway through with no automatic rollover until the next calendar day.

IronOCR Approach

IronOCR has no rate limits. Batch processing is a loop:

// IronOCR batch: 8 lines, no rate limits, no retries needed
public Dictionary<string, string> ProcessBatch(string[] imagePaths)
{
    var results = new Dictionary<string, string>();
    var ocr = new IronTesseract();

    foreach (var imagePath in imagePaths)
    {
        // No rate limits, no retries, no cloud dependency
        var result = ocr.Read(imagePath);
        results[imagePath] = result.Text;
    }

    return results;
}
// IronOCR batch: 8 lines, no rate limits, no retries needed
public Dictionary<string, string> ProcessBatch(string[] imagePaths)
{
    var results = new Dictionary<string, string>();
    var ocr = new IronTesseract();

    foreach (var imagePath in imagePaths)
    {
        // No rate limits, no retries, no cloud dependency
        var result = ocr.Read(imagePath);
        results[imagePath] = result.Text;
    }

    return results;
}
Imports System.Collections.Generic

' IronOCR batch: 8 lines, no rate limits, no retries needed
Public Function ProcessBatch(imagePaths As String()) As Dictionary(Of String, String)
    Dim results As New Dictionary(Of String, String)()
    Dim ocr As New IronTesseract()

    For Each imagePath As String In imagePaths
        ' No rate limits, no retries, no cloud dependency
        Dim result = ocr.Read(imagePath)
        results(imagePath) = result.Text
    Next

    Return results
End Function
$vbLabelText   $csharpLabel

Reusing the IronTesseract instance across the batch avoids repeated engine initialization overhead. Processing speed is bounded by local CPU and disk I/O, not by a remote API's throttle policy. The multithreading example shows how to parallelize across cores when throughput matters.

PDF Processing

PDF support illustrates the second major structural gap.

OCR.space Approach

OCR.space accepts PDF uploads on paid plans and partially on the free tier, but the 5 MB file size ceiling on free accounts excludes most multi-page documents. Free-tier searchable PDF output embeds a watermark. The implementation requires per-page result extraction and explicit file size checking before every request:

// OCR.space PDF: size check required, watermarks on free tier
public async Task<List<string>> ExtractTextFromPdf(string pdfPath)
{
    var pageTexts = new List<string>();

    // Reject before wasting a request quota entry
    var fileInfo = new FileInfo(pdfPath);
    if (fileInfo.Length > 5 * 1024 * 1024)
        throw new InvalidOperationException("File exceeds 5MB free tier limit. Upgrade or split PDF.");

    byte[] pdfBytes = File.ReadAllBytes(pdfPath);
    string base64 = Convert.ToBase64String(pdfBytes);

    var content = new FormUrlEncodedContent(new[]
    {
        new KeyValuePair<string, string>("apikey", _apiKey),
        new KeyValuePair<string, string>("base64Image", $"data:application/pdf;base64,{base64}"),
        new KeyValuePair<string, string>("language", "eng"),
        new KeyValuePair<string, string>("filetype", "PDF"),
        new KeyValuePair<string, string>("isCreateSearchablePdf", "false") // Watermarked on free tier
    });

    var response = await _httpClient.PostAsync("https://api.ocr.space/parse/image", content);

    if (!response.IsSuccessStatusCode)
        throw new Exception($"API error: {response.StatusCode}");

    string json = await response.Content.ReadAsStringAsync();
    using var doc = JsonDocument.Parse(json);

    if (doc.RootElement.TryGetProperty("ParsedResults", out var results))
    {
        foreach (var result in results.EnumerateArray())
        {
            if (result.TryGetProperty("ParsedText", out var text))
                pageTexts.Add(text.GetString() ?? string.Empty);
        }
    }

    return pageTexts;
}
// OCR.space PDF: size check required, watermarks on free tier
public async Task<List<string>> ExtractTextFromPdf(string pdfPath)
{
    var pageTexts = new List<string>();

    // Reject before wasting a request quota entry
    var fileInfo = new FileInfo(pdfPath);
    if (fileInfo.Length > 5 * 1024 * 1024)
        throw new InvalidOperationException("File exceeds 5MB free tier limit. Upgrade or split PDF.");

    byte[] pdfBytes = File.ReadAllBytes(pdfPath);
    string base64 = Convert.ToBase64String(pdfBytes);

    var content = new FormUrlEncodedContent(new[]
    {
        new KeyValuePair<string, string>("apikey", _apiKey),
        new KeyValuePair<string, string>("base64Image", $"data:application/pdf;base64,{base64}"),
        new KeyValuePair<string, string>("language", "eng"),
        new KeyValuePair<string, string>("filetype", "PDF"),
        new KeyValuePair<string, string>("isCreateSearchablePdf", "false") // Watermarked on free tier
    });

    var response = await _httpClient.PostAsync("https://api.ocr.space/parse/image", content);

    if (!response.IsSuccessStatusCode)
        throw new Exception($"API error: {response.StatusCode}");

    string json = await response.Content.ReadAsStringAsync();
    using var doc = JsonDocument.Parse(json);

    if (doc.RootElement.TryGetProperty("ParsedResults", out var results))
    {
        foreach (var result in results.EnumerateArray())
        {
            if (result.TryGetProperty("ParsedText", out var text))
                pageTexts.Add(text.GetString() ?? string.Empty);
        }
    }

    return pageTexts;
}
Imports System.IO
Imports System.Net.Http
Imports System.Text.Json
Imports System.Threading.Tasks

Public Class OCRSpaceClient
    Private ReadOnly _apiKey As String
    Private ReadOnly _httpClient As HttpClient

    Public Sub New(apiKey As String, httpClient As HttpClient)
        _apiKey = apiKey
        _httpClient = httpClient
    End Sub

    Public Async Function ExtractTextFromPdf(pdfPath As String) As Task(Of List(Of String))
        Dim pageTexts As New List(Of String)()

        ' Reject before wasting a request quota entry
        Dim fileInfo As New FileInfo(pdfPath)
        If fileInfo.Length > 5 * 1024 * 1024 Then
            Throw New InvalidOperationException("File exceeds 5MB free tier limit. Upgrade or split PDF.")
        End If

        Dim pdfBytes As Byte() = File.ReadAllBytes(pdfPath)
        Dim base64 As String = Convert.ToBase64String(pdfBytes)

        Dim content As New FormUrlEncodedContent(New KeyValuePair(Of String, String)() {
            New KeyValuePair(Of String, String)("apikey", _apiKey),
            New KeyValuePair(Of String, String)("base64Image", $"data:application/pdf;base64,{base64}"),
            New KeyValuePair(Of String, String)("language", "eng"),
            New KeyValuePair(Of String, String)("filetype", "PDF"),
            New KeyValuePair(Of String, String)("isCreateSearchablePdf", "false") ' Watermarked on free tier
        })

        Dim response As HttpResponseMessage = Await _httpClient.PostAsync("https://api.ocr.space/parse/image", content)

        If Not response.IsSuccessStatusCode Then
            Throw New Exception($"API error: {response.StatusCode}")
        End If

        Dim json As String = Await response.Content.ReadAsStringAsync()
        Using doc As JsonDocument = JsonDocument.Parse(json)
            If doc.RootElement.TryGetProperty("ParsedResults", results) Then
                For Each result In results.EnumerateArray()
                    If result.TryGetProperty("ParsedText", text) Then
                        pageTexts.Add(If(text.GetString(), String.Empty))
                    End If
                Next
            End If
        End Using

        Return pageTexts
    End Function
End Class
$vbLabelText   $csharpLabel

IronOCR Approach

IronOCR reads PDFs natively. The limit is available memory, not an arbitrary file size threshold. Searchable PDF output produces clean files with no watermarks, on any license tier:

// IronOCR PDF: native support, no size limit, no watermarks
public List<string> ExtractTextFromPdf(string pdfPath)
{
    var pageTexts = new List<string>();

    using var input = new OcrInput();
    input.LoadPdf(pdfPath); // No 5MB ceiling

    var result = new IronTesseract().Read(input);

    foreach (var page in result.Pages)
        pageTexts.Add(page.Text);

    return pageTexts;
}

// Generate searchable PDF — no watermarks
public void CreateSearchablePdf(string inputPath, string outputPath)
{
    var result = new IronTesseract().Read(inputPath);
    result.SaveAsSearchablePdf(outputPath);
}
// IronOCR PDF: native support, no size limit, no watermarks
public List<string> ExtractTextFromPdf(string pdfPath)
{
    var pageTexts = new List<string>();

    using var input = new OcrInput();
    input.LoadPdf(pdfPath); // No 5MB ceiling

    var result = new IronTesseract().Read(input);

    foreach (var page in result.Pages)
        pageTexts.Add(page.Text);

    return pageTexts;
}

// Generate searchable PDF — no watermarks
public void CreateSearchablePdf(string inputPath, string outputPath)
{
    var result = new IronTesseract().Read(inputPath);
    result.SaveAsSearchablePdf(outputPath);
}
Imports System.Collections.Generic

' IronOCR PDF: native support, no size limit, no watermarks
Public Function ExtractTextFromPdf(pdfPath As String) As List(Of String)
    Dim pageTexts As New List(Of String)()

    Using input As New OcrInput()
        input.LoadPdf(pdfPath) ' No 5MB ceiling

        Dim result = New IronTesseract().Read(input)

        For Each page In result.Pages
            pageTexts.Add(page.Text)
        Next
    End Using

    Return pageTexts
End Function

' Generate searchable PDF — no watermarks
Public Sub CreateSearchablePdf(inputPath As String, outputPath As String)
    Dim result = New IronTesseract().Read(inputPath)
    result.SaveAsSearchablePdf(outputPath)
End Sub
$vbLabelText   $csharpLabel

Password-protected PDFs are a single parameter: input.LoadPdf("encrypted.pdf", Password: "secret"). The PDF OCR how-to guide covers page range selection and password handling in detail. For searchable PDF generation patterns, see the searchable PDF how-to.

Error Handling

OCR.space delivers errors through two separate channels: HTTP status codes and a JSON IsErroredOnProcessing flag. Production code must check both, parse the JSON error array when IsErroredOnProcessing is true, and inspect per-page FileParseExitCode values for partial failures. None of this is typed — everything is string parsing from JsonDocument:

// OCR.space: two error layers, all string-based
public async Task<string> SafeExtract(HttpClient client, string apiKey, string imagePath)
{
    try
    {
        byte[] imageBytes = File.ReadAllBytes(imagePath);
        string base64 = Convert.ToBase64String(imageBytes);

        var content = new FormUrlEncodedContent(new[]
        {
            new KeyValuePair<string, string>("apikey", apiKey),
            new KeyValuePair<string, string>("base64Image", $"data:image/png;base64,{base64}")
        });

        var response = await client.PostAsync("https://api.ocr.space/parse/image", content);

        if (!response.IsSuccessStatusCode)
        {
            switch (response.StatusCode)
            {
                case System.Net.HttpStatusCode.Unauthorized:
                    throw new Exception("Invalid API key");
                case System.Net.HttpStatusCode.TooManyRequests:
                    throw new Exception("Rate limit exceeded — wait and retry");
                case System.Net.HttpStatusCode.PaymentRequired:
                    throw new Exception("Quota exceeded — upgrade plan");
                default:
                    throw new Exception($"API error: {response.StatusCode}");
            }
        }

        // Second error layer: JSON-level failures
        string json = await response.Content.ReadAsStringAsync();
        using var doc = JsonDocument.Parse(json);

        if (doc.RootElement.TryGetProperty("IsErroredOnProcessing", out var isError)
            && isError.GetBoolean())
        {
            var messages = new List<string>();
            if (doc.RootElement.TryGetProperty("ErrorMessage", out var errors))
            {
                foreach (var e in errors.EnumerateArray())
                    messages.Add(e.GetString() ?? "Unknown error");
            }
            throw new Exception(string.Join("; ", messages));
        }

        // Third layer: per-page exit codes
        if (doc.RootElement.TryGetProperty("ParsedResults", out var results))
        {
            foreach (var result in results.EnumerateArray())
            {
                if (result.TryGetProperty("FileParseExitCode", out var exitCode)
                    && exitCode.GetInt32() != 1)
                    throw new Exception($"Page parse failed: exit code {exitCode.GetInt32()}");
            }
        }

        return doc.RootElement
            .GetProperty("ParsedResults")[0]
            .GetProperty("ParsedText")
            .GetString() ?? string.Empty;
    }
    catch (JsonException ex)
    {
        throw new Exception($"Invalid API response: {ex.Message}");
    }
    catch (HttpRequestException ex)
    {
        throw new Exception($"Network error: {ex.Message}");
    }
}
// OCR.space: two error layers, all string-based
public async Task<string> SafeExtract(HttpClient client, string apiKey, string imagePath)
{
    try
    {
        byte[] imageBytes = File.ReadAllBytes(imagePath);
        string base64 = Convert.ToBase64String(imageBytes);

        var content = new FormUrlEncodedContent(new[]
        {
            new KeyValuePair<string, string>("apikey", apiKey),
            new KeyValuePair<string, string>("base64Image", $"data:image/png;base64,{base64}")
        });

        var response = await client.PostAsync("https://api.ocr.space/parse/image", content);

        if (!response.IsSuccessStatusCode)
        {
            switch (response.StatusCode)
            {
                case System.Net.HttpStatusCode.Unauthorized:
                    throw new Exception("Invalid API key");
                case System.Net.HttpStatusCode.TooManyRequests:
                    throw new Exception("Rate limit exceeded — wait and retry");
                case System.Net.HttpStatusCode.PaymentRequired:
                    throw new Exception("Quota exceeded — upgrade plan");
                default:
                    throw new Exception($"API error: {response.StatusCode}");
            }
        }

        // Second error layer: JSON-level failures
        string json = await response.Content.ReadAsStringAsync();
        using var doc = JsonDocument.Parse(json);

        if (doc.RootElement.TryGetProperty("IsErroredOnProcessing", out var isError)
            && isError.GetBoolean())
        {
            var messages = new List<string>();
            if (doc.RootElement.TryGetProperty("ErrorMessage", out var errors))
            {
                foreach (var e in errors.EnumerateArray())
                    messages.Add(e.GetString() ?? "Unknown error");
            }
            throw new Exception(string.Join("; ", messages));
        }

        // Third layer: per-page exit codes
        if (doc.RootElement.TryGetProperty("ParsedResults", out var results))
        {
            foreach (var result in results.EnumerateArray())
            {
                if (result.TryGetProperty("FileParseExitCode", out var exitCode)
                    && exitCode.GetInt32() != 1)
                    throw new Exception($"Page parse failed: exit code {exitCode.GetInt32()}");
            }
        }

        return doc.RootElement
            .GetProperty("ParsedResults")[0]
            .GetProperty("ParsedText")
            .GetString() ?? string.Empty;
    }
    catch (JsonException ex)
    {
        throw new Exception($"Invalid API response: {ex.Message}");
    }
    catch (HttpRequestException ex)
    {
        throw new Exception($"Network error: {ex.Message}");
    }
}
Imports System
Imports System.Collections.Generic
Imports System.IO
Imports System.Net.Http
Imports System.Text.Json
Imports System.Threading.Tasks

Public Class OCRSpace
    Public Async Function SafeExtract(client As HttpClient, apiKey As String, imagePath As String) As Task(Of String)
        Try
            Dim imageBytes As Byte() = File.ReadAllBytes(imagePath)
            Dim base64 As String = Convert.ToBase64String(imageBytes)

            Dim content As New FormUrlEncodedContent(New KeyValuePair(Of String, String)() {
                New KeyValuePair(Of String, String)("apikey", apiKey),
                New KeyValuePair(Of String, String)("base64Image", $"data:image/png;base64,{base64}")
            })

            Dim response As HttpResponseMessage = Await client.PostAsync("https://api.ocr.space/parse/image", content)

            If Not response.IsSuccessStatusCode Then
                Select Case response.StatusCode
                    Case System.Net.HttpStatusCode.Unauthorized
                        Throw New Exception("Invalid API key")
                    Case System.Net.HttpStatusCode.TooManyRequests
                        Throw New Exception("Rate limit exceeded — wait and retry")
                    Case System.Net.HttpStatusCode.PaymentRequired
                        Throw New Exception("Quota exceeded — upgrade plan")
                    Case Else
                        Throw New Exception($"API error: {response.StatusCode}")
                End Select
            End If

            ' Second error layer: JSON-level failures
            Dim json As String = Await response.Content.ReadAsStringAsync()
            Using doc As JsonDocument = JsonDocument.Parse(json)
                If doc.RootElement.TryGetProperty("IsErroredOnProcessing", ByRef isError) AndAlso isError.GetBoolean() Then
                    Dim messages As New List(Of String)()
                    If doc.RootElement.TryGetProperty("ErrorMessage", ByRef errors) Then
                        For Each e In errors.EnumerateArray()
                            messages.Add(e.GetString() ?? "Unknown error")
                        Next
                    End If
                    Throw New Exception(String.Join("; ", messages))
                End If

                ' Third layer: per-page exit codes
                If doc.RootElement.TryGetProperty("ParsedResults", ByRef results) Then
                    For Each result In results.EnumerateArray()
                        If result.TryGetProperty("FileParseExitCode", ByRef exitCode) AndAlso exitCode.GetInt32() <> 1 Then
                            Throw New Exception($"Page parse failed: exit code {exitCode.GetInt32()}")
                        End If
                    Next
                End If

                Return doc.RootElement _
                    .GetProperty("ParsedResults")(0) _
                    .GetProperty("ParsedText") _
                    .GetString() ?? String.Empty
            End Using
        Catch ex As JsonException
            Throw New Exception($"Invalid API response: {ex.Message}")
        Catch ex As HttpRequestException
            Throw New Exception($"Network error: {ex.Message}")
        End Try
    End Function
End Class
$vbLabelText   $csharpLabel

IronOCR raises standard .NET exceptions with clear messages. No JSON parsing, no HTTP status code switching, no layered error extraction:

// IronOCR: standard .NET exceptions
public string SafeExtract(string imagePath)
{
    try
    {
        var result = new IronTesseract().Read(imagePath);
        return result.Text;
    }
    catch (FileNotFoundException)
    {
        throw; // File not found — straightforward
    }
    catch (Exception ex) when (ex.Message.Contains("corrupt"))
    {
        throw new Exception($"Image file is corrupt: {imagePath}");
    }
    // No JSON errors. No HTTP errors. No rate limit errors.
}
// IronOCR: standard .NET exceptions
public string SafeExtract(string imagePath)
{
    try
    {
        var result = new IronTesseract().Read(imagePath);
        return result.Text;
    }
    catch (FileNotFoundException)
    {
        throw; // File not found — straightforward
    }
    catch (Exception ex) when (ex.Message.Contains("corrupt"))
    {
        throw new Exception($"Image file is corrupt: {imagePath}");
    }
    // No JSON errors. No HTTP errors. No rate limit errors.
}
Imports IronOcr

Public Function SafeExtract(imagePath As String) As String
    Try
        Dim result = New IronTesseract().Read(imagePath)
        Return result.Text
    Catch ex As FileNotFoundException
        Throw ' File not found — straightforward
    Catch ex As Exception When ex.Message.Contains("corrupt")
        Throw New Exception($"Image file is corrupt: {imagePath}")
    End Try
    ' No JSON errors. No HTTP errors. No rate limit errors.
End Function
$vbLabelText   $csharpLabel

The IronTesseract API reference and OcrResult API reference document the typed result structure, including the Confidence property for assessing extraction quality.

API Mapping Reference

OCR.space Concept IronOCR Equivalent
POST https://api.ocr.space/parse/image new IronTesseract().Read(path)
FormUrlEncodedContent / MultipartFormDataContent OcrInput
base64Image parameter input.LoadImage(path) or input.LoadImage(bytes)
language parameter ocr.Language = OcrLanguage.English
OCREngine parameter Engine managed internally
isOverlayRequired parameter result.Words / result.Lines (always available)
isCreateSearchablePdf parameter result.SaveAsSearchablePdf(outputPath)
filetype=PDF parameter input.LoadPdf(path)
ParsedResults[0].ParsedText result.Text
ParsedResults[n] (per page) result.Pages[n].Text
IsErroredOnProcessing JSON flag Standard .NET exception
FileParseExitCode per-page flag Standard .NET exception
ProcessingTimeInMilliseconds Implicit — no extra parsing needed
HTTP 429 / rate limit Not applicable — no rate limits
Custom OcrResult POCO (user-defined) IronOcr.OcrResult (provided by NuGet)
Custom OcrSpaceException (user-defined) Standard .NET exception types
SemaphoreSlim rate limiter (user-built) Not needed
API key in every request IronOcr.License.LicenseKey (once at startup)

When Teams Consider Moving from OCR.space to IronOCR

Hitting the Free Tier During Load Testing

OCR.space's 500 requests-per-day per-IP ceiling is a hard wall, not a soft advisory. Teams discover this during load testing or staging deployments where multiple developers share an office IP address. A shared CI/CD pipeline that runs integration tests against OCR.space will exhaust the daily quota before the business day ends. At that point the application fails not because the code is wrong, but because a third-party counter reached an arbitrary threshold. Moving to local processing eliminates this category of failure entirely — there is no quota to exhaust because processing runs in-process.

Compliance and Data Classification Reviews

Security reviews that classify documents as PII, PHI, or financially sensitive create a binary problem for OCR.space: the service has no on-premise deployment path. There is no Business Associate Agreement (BAA) equivalent documented for HIPAA scenarios, no Data Processing Agreement structure that clearly governs EU data transfers under GDPR, and no technical mechanism to prevent document transmission even with contractual controls in place. Teams receiving compliance findings against cloud-transmitted document processing need a local solution. IronOCR satisfies this requirement by design — documents never leave the application server. The licensing page documents the perpetual license structure, which also simplifies compliance documentation by eliminating cloud-vendor review from the scope.

Integration Code Maintenance Burden

OCR.space integrations accumulate technical debt in proportion to their feature requirements. The initial HTTP client is manageable. Then the team adds retry logic. Then they add per-IP rate-limit tracking because multiple services share an address. Then they add a file-size pre-validation step. Then they discover the JSON error structure differs between Engine 1 and Engine 2 responses and add a branch. Six months later the OCR.space integration is 400 lines of infrastructure code that every new team member must understand before touching anything OCR-related. When that team evaluates the maintenance cost against a $999 one-time IronOCR license, the arithmetic is straightforward.

Structured Output Requirements

OCR.space returns plain text and nothing else. There are no word coordinates, no line boundaries, no paragraph segmentation, and no per-word confidence scores in the response. Applications that need to extract specific fields from invoices — vendor name at a known region, total amount in the bottom-right corner — have no structured foundation to build on from OCR.space's output. IronOCR exposes a full document hierarchy with coordinates and confidence values at every granularity level: pages, paragraphs, lines, and individual words. Region-based processing allows targeting specific document zones without post-processing the entire page output. The read results how-to and region-based OCR guide cover these patterns in detail.

Volume Growth Beyond the Free Tier

At 25,000 requests per month the free tier is functional for low-volume scenarios. Beyond that, paid tiers apply — contact OCR.space for current pricing. Those costs compound over time until the comparison is made against a $999 perpetual license with no per-request charges ever. Teams projecting document volume growth past 25,000 per month have a straightforward cost case for local processing.

Common Migration Considerations

Replacing the HTTP Client with a Method Call

The migration from OCR.space to IronOCR is architecturally simple because both return text from documents. The HTTP client, JSON parser, rate limiter, and custom exception types disappear. What replaces them is a single NuGet package and method calls. The before/after at the service boundary:

// Before: OCR.space service (simplified — actual implementation is 200+ lines)
public class OcrSpaceService : IDisposable
{
    private readonly HttpClient _client = new();
    private readonly string _apiKey;

    public OcrSpaceService(string apiKey) => _apiKey = apiKey;

    public async Task<string> ProcessDocument(string path)
    {
        byte[] bytes = File.ReadAllBytes(path);
        string base64 = Convert.ToBase64String(bytes);

        var content = new FormUrlEncodedContent(new[]
        {
            new KeyValuePair<string, string>("apikey", _apiKey),
            new KeyValuePair<string, string>("base64Image", $"data:image/png;base64,{base64}")
        });

        var response = await _client.PostAsync("https://api.ocr.space/parse/image", content);
        response.EnsureSuccessStatusCode();

        string json = await response.Content.ReadAsStringAsync();
        using var doc = JsonDocument.Parse(json);
        return doc.RootElement.GetProperty("ParsedResults")[0].GetProperty("ParsedText").GetString()!;
    }

    public void Dispose() => _client.Dispose();
}

// After: IronOCR service — no HttpClient, no JSON, no API key
public class IronOcrService
{
    private readonly IronTesseract _ocr = new();

    public string ProcessDocument(string path)
    {
        return _ocr.Read(path).Text;
    }
    // No Dispose — no external connections to close
}
// Before: OCR.space service (simplified — actual implementation is 200+ lines)
public class OcrSpaceService : IDisposable
{
    private readonly HttpClient _client = new();
    private readonly string _apiKey;

    public OcrSpaceService(string apiKey) => _apiKey = apiKey;

    public async Task<string> ProcessDocument(string path)
    {
        byte[] bytes = File.ReadAllBytes(path);
        string base64 = Convert.ToBase64String(bytes);

        var content = new FormUrlEncodedContent(new[]
        {
            new KeyValuePair<string, string>("apikey", _apiKey),
            new KeyValuePair<string, string>("base64Image", $"data:image/png;base64,{base64}")
        });

        var response = await _client.PostAsync("https://api.ocr.space/parse/image", content);
        response.EnsureSuccessStatusCode();

        string json = await response.Content.ReadAsStringAsync();
        using var doc = JsonDocument.Parse(json);
        return doc.RootElement.GetProperty("ParsedResults")[0].GetProperty("ParsedText").GetString()!;
    }

    public void Dispose() => _client.Dispose();
}

// After: IronOCR service — no HttpClient, no JSON, no API key
public class IronOcrService
{
    private readonly IronTesseract _ocr = new();

    public string ProcessDocument(string path)
    {
        return _ocr.Read(path).Text;
    }
    // No Dispose — no external connections to close
}
Imports System
Imports System.IO
Imports System.Net.Http
Imports System.Threading.Tasks
Imports System.Collections.Generic
Imports System.Text.Json

' Before: OCR.space service (simplified — actual implementation is 200+ lines)
Public Class OcrSpaceService
    Implements IDisposable

    Private ReadOnly _client As HttpClient = New HttpClient()
    Private ReadOnly _apiKey As String

    Public Sub New(apiKey As String)
        _apiKey = apiKey
    End Sub

    Public Async Function ProcessDocument(path As String) As Task(Of String)
        Dim bytes As Byte() = File.ReadAllBytes(path)
        Dim base64 As String = Convert.ToBase64String(bytes)

        Dim content As New FormUrlEncodedContent(New KeyValuePair(Of String, String)() {
            New KeyValuePair(Of String, String)("apikey", _apiKey),
            New KeyValuePair(Of String, String)("base64Image", $"data:image/png;base64,{base64}")
        })

        Dim response As HttpResponseMessage = Await _client.PostAsync("https://api.ocr.space/parse/image", content)
        response.EnsureSuccessStatusCode()

        Dim json As String = Await response.Content.ReadAsStringAsync()
        Using doc As JsonDocument = JsonDocument.Parse(json)
            Return doc.RootElement.GetProperty("ParsedResults")(0).GetProperty("ParsedText").GetString()
        End Using
    End Function

    Public Sub Dispose() Implements IDisposable.Dispose
        _client.Dispose()
    End Sub
End Class

' After: IronOCR service — no HttpClient, no JSON, no API key
Public Class IronOcrService
    Private ReadOnly _ocr As New IronTesseract()

    Public Function ProcessDocument(path As String) As String
        Return _ocr.Read(path).Text
    End Function
    ' No Dispose — no external connections to close
End Class
$vbLabelText   $csharpLabel

The IDisposable implementation disappears because there is no HttpClient to manage.

API Key Removal

OCR.space requires an API key in every HTTP request. That key must be stored securely, rotated when exposed, and excluded from source control. IronOCR uses a license key set once at application startup, typically from an environment variable or configuration system:

// At application startup — once, not per request
IronOcr.License.LicenseKey = Environment.GetEnvironmentVariable("IRONOCR_LICENSE");
// At application startup — once, not per request
IronOcr.License.LicenseKey = Environment.GetEnvironmentVariable("IRONOCR_LICENSE");
' At application startup — once, not per request
IronOcr.License.LicenseKey = Environment.GetEnvironmentVariable("IRONOCR_LICENSE")
$vbLabelText   $csharpLabel

After migration, the API key rotation process, secret management configuration, and key-per-request injection logic all become unnecessary.

Preprocessing After Migration

OCR.space applies server-side processing before returning results, but developers have no control over what that processing does or does not include. IronOCR exposes explicit preprocessing methods. If document quality is a concern — skewed scans, low-contrast photocopies, noisy fax output — the preprocessing pipeline is three to five method calls:

using var input = new OcrInput();
input.LoadImage("low-quality-scan.jpg");
input.Deskew();
input.DeNoise();
input.Contrast();
input.Binarize();
input.EnhanceResolution(300);
var result = new IronTesseract().Read(input);
using var input = new OcrInput();
input.LoadImage("low-quality-scan.jpg");
input.Deskew();
input.DeNoise();
input.Contrast();
input.Binarize();
input.EnhanceResolution(300);
var result = new IronTesseract().Read(input);
Imports IronOcr

Using input As New OcrInput()
    input.LoadImage("low-quality-scan.jpg")
    input.Deskew()
    input.DeNoise()
    input.Contrast()
    input.Binarize()
    input.EnhanceResolution(300)
    Dim result = New IronTesseract().Read(input)
End Using
$vbLabelText   $csharpLabel

The image quality correction guide and image color correction guide document every available filter with examples showing before/after accuracy differences.

Async vs. Synchronous

OCR.space calls are inherently async because they involve HTTP round trips. IronOCR calls are synchronous by default, which simplifies code in non-web contexts. For ASP.NET and server scenarios requiring non-blocking execution, IronOCR supports async operation via Task.Run wrapping or direct async API. The async OCR guide covers the patterns. Removing await from a service method that previously awaited a cloud call simplifies the call stack in contexts where async was adopted only because OCR.space required it.

Additional IronOCR Capabilities

Features not covered in the sections above, each representing functionality with no equivalent in OCR.space:

  • hOCR export: result.SaveAsHocrFile("output.hocr") produces standards-compliant hOCR output for downstream document processing pipelines.
  • Progress tracking: Subscribe to progress events for long-running multi-page batch operations, enabling progress bars and ETA calculations in UI applications.
  • Table extraction: Structured table data is accessible via the result model, supporting use cases like invoice line-item extraction that require column alignment.
  • Specialized document reading: Passport MRZ zones, license plates, MICR cheque lines, and handwriting recognition are purpose-built capabilities accessible through the same IronTesseract API.

.NET Compatibility and Future Readiness

IronOCR targets .NET 6, .NET 7, .NET 8, and .NET 9, with active support for each LTS and STS release. The library runs on Windows x64, Windows x86, Linux x64, macOS, Docker, AWS Lambda, and Azure App Service — all from the same NuGet package with no platform-specific configuration. OCR.space requires outbound HTTPS from every deployment environment, which excludes air-gapped networks, restrictive corporate proxies, and infrastructure where outbound traffic to third-party APIs is blocked by policy. IronOCR has no runtime network requirements; it executes entirely in-process. As .NET 10 moves toward release in late 2026, IronOCR's track record of maintaining compatibility across .NET major versions provides continuity without integration changes.

Conclusion

OCR.space occupies a real and useful niche: developers who need to prototype an OCR feature over a weekend, students exploring text extraction concepts, or low-volume personal tools under 25,000 documents per month with no compliance requirements. For that audience, the free tier delivers genuine value.

The problem is that .NET developers are typically not building weekend prototypes. They are building invoicing systems, medical records processors, document archival pipelines, and compliance-sensitive business applications. For those contexts, OCR.space's absent NuGet package is not an inconvenience — it is a structural incompatibility. Every hour spent building the HTTP client, rate limiter, JSON parser, and retry infrastructure is an hour not spent on the application's actual requirements. That cost is front-loaded, but the maintenance cost continues for the life of the integration.

IronOCR addresses the specific failure mode that defines every OCR.space integration: the absence of a real SDK. One NuGet package, one method call, no HTTP plumbing, no rate limits, no documents transmitted to external servers. The $999 entry price is a one-time cost; recurring OCR.space subscription costs will exceed it over time, especially as volume grows. For teams with compliance requirements, local processing is the only option regardless of cost.

The comparison ultimately reduces to a single question: does the application require OCR as an external REST dependency, or as a library function? For production .NET applications, the answer is almost always the latter.

Please noteOCR.space and Tesseract are registered trademarks of their respective owners. This site is not affiliated with, endorsed by, or sponsored by Google or OCR.space. All product names, logos, and brands are property of their respective owners. Comparisons are for informational purposes only and reflect publicly available information at the time of writing.

Frequently Asked Questions

What is OCR.space API?

OCR.space API is an OCR solution used by developers and enterprises to extract text from images and documents. It is one of several OCR options evaluated alongside IronOCR for .NET application development.

How does IronOCR compare to OCR.space API for .NET developers?

IronOCR is a NuGet-native .NET OCR library using IronTesseract as its core engine. Compared to OCR.space API, it offers simpler deployment (no SDK installers), flat-rate pricing, and a clean C# API without COM interop or cloud dependencies.

Is IronOCR easier to set up than OCR.space API?

IronOCR installs via a single NuGet package. There are no SDK installers, license files to copy, COM components to register, or separate runtime binaries to manage. The entire OCR engine is bundled in the package.

What accuracy differences exist between OCR.space API and IronOCR?

IronOCR achieves high recognition accuracy for standard business documents, invoices, receipts, and scanned forms. For highly degraded documents or uncommon scripts, accuracy varies by source quality. IronOCR includes image preprocessing filters to improve recognition on low-quality inputs.

Does IronOCR support PDF text extraction?

Yes. IronOCR extracts text from both native PDFs and scanned PDF images in a single call. It also supports multi-page TIFF files, images, and streams. For scanned PDFs, OCR is applied page-by-page with per-page result objects.

How does OCR.space API licensing compare to IronOCR?

IronOCR uses a flat-rate perpetual license with no per-page or per-scan charges. Organizations processing high document volumes pay the same license cost regardless of volume. Details and volume pricing are on the IronOCR licensing page.

What languages does IronOCR support?

IronOCR supports 127 languages via separate NuGet language packs. Adding a language requires a single 'dotnet add package IronOcr.Languages.{Language}' command. No manual file placement or path configuration is needed.

How do I install IronOCR in a .NET project?

Install via NuGet: 'Install-Package IronOcr' in Package Manager Console or 'dotnet add package IronOcr' in the CLI. Additional language packs are installed the same way. No native SDK installer is required.

Is IronOCR suitable for Docker and containerized deployments, unlike OCR.space?

Yes. IronOCR works in Docker containers via its NuGet package. The license key is set via an environment variable. No license files, SDK paths, or volume mounts are required for the OCR engine itself.

Can I try IronOCR before purchasing, compared to OCR.space?

Yes. IronOCR trial mode processes documents and returns OCR results with a watermark overlay on output. You can verify accuracy on your own documents before purchasing a license.

Does IronOCR support barcode reading alongside text extraction?

IronOCR focuses on text extraction and OCR. For barcode reading, Iron Software provides IronBarcode as a companion library. Both are available individually or as part of the Iron Suite bundle.

Is it easy to migrate from OCR.space API to IronOCR?

Migration from OCR.space API to IronOCR typically involves replacing initialization sequences with IronTesseract instantiation, removing COM lifecycle management, and updating API calls. Most migrations reduce code complexity significantly.

Kannaopat Udonpant
Software Engineer
Before becoming a Software Engineer, Kannapat completed a Environmental Resources PhD from Hokkaido University in Japan. While pursuing his degree, Kannapat also became a member of the Vehicle Robotics Laboratory, which is part of the Department of Bioproduction Engineering. In 2022, he leveraged his C# skills to join Iron Software's engineering ...
Read More

Iron Support Team

We're online 24 hours, 5 days a week.
Chat
Email
Call Me