OCR.space API vs IronOCR: .NET OCR Library
OCR.space has no NuGet package. That single fact defines every integration decision a .NET developer makes when choosing it: you write a custom HttpClient, base64-encode your documents, POST to https://api.ocr.space/parse/image, parse the JSON response manually, catch HTTP 429s, implement exponential backoff, build rate-limit bookkeeping, and define your own exception types — all before a single character of text reaches your application. This article examines what that DIY overhead actually costs in code, in time, and in production reliability, and compares it directly to IronOCR, a native .NET library delivered as a single NuGet package.
Understanding OCR.space
OCR.space is a freemium REST API that processes images and PDFs on remote servers operated by OCR.space. The service positions itself around its free tier: 25,000 requests per month with no credit card required, which appeals to developers experimenting with OCR or building personal projects. There is no NuGet package, no official .NET SDK, and no strongly-typed client library in any language. Every .NET integration is built by hand on top of HttpClient.
The architectural reality for .NET developers:
- No NuGet package: Zero first-class .NET support. No IntelliSense, no typed models, no integrated error handling.
- Cloud-only processing: Every document leaves your infrastructure. There is no on-premise option and no self-hosted deployment path.
- Manual REST integration: Developers encode files to base64, construct
FormUrlEncodedContentorMultipartFormDataContent, and parse raw JSON responses. - DIY rate limiting: The free tier enforces 60 requests per minute and a hard 500 requests-per-day per IP. Production applications must build this logic themselves.
- File size constraints: The free tier rejects files over 5 MB. Multi-page PDFs often exceed this limit.
- Watermarked PDF output: Searchable PDF generation on the free tier embeds OCR.space watermarks, making the output unusable for client delivery or document archival.
- No SLA: The free tier carries no uptime commitment. Paid plans offer higher limits but the same architectural constraints.
The REST Integration Developers Must Write
OCR.space's documentation points developers at the REST endpoint and leaves them to build everything else. The minimum viable .NET client — before any consideration of production hardening — looks like this:
// OCR.space: what you build before the first line of your actual application
public class OcrSpaceApiClient : IDisposable
{
private readonly HttpClient _httpClient;
private readonly string _apiKey;
private readonly SemaphoreSlim _rateLimiter;
private const string ApiEndpoint = "https://api.ocr.space/parse/image";
private const int MaxFileSizeFree = 5 * 1024 * 1024; // 5MB — hard limit on free tier
private const int RateLimitPerMinute = 60;
public OcrSpaceApiClient(string apiKey)
{
_apiKey = apiKey ?? throw new ArgumentNullException(nameof(apiKey));
_httpClient = new HttpClient();
_httpClient.Timeout = TimeSpan.FromSeconds(120);
_rateLimiter = new SemaphoreSlim(RateLimitPerMinute, RateLimitPerMinute);
}
public async Task<OcrResult> ExtractTextFromFileAsync(
string filePath,
string language = "eng",
CancellationToken cancellationToken = default)
{
if (!File.Exists(filePath))
throw new FileNotFoundException("Image file not found", filePath);
var fileInfo = new FileInfo(filePath);
if (fileInfo.Length > MaxFileSizeFree)
{
throw new InvalidOperationException(
$"File size {fileInfo.Length / 1024 / 1024}MB exceeds free tier limit of 5MB.");
}
await _rateLimiter.WaitAsync(cancellationToken);
try
{
// Document leaves your infrastructure here
byte[] imageBytes = await File.ReadAllBytesAsync(filePath, cancellationToken);
string base64Image = Convert.ToBase64String(imageBytes);
string mimeType = GetMimeType(filePath);
var formContent = new FormUrlEncodedContent(new[]
{
new KeyValuePair<string, string>("apikey", _apiKey),
new KeyValuePair<string, string>("base64Image", $"data:{mimeType};base64,{base64Image}"),
new KeyValuePair<string, string>("language", language),
new KeyValuePair<string, string>("isOverlayRequired", "false"),
new KeyValuePair<string, string>("filetype", Path.GetExtension(filePath).TrimStart('.')),
new KeyValuePair<string, string>("detectOrientation", "true"),
new KeyValuePair<string, string>("scale", "true"),
new KeyValuePair<string, string>("OCREngine", "2")
});
var response = await _httpClient.PostAsync(ApiEndpoint, formContent, cancellationToken);
if (!response.IsSuccessStatusCode)
{
string errorBody = await response.Content.ReadAsStringAsync(cancellationToken);
throw new OcrSpaceException(
$"OCR.space API returned {response.StatusCode}: {errorBody}",
(int)response.StatusCode);
}
string jsonResponse = await response.Content.ReadAsStringAsync(cancellationToken);
return ParseOcrResponse(jsonResponse);
}
finally
{
_ = Task.Run(async () =>
{
await Task.Delay(60000 / RateLimitPerMinute);
_rateLimiter.Release();
});
}
}
private OcrResult ParseOcrResponse(string jsonResponse)
{
using var doc = JsonDocument.Parse(jsonResponse);
var root = doc.RootElement;
if (root.TryGetProperty("IsErroredOnProcessing", out var isErrored) && isErrored.GetBoolean())
{
string errorMessage = "OCR processing failed";
if (root.TryGetProperty("ErrorMessage", out var errorMessages))
{
// Parse the array of error strings manually
var messages = new List<string>();
if (errorMessages.ValueKind == JsonValueKind.Array)
{
foreach (var msg in errorMessages.EnumerateArray())
messages.Add(msg.GetString() ?? "Unknown error");
}
errorMessage = string.Join("; ", messages);
}
throw new OcrSpaceException(errorMessage, 500);
}
var result = new OcrResult();
if (root.TryGetProperty("ParsedResults", out var parsedResults))
{
foreach (var parsedResult in parsedResults.EnumerateArray())
{
if (parsedResult.TryGetProperty("ParsedText", out var parsedText))
result.ParsedText += parsedText.GetString();
if (parsedResult.TryGetProperty("FileParseExitCode", out var exitCode))
result.ExitCodes.Add(exitCode.GetInt32());
}
}
return result;
}
private string GetMimeType(string filePath) =>
Path.GetExtension(filePath).ToLowerInvariant() switch
{
".png" => "image/png",
".jpg" or ".jpeg" => "image/jpeg",
".bmp" => "image/bmp",
".tiff" or ".tif" => "image/tiff",
".pdf" => "application/pdf",
_ => "application/octet-stream"
};
public void Dispose()
{
_httpClient?.Dispose();
_rateLimiter?.Dispose();
}
}
// Custom models — no SDK means you define these yourself
public class OcrResult
{
public string ParsedText { get; set; } = string.Empty;
public List<string> Warnings { get; } = new();
public List<int> ExitCodes { get; } = new();
}
public class OcrSpaceException : Exception
{
public int StatusCode { get; }
public OcrSpaceException(string message, int statusCode) : base(message)
=> StatusCode = statusCode;
}
// OCR.space: what you build before the first line of your actual application
public class OcrSpaceApiClient : IDisposable
{
private readonly HttpClient _httpClient;
private readonly string _apiKey;
private readonly SemaphoreSlim _rateLimiter;
private const string ApiEndpoint = "https://api.ocr.space/parse/image";
private const int MaxFileSizeFree = 5 * 1024 * 1024; // 5MB — hard limit on free tier
private const int RateLimitPerMinute = 60;
public OcrSpaceApiClient(string apiKey)
{
_apiKey = apiKey ?? throw new ArgumentNullException(nameof(apiKey));
_httpClient = new HttpClient();
_httpClient.Timeout = TimeSpan.FromSeconds(120);
_rateLimiter = new SemaphoreSlim(RateLimitPerMinute, RateLimitPerMinute);
}
public async Task<OcrResult> ExtractTextFromFileAsync(
string filePath,
string language = "eng",
CancellationToken cancellationToken = default)
{
if (!File.Exists(filePath))
throw new FileNotFoundException("Image file not found", filePath);
var fileInfo = new FileInfo(filePath);
if (fileInfo.Length > MaxFileSizeFree)
{
throw new InvalidOperationException(
$"File size {fileInfo.Length / 1024 / 1024}MB exceeds free tier limit of 5MB.");
}
await _rateLimiter.WaitAsync(cancellationToken);
try
{
// Document leaves your infrastructure here
byte[] imageBytes = await File.ReadAllBytesAsync(filePath, cancellationToken);
string base64Image = Convert.ToBase64String(imageBytes);
string mimeType = GetMimeType(filePath);
var formContent = new FormUrlEncodedContent(new[]
{
new KeyValuePair<string, string>("apikey", _apiKey),
new KeyValuePair<string, string>("base64Image", $"data:{mimeType};base64,{base64Image}"),
new KeyValuePair<string, string>("language", language),
new KeyValuePair<string, string>("isOverlayRequired", "false"),
new KeyValuePair<string, string>("filetype", Path.GetExtension(filePath).TrimStart('.')),
new KeyValuePair<string, string>("detectOrientation", "true"),
new KeyValuePair<string, string>("scale", "true"),
new KeyValuePair<string, string>("OCREngine", "2")
});
var response = await _httpClient.PostAsync(ApiEndpoint, formContent, cancellationToken);
if (!response.IsSuccessStatusCode)
{
string errorBody = await response.Content.ReadAsStringAsync(cancellationToken);
throw new OcrSpaceException(
$"OCR.space API returned {response.StatusCode}: {errorBody}",
(int)response.StatusCode);
}
string jsonResponse = await response.Content.ReadAsStringAsync(cancellationToken);
return ParseOcrResponse(jsonResponse);
}
finally
{
_ = Task.Run(async () =>
{
await Task.Delay(60000 / RateLimitPerMinute);
_rateLimiter.Release();
});
}
}
private OcrResult ParseOcrResponse(string jsonResponse)
{
using var doc = JsonDocument.Parse(jsonResponse);
var root = doc.RootElement;
if (root.TryGetProperty("IsErroredOnProcessing", out var isErrored) && isErrored.GetBoolean())
{
string errorMessage = "OCR processing failed";
if (root.TryGetProperty("ErrorMessage", out var errorMessages))
{
// Parse the array of error strings manually
var messages = new List<string>();
if (errorMessages.ValueKind == JsonValueKind.Array)
{
foreach (var msg in errorMessages.EnumerateArray())
messages.Add(msg.GetString() ?? "Unknown error");
}
errorMessage = string.Join("; ", messages);
}
throw new OcrSpaceException(errorMessage, 500);
}
var result = new OcrResult();
if (root.TryGetProperty("ParsedResults", out var parsedResults))
{
foreach (var parsedResult in parsedResults.EnumerateArray())
{
if (parsedResult.TryGetProperty("ParsedText", out var parsedText))
result.ParsedText += parsedText.GetString();
if (parsedResult.TryGetProperty("FileParseExitCode", out var exitCode))
result.ExitCodes.Add(exitCode.GetInt32());
}
}
return result;
}
private string GetMimeType(string filePath) =>
Path.GetExtension(filePath).ToLowerInvariant() switch
{
".png" => "image/png",
".jpg" or ".jpeg" => "image/jpeg",
".bmp" => "image/bmp",
".tiff" or ".tif" => "image/tiff",
".pdf" => "application/pdf",
_ => "application/octet-stream"
};
public void Dispose()
{
_httpClient?.Dispose();
_rateLimiter?.Dispose();
}
}
// Custom models — no SDK means you define these yourself
public class OcrResult
{
public string ParsedText { get; set; } = string.Empty;
public List<string> Warnings { get; } = new();
public List<int> ExitCodes { get; } = new();
}
public class OcrSpaceException : Exception
{
public int StatusCode { get; }
public OcrSpaceException(string message, int statusCode) : base(message)
=> StatusCode = statusCode;
}
Imports System
Imports System.Collections.Generic
Imports System.IO
Imports System.Net.Http
Imports System.Text.Json
Imports System.Threading
Imports System.Threading.Tasks
' OCR.space: what you build before the first line of your actual application
Public Class OcrSpaceApiClient
Implements IDisposable
Private ReadOnly _httpClient As HttpClient
Private ReadOnly _apiKey As String
Private ReadOnly _rateLimiter As SemaphoreSlim
Private Const ApiEndpoint As String = "https://api.ocr.space/parse/image"
Private Const MaxFileSizeFree As Integer = 5 * 1024 * 1024 ' 5MB — hard limit on free tier
Private Const RateLimitPerMinute As Integer = 60
Public Sub New(apiKey As String)
_apiKey = If(apiKey, Throw New ArgumentNullException(NameOf(apiKey)))
_httpClient = New HttpClient()
_httpClient.Timeout = TimeSpan.FromSeconds(120)
_rateLimiter = New SemaphoreSlim(RateLimitPerMinute, RateLimitPerMinute)
End Sub
Public Async Function ExtractTextFromFileAsync(filePath As String, Optional language As String = "eng", Optional cancellationToken As CancellationToken = Nothing) As Task(Of OcrResult)
If Not File.Exists(filePath) Then
Throw New FileNotFoundException("Image file not found", filePath)
End If
Dim fileInfo = New FileInfo(filePath)
If fileInfo.Length > MaxFileSizeFree Then
Throw New InvalidOperationException($"File size {fileInfo.Length \ 1024 \ 1024}MB exceeds free tier limit of 5MB.")
End If
Await _rateLimiter.WaitAsync(cancellationToken)
Try
' Document leaves your infrastructure here
Dim imageBytes As Byte() = Await File.ReadAllBytesAsync(filePath, cancellationToken)
Dim base64Image As String = Convert.ToBase64String(imageBytes)
Dim mimeType As String = GetMimeType(filePath)
Dim formContent = New FormUrlEncodedContent(New KeyValuePair(Of String, String)() {
New KeyValuePair(Of String, String)("apikey", _apiKey),
New KeyValuePair(Of String, String)("base64Image", $"data:{mimeType};base64,{base64Image}"),
New KeyValuePair(Of String, String)("language", language),
New KeyValuePair(Of String, String)("isOverlayRequired", "false"),
New KeyValuePair(Of String, String)("filetype", Path.GetExtension(filePath).TrimStart("."c)),
New KeyValuePair(Of String, String)("detectOrientation", "true"),
New KeyValuePair(Of String, String)("scale", "true"),
New KeyValuePair(Of String, String)("OCREngine", "2")
})
Dim response = Await _httpClient.PostAsync(ApiEndpoint, formContent, cancellationToken)
If Not response.IsSuccessStatusCode Then
Dim errorBody = Await response.Content.ReadAsStringAsync(cancellationToken)
Throw New OcrSpaceException($"OCR.space API returned {response.StatusCode}: {errorBody}", CInt(response.StatusCode))
End If
Dim jsonResponse = Await response.Content.ReadAsStringAsync(cancellationToken)
Return ParseOcrResponse(jsonResponse)
Finally
_ = Task.Run(Async Function()
Await Task.Delay(60000 \ RateLimitPerMinute)
_rateLimiter.Release()
End Function)
End Try
End Function
Private Function ParseOcrResponse(jsonResponse As String) As OcrResult
Using doc = JsonDocument.Parse(jsonResponse)
Dim root = doc.RootElement
If root.TryGetProperty("IsErroredOnProcessing", ByRef isErrored) AndAlso isErrored.GetBoolean() Then
Dim errorMessage As String = "OCR processing failed"
If root.TryGetProperty("ErrorMessage", ByRef errorMessages) Then
' Parse the array of error strings manually
Dim messages = New List(Of String)()
If errorMessages.ValueKind = JsonValueKind.Array Then
For Each msg In errorMessages.EnumerateArray()
messages.Add(msg.GetString() OrElse "Unknown error")
Next
End If
errorMessage = String.Join("; ", messages)
End If
Throw New OcrSpaceException(errorMessage, 500)
End If
Dim result = New OcrResult()
If root.TryGetProperty("ParsedResults", ByRef parsedResults) Then
For Each parsedResult In parsedResults.EnumerateArray()
If parsedResult.TryGetProperty("ParsedText", ByRef parsedText) Then
result.ParsedText &= parsedText.GetString()
End If
If parsedResult.TryGetProperty("FileParseExitCode", ByRef exitCode) Then
result.ExitCodes.Add(exitCode.GetInt32())
End If
Next
End If
Return result
End Using
End Function
Private Function GetMimeType(filePath As String) As String
Return Path.GetExtension(filePath).ToLowerInvariant() Select Case
Case ".png" : Return "image/png"
Case ".jpg", ".jpeg" : Return "image/jpeg"
Case ".bmp" : Return "image/bmp"
Case ".tiff", ".tif" : Return "image/tiff"
Case ".pdf" : Return "application/pdf"
Case Else : Return "application/octet-stream"
End Select
End Function
Public Sub Dispose() Implements IDisposable.Dispose
_httpClient?.Dispose()
_rateLimiter?.Dispose()
End Sub
End Class
' Custom models — no SDK means you define these yourself
Public Class OcrResult
Public Property ParsedText As String = String.Empty
Public ReadOnly Property Warnings As New List(Of String)()
Public ReadOnly Property ExitCodes As New List(Of Integer)()
End Class
Public Class OcrSpaceException
Inherits Exception
Public ReadOnly Property StatusCode As Integer
Public Sub New(message As String, statusCode As Integer)
MyBase.New(message)
Me.StatusCode = statusCode
End Sub
End Class
This is 90+ lines of infrastructure code that exists before the first business-logic method. Every OCR.space integration ships this boilerplate (or something equivalent), and every team that writes it is solving the same problem OCR.space chose not to solve for them.
Understanding IronOCR
IronOCR is a commercial OCR library for .NET, installed via the IronOcr NuGet package. It wraps an optimized Tesseract 5 engine with automatic preprocessing, native PDF input, searchable PDF output, 125+ language packs, and structured result extraction — all without cloud dependency, API keys, or per-request charges. Documents are processed locally on whatever infrastructure runs the .NET application.
Key characteristics:
- Single NuGet package:
dotnet add package IronOcris the complete installation. No native binaries to manage, no tessdata directories, no environment variables. - Strongly-typed API:
IronTesseract,OcrInput,OcrResult, andCropRectangleare first-class .NET types with full IntelliSense support. - Automatic preprocessing: Deskew, DeNoise, Contrast, Binarize, and EnhanceResolution filters apply automatically or on demand without external libraries.
- Native PDF support: Both reading scanned PDFs and generating searchable PDFs are built-in. No size limits beyond available memory.
- Local execution: No network call occurs during OCR. The library runs entirely in-process. Air-gapped environments work without configuration changes.
- Thread-safe: Multiple
IronTesseractinstances run concurrently without locking or contention management. - Perpetual licensing: $999 Lite / $1,499 Plus / $2,999 Professional. No per-request charges at any volume.
Feature Comparison
| Feature | OCR.space | IronOCR |
|---|---|---|
| NuGet package | None — REST API only | IronOcr — full .NET support |
| Processing location | OCR.space cloud servers | Local — your infrastructure |
| Free tier | 25,000 req/month (limited) | Trial available |
| Paid pricing | Contact OCR.space for current pricing | $999–$2,999 one-time perpetual |
| Rate limiting | 60 req/min, 500/day (free) | None |
| Offline capability | No | Yes — fully air-gapped |
| Data privacy | Documents sent to third party | Documents stay on your servers |
Detailed Feature Comparison
| Feature | OCR.space | IronOCR |
|---|---|---|
| Integration | ||
| NuGet package | None | Yes (IronOcr) |
| Strongly-typed models | None — manual JSON parsing | Yes (OcrResult, OcrInput) |
| IntelliSense support | None | Full |
| Custom exception types | Must define yourself | Built-in |
| Retry logic | Must build yourself | Built-in |
| Processing | ||
| Processing location | Cloud servers | Local in-process |
| Internet required | Yes — every call | No |
| Air-gapped support | No | Yes |
| Rate limits | Yes — all plans | None |
| File size limits | 5 MB (free tier) | Memory only |
| OCR Capability | ||
| Image OCR | Yes | Yes |
| PDF OCR | Yes (with limits) | Yes (native, no size limit) |
| Searchable PDF output | Watermarked on free tier | Clean output, no watermarks |
| Automatic preprocessing | None (server-side only) | Deskew, DeNoise, Contrast, Binarize, EnhanceResolution |
| Manual preprocessing control | None | Full filter API |
| Language support | ~25 languages | 125+ via NuGet language packs |
| Multi-language per document | Limited | Yes — primary + secondary languages |
| Structured output (words/lines/pages) | No — plain text only | Yes — pages, paragraphs, lines, words with coordinates |
| Confidence scores | No | Yes — per result and per word |
| Region-based OCR | No | Yes — CropRectangle |
| Barcode reading during OCR | No | Yes — ReadBarCodes = true |
| Deployment | ||
| Docker | Requires outbound internet | Works out of the box |
| Linux | Requires outbound internet | Fully supported |
| HIPAA-compatible | Risk — documents leave your control | Yes — no external transmission |
| GDPR-compatible | Risk — EU data transfer unclear | Yes — no third-party data handling |
| Pricing | ||
| Pricing model | Monthly subscription | One-time perpetual license |
| Entry price | Contact OCR.space for current pricing | $999 one-time |
| Per-document cost at scale | Yes | None |
| SLA | None (free), varies (paid) | Enterprise SLA available |
SDK Depth vs. Raw HTTP: The Integration Cost
The gap between OCR.space and a native SDK is not a style preference. It is measured in hours of development time, surface area for bugs, and maintenance burden across every deployment.
OCR.space Approach
Every .NET developer using OCR.space writes their own HTTP client. The simplest possible implementation — basic text extraction with no retry logic, no batch support, and no preprocessing — is still 50+ lines:
// OCR.space: minimum viable extraction — 50+ lines before business logic
public class OcrSpaceBasicExtraction
{
private readonly HttpClient _httpClient;
private readonly string _apiKey;
public OcrSpaceBasicExtraction(string apiKey)
{
_apiKey = apiKey;
_httpClient = new HttpClient();
}
public async Task<string> ExtractText(string imagePath)
{
// Read file and encode — document leaves your infrastructure
byte[] imageBytes = File.ReadAllBytes(imagePath);
string base64 = Convert.ToBase64String(imageBytes);
var content = new FormUrlEncodedContent(new[]
{
new KeyValuePair<string, string>("apikey", _apiKey),
new KeyValuePair<string, string>("base64Image", $"data:image/png;base64,{base64}"),
new KeyValuePair<string, string>("language", "eng")
});
// Send to cloud
var response = await _httpClient.PostAsync(
"https://api.ocr.space/parse/image", content);
if (!response.IsSuccessStatusCode)
throw new Exception($"API error: {response.StatusCode}");
// Parse JSON manually — no typed models
string json = await response.Content.ReadAsStringAsync();
using var doc = JsonDocument.Parse(json);
return doc.RootElement
.GetProperty("ParsedResults")[0]
.GetProperty("ParsedText")
.GetString() ?? string.Empty;
}
}
// OCR.space: minimum viable extraction — 50+ lines before business logic
public class OcrSpaceBasicExtraction
{
private readonly HttpClient _httpClient;
private readonly string _apiKey;
public OcrSpaceBasicExtraction(string apiKey)
{
_apiKey = apiKey;
_httpClient = new HttpClient();
}
public async Task<string> ExtractText(string imagePath)
{
// Read file and encode — document leaves your infrastructure
byte[] imageBytes = File.ReadAllBytes(imagePath);
string base64 = Convert.ToBase64String(imageBytes);
var content = new FormUrlEncodedContent(new[]
{
new KeyValuePair<string, string>("apikey", _apiKey),
new KeyValuePair<string, string>("base64Image", $"data:image/png;base64,{base64}"),
new KeyValuePair<string, string>("language", "eng")
});
// Send to cloud
var response = await _httpClient.PostAsync(
"https://api.ocr.space/parse/image", content);
if (!response.IsSuccessStatusCode)
throw new Exception($"API error: {response.StatusCode}");
// Parse JSON manually — no typed models
string json = await response.Content.ReadAsStringAsync();
using var doc = JsonDocument.Parse(json);
return doc.RootElement
.GetProperty("ParsedResults")[0]
.GetProperty("ParsedText")
.GetString() ?? string.Empty;
}
}
Imports System
Imports System.Net.Http
Imports System.Threading.Tasks
Imports System.IO
Imports System.Text.Json
Public Class OcrSpaceBasicExtraction
Private ReadOnly _httpClient As HttpClient
Private ReadOnly _apiKey As String
Public Sub New(apiKey As String)
_apiKey = apiKey
_httpClient = New HttpClient()
End Sub
Public Async Function ExtractText(imagePath As String) As Task(Of String)
' Read file and encode — document leaves your infrastructure
Dim imageBytes As Byte() = File.ReadAllBytes(imagePath)
Dim base64 As String = Convert.ToBase64String(imageBytes)
Dim content = New FormUrlEncodedContent(New KeyValuePair(Of String, String)() {
New KeyValuePair(Of String, String)("apikey", _apiKey),
New KeyValuePair(Of String, String)("base64Image", $"data:image/png;base64,{base64}"),
New KeyValuePair(Of String, String)("language", "eng")
})
' Send to cloud
Dim response = Await _httpClient.PostAsync("https://api.ocr.space/parse/image", content)
If Not response.IsSuccessStatusCode Then
Throw New Exception($"API error: {response.StatusCode}")
End If
' Parse JSON manually — no typed models
Dim json As String = Await response.Content.ReadAsStringAsync()
Using doc As JsonDocument = JsonDocument.Parse(json)
Return doc.RootElement _
.GetProperty("ParsedResults")(0) _
.GetProperty("ParsedText") _
.GetString() OrElse String.Empty
End Using
End Function
End Class
Add batch processing and the line count climbs to 80+, because every batch request needs rate-limit throttling, per-request retry logic with exponential backoff, and explicit handling for HTTP 429 responses. Add PDF processing and you layer on a 5 MB file size check before every call, because the free tier will reject larger files with an error rather than a warning.
IronOCR Approach
Installing IronOCR from NuGet replaces all of that infrastructure code with method calls that already exist:
// IronOCR: complete implementation
var result = new IronTesseract().Read("invoice.png");
Console.WriteLine(result.Text);
// IronOCR: complete implementation
var result = new IronTesseract().Read("invoice.png");
Console.WriteLine(result.Text);
The IronTesseract class handles file reading, preprocessing, engine execution, and result construction. No HTTP client. No base64 encoding. No JSON parsing. No API key. No rate limiter. The reading text from images tutorial covers input variations — file paths, byte arrays, streams, bitmaps — that all follow the same one-call pattern.
The code complexity numbers from the source files are not hypothetical:
| Scenario | OCR.space Lines | IronOCR Lines |
|---|---|---|
| Basic extraction | 50+ | 3 |
| PDF processing | 60+ | 12 |
| Batch processing | 80+ | 15 |
| Error handling | 70+ | 15 |
| Full client infrastructure | 200+ | 0 (NuGet provides it) |
Batch Processing and Rate Limits
The rate-limit problem is where OCR.space's free tier most visibly breaks down under real production conditions.
OCR.space Approach
Batch processing under OCR.space requires building and managing a SemaphoreSlim to stay within the 60 requests-per-minute constraint, plus exponential-backoff retry logic for HTTP 429 responses that arrive when the semaphore logic is imprecise or when shared infrastructure shares an IP address:
// OCR.space batch: 80+ lines of rate-limit plumbing before actual work
public class OcrSpaceBatchProcessing
{
private readonly HttpClient _httpClient;
private readonly string _apiKey;
private readonly SemaphoreSlim _rateLimiter;
public OcrSpaceBatchProcessing(string apiKey)
{
_apiKey = apiKey;
_httpClient = new HttpClient();
_rateLimiter = new SemaphoreSlim(60, 60); // Free tier: 60/minute
}
public async Task<Dictionary<string, string>> ProcessBatch(string[] imagePaths)
{
var results = new Dictionary<string, string>();
foreach (var imagePath in imagePaths)
{
await _rateLimiter.WaitAsync();
try
{
string text = await ProcessWithRetry(imagePath);
results[imagePath] = text;
}
finally
{
_ = Task.Run(async () =>
{
await Task.Delay(1000); // 1 second spacing
_rateLimiter.Release();
});
}
}
return results;
}
private async Task<string> ProcessWithRetry(string imagePath, int maxRetries = 3)
{
for (int attempt = 0; attempt < maxRetries; attempt++)
{
try
{
byte[] imageBytes = File.ReadAllBytes(imagePath);
string base64 = Convert.ToBase64String(imageBytes);
var content = new FormUrlEncodedContent(new[]
{
new KeyValuePair<string, string>("apikey", _apiKey),
new KeyValuePair<string, string>("base64Image", $"data:image/png;base64,{base64}"),
new KeyValuePair<string, string>("language", "eng")
});
var response = await _httpClient.PostAsync(
"https://api.ocr.space/parse/image", content);
if (response.StatusCode == System.Net.HttpStatusCode.TooManyRequests)
{
// Rate limited — exponential backoff
await Task.Delay(TimeSpan.FromSeconds(Math.Pow(2, attempt)));
continue;
}
response.EnsureSuccessStatusCode();
string json = await response.Content.ReadAsStringAsync();
using var doc = JsonDocument.Parse(json);
return doc.RootElement
.GetProperty("ParsedResults")[0]
.GetProperty("ParsedText")
.GetString() ?? string.Empty;
}
catch (HttpRequestException) when (attempt < maxRetries - 1)
{
await Task.Delay(TimeSpan.FromSeconds(Math.Pow(2, attempt)));
}
}
throw new Exception($"Failed to process {imagePath} after {maxRetries} attempts");
}
}
// OCR.space batch: 80+ lines of rate-limit plumbing before actual work
public class OcrSpaceBatchProcessing
{
private readonly HttpClient _httpClient;
private readonly string _apiKey;
private readonly SemaphoreSlim _rateLimiter;
public OcrSpaceBatchProcessing(string apiKey)
{
_apiKey = apiKey;
_httpClient = new HttpClient();
_rateLimiter = new SemaphoreSlim(60, 60); // Free tier: 60/minute
}
public async Task<Dictionary<string, string>> ProcessBatch(string[] imagePaths)
{
var results = new Dictionary<string, string>();
foreach (var imagePath in imagePaths)
{
await _rateLimiter.WaitAsync();
try
{
string text = await ProcessWithRetry(imagePath);
results[imagePath] = text;
}
finally
{
_ = Task.Run(async () =>
{
await Task.Delay(1000); // 1 second spacing
_rateLimiter.Release();
});
}
}
return results;
}
private async Task<string> ProcessWithRetry(string imagePath, int maxRetries = 3)
{
for (int attempt = 0; attempt < maxRetries; attempt++)
{
try
{
byte[] imageBytes = File.ReadAllBytes(imagePath);
string base64 = Convert.ToBase64String(imageBytes);
var content = new FormUrlEncodedContent(new[]
{
new KeyValuePair<string, string>("apikey", _apiKey),
new KeyValuePair<string, string>("base64Image", $"data:image/png;base64,{base64}"),
new KeyValuePair<string, string>("language", "eng")
});
var response = await _httpClient.PostAsync(
"https://api.ocr.space/parse/image", content);
if (response.StatusCode == System.Net.HttpStatusCode.TooManyRequests)
{
// Rate limited — exponential backoff
await Task.Delay(TimeSpan.FromSeconds(Math.Pow(2, attempt)));
continue;
}
response.EnsureSuccessStatusCode();
string json = await response.Content.ReadAsStringAsync();
using var doc = JsonDocument.Parse(json);
return doc.RootElement
.GetProperty("ParsedResults")[0]
.GetProperty("ParsedText")
.GetString() ?? string.Empty;
}
catch (HttpRequestException) when (attempt < maxRetries - 1)
{
await Task.Delay(TimeSpan.FromSeconds(Math.Pow(2, attempt)));
}
}
throw new Exception($"Failed to process {imagePath} after {maxRetries} attempts");
}
}
Imports System
Imports System.Collections.Generic
Imports System.IO
Imports System.Net.Http
Imports System.Text.Json
Imports System.Threading
Imports System.Threading.Tasks
Public Class OcrSpaceBatchProcessing
Private ReadOnly _httpClient As HttpClient
Private ReadOnly _apiKey As String
Private ReadOnly _rateLimiter As SemaphoreSlim
Public Sub New(apiKey As String)
_apiKey = apiKey
_httpClient = New HttpClient()
_rateLimiter = New SemaphoreSlim(60, 60) ' Free tier: 60/minute
End Sub
Public Async Function ProcessBatch(imagePaths As String()) As Task(Of Dictionary(Of String, String))
Dim results As New Dictionary(Of String, String)()
For Each imagePath In imagePaths
Await _rateLimiter.WaitAsync()
Try
Dim text As String = Await ProcessWithRetry(imagePath)
results(imagePath) = text
Finally
_ = Task.Run(Async Function()
Await Task.Delay(1000) ' 1 second spacing
_rateLimiter.Release()
End Function)
End Try
Next
Return results
End Function
Private Async Function ProcessWithRetry(imagePath As String, Optional maxRetries As Integer = 3) As Task(Of String)
For attempt As Integer = 0 To maxRetries - 1
Try
Dim imageBytes As Byte() = File.ReadAllBytes(imagePath)
Dim base64 As String = Convert.ToBase64String(imageBytes)
Dim content As New FormUrlEncodedContent(New KeyValuePair(Of String, String)() {
New KeyValuePair(Of String, String)("apikey", _apiKey),
New KeyValuePair(Of String, String)("base64Image", $"data:image/png;base64,{base64}"),
New KeyValuePair(Of String, String)("language", "eng")
})
Dim response As HttpResponseMessage = Await _httpClient.PostAsync("https://api.ocr.space/parse/image", content)
If response.StatusCode = System.Net.HttpStatusCode.TooManyRequests Then
' Rate limited — exponential backoff
Await Task.Delay(TimeSpan.FromSeconds(Math.Pow(2, attempt)))
Continue For
End If
response.EnsureSuccessStatusCode()
Dim json As String = Await response.Content.ReadAsStringAsync()
Using doc As JsonDocument = JsonDocument.Parse(json)
Return doc.RootElement _
.GetProperty("ParsedResults")(0) _
.GetProperty("ParsedText") _
.GetString() OrElse String.Empty
End Using
Catch ex As HttpRequestException When attempt < maxRetries - 1
Await Task.Delay(TimeSpan.FromSeconds(Math.Pow(2, attempt)))
End Try
Next
Throw New Exception($"Failed to process {imagePath} after {maxRetries} attempts")
End Function
End Class
The 500 requests-per-day cap on the free tier further constrains this. An application processing 600 documents in a single day will fail partway through with no automatic rollover until the next calendar day.
IronOCR Approach
IronOCR has no rate limits. Batch processing is a loop:
// IronOCR batch: 8 lines, no rate limits, no retries needed
public Dictionary<string, string> ProcessBatch(string[] imagePaths)
{
var results = new Dictionary<string, string>();
var ocr = new IronTesseract();
foreach (var imagePath in imagePaths)
{
// No rate limits, no retries, no cloud dependency
var result = ocr.Read(imagePath);
results[imagePath] = result.Text;
}
return results;
}
// IronOCR batch: 8 lines, no rate limits, no retries needed
public Dictionary<string, string> ProcessBatch(string[] imagePaths)
{
var results = new Dictionary<string, string>();
var ocr = new IronTesseract();
foreach (var imagePath in imagePaths)
{
// No rate limits, no retries, no cloud dependency
var result = ocr.Read(imagePath);
results[imagePath] = result.Text;
}
return results;
}
Imports System.Collections.Generic
' IronOCR batch: 8 lines, no rate limits, no retries needed
Public Function ProcessBatch(imagePaths As String()) As Dictionary(Of String, String)
Dim results As New Dictionary(Of String, String)()
Dim ocr As New IronTesseract()
For Each imagePath As String In imagePaths
' No rate limits, no retries, no cloud dependency
Dim result = ocr.Read(imagePath)
results(imagePath) = result.Text
Next
Return results
End Function
Reusing the IronTesseract instance across the batch avoids repeated engine initialization overhead. Processing speed is bounded by local CPU and disk I/O, not by a remote API's throttle policy. The multithreading example shows how to parallelize across cores when throughput matters.
PDF Processing
PDF support illustrates the second major structural gap.
OCR.space Approach
OCR.space accepts PDF uploads on paid plans and partially on the free tier, but the 5 MB file size ceiling on free accounts excludes most multi-page documents. Free-tier searchable PDF output embeds a watermark. The implementation requires per-page result extraction and explicit file size checking before every request:
// OCR.space PDF: size check required, watermarks on free tier
public async Task<List<string>> ExtractTextFromPdf(string pdfPath)
{
var pageTexts = new List<string>();
// Reject before wasting a request quota entry
var fileInfo = new FileInfo(pdfPath);
if (fileInfo.Length > 5 * 1024 * 1024)
throw new InvalidOperationException("File exceeds 5MB free tier limit. Upgrade or split PDF.");
byte[] pdfBytes = File.ReadAllBytes(pdfPath);
string base64 = Convert.ToBase64String(pdfBytes);
var content = new FormUrlEncodedContent(new[]
{
new KeyValuePair<string, string>("apikey", _apiKey),
new KeyValuePair<string, string>("base64Image", $"data:application/pdf;base64,{base64}"),
new KeyValuePair<string, string>("language", "eng"),
new KeyValuePair<string, string>("filetype", "PDF"),
new KeyValuePair<string, string>("isCreateSearchablePdf", "false") // Watermarked on free tier
});
var response = await _httpClient.PostAsync("https://api.ocr.space/parse/image", content);
if (!response.IsSuccessStatusCode)
throw new Exception($"API error: {response.StatusCode}");
string json = await response.Content.ReadAsStringAsync();
using var doc = JsonDocument.Parse(json);
if (doc.RootElement.TryGetProperty("ParsedResults", out var results))
{
foreach (var result in results.EnumerateArray())
{
if (result.TryGetProperty("ParsedText", out var text))
pageTexts.Add(text.GetString() ?? string.Empty);
}
}
return pageTexts;
}
// OCR.space PDF: size check required, watermarks on free tier
public async Task<List<string>> ExtractTextFromPdf(string pdfPath)
{
var pageTexts = new List<string>();
// Reject before wasting a request quota entry
var fileInfo = new FileInfo(pdfPath);
if (fileInfo.Length > 5 * 1024 * 1024)
throw new InvalidOperationException("File exceeds 5MB free tier limit. Upgrade or split PDF.");
byte[] pdfBytes = File.ReadAllBytes(pdfPath);
string base64 = Convert.ToBase64String(pdfBytes);
var content = new FormUrlEncodedContent(new[]
{
new KeyValuePair<string, string>("apikey", _apiKey),
new KeyValuePair<string, string>("base64Image", $"data:application/pdf;base64,{base64}"),
new KeyValuePair<string, string>("language", "eng"),
new KeyValuePair<string, string>("filetype", "PDF"),
new KeyValuePair<string, string>("isCreateSearchablePdf", "false") // Watermarked on free tier
});
var response = await _httpClient.PostAsync("https://api.ocr.space/parse/image", content);
if (!response.IsSuccessStatusCode)
throw new Exception($"API error: {response.StatusCode}");
string json = await response.Content.ReadAsStringAsync();
using var doc = JsonDocument.Parse(json);
if (doc.RootElement.TryGetProperty("ParsedResults", out var results))
{
foreach (var result in results.EnumerateArray())
{
if (result.TryGetProperty("ParsedText", out var text))
pageTexts.Add(text.GetString() ?? string.Empty);
}
}
return pageTexts;
}
Imports System.IO
Imports System.Net.Http
Imports System.Text.Json
Imports System.Threading.Tasks
Public Class OCRSpaceClient
Private ReadOnly _apiKey As String
Private ReadOnly _httpClient As HttpClient
Public Sub New(apiKey As String, httpClient As HttpClient)
_apiKey = apiKey
_httpClient = httpClient
End Sub
Public Async Function ExtractTextFromPdf(pdfPath As String) As Task(Of List(Of String))
Dim pageTexts As New List(Of String)()
' Reject before wasting a request quota entry
Dim fileInfo As New FileInfo(pdfPath)
If fileInfo.Length > 5 * 1024 * 1024 Then
Throw New InvalidOperationException("File exceeds 5MB free tier limit. Upgrade or split PDF.")
End If
Dim pdfBytes As Byte() = File.ReadAllBytes(pdfPath)
Dim base64 As String = Convert.ToBase64String(pdfBytes)
Dim content As New FormUrlEncodedContent(New KeyValuePair(Of String, String)() {
New KeyValuePair(Of String, String)("apikey", _apiKey),
New KeyValuePair(Of String, String)("base64Image", $"data:application/pdf;base64,{base64}"),
New KeyValuePair(Of String, String)("language", "eng"),
New KeyValuePair(Of String, String)("filetype", "PDF"),
New KeyValuePair(Of String, String)("isCreateSearchablePdf", "false") ' Watermarked on free tier
})
Dim response As HttpResponseMessage = Await _httpClient.PostAsync("https://api.ocr.space/parse/image", content)
If Not response.IsSuccessStatusCode Then
Throw New Exception($"API error: {response.StatusCode}")
End If
Dim json As String = Await response.Content.ReadAsStringAsync()
Using doc As JsonDocument = JsonDocument.Parse(json)
If doc.RootElement.TryGetProperty("ParsedResults", results) Then
For Each result In results.EnumerateArray()
If result.TryGetProperty("ParsedText", text) Then
pageTexts.Add(If(text.GetString(), String.Empty))
End If
Next
End If
End Using
Return pageTexts
End Function
End Class
IronOCR Approach
IronOCR reads PDFs natively. The limit is available memory, not an arbitrary file size threshold. Searchable PDF output produces clean files with no watermarks, on any license tier:
// IronOCR PDF: native support, no size limit, no watermarks
public List<string> ExtractTextFromPdf(string pdfPath)
{
var pageTexts = new List<string>();
using var input = new OcrInput();
input.LoadPdf(pdfPath); // No 5MB ceiling
var result = new IronTesseract().Read(input);
foreach (var page in result.Pages)
pageTexts.Add(page.Text);
return pageTexts;
}
// Generate searchable PDF — no watermarks
public void CreateSearchablePdf(string inputPath, string outputPath)
{
var result = new IronTesseract().Read(inputPath);
result.SaveAsSearchablePdf(outputPath);
}
// IronOCR PDF: native support, no size limit, no watermarks
public List<string> ExtractTextFromPdf(string pdfPath)
{
var pageTexts = new List<string>();
using var input = new OcrInput();
input.LoadPdf(pdfPath); // No 5MB ceiling
var result = new IronTesseract().Read(input);
foreach (var page in result.Pages)
pageTexts.Add(page.Text);
return pageTexts;
}
// Generate searchable PDF — no watermarks
public void CreateSearchablePdf(string inputPath, string outputPath)
{
var result = new IronTesseract().Read(inputPath);
result.SaveAsSearchablePdf(outputPath);
}
Imports System.Collections.Generic
' IronOCR PDF: native support, no size limit, no watermarks
Public Function ExtractTextFromPdf(pdfPath As String) As List(Of String)
Dim pageTexts As New List(Of String)()
Using input As New OcrInput()
input.LoadPdf(pdfPath) ' No 5MB ceiling
Dim result = New IronTesseract().Read(input)
For Each page In result.Pages
pageTexts.Add(page.Text)
Next
End Using
Return pageTexts
End Function
' Generate searchable PDF — no watermarks
Public Sub CreateSearchablePdf(inputPath As String, outputPath As String)
Dim result = New IronTesseract().Read(inputPath)
result.SaveAsSearchablePdf(outputPath)
End Sub
Password-protected PDFs are a single parameter: input.LoadPdf("encrypted.pdf", Password: "secret"). The PDF OCR how-to guide covers page range selection and password handling in detail. For searchable PDF generation patterns, see the searchable PDF how-to.
Error Handling
OCR.space delivers errors through two separate channels: HTTP status codes and a JSON IsErroredOnProcessing flag. Production code must check both, parse the JSON error array when IsErroredOnProcessing is true, and inspect per-page FileParseExitCode values for partial failures. None of this is typed — everything is string parsing from JsonDocument:
// OCR.space: two error layers, all string-based
public async Task<string> SafeExtract(HttpClient client, string apiKey, string imagePath)
{
try
{
byte[] imageBytes = File.ReadAllBytes(imagePath);
string base64 = Convert.ToBase64String(imageBytes);
var content = new FormUrlEncodedContent(new[]
{
new KeyValuePair<string, string>("apikey", apiKey),
new KeyValuePair<string, string>("base64Image", $"data:image/png;base64,{base64}")
});
var response = await client.PostAsync("https://api.ocr.space/parse/image", content);
if (!response.IsSuccessStatusCode)
{
switch (response.StatusCode)
{
case System.Net.HttpStatusCode.Unauthorized:
throw new Exception("Invalid API key");
case System.Net.HttpStatusCode.TooManyRequests:
throw new Exception("Rate limit exceeded — wait and retry");
case System.Net.HttpStatusCode.PaymentRequired:
throw new Exception("Quota exceeded — upgrade plan");
default:
throw new Exception($"API error: {response.StatusCode}");
}
}
// Second error layer: JSON-level failures
string json = await response.Content.ReadAsStringAsync();
using var doc = JsonDocument.Parse(json);
if (doc.RootElement.TryGetProperty("IsErroredOnProcessing", out var isError)
&& isError.GetBoolean())
{
var messages = new List<string>();
if (doc.RootElement.TryGetProperty("ErrorMessage", out var errors))
{
foreach (var e in errors.EnumerateArray())
messages.Add(e.GetString() ?? "Unknown error");
}
throw new Exception(string.Join("; ", messages));
}
// Third layer: per-page exit codes
if (doc.RootElement.TryGetProperty("ParsedResults", out var results))
{
foreach (var result in results.EnumerateArray())
{
if (result.TryGetProperty("FileParseExitCode", out var exitCode)
&& exitCode.GetInt32() != 1)
throw new Exception($"Page parse failed: exit code {exitCode.GetInt32()}");
}
}
return doc.RootElement
.GetProperty("ParsedResults")[0]
.GetProperty("ParsedText")
.GetString() ?? string.Empty;
}
catch (JsonException ex)
{
throw new Exception($"Invalid API response: {ex.Message}");
}
catch (HttpRequestException ex)
{
throw new Exception($"Network error: {ex.Message}");
}
}
// OCR.space: two error layers, all string-based
public async Task<string> SafeExtract(HttpClient client, string apiKey, string imagePath)
{
try
{
byte[] imageBytes = File.ReadAllBytes(imagePath);
string base64 = Convert.ToBase64String(imageBytes);
var content = new FormUrlEncodedContent(new[]
{
new KeyValuePair<string, string>("apikey", apiKey),
new KeyValuePair<string, string>("base64Image", $"data:image/png;base64,{base64}")
});
var response = await client.PostAsync("https://api.ocr.space/parse/image", content);
if (!response.IsSuccessStatusCode)
{
switch (response.StatusCode)
{
case System.Net.HttpStatusCode.Unauthorized:
throw new Exception("Invalid API key");
case System.Net.HttpStatusCode.TooManyRequests:
throw new Exception("Rate limit exceeded — wait and retry");
case System.Net.HttpStatusCode.PaymentRequired:
throw new Exception("Quota exceeded — upgrade plan");
default:
throw new Exception($"API error: {response.StatusCode}");
}
}
// Second error layer: JSON-level failures
string json = await response.Content.ReadAsStringAsync();
using var doc = JsonDocument.Parse(json);
if (doc.RootElement.TryGetProperty("IsErroredOnProcessing", out var isError)
&& isError.GetBoolean())
{
var messages = new List<string>();
if (doc.RootElement.TryGetProperty("ErrorMessage", out var errors))
{
foreach (var e in errors.EnumerateArray())
messages.Add(e.GetString() ?? "Unknown error");
}
throw new Exception(string.Join("; ", messages));
}
// Third layer: per-page exit codes
if (doc.RootElement.TryGetProperty("ParsedResults", out var results))
{
foreach (var result in results.EnumerateArray())
{
if (result.TryGetProperty("FileParseExitCode", out var exitCode)
&& exitCode.GetInt32() != 1)
throw new Exception($"Page parse failed: exit code {exitCode.GetInt32()}");
}
}
return doc.RootElement
.GetProperty("ParsedResults")[0]
.GetProperty("ParsedText")
.GetString() ?? string.Empty;
}
catch (JsonException ex)
{
throw new Exception($"Invalid API response: {ex.Message}");
}
catch (HttpRequestException ex)
{
throw new Exception($"Network error: {ex.Message}");
}
}
Imports System
Imports System.Collections.Generic
Imports System.IO
Imports System.Net.Http
Imports System.Text.Json
Imports System.Threading.Tasks
Public Class OCRSpace
Public Async Function SafeExtract(client As HttpClient, apiKey As String, imagePath As String) As Task(Of String)
Try
Dim imageBytes As Byte() = File.ReadAllBytes(imagePath)
Dim base64 As String = Convert.ToBase64String(imageBytes)
Dim content As New FormUrlEncodedContent(New KeyValuePair(Of String, String)() {
New KeyValuePair(Of String, String)("apikey", apiKey),
New KeyValuePair(Of String, String)("base64Image", $"data:image/png;base64,{base64}")
})
Dim response As HttpResponseMessage = Await client.PostAsync("https://api.ocr.space/parse/image", content)
If Not response.IsSuccessStatusCode Then
Select Case response.StatusCode
Case System.Net.HttpStatusCode.Unauthorized
Throw New Exception("Invalid API key")
Case System.Net.HttpStatusCode.TooManyRequests
Throw New Exception("Rate limit exceeded — wait and retry")
Case System.Net.HttpStatusCode.PaymentRequired
Throw New Exception("Quota exceeded — upgrade plan")
Case Else
Throw New Exception($"API error: {response.StatusCode}")
End Select
End If
' Second error layer: JSON-level failures
Dim json As String = Await response.Content.ReadAsStringAsync()
Using doc As JsonDocument = JsonDocument.Parse(json)
If doc.RootElement.TryGetProperty("IsErroredOnProcessing", ByRef isError) AndAlso isError.GetBoolean() Then
Dim messages As New List(Of String)()
If doc.RootElement.TryGetProperty("ErrorMessage", ByRef errors) Then
For Each e In errors.EnumerateArray()
messages.Add(e.GetString() ?? "Unknown error")
Next
End If
Throw New Exception(String.Join("; ", messages))
End If
' Third layer: per-page exit codes
If doc.RootElement.TryGetProperty("ParsedResults", ByRef results) Then
For Each result In results.EnumerateArray()
If result.TryGetProperty("FileParseExitCode", ByRef exitCode) AndAlso exitCode.GetInt32() <> 1 Then
Throw New Exception($"Page parse failed: exit code {exitCode.GetInt32()}")
End If
Next
End If
Return doc.RootElement _
.GetProperty("ParsedResults")(0) _
.GetProperty("ParsedText") _
.GetString() ?? String.Empty
End Using
Catch ex As JsonException
Throw New Exception($"Invalid API response: {ex.Message}")
Catch ex As HttpRequestException
Throw New Exception($"Network error: {ex.Message}")
End Try
End Function
End Class
IronOCR raises standard .NET exceptions with clear messages. No JSON parsing, no HTTP status code switching, no layered error extraction:
// IronOCR: standard .NET exceptions
public string SafeExtract(string imagePath)
{
try
{
var result = new IronTesseract().Read(imagePath);
return result.Text;
}
catch (FileNotFoundException)
{
throw; // File not found — straightforward
}
catch (Exception ex) when (ex.Message.Contains("corrupt"))
{
throw new Exception($"Image file is corrupt: {imagePath}");
}
// No JSON errors. No HTTP errors. No rate limit errors.
}
// IronOCR: standard .NET exceptions
public string SafeExtract(string imagePath)
{
try
{
var result = new IronTesseract().Read(imagePath);
return result.Text;
}
catch (FileNotFoundException)
{
throw; // File not found — straightforward
}
catch (Exception ex) when (ex.Message.Contains("corrupt"))
{
throw new Exception($"Image file is corrupt: {imagePath}");
}
// No JSON errors. No HTTP errors. No rate limit errors.
}
Imports IronOcr
Public Function SafeExtract(imagePath As String) As String
Try
Dim result = New IronTesseract().Read(imagePath)
Return result.Text
Catch ex As FileNotFoundException
Throw ' File not found — straightforward
Catch ex As Exception When ex.Message.Contains("corrupt")
Throw New Exception($"Image file is corrupt: {imagePath}")
End Try
' No JSON errors. No HTTP errors. No rate limit errors.
End Function
The IronTesseract API reference and OcrResult API reference document the typed result structure, including the Confidence property for assessing extraction quality.
API Mapping Reference
| OCR.space Concept | IronOCR Equivalent |
|---|---|
POST https://api.ocr.space/parse/image |
new IronTesseract().Read(path) |
FormUrlEncodedContent / MultipartFormDataContent |
OcrInput |
base64Image parameter |
input.LoadImage(path) or input.LoadImage(bytes) |
language parameter |
ocr.Language = OcrLanguage.English |
OCREngine parameter |
Engine managed internally |
isOverlayRequired parameter |
result.Words / result.Lines (always available) |
isCreateSearchablePdf parameter |
result.SaveAsSearchablePdf(outputPath) |
filetype=PDF parameter |
input.LoadPdf(path) |
ParsedResults[0].ParsedText |
result.Text |
ParsedResults[n] (per page) |
result.Pages[n].Text |
IsErroredOnProcessing JSON flag |
Standard .NET exception |
FileParseExitCode per-page flag |
Standard .NET exception |
ProcessingTimeInMilliseconds |
Implicit — no extra parsing needed |
| HTTP 429 / rate limit | Not applicable — no rate limits |
Custom OcrResult POCO (user-defined) |
IronOcr.OcrResult (provided by NuGet) |
Custom OcrSpaceException (user-defined) |
Standard .NET exception types |
SemaphoreSlim rate limiter (user-built) |
Not needed |
| API key in every request | IronOcr.License.LicenseKey (once at startup) |
When Teams Consider Moving from OCR.space to IronOCR
Hitting the Free Tier During Load Testing
OCR.space's 500 requests-per-day per-IP ceiling is a hard wall, not a soft advisory. Teams discover this during load testing or staging deployments where multiple developers share an office IP address. A shared CI/CD pipeline that runs integration tests against OCR.space will exhaust the daily quota before the business day ends. At that point the application fails not because the code is wrong, but because a third-party counter reached an arbitrary threshold. Moving to local processing eliminates this category of failure entirely — there is no quota to exhaust because processing runs in-process.
Compliance and Data Classification Reviews
Security reviews that classify documents as PII, PHI, or financially sensitive create a binary problem for OCR.space: the service has no on-premise deployment path. There is no Business Associate Agreement (BAA) equivalent documented for HIPAA scenarios, no Data Processing Agreement structure that clearly governs EU data transfers under GDPR, and no technical mechanism to prevent document transmission even with contractual controls in place. Teams receiving compliance findings against cloud-transmitted document processing need a local solution. IronOCR satisfies this requirement by design — documents never leave the application server. The licensing page documents the perpetual license structure, which also simplifies compliance documentation by eliminating cloud-vendor review from the scope.
Integration Code Maintenance Burden
OCR.space integrations accumulate technical debt in proportion to their feature requirements. The initial HTTP client is manageable. Then the team adds retry logic. Then they add per-IP rate-limit tracking because multiple services share an address. Then they add a file-size pre-validation step. Then they discover the JSON error structure differs between Engine 1 and Engine 2 responses and add a branch. Six months later the OCR.space integration is 400 lines of infrastructure code that every new team member must understand before touching anything OCR-related. When that team evaluates the maintenance cost against a $999 one-time IronOCR license, the arithmetic is straightforward.
Structured Output Requirements
OCR.space returns plain text and nothing else. There are no word coordinates, no line boundaries, no paragraph segmentation, and no per-word confidence scores in the response. Applications that need to extract specific fields from invoices — vendor name at a known region, total amount in the bottom-right corner — have no structured foundation to build on from OCR.space's output. IronOCR exposes a full document hierarchy with coordinates and confidence values at every granularity level: pages, paragraphs, lines, and individual words. Region-based processing allows targeting specific document zones without post-processing the entire page output. The read results how-to and region-based OCR guide cover these patterns in detail.
Volume Growth Beyond the Free Tier
At 25,000 requests per month the free tier is functional for low-volume scenarios. Beyond that, paid tiers apply — contact OCR.space for current pricing. Those costs compound over time until the comparison is made against a $999 perpetual license with no per-request charges ever. Teams projecting document volume growth past 25,000 per month have a straightforward cost case for local processing.
Common Migration Considerations
Replacing the HTTP Client with a Method Call
The migration from OCR.space to IronOCR is architecturally simple because both return text from documents. The HTTP client, JSON parser, rate limiter, and custom exception types disappear. What replaces them is a single NuGet package and method calls. The before/after at the service boundary:
// Before: OCR.space service (simplified — actual implementation is 200+ lines)
public class OcrSpaceService : IDisposable
{
private readonly HttpClient _client = new();
private readonly string _apiKey;
public OcrSpaceService(string apiKey) => _apiKey = apiKey;
public async Task<string> ProcessDocument(string path)
{
byte[] bytes = File.ReadAllBytes(path);
string base64 = Convert.ToBase64String(bytes);
var content = new FormUrlEncodedContent(new[]
{
new KeyValuePair<string, string>("apikey", _apiKey),
new KeyValuePair<string, string>("base64Image", $"data:image/png;base64,{base64}")
});
var response = await _client.PostAsync("https://api.ocr.space/parse/image", content);
response.EnsureSuccessStatusCode();
string json = await response.Content.ReadAsStringAsync();
using var doc = JsonDocument.Parse(json);
return doc.RootElement.GetProperty("ParsedResults")[0].GetProperty("ParsedText").GetString()!;
}
public void Dispose() => _client.Dispose();
}
// After: IronOCR service — no HttpClient, no JSON, no API key
public class IronOcrService
{
private readonly IronTesseract _ocr = new();
public string ProcessDocument(string path)
{
return _ocr.Read(path).Text;
}
// No Dispose — no external connections to close
}
// Before: OCR.space service (simplified — actual implementation is 200+ lines)
public class OcrSpaceService : IDisposable
{
private readonly HttpClient _client = new();
private readonly string _apiKey;
public OcrSpaceService(string apiKey) => _apiKey = apiKey;
public async Task<string> ProcessDocument(string path)
{
byte[] bytes = File.ReadAllBytes(path);
string base64 = Convert.ToBase64String(bytes);
var content = new FormUrlEncodedContent(new[]
{
new KeyValuePair<string, string>("apikey", _apiKey),
new KeyValuePair<string, string>("base64Image", $"data:image/png;base64,{base64}")
});
var response = await _client.PostAsync("https://api.ocr.space/parse/image", content);
response.EnsureSuccessStatusCode();
string json = await response.Content.ReadAsStringAsync();
using var doc = JsonDocument.Parse(json);
return doc.RootElement.GetProperty("ParsedResults")[0].GetProperty("ParsedText").GetString()!;
}
public void Dispose() => _client.Dispose();
}
// After: IronOCR service — no HttpClient, no JSON, no API key
public class IronOcrService
{
private readonly IronTesseract _ocr = new();
public string ProcessDocument(string path)
{
return _ocr.Read(path).Text;
}
// No Dispose — no external connections to close
}
Imports System
Imports System.IO
Imports System.Net.Http
Imports System.Threading.Tasks
Imports System.Collections.Generic
Imports System.Text.Json
' Before: OCR.space service (simplified — actual implementation is 200+ lines)
Public Class OcrSpaceService
Implements IDisposable
Private ReadOnly _client As HttpClient = New HttpClient()
Private ReadOnly _apiKey As String
Public Sub New(apiKey As String)
_apiKey = apiKey
End Sub
Public Async Function ProcessDocument(path As String) As Task(Of String)
Dim bytes As Byte() = File.ReadAllBytes(path)
Dim base64 As String = Convert.ToBase64String(bytes)
Dim content As New FormUrlEncodedContent(New KeyValuePair(Of String, String)() {
New KeyValuePair(Of String, String)("apikey", _apiKey),
New KeyValuePair(Of String, String)("base64Image", $"data:image/png;base64,{base64}")
})
Dim response As HttpResponseMessage = Await _client.PostAsync("https://api.ocr.space/parse/image", content)
response.EnsureSuccessStatusCode()
Dim json As String = Await response.Content.ReadAsStringAsync()
Using doc As JsonDocument = JsonDocument.Parse(json)
Return doc.RootElement.GetProperty("ParsedResults")(0).GetProperty("ParsedText").GetString()
End Using
End Function
Public Sub Dispose() Implements IDisposable.Dispose
_client.Dispose()
End Sub
End Class
' After: IronOCR service — no HttpClient, no JSON, no API key
Public Class IronOcrService
Private ReadOnly _ocr As New IronTesseract()
Public Function ProcessDocument(path As String) As String
Return _ocr.Read(path).Text
End Function
' No Dispose — no external connections to close
End Class
The IDisposable implementation disappears because there is no HttpClient to manage.
API Key Removal
OCR.space requires an API key in every HTTP request. That key must be stored securely, rotated when exposed, and excluded from source control. IronOCR uses a license key set once at application startup, typically from an environment variable or configuration system:
// At application startup — once, not per request
IronOcr.License.LicenseKey = Environment.GetEnvironmentVariable("IRONOCR_LICENSE");
// At application startup — once, not per request
IronOcr.License.LicenseKey = Environment.GetEnvironmentVariable("IRONOCR_LICENSE");
' At application startup — once, not per request
IronOcr.License.LicenseKey = Environment.GetEnvironmentVariable("IRONOCR_LICENSE")
After migration, the API key rotation process, secret management configuration, and key-per-request injection logic all become unnecessary.
Preprocessing After Migration
OCR.space applies server-side processing before returning results, but developers have no control over what that processing does or does not include. IronOCR exposes explicit preprocessing methods. If document quality is a concern — skewed scans, low-contrast photocopies, noisy fax output — the preprocessing pipeline is three to five method calls:
using var input = new OcrInput();
input.LoadImage("low-quality-scan.jpg");
input.Deskew();
input.DeNoise();
input.Contrast();
input.Binarize();
input.EnhanceResolution(300);
var result = new IronTesseract().Read(input);
using var input = new OcrInput();
input.LoadImage("low-quality-scan.jpg");
input.Deskew();
input.DeNoise();
input.Contrast();
input.Binarize();
input.EnhanceResolution(300);
var result = new IronTesseract().Read(input);
Imports IronOcr
Using input As New OcrInput()
input.LoadImage("low-quality-scan.jpg")
input.Deskew()
input.DeNoise()
input.Contrast()
input.Binarize()
input.EnhanceResolution(300)
Dim result = New IronTesseract().Read(input)
End Using
The image quality correction guide and image color correction guide document every available filter with examples showing before/after accuracy differences.
Async vs. Synchronous
OCR.space calls are inherently async because they involve HTTP round trips. IronOCR calls are synchronous by default, which simplifies code in non-web contexts. For ASP.NET and server scenarios requiring non-blocking execution, IronOCR supports async operation via Task.Run wrapping or direct async API. The async OCR guide covers the patterns. Removing await from a service method that previously awaited a cloud call simplifies the call stack in contexts where async was adopted only because OCR.space required it.
Additional IronOCR Capabilities
Features not covered in the sections above, each representing functionality with no equivalent in OCR.space:
- hOCR export:
result.SaveAsHocrFile("output.hocr")produces standards-compliant hOCR output for downstream document processing pipelines. - Progress tracking: Subscribe to progress events for long-running multi-page batch operations, enabling progress bars and ETA calculations in UI applications.
- Table extraction: Structured table data is accessible via the result model, supporting use cases like invoice line-item extraction that require column alignment.
- Specialized document reading: Passport MRZ zones, license plates, MICR cheque lines, and handwriting recognition are purpose-built capabilities accessible through the same
IronTesseractAPI.
.NET Compatibility and Future Readiness
IronOCR targets .NET 6, .NET 7, .NET 8, and .NET 9, with active support for each LTS and STS release. The library runs on Windows x64, Windows x86, Linux x64, macOS, Docker, AWS Lambda, and Azure App Service — all from the same NuGet package with no platform-specific configuration. OCR.space requires outbound HTTPS from every deployment environment, which excludes air-gapped networks, restrictive corporate proxies, and infrastructure where outbound traffic to third-party APIs is blocked by policy. IronOCR has no runtime network requirements; it executes entirely in-process. As .NET 10 moves toward release in late 2026, IronOCR's track record of maintaining compatibility across .NET major versions provides continuity without integration changes.
Conclusion
OCR.space occupies a real and useful niche: developers who need to prototype an OCR feature over a weekend, students exploring text extraction concepts, or low-volume personal tools under 25,000 documents per month with no compliance requirements. For that audience, the free tier delivers genuine value.
The problem is that .NET developers are typically not building weekend prototypes. They are building invoicing systems, medical records processors, document archival pipelines, and compliance-sensitive business applications. For those contexts, OCR.space's absent NuGet package is not an inconvenience — it is a structural incompatibility. Every hour spent building the HTTP client, rate limiter, JSON parser, and retry infrastructure is an hour not spent on the application's actual requirements. That cost is front-loaded, but the maintenance cost continues for the life of the integration.
IronOCR addresses the specific failure mode that defines every OCR.space integration: the absence of a real SDK. One NuGet package, one method call, no HTTP plumbing, no rate limits, no documents transmitted to external servers. The $999 entry price is a one-time cost; recurring OCR.space subscription costs will exceed it over time, especially as volume grows. For teams with compliance requirements, local processing is the only option regardless of cost.
The comparison ultimately reduces to a single question: does the application require OCR as an external REST dependency, or as a library function? For production .NET applications, the answer is almost always the latter.
Frequently Asked Questions
What is OCR.space API?
OCR.space API is an OCR solution used by developers and enterprises to extract text from images and documents. It is one of several OCR options evaluated alongside IronOCR for .NET application development.
How does IronOCR compare to OCR.space API for .NET developers?
IronOCR is a NuGet-native .NET OCR library using IronTesseract as its core engine. Compared to OCR.space API, it offers simpler deployment (no SDK installers), flat-rate pricing, and a clean C# API without COM interop or cloud dependencies.
Is IronOCR easier to set up than OCR.space API?
IronOCR installs via a single NuGet package. There are no SDK installers, license files to copy, COM components to register, or separate runtime binaries to manage. The entire OCR engine is bundled in the package.
What accuracy differences exist between OCR.space API and IronOCR?
IronOCR achieves high recognition accuracy for standard business documents, invoices, receipts, and scanned forms. For highly degraded documents or uncommon scripts, accuracy varies by source quality. IronOCR includes image preprocessing filters to improve recognition on low-quality inputs.
Does IronOCR support PDF text extraction?
Yes. IronOCR extracts text from both native PDFs and scanned PDF images in a single call. It also supports multi-page TIFF files, images, and streams. For scanned PDFs, OCR is applied page-by-page with per-page result objects.
How does OCR.space API licensing compare to IronOCR?
IronOCR uses a flat-rate perpetual license with no per-page or per-scan charges. Organizations processing high document volumes pay the same license cost regardless of volume. Details and volume pricing are on the IronOCR licensing page.
What languages does IronOCR support?
IronOCR supports 127 languages via separate NuGet language packs. Adding a language requires a single 'dotnet add package IronOcr.Languages.{Language}' command. No manual file placement or path configuration is needed.
How do I install IronOCR in a .NET project?
Install via NuGet: 'Install-Package IronOcr' in Package Manager Console or 'dotnet add package IronOcr' in the CLI. Additional language packs are installed the same way. No native SDK installer is required.
Is IronOCR suitable for Docker and containerized deployments, unlike OCR.space?
Yes. IronOCR works in Docker containers via its NuGet package. The license key is set via an environment variable. No license files, SDK paths, or volume mounts are required for the OCR engine itself.
Can I try IronOCR before purchasing, compared to OCR.space?
Yes. IronOCR trial mode processes documents and returns OCR results with a watermark overlay on output. You can verify accuracy on your own documents before purchasing a license.
Does IronOCR support barcode reading alongside text extraction?
IronOCR focuses on text extraction and OCR. For barcode reading, Iron Software provides IronBarcode as a companion library. Both are available individually or as part of the Iron Suite bundle.
Is it easy to migrate from OCR.space API to IronOCR?
Migration from OCR.space API to IronOCR typically involves replacing initialization sequences with IronTesseract instantiation, removing COM lifecycle management, and updating API calls. Most migrations reduce code complexity significantly.

