Migrating from OCR.space to IronOCR
This guide walks .NET developers through replacing OCR.space's REST API integration with IronOCR, a native .NET library delivered as a single NuGet package. It covers the package swap, namespace cleanup, and four concrete code migration scenarios that are specific to the REST-to-local transition: multipart upload elimination, base64 encoding removal, OCR engine selection replacement, and structured data extraction. Developers who have read the Phase 1 comparison article will find this guide focused on the mechanical steps of the migration itself rather than the feature comparison.
Why Migrate from OCR.space
OCR.space fills a genuine niche: zero-cost experimentation for developers who want to test OCR in an afternoon without installing anything. The problem is that the free tier is designed for prototyping, not production. Once a .NET application moves toward real document volumes, compliance requirements, or team development, every characteristic of the OCR.space integration works against the application.
No NuGet package means no SDK and no IntelliSense. OCR.space provides a REST endpoint and documentation. The .NET integration — HTTP client construction, request serialization, response deserialization, error handling, and retry logic — is entirely the developer's responsibility. This is not a minor inconvenience. The minimum viable client is 80+ lines of infrastructure code before the first business-logic method is written. That code is undifferentiated across every OCR.space integration in every .NET codebase, and it accumulates bugs and maintenance burden over time.
Rate limits impose artificial ceilings on production applications. The free tier enforces 60 requests per minute and 500 requests per day per IP address. Both limits are hard walls. An application that exceeds 500 requests between midnight and the next midnight receives error responses until the counter resets. Production systems running in shared office networks or shared CI/CD environments can exhaust the daily quota before business hours end.
Documents leave your infrastructure on every call. OCR.space has no on-premise deployment option. Every request transmits the document — invoices, medical records, contracts, identity documents — to OCR.space's cloud servers. HIPAA, GDPR, and internal data classification policies that prohibit third-party transmission of sensitive documents make OCR.space architecturally incompatible, regardless of contractual controls.
The free tier produces watermarked searchable PDFs. Applications that generate searchable PDF output as a deliverable — document archival systems, compliance platforms, client-facing document portals — cannot use OCR.space's free tier for this purpose. The watermark is embedded in the output PDF and cannot be removed without a paid plan.
Subscription pricing grows with volume; the OCR.space PRO tier at $144 per year crosses IronOCR's $999 perpetual entry price before year six. Teams projecting document volume growth past the free tier threshold face compounding subscription costs against a fixed perpetual license. The $999 Lite license covers one developer and one deployment location with no per-request charges at any volume. See the IronOCR licensing page for tier details.
The Fundamental Problem
OCR.space requires you to build a complete HTTP client before processing a single document:
// OCR.space: 80+ lines of infrastructure before business logic
public class OcrSpaceApiClient : IDisposable
{
private readonly HttpClient _httpClient;
private readonly string _apiKey;
private readonly SemaphoreSlim _rateLimiter; // You implement this
public OcrSpaceApiClient(string apiKey)
{
_httpClient = new HttpClient();
_httpClient.Timeout = TimeSpan.FromSeconds(120);
_rateLimiter = new SemaphoreSlim(60, 60); // Free tier: 60/min
}
// ... 70+ more lines of HTTP plumbing follow
}
// OCR.space: 80+ lines of infrastructure before business logic
public class OcrSpaceApiClient : IDisposable
{
private readonly HttpClient _httpClient;
private readonly string _apiKey;
private readonly SemaphoreSlim _rateLimiter; // You implement this
public OcrSpaceApiClient(string apiKey)
{
_httpClient = new HttpClient();
_httpClient.Timeout = TimeSpan.FromSeconds(120);
_rateLimiter = new SemaphoreSlim(60, 60); // Free tier: 60/min
}
// ... 70+ more lines of HTTP plumbing follow
}
Imports System
Imports System.Net.Http
Imports System.Threading
' OCR.space: 80+ lines of infrastructure before business logic
Public Class OcrSpaceApiClient
Implements IDisposable
Private ReadOnly _httpClient As HttpClient
Private ReadOnly _apiKey As String
Private ReadOnly _rateLimiter As SemaphoreSlim ' You implement this
Public Sub New(apiKey As String)
_httpClient = New HttpClient()
_httpClient.Timeout = TimeSpan.FromSeconds(120)
_rateLimiter = New SemaphoreSlim(60, 60) ' Free tier: 60/min
End Sub
' ... 70+ more lines of HTTP plumbing follow
Public Sub Dispose() Implements IDisposable.Dispose
_httpClient.Dispose()
_rateLimiter.Dispose()
End Sub
End Class
IronOCR is a NuGet package. The entire client is already written:
// IronOCR: no client to build, no rate limiter to manage
IronOcr.License.LicenseKey = "YOUR-LICENSE-KEY";
var result = new IronTesseract().Read("document.jpg");
Console.WriteLine(result.Text);
// IronOCR: no client to build, no rate limiter to manage
IronOcr.License.LicenseKey = "YOUR-LICENSE-KEY";
var result = new IronTesseract().Read("document.jpg");
Console.WriteLine(result.Text);
Imports IronOcr
' IronOCR: no client to build, no rate limiter to manage
IronOcr.License.LicenseKey = "YOUR-LICENSE-KEY"
Dim result = New IronTesseract().Read("document.jpg")
Console.WriteLine(result.Text)
IronOCR vs OCR.space: Feature Comparison
The table below maps OCR.space concepts and constraints directly to IronOCR equivalents.
| Feature | OCR.space | IronOCR |
|---|---|---|
| NuGet package | None — REST API only | IronOcr — native .NET |
| SDK / IntelliSense | None — manual JSON | Full — typed API |
| Custom models required | No | No |
| Processing location | OCR.space cloud servers | Local — in-process |
| Internet dependency | Required for every call | None |
| Air-gapped deployment | Not supported | Fully supported |
| Rate limits | 60/min, 500/day (free) | None |
| File size limit | 5 MB (free tier) | Available memory only |
| PDF input | Yes (limited, 5 MB) | Yes — native, no size limit |
| Searchable PDF output | Watermarked on free tier | Clean output, all tiers |
| Automatic preprocessing | Server-side, no developer control | Deskew, DeNoise, Contrast, Binarize, Sharpen |
| Language support | ~25 languages | 125+ via NuGet language packs |
| Multi-language per document | Not supported | Yes — OcrLanguage.French + OcrLanguage.German |
| Structured output (words, lines) | Plain text only | Pages, paragraphs, lines, words with coordinates |
| Word-level confidence scores | Not available | Yes — word.Confidence |
| Region-based OCR | Not supported | Yes — CropRectangle |
| Barcode reading | Not supported | Yes — ReadBarCodes = true |
| Searchable PDF generation | Watermarked (free), clean (paid) | Clean output — all license tiers |
| HIPAA / GDPR compatibility | Risk — data transmitted externally | Yes — no external data transmission |
| Pricing model | Monthly subscription | One-time perpetual |
| Entry price | $12/month ($144/year) | $999 one-time |
| .NET compatibility | HttpClient — any .NET |
.NET 4.6.2+, .NET 5/6/7/8/9 |
| Cross-platform deployment | Requires outbound internet | Windows, Linux, macOS, Docker, Azure, AWS |
Quick Start: OCR.space to IronOCR Migration
Step 1: Replace NuGet Package
OCR.space has no NuGet package to uninstall. Remove all OCR.space-related infrastructure code from the project: the HttpClient wrapper class, the SemaphoreSlim rate limiter, the custom result models, and the custom exception types. All of these are replaced by IronOCR's NuGet package.
Install IronOCR from the IronOCR NuGet page:
dotnet add package IronOcr
Step 2: Update Namespaces
Remove OCR.space HTTP and JSON namespaces. Add the IronOCR namespace:
// Before (OCR.space — manually written infrastructure)
using System.Net.Http;
using System.Text.Json;
using System.Threading;
// After (IronOCR)
using IronOcr;
// Before (OCR.space — manually written infrastructure)
using System.Net.Http;
using System.Text.Json;
using System.Threading;
// After (IronOCR)
using IronOcr;
Imports System.Net.Http
Imports System.Text.Json
Imports System.Threading
Imports IronOcr
Step 3: Initialize License
Add license initialization once at application startup — not per request:
// Program.cs or application startup
IronOcr.License.LicenseKey = "YOUR-LICENSE-KEY";
// Program.cs or application startup
IronOcr.License.LicenseKey = "YOUR-LICENSE-KEY";
Imports IronOcr
IronOcr.License.LicenseKey = "YOUR-LICENSE-KEY"
Code Migration Examples
Replacing MultipartFormDataContent File Uploads
OCR.space requires building MultipartFormDataContent with the file bytes and API key, then POST-ing to the cloud endpoint. The document leaves your infrastructure on every call.
OCR.space Approach:
// MultipartFormDataContent: file upload to cloud on every request
public async Task<string> UploadAndExtract(string imagePath)
{
using var content = new MultipartFormDataContent();
var imageBytes = File.ReadAllBytes(imagePath);
// Document is transmitted to OCR.space servers here
content.Add(new ByteArrayContent(imageBytes), "file", Path.GetFileName(imagePath));
content.Add(new StringContent(_apiKey), "apikey");
content.Add(new StringContent("eng"), "language");
content.Add(new StringContent("2"), "OCREngine"); // Select Engine 2
var response = await _httpClient.PostAsync("https://api.ocr.space/parse/image", content);
response.EnsureSuccessStatusCode();
string json = await response.Content.ReadAsStringAsync();
using var doc = JsonDocument.Parse(json);
// Navigate JSON tree manually — no typed result
return doc.RootElement
.GetProperty("ParsedResults")[0]
.GetProperty("ParsedText")
.GetString() ?? string.Empty;
}
// MultipartFormDataContent: file upload to cloud on every request
public async Task<string> UploadAndExtract(string imagePath)
{
using var content = new MultipartFormDataContent();
var imageBytes = File.ReadAllBytes(imagePath);
// Document is transmitted to OCR.space servers here
content.Add(new ByteArrayContent(imageBytes), "file", Path.GetFileName(imagePath));
content.Add(new StringContent(_apiKey), "apikey");
content.Add(new StringContent("eng"), "language");
content.Add(new StringContent("2"), "OCREngine"); // Select Engine 2
var response = await _httpClient.PostAsync("https://api.ocr.space/parse/image", content);
response.EnsureSuccessStatusCode();
string json = await response.Content.ReadAsStringAsync();
using var doc = JsonDocument.Parse(json);
// Navigate JSON tree manually — no typed result
return doc.RootElement
.GetProperty("ParsedResults")[0]
.GetProperty("ParsedText")
.GetString() ?? string.Empty;
}
Imports System.IO
Imports System.Net.Http
Imports System.Text.Json
Imports System.Threading.Tasks
Public Class YourClassName
Private _apiKey As String
Private _httpClient As HttpClient
Public Async Function UploadAndExtract(imagePath As String) As Task(Of String)
Using content As New MultipartFormDataContent()
Dim imageBytes = File.ReadAllBytes(imagePath)
' Document is transmitted to OCR.space servers here
content.Add(New ByteArrayContent(imageBytes), "file", Path.GetFileName(imagePath))
content.Add(New StringContent(_apiKey), "apikey")
content.Add(New StringContent("eng"), "language")
content.Add(New StringContent("2"), "OCREngine") ' Select Engine 2
Dim response = Await _httpClient.PostAsync("https://api.ocr.space/parse/image", content)
response.EnsureSuccessStatusCode()
Dim json As String = Await response.Content.ReadAsStringAsync()
Using doc = JsonDocument.Parse(json)
' Navigate JSON tree manually — no typed result
Return doc.RootElement _
.GetProperty("ParsedResults")(0) _
.GetProperty("ParsedText") _
.GetString() OrElse String.Empty
End Using
End Using
End Function
End Class
IronOCR Approach:
// OcrInput replaces the entire upload + JSON pipeline
public string ExtractFromFile(string imagePath)
{
using var input = new OcrInput();
input.LoadImage(imagePath); // Stays local — no network call
var ocr = new IronTesseract();
var result = ocr.Read(input);
return result.Text; // Typed property — no JSON navigation
}
// OcrInput replaces the entire upload + JSON pipeline
public string ExtractFromFile(string imagePath)
{
using var input = new OcrInput();
input.LoadImage(imagePath); // Stays local — no network call
var ocr = new IronTesseract();
var result = ocr.Read(input);
return result.Text; // Typed property — no JSON navigation
}
Imports IronTesseract
Public Function ExtractFromFile(ByVal imagePath As String) As String
Using input As New OcrInput()
input.LoadImage(imagePath) ' Stays local — no network call
Dim ocr As New IronTesseract()
Dim result = ocr.Read(input)
Return result.Text ' Typed property — no JSON navigation
End Using
End Function
OcrInput is the local replacement for MultipartFormDataContent. It accepts file paths, byte arrays, streams, and multi-page TIFFs through a consistent API. The HttpClient, API key injection, and JSON navigation disappear entirely. The image input how-to covers every supported input format.
Eliminating Base64 Encoding
When OCR.space integrations use the base64Image form parameter instead of the file upload parameter, the code reads the file to bytes, encodes to Base64, constructs a data URI string, and embeds it in FormUrlEncodedContent. IronOCR accepts raw bytes directly with no encoding step.
OCR.space Approach:
// base64Image parameter: read → encode → embed in form → POST → parse
public async Task<string> ExtractViaBase64(string imagePath)
{
byte[] imageBytes = await File.ReadAllBytesAsync(imagePath);
string base64Image = Convert.ToBase64String(imageBytes); // Mandatory encoding step
// Embed as data URI — adds 33% overhead to payload size
string mimeType = "image/png";
string dataUri = $"data:{mimeType};base64,{base64Image}";
var formContent = new FormUrlEncodedContent(new[]
{
new KeyValuePair<string, string>("apikey", _apiKey),
new KeyValuePair<string, string>("base64Image", dataUri),
new KeyValuePair<string, string>("language", "eng"),
new KeyValuePair<string, string>("isOverlayRequired", "false")
});
var response = await _httpClient.PostAsync("https://api.ocr.space/parse/image", formContent);
string json = await response.Content.ReadAsStringAsync();
using var doc = JsonDocument.Parse(json);
return doc.RootElement
.GetProperty("ParsedResults")[0]
.GetProperty("ParsedText")
.GetString() ?? string.Empty;
}
// base64Image parameter: read → encode → embed in form → POST → parse
public async Task<string> ExtractViaBase64(string imagePath)
{
byte[] imageBytes = await File.ReadAllBytesAsync(imagePath);
string base64Image = Convert.ToBase64String(imageBytes); // Mandatory encoding step
// Embed as data URI — adds 33% overhead to payload size
string mimeType = "image/png";
string dataUri = $"data:{mimeType};base64,{base64Image}";
var formContent = new FormUrlEncodedContent(new[]
{
new KeyValuePair<string, string>("apikey", _apiKey),
new KeyValuePair<string, string>("base64Image", dataUri),
new KeyValuePair<string, string>("language", "eng"),
new KeyValuePair<string, string>("isOverlayRequired", "false")
});
var response = await _httpClient.PostAsync("https://api.ocr.space/parse/image", formContent);
string json = await response.Content.ReadAsStringAsync();
using var doc = JsonDocument.Parse(json);
return doc.RootElement
.GetProperty("ParsedResults")[0]
.GetProperty("ParsedText")
.GetString() ?? string.Empty;
}
Imports System
Imports System.Collections.Generic
Imports System.IO
Imports System.Net.Http
Imports System.Text.Json
Imports System.Threading.Tasks
Public Class ImageProcessor
Private _apiKey As String
Private _httpClient As HttpClient
Public Sub New(apiKey As String, httpClient As HttpClient)
_apiKey = apiKey
_httpClient = httpClient
End Sub
' base64Image parameter: read → encode → embed in form → POST → parse
Public Async Function ExtractViaBase64(imagePath As String) As Task(Of String)
Dim imageBytes As Byte() = Await File.ReadAllBytesAsync(imagePath)
Dim base64Image As String = Convert.ToBase64String(imageBytes) ' Mandatory encoding step
' Embed as data URI — adds 33% overhead to payload size
Dim mimeType As String = "image/png"
Dim dataUri As String = $"data:{mimeType};base64,{base64Image}"
Dim formContent As New FormUrlEncodedContent(New KeyValuePair(Of String, String)() {
New KeyValuePair(Of String, String)("apikey", _apiKey),
New KeyValuePair(Of String, String)("base64Image", dataUri),
New KeyValuePair(Of String, String)("language", "eng"),
New KeyValuePair(Of String, String)("isOverlayRequired", "false")
})
Dim response As HttpResponseMessage = Await _httpClient.PostAsync("https://api.ocr.space/parse/image", formContent)
Dim json As String = Await response.Content.ReadAsStringAsync()
Using doc As JsonDocument = JsonDocument.Parse(json)
Return doc.RootElement _
.GetProperty("ParsedResults")(0) _
.GetProperty("ParsedText") _
.GetString() OrElse String.Empty
End Using
End Function
End Class
IronOCR Approach:
// LoadImage(bytes): raw bytes accepted directly — no encoding
public string ExtractFromBytes(byte[] imageBytes)
{
using var input = new OcrInput();
input.LoadImage(imageBytes); // No Base64, no data URI, no overhead
var result = new IronTesseract().Read(input);
return result.Text;
}
// LoadImage(bytes): raw bytes accepted directly — no encoding
public string ExtractFromBytes(byte[] imageBytes)
{
using var input = new OcrInput();
input.LoadImage(imageBytes); // No Base64, no data URI, no overhead
var result = new IronTesseract().Read(input);
return result.Text;
}
Imports IronOcr
Public Function ExtractFromBytes(imageBytes As Byte()) As String
Using input As New OcrInput()
input.LoadImage(imageBytes) ' No Base64, no data URI, no overhead
Dim result = New IronTesseract().Read(input)
Return result.Text
End Using
End Function
The Base64 encoding step does not exist in IronOCR because there is no HTTP transport layer. Raw bytes go directly into OcrInput.LoadImage(). The data URI overhead — Base64 encoding inflates payload size by approximately 33% — also disappears. The stream input guide shows the same pattern for Stream inputs, which is useful when the bytes originate from an upload handler or memory buffer rather than a file.
Replacing OCR Engine Selection with Image Preprocessing
OCR.space exposes two OCR engines via the OCREngine form parameter: Engine 1 is faster with lower accuracy on complex layouts; Engine 2 is slower with higher accuracy on most document types. Developers select the engine per call based on document characteristics. IronOCR runs a single optimized Tesseract 5 engine but exposes explicit preprocessing filters that address the root cause — document quality — rather than switching between engine modes.
OCR.space Approach:
// OCREngine parameter: binary choice, no control over why accuracy differs
public async Task<string> ExtractWithEngineSelection(
string imagePath,
bool useHighAccuracyEngine = true)
{
byte[] imageBytes = await File.ReadAllBytesAsync(imagePath);
string base64 = Convert.ToBase64String(imageBytes);
var formContent = new FormUrlEncodedContent(new[]
{
new KeyValuePair<string, string>("apikey", _apiKey),
new KeyValuePair<string, string>("base64Image", $"data:image/png;base64,{base64}"),
new KeyValuePair<string, string>("language", "eng"),
// Engine 1 = faster, Engine 2 = higher accuracy — binary choice only
new KeyValuePair<string, string>("OCREngine", useHighAccuracyEngine ? "2" : "1"),
new KeyValuePair<string, string>("scale", "true"),
new KeyValuePair<string, string>("detectOrientation", "true")
});
var response = await _httpClient.PostAsync("https://api.ocr.space/parse/image", formContent);
string json = await response.Content.ReadAsStringAsync();
using var doc = JsonDocument.Parse(json);
return doc.RootElement
.GetProperty("ParsedResults")[0]
.GetProperty("ParsedText")
.GetString() ?? string.Empty;
}
// OCREngine parameter: binary choice, no control over why accuracy differs
public async Task<string> ExtractWithEngineSelection(
string imagePath,
bool useHighAccuracyEngine = true)
{
byte[] imageBytes = await File.ReadAllBytesAsync(imagePath);
string base64 = Convert.ToBase64String(imageBytes);
var formContent = new FormUrlEncodedContent(new[]
{
new KeyValuePair<string, string>("apikey", _apiKey),
new KeyValuePair<string, string>("base64Image", $"data:image/png;base64,{base64}"),
new KeyValuePair<string, string>("language", "eng"),
// Engine 1 = faster, Engine 2 = higher accuracy — binary choice only
new KeyValuePair<string, string>("OCREngine", useHighAccuracyEngine ? "2" : "1"),
new KeyValuePair<string, string>("scale", "true"),
new KeyValuePair<string, string>("detectOrientation", "true")
});
var response = await _httpClient.PostAsync("https://api.ocr.space/parse/image", formContent);
string json = await response.Content.ReadAsStringAsync();
using var doc = JsonDocument.Parse(json);
return doc.RootElement
.GetProperty("ParsedResults")[0]
.GetProperty("ParsedText")
.GetString() ?? string.Empty;
}
Imports System.IO
Imports System.Net.Http
Imports System.Text.Json
Imports System.Threading.Tasks
Public Class OCRService
Private _apiKey As String
Private _httpClient As HttpClient
Public Sub New(apiKey As String, httpClient As HttpClient)
_apiKey = apiKey
_httpClient = httpClient
End Sub
' OCREngine parameter: binary choice, no control over why accuracy differs
Public Async Function ExtractWithEngineSelection(imagePath As String, Optional useHighAccuracyEngine As Boolean = True) As Task(Of String)
Dim imageBytes As Byte() = Await File.ReadAllBytesAsync(imagePath)
Dim base64 As String = Convert.ToBase64String(imageBytes)
Dim formContent As New FormUrlEncodedContent(New KeyValuePair(Of String, String)() {
New KeyValuePair(Of String, String)("apikey", _apiKey),
New KeyValuePair(Of String, String)("base64Image", $"data:image/png;base64,{base64}"),
New KeyValuePair(Of String, String)("language", "eng"),
' Engine 1 = faster, Engine 2 = higher accuracy — binary choice only
New KeyValuePair(Of String, String)("OCREngine", If(useHighAccuracyEngine, "2", "1")),
New KeyValuePair(Of String, String)("scale", "true"),
New KeyValuePair(Of String, String)("detectOrientation", "true")
})
Dim response As HttpResponseMessage = Await _httpClient.PostAsync("https://api.ocr.space/parse/image", formContent)
Dim json As String = Await response.Content.ReadAsStringAsync()
Using doc As JsonDocument = JsonDocument.Parse(json)
Return doc.RootElement _
.GetProperty("ParsedResults")(0) _
.GetProperty("ParsedText") _
.GetString() OrElse String.Empty
End Using
End Function
End Class
IronOCR Approach:
// Preprocessing pipeline: fix the document, not the engine selection
public string ExtractWithPreprocessing(string imagePath)
{
using var input = new OcrInput();
input.LoadImage(imagePath);
// Apply filters that match the document's specific quality issues
input.Deskew(); // Correct rotation — replaces detectOrientation
input.DeNoise(); // Remove noise from fax/photocopier artifacts
input.Contrast(); // Enhance contrast on low-quality scans
input.Scale(200); // Upscale small or low-DPI images
var ocr = new IronTesseract();
var result = ocr.Read(input);
Console.WriteLine($"Confidence: {result.Confidence}%"); // No equivalent in OCR.space
return result.Text;
}
// Preprocessing pipeline: fix the document, not the engine selection
public string ExtractWithPreprocessing(string imagePath)
{
using var input = new OcrInput();
input.LoadImage(imagePath);
// Apply filters that match the document's specific quality issues
input.Deskew(); // Correct rotation — replaces detectOrientation
input.DeNoise(); // Remove noise from fax/photocopier artifacts
input.Contrast(); // Enhance contrast on low-quality scans
input.Scale(200); // Upscale small or low-DPI images
var ocr = new IronTesseract();
var result = ocr.Read(input);
Console.WriteLine($"Confidence: {result.Confidence}%"); // No equivalent in OCR.space
return result.Text;
}
Imports System
Public Class Example
Public Function ExtractWithPreprocessing(imagePath As String) As String
Using input As New OcrInput()
input.LoadImage(imagePath)
' Apply filters that match the document's specific quality issues
input.Deskew() ' Correct rotation — replaces detectOrientation
input.DeNoise() ' Remove noise from fax/photocopier artifacts
input.Contrast() ' Enhance contrast on low-quality scans
input.Scale(200) ' Upscale small or low-DPI images
Dim ocr As New IronTesseract()
Dim result = ocr.Read(input)
Console.WriteLine($"Confidence: {result.Confidence}%") ' No equivalent in OCR.space
Return result.Text
End Using
End Function
End Class
The OCR.space OCREngine parameter is a proxy for document quality — when Engine 1 fails on a document, developers switch to Engine 2 hoping the different algorithm compensates. IronOCR's preprocessing pipeline addresses the quality problem directly: Deskew() corrects skewed scans, DeNoise() handles fax artifacts, and Contrast() recovers text from low-contrast photocopies. The result's Confidence property quantifies extraction quality, which OCREngine switching cannot provide. The image quality correction guide and filter wizard document each filter's effect on different document types.
Multi-Language OCR Without Per-Call Language Switching
OCR.space accepts one language parameter per API call. Documents containing mixed languages require separate calls for each language, with results merged manually. IronOCR processes multiple languages simultaneously in a single read operation using the + operator on OcrLanguage values.
OCR.space Approach:
// OCR.space: one language per call — multi-language requires multiple requests
public async Task<string> ExtractMultiLanguage(string imagePath)
{
// First pass: English
string englishText = await ExtractWithLanguage(imagePath, "eng");
// Second pass: French (consumes another rate-limit slot, another API call)
string frenchText = await ExtractWithLanguage(imagePath, "fre");
// Manually merge results — no way to know which text belongs to which language
return $"{englishText}\n{frenchText}";
}
private async Task<string> ExtractWithLanguage(string imagePath, string langCode)
{
byte[] imageBytes = await File.ReadAllBytesAsync(imagePath);
string base64 = Convert.ToBase64String(imageBytes);
var content = new FormUrlEncodedContent(new[]
{
new KeyValuePair<string, string>("apikey", _apiKey),
new KeyValuePair<string, string>("base64Image", $"data:image/png;base64,{base64}"),
new KeyValuePair<string, string>("language", langCode) // One language per call
});
var response = await _httpClient.PostAsync("https://api.ocr.space/parse/image", content);
string json = await response.Content.ReadAsStringAsync();
using var doc = JsonDocument.Parse(json);
return doc.RootElement
.GetProperty("ParsedResults")[0]
.GetProperty("ParsedText")
.GetString() ?? string.Empty;
}
// OCR.space: one language per call — multi-language requires multiple requests
public async Task<string> ExtractMultiLanguage(string imagePath)
{
// First pass: English
string englishText = await ExtractWithLanguage(imagePath, "eng");
// Second pass: French (consumes another rate-limit slot, another API call)
string frenchText = await ExtractWithLanguage(imagePath, "fre");
// Manually merge results — no way to know which text belongs to which language
return $"{englishText}\n{frenchText}";
}
private async Task<string> ExtractWithLanguage(string imagePath, string langCode)
{
byte[] imageBytes = await File.ReadAllBytesAsync(imagePath);
string base64 = Convert.ToBase64String(imageBytes);
var content = new FormUrlEncodedContent(new[]
{
new KeyValuePair<string, string>("apikey", _apiKey),
new KeyValuePair<string, string>("base64Image", $"data:image/png;base64,{base64}"),
new KeyValuePair<string, string>("language", langCode) // One language per call
});
var response = await _httpClient.PostAsync("https://api.ocr.space/parse/image", content);
string json = await response.Content.ReadAsStringAsync();
using var doc = JsonDocument.Parse(json);
return doc.RootElement
.GetProperty("ParsedResults")[0]
.GetProperty("ParsedText")
.GetString() ?? string.Empty;
}
Imports System.IO
Imports System.Net.Http
Imports System.Text.Json
Imports System.Threading.Tasks
Public Class OCRSpace
Private _apiKey As String
Private _httpClient As HttpClient
Public Async Function ExtractMultiLanguage(imagePath As String) As Task(Of String)
' First pass: English
Dim englishText As String = Await ExtractWithLanguage(imagePath, "eng")
' Second pass: French (consumes another rate-limit slot, another API call)
Dim frenchText As String = Await ExtractWithLanguage(imagePath, "fre")
' Manually merge results — no way to know which text belongs to which language
Return $"{englishText}{vbLf}{frenchText}"
End Function
Private Async Function ExtractWithLanguage(imagePath As String, langCode As String) As Task(Of String)
Dim imageBytes As Byte() = Await File.ReadAllBytesAsync(imagePath)
Dim base64 As String = Convert.ToBase64String(imageBytes)
Dim content = New FormUrlEncodedContent(New KeyValuePair(Of String, String)() {
New KeyValuePair(Of String, String)("apikey", _apiKey),
New KeyValuePair(Of String, String)("base64Image", $"data:image/png;base64,{base64}"),
New KeyValuePair(Of String, String)("language", langCode) ' One language per call
})
Dim response = Await _httpClient.PostAsync("https://api.ocr.space/parse/image", content)
Dim json As String = Await response.Content.ReadAsStringAsync()
Using doc = JsonDocument.Parse(json)
Return doc.RootElement _
.GetProperty("ParsedResults")(0) _
.GetProperty("ParsedText") _
.GetString() OrElse String.Empty
End Using
End Function
End Class
IronOCR Approach:
// IronOCR: multiple languages in a single read — one pass, correct output
public string ExtractMultiLanguage(string imagePath)
{
var ocr = new IronTesseract();
// Combine languages with + operator — processed simultaneously
ocr.Language = OcrLanguage.English + OcrLanguage.French + OcrLanguage.German;
var result = ocr.Read(imagePath);
return result.Text; // Correctly interleaved multilingual output
}
// IronOCR: multiple languages in a single read — one pass, correct output
public string ExtractMultiLanguage(string imagePath)
{
var ocr = new IronTesseract();
// Combine languages with + operator — processed simultaneously
ocr.Language = OcrLanguage.English + OcrLanguage.French + OcrLanguage.German;
var result = ocr.Read(imagePath);
return result.Text; // Correctly interleaved multilingual output
}
Imports IronOcr
Public Function ExtractMultiLanguage(imagePath As String) As String
Dim ocr As New IronTesseract()
' Combine languages with + operator — processed simultaneously
ocr.Language = OcrLanguage.English + OcrLanguage.French + OcrLanguage.German
Dim result = ocr.Read(imagePath)
Return result.Text ' Correctly interleaved multilingual output
End Function
OCR.space's single-language-per-call constraint forces developers to make N API calls for an N-language document and guess at how to reconcile the results. IronOCR combines language models into a single engine pass, which produces correctly interleaved output without post-processing. Language packs install as NuGet packages — IronOcr.Languages.French, IronOcr.Languages.German, and so on — and work offline. The multiple languages how-to covers pack installation and the + operator syntax for all 125+ supported languages.
Structured Data Extraction with Word Coordinates
OCR.space returns plain text from ParsedResults[0].ParsedText. There is no word-level data, no bounding boxes, no line boundaries, and no per-element confidence scores. Applications that need to locate specific fields — a date in the upper-right corner of an invoice, a total in the bottom-right cell of a table — have no structured foundation to build on from OCR.space's response. IronOCR provides a full document hierarchy: pages, paragraphs, lines, words, and characters, each with pixel coordinates and confidence scores.
OCR.space Approach:
// OCR.space: plain text only — no structure, no coordinates
public async Task<string> ExtractInvoiceFields(string invoicePath)
{
byte[] invoiceBytes = await File.ReadAllBytesAsync(invoicePath);
string base64 = Convert.ToBase64String(invoiceBytes);
var content = new FormUrlEncodedContent(new[]
{
new KeyValuePair<string, string>("apikey", _apiKey),
new KeyValuePair<string, string>("base64Image", $"data:application/pdf;base64,{base64}"),
new KeyValuePair<string, string>("filetype", "PDF"),
new KeyValuePair<string, string>("language", "eng"),
// isOverlayRequired=true returns word boxes, but only as raw JSON coordinates
new KeyValuePair<string, string>("isOverlayRequired", "true")
});
var response = await _httpClient.PostAsync("https://api.ocr.space/parse/image", content);
string json = await response.Content.ReadAsStringAsync();
// Navigate deeply-nested JSON to find word boxes — no typed models
using var doc = JsonDocument.Parse(json);
var overlay = doc.RootElement
.GetProperty("ParsedResults")[0]
.GetProperty("TextOverlay");
// Parse word coordinate arrays manually — fragile JSON path traversal
var wordData = new List<(string word, int x, int y)>();
foreach (var line in overlay.GetProperty("Lines").EnumerateArray())
{
foreach (var word in line.GetProperty("Words").EnumerateArray())
{
string wordText = word.GetProperty("WordText").GetString() ?? "";
int left = word.GetProperty("Left").GetInt32();
int top = word.GetProperty("Top").GetInt32();
wordData.Add((wordText, left, top));
}
}
// Reconstruct full text from raw JSON — still no typed result
return string.Join(" ", wordData.Select(w => w.word));
}
// OCR.space: plain text only — no structure, no coordinates
public async Task<string> ExtractInvoiceFields(string invoicePath)
{
byte[] invoiceBytes = await File.ReadAllBytesAsync(invoicePath);
string base64 = Convert.ToBase64String(invoiceBytes);
var content = new FormUrlEncodedContent(new[]
{
new KeyValuePair<string, string>("apikey", _apiKey),
new KeyValuePair<string, string>("base64Image", $"data:application/pdf;base64,{base64}"),
new KeyValuePair<string, string>("filetype", "PDF"),
new KeyValuePair<string, string>("language", "eng"),
// isOverlayRequired=true returns word boxes, but only as raw JSON coordinates
new KeyValuePair<string, string>("isOverlayRequired", "true")
});
var response = await _httpClient.PostAsync("https://api.ocr.space/parse/image", content);
string json = await response.Content.ReadAsStringAsync();
// Navigate deeply-nested JSON to find word boxes — no typed models
using var doc = JsonDocument.Parse(json);
var overlay = doc.RootElement
.GetProperty("ParsedResults")[0]
.GetProperty("TextOverlay");
// Parse word coordinate arrays manually — fragile JSON path traversal
var wordData = new List<(string word, int x, int y)>();
foreach (var line in overlay.GetProperty("Lines").EnumerateArray())
{
foreach (var word in line.GetProperty("Words").EnumerateArray())
{
string wordText = word.GetProperty("WordText").GetString() ?? "";
int left = word.GetProperty("Left").GetInt32();
int top = word.GetProperty("Top").GetInt32();
wordData.Add((wordText, left, top));
}
}
// Reconstruct full text from raw JSON — still no typed result
return string.Join(" ", wordData.Select(w => w.word));
}
Imports System.IO
Imports System.Net.Http
Imports System.Text.Json
Imports System.Threading.Tasks
Public Class InvoiceProcessor
Private _apiKey As String
Private _httpClient As HttpClient
Public Sub New(apiKey As String, httpClient As HttpClient)
_apiKey = apiKey
_httpClient = httpClient
End Sub
Public Async Function ExtractInvoiceFields(invoicePath As String) As Task(Of String)
Dim invoiceBytes As Byte() = Await File.ReadAllBytesAsync(invoicePath)
Dim base64 As String = Convert.ToBase64String(invoiceBytes)
Dim content As New FormUrlEncodedContent(New KeyValuePair(Of String, String)() {
New KeyValuePair(Of String, String)("apikey", _apiKey),
New KeyValuePair(Of String, String)("base64Image", $"data:application/pdf;base64,{base64}"),
New KeyValuePair(Of String, String)("filetype", "PDF"),
New KeyValuePair(Of String, String)("language", "eng"),
New KeyValuePair(Of String, String)("isOverlayRequired", "true")
})
Dim response As HttpResponseMessage = Await _httpClient.PostAsync("https://api.ocr.space/parse/image", content)
Dim json As String = Await response.Content.ReadAsStringAsync()
Using doc As JsonDocument = JsonDocument.Parse(json)
Dim overlay = doc.RootElement _
.GetProperty("ParsedResults")(0) _
.GetProperty("TextOverlay")
Dim wordData As New List(Of (word As String, x As Integer, y As Integer))()
For Each line In overlay.GetProperty("Lines").EnumerateArray()
For Each word In line.GetProperty("Words").EnumerateArray()
Dim wordText As String = word.GetProperty("WordText").GetString() OrElse ""
Dim left As Integer = word.GetProperty("Left").GetInt32()
Dim top As Integer = word.GetProperty("Top").GetInt32()
wordData.Add((wordText, left, top))
Next
Next
Return String.Join(" ", wordData.Select(Function(w) w.word))
End Using
End Function
End Class
IronOCR Approach:
// IronOCR: full document hierarchy — typed, no JSON, no coordinates parsing
public void ExtractInvoiceFields(string invoicePath)
{
var ocr = new IronTesseract();
var result = ocr.Read(invoicePath);
// Access the full document hierarchy — all strongly typed
foreach (var page in result.Pages)
{
foreach (var paragraph in page.Paragraphs)
{
Console.WriteLine($"Paragraph at ({paragraph.X}, {paragraph.Y}): {paragraph.Text}");
}
foreach (var word in page.Words)
{
// Word-level confidence — identify low-quality extractions
if (word.Confidence < 70)
Console.WriteLine($"Low confidence word '{word.Text}' at ({word.X}, {word.Y})");
}
}
// Or use region-based OCR to target specific invoice zones directly
var totalRegion = new CropRectangle(400, 700, 200, 50); // Bottom-right total field
using var input = new OcrInput();
input.LoadImage(invoicePath, totalRegion);
string totalText = ocr.Read(input).Text;
Console.WriteLine($"Invoice total: {totalText}");
}
// IronOCR: full document hierarchy — typed, no JSON, no coordinates parsing
public void ExtractInvoiceFields(string invoicePath)
{
var ocr = new IronTesseract();
var result = ocr.Read(invoicePath);
// Access the full document hierarchy — all strongly typed
foreach (var page in result.Pages)
{
foreach (var paragraph in page.Paragraphs)
{
Console.WriteLine($"Paragraph at ({paragraph.X}, {paragraph.Y}): {paragraph.Text}");
}
foreach (var word in page.Words)
{
// Word-level confidence — identify low-quality extractions
if (word.Confidence < 70)
Console.WriteLine($"Low confidence word '{word.Text}' at ({word.X}, {word.Y})");
}
}
// Or use region-based OCR to target specific invoice zones directly
var totalRegion = new CropRectangle(400, 700, 200, 50); // Bottom-right total field
using var input = new OcrInput();
input.LoadImage(invoicePath, totalRegion);
string totalText = ocr.Read(input).Text;
Console.WriteLine($"Invoice total: {totalText}");
}
Imports IronOcr
Public Sub ExtractInvoiceFields(invoicePath As String)
Dim ocr As New IronTesseract()
Dim result = ocr.Read(invoicePath)
' Access the full document hierarchy — all strongly typed
For Each page In result.Pages
For Each paragraph In page.Paragraphs
Console.WriteLine($"Paragraph at ({paragraph.X}, {paragraph.Y}): {paragraph.Text}")
Next
For Each word In page.Words
' Word-level confidence — identify low-quality extractions
If word.Confidence < 70 Then
Console.WriteLine($"Low confidence word '{word.Text}' at ({word.X}, {word.Y})")
End If
Next
Next
' Or use region-based OCR to target specific invoice zones directly
Dim totalRegion As New CropRectangle(400, 700, 200, 50) ' Bottom-right total field
Using input As New OcrInput()
input.LoadImage(invoicePath, totalRegion)
Dim totalText As String = ocr.Read(input).Text
Console.WriteLine($"Invoice total: {totalText}")
End Using
End Sub
The OCR.space isOverlayRequired=true flag returns JSON word coordinates, but the response structure requires navigating nested JSON arrays with string-keyed property access — no typed model, no IntelliSense, and fragile path traversal that breaks if the response structure changes. IronOCR's result.Pages, result.Words, and result.Lines are typed .NET objects. The CropRectangle approach targets specific document regions directly rather than extracting the full document and filtering by coordinates afterward. The read results how-to and region-based OCR guide cover both patterns in detail.
OCR.space API to IronOCR Mapping Reference
| OCR.space Concept | IronOCR Equivalent |
|---|---|
| No NuGet package | dotnet add package IronOcr |
HttpClient construction |
Not needed — no HTTP layer |
SemaphoreSlim rate limiter |
Not needed — no rate limits |
FormUrlEncodedContent / MultipartFormDataContent |
OcrInput |
base64Image data URI parameter |
input.LoadImage(bytes) |
file upload parameter |
input.LoadImage(path) |
apikey header / form field |
IronOcr.License.LicenseKey (once at startup) |
language parameter (one per call) |
ocr.Language = OcrLanguage.English + OcrLanguage.French |
OCREngine=1 (fast) |
Default engine (optimized Tesseract 5) |
OCREngine=2 (high accuracy) |
input.Deskew(); input.DeNoise(); input.Contrast(); |
scale=true parameter |
input.Scale(200) |
detectOrientation=true parameter |
input.Deskew() |
isOverlayRequired=true parameter |
result.Pages[n].Words (always available, typed) |
isCreateSearchablePdf=true parameter |
result.SaveAsSearchablePdf("output.pdf") |
filetype=PDF parameter |
input.LoadPdf(path) |
ParsedResults[0].ParsedText |
result.Text |
ParsedResults[n] (per-page text) |
result.Pages[n].Text |
TextOverlay.Lines[n].Words[n].WordText |
result.Pages[n].Words[n].Text |
TextOverlay.Lines[n].Words[n].Left/Top |
result.Pages[n].Words[n].X / .Y |
IsErroredOnProcessing JSON flag |
Standard Exception with message |
FileParseExitCode per-page flag |
Standard Exception with message |
| HTTP 429 Too Many Requests | Not applicable — no rate limits |
Custom OcrResult POCO (user-defined) |
IronOcr.OcrResult (provided by NuGet) |
Custom OcrSpaceException (user-defined) |
Standard .NET exception types |
Common Migration Issues and Solutions
Issue 1: Async Code That Existed Only for HTTP
OCR.space: Every OCR call is async because it involves an HTTP round trip to the cloud. Service methods, controller actions, and background jobs were made async to avoid blocking the thread on the network wait.
Solution: IronOCR's Read() method is synchronous. Remove async/await from methods that were async purely because OCR.space required it. In ASP.NET Core contexts where non-blocking execution matters, wrap the synchronous call in Task.Run() or use the async patterns documented in the async OCR guide. Do not reflexively add await to IronOCR calls — it is not required and adds unnecessary overhead in non-web contexts.
// Before: async because OCR.space required network I/O
public async Task<string> ProcessDocumentAsync(string path)
{
return await _ocrSpaceClient.ExtractTextAsync(path); // Network wait
}
// After: synchronous — no network, no async needed
public string ProcessDocument(string path)
{
return _ocr.Read(path).Text; // Local execution
}
// Before: async because OCR.space required network I/O
public async Task<string> ProcessDocumentAsync(string path)
{
return await _ocrSpaceClient.ExtractTextAsync(path); // Network wait
}
// After: synchronous — no network, no async needed
public string ProcessDocument(string path)
{
return _ocr.Read(path).Text; // Local execution
}
Imports System.Threading.Tasks
' Before: async because OCR.space required network I/O
Public Async Function ProcessDocumentAsync(path As String) As Task(Of String)
Return Await _ocrSpaceClient.ExtractTextAsync(path) ' Network wait
End Function
' After: synchronous — no network, no async needed
Public Function ProcessDocument(path As String) As String
Return _ocr.Read(path).Text ' Local execution
End Function
Issue 2: API Key Storage and Rotation Infrastructure
OCR.space: The API key must be injected into every request. Teams typically store it in appsettings.json or environment variables, inject it through IOptions<T> or constructor injection, and rotate it when exposed. Key rotation requires updating every deployment environment and restarting the application.
Solution: The IronOCR license key is set once at startup and never referenced again during execution. Remove the per-request key injection pattern. Remove the IOptions<OcrSpaceSettings> configuration class. The key initialization pattern is one line:
// Startup.cs or Program.cs — once only
IronOcr.License.LicenseKey = Environment.GetEnvironmentVariable("IRONOCR_LICENSE_KEY");
// Startup.cs or Program.cs — once only
IronOcr.License.LicenseKey = Environment.GetEnvironmentVariable("IRONOCR_LICENSE_KEY");
Imports System
' Startup.vb or Program.vb — once only
IronOcr.License.LicenseKey = Environment.GetEnvironmentVariable("IRONOCR_LICENSE_KEY")
There is no per-request credential injection, no key rotation procedure, and no risk of accidentally logging the key in request traces.
Issue 3: File Size Pre-Validation Logic
OCR.space: The free tier rejects files over 5 MB with an error response. Production code adds a file size check before every request to avoid wasting a rate-limit slot on a call that will fail:
// OCR.space: pre-validation to avoid wasting quota on large files
var fileInfo = new FileInfo(filePath);
if (fileInfo.Length > 5 * 1024 * 1024)
throw new InvalidOperationException("File exceeds 5MB free tier limit.");
// OCR.space: pre-validation to avoid wasting quota on large files
var fileInfo = new FileInfo(filePath);
if (fileInfo.Length > 5 * 1024 * 1024)
throw new InvalidOperationException("File exceeds 5MB free tier limit.");
Dim fileInfo As New FileInfo(filePath)
If fileInfo.Length > 5 * 1024 * 1024 Then
Throw New InvalidOperationException("File exceeds 5MB free tier limit.")
End If
Solution: Delete this check entirely. IronOCR's OcrInput.LoadPdf() and OcrInput.LoadImage() have no size limit beyond available system memory. The artificial 5 MB threshold exists only because OCR.space's free tier imposes it for server capacity reasons. A 50 MB scanned PDF loads the same way as a 500 KB one.
Issue 4: JSON Response Navigation Fragility
OCR.space: Response parsing relies on navigating JsonDocument with string-keyed property access. Code like doc.RootElement.GetProperty("ParsedResults")[0].GetProperty("ParsedText") throws KeyNotFoundException if the response shape changes and IndexOutOfRangeException if ParsedResults is empty. Both require try-catch guards or null checks throughout.
Solution: IronOCR returns a typed OcrResult object. The .Text property is always a string — never null, never missing. If OCR produces no output (blank page, unreadable image), result.Text is an empty string. There is no JSON to navigate and no property-path fragility to guard against. For confidence-based filtering, result.Confidence returns a double that you compare directly:
// IronOCR: typed result — no JSON path fragility
var result = new IronTesseract().Read("document.jpg");
if (result.Confidence < 50)
Console.WriteLine("Low confidence — consider preprocessing");
else
Console.WriteLine(result.Text);
// IronOCR: typed result — no JSON path fragility
var result = new IronTesseract().Read("document.jpg");
if (result.Confidence < 50)
Console.WriteLine("Low confidence — consider preprocessing");
else
Console.WriteLine(result.Text);
Imports IronOcr
Dim result = New IronTesseract().Read("document.jpg")
If result.Confidence < 50 Then
Console.WriteLine("Low confidence — consider preprocessing")
Else
Console.WriteLine(result.Text)
End If
The confidence scores how-to covers per-word and per-document confidence thresholds.
Issue 5: Shared-IP Rate Limit Exhaustion in CI/CD
OCR.space: CI/CD pipelines that run integration tests against OCR.space use the same outbound IP address as the development office network. Free tier accounts share a 500-requests-per-day limit per IP. A pipeline that processes 200 test documents per run can exhaust the daily quota before the first developer runs a manual test. Teams work around this by mocking OCR.space responses in tests, which defeats the purpose of integration testing.
Solution: IronOCR processes locally. The test suite calls new IronTesseract().Read(testImagePath).Text directly — no mocking required, no quota to exhaust, no network dependency. Integration tests run in CI/CD with the same real OCR results as production without any rate-limit management or test isolation patterns.
Issue 6: IDisposable Pattern from HttpClient Management
OCR.space: The HttpClient wrapper class implements IDisposable to release the HTTP connection pool. Every consumer of the OCR service must either inject a singleton, use using blocks, or register it with the DI container's disposal lifecycle. Forgetting to dispose causes socket exhaustion under load.
Solution: IronTesseract does not manage network connections. It does not implement IDisposable. Create one instance per thread (or per request in ASP.NET), call .Read(), and let the GC collect it. The OcrInput class implements IDisposable and should be wrapped in using blocks when preprocessing is applied, but the primary IronTesseract class needs no lifecycle management. Remove the IDisposable implementation from your OCR service wrapper and simplify the DI registration from scoped/transient with disposal to a simple factory or singleton.
OCR.space Migration Checklist
Pre-Migration Tasks
Audit the codebase to identify all OCR.space integration points:
# Find all OCR.space HTTP calls
grep -rn "ocr.space" --include="*.cs" .
# Find all base64 encoding related to OCR
grep -rn "base64Image\|Convert.ToBase64String" --include="*.cs" .
# Find rate limiter and retry logic
grep -rn "SemaphoreSlim\|TooManyRequests\|exponential" --include="*.cs" .
# Find JSON parsing for OCR responses
grep -rn "ParsedResults\|IsErroredOnProcessing\|FileParseExitCode" --include="*.cs" .
# Find custom OCR models and exception types
grep -rn "OcrSpaceException\|OcrResult\b" --include="*.cs" .
# Find API key configuration
grep -rn "apikey\|OcrSpaceApiKey\|ocr_space" --include="*.cs" --include="*.json" .
# Find all OCR.space HTTP calls
grep -rn "ocr.space" --include="*.cs" .
# Find all base64 encoding related to OCR
grep -rn "base64Image\|Convert.ToBase64String" --include="*.cs" .
# Find rate limiter and retry logic
grep -rn "SemaphoreSlim\|TooManyRequests\|exponential" --include="*.cs" .
# Find JSON parsing for OCR responses
grep -rn "ParsedResults\|IsErroredOnProcessing\|FileParseExitCode" --include="*.cs" .
# Find custom OCR models and exception types
grep -rn "OcrSpaceException\|OcrResult\b" --include="*.cs" .
# Find API key configuration
grep -rn "apikey\|OcrSpaceApiKey\|ocr_space" --include="*.cs" --include="*.json" .
Document the list of files containing OCR.space code. Note which methods are async solely because of OCR.space's HTTP dependency — these can be made synchronous after migration.
Code Update Tasks
- Install the
IronOcrNuGet package:dotnet add package IronOcr - Add
IronOcr.License.LicenseKey = "..."to application startup - Delete the
OcrSpaceApiClientclass and all supporting infrastructure - Delete the custom
OcrResultPOCO (replaced byIronOcr.OcrResult) - Delete the custom
OcrSpaceExceptionclass (replaced by standard .NET exceptions) - Delete the
SemaphoreSlimrate limiter and associated Task.Delay logic - Remove all
Convert.ToBase64String()calls used for OCR image encoding - Replace
FormUrlEncodedContent/MultipartFormDataContentconstruction withOcrInput - Replace
_httpClient.PostAsync(...)calls withnew IronTesseract().Read(input) - Replace
JsonDocumentparsing ofParsedResults[0].ParsedTextwithresult.Text - Replace
TextOverlayJSON coordinate parsing withresult.Pages[n].Words - Replace
OCREngineparameter switching with appropriate preprocessing filters - Replace
languageparameter strings withOcrLanguageenum values - Remove file size pre-validation checks (5 MB limit no longer applies)
- Convert
async Task<string>OCR methods to synchronousstringwhere HTTP was the only async reason - Remove OCR.space API key from configuration files and environment variable setup
Post-Migration Testing
- Verify text extraction produces equivalent or higher accuracy on the same test documents
- Confirm large files (over 5 MB) process without errors
- Test multi-language documents with
OcrLanguage.English + OcrLanguage.Frenchand verify interleaved output - Run the CI/CD pipeline with real OCR calls — confirm no rate-limit errors at any document volume
- Validate searchable PDF output has no watermarks
- Check that previously async controller actions still respond correctly after synchronous conversion
- Test that air-gapped or network-restricted deployment environments process documents without errors
- Confirm
result.Confidencevalues are acceptable on documents that previously requiredOCREngine=2 - Verify
result.Pages[n].Wordscoordinates match expected field positions in structured documents - Test that the application startup license initialization succeeds before the first OCR call
Key Benefits of Migrating to IronOCR
The 80+ line infrastructure tax disappears. Every OCR.space integration ships an HTTP client, a rate limiter, a JSON deserializer, custom exception types, and custom result models. None of that code does anything the application actually needs — it exists to compensate for OCR.space's absent SDK. After migration, that code is deleted. The OCR surface area in the codebase shrinks to new IronTesseract().Read(path).Text at the call site and one license initialization line at startup.
Document processing speed becomes a function of local hardware. OCR.space introduces network latency, OCR.space server queue depth, and geographic round-trip time into every processing operation. IronOCR executes in-process. A local workstation processes documents faster than any cloud API at any throughput level, without the 60-requests-per-minute cap that serializes batch processing. Parallel processing with Parallel.ForEach across multiple IronTesseract instances scales with CPU cores — see the multithreading example.
Sensitive documents stay inside your infrastructure permanently. After migration, medical records, financial documents, legal contracts, and identity documents never leave the application server. Compliance reviews for HIPAA, GDPR, SOC 2, and internal data classification policies no longer need to include OCR.space's data handling practices in their scope. The audit surface shrinks to your own infrastructure. The Docker deployment guide and Azure deployment guide cover deploying IronOCR in containerized and cloud environments that require data residency compliance.
Structured output enables document intelligence applications. OCR.space's ParsedText string is the end of the road for document analysis. IronOCR's result.Pages, result.Words, and result.Lines with coordinates and per-word confidence scores enable applications to locate specific fields, validate extraction quality, extract table data, and build downstream document intelligence pipelines. Capabilities that required building custom layout analysis on top of OCR.space's plain-text output become direct API calls. The table extraction guide and scanned document processing guide demonstrate what this structured foundation enables.
Cost becomes fixed and predictable at any volume. OCR.space's free tier covers 25,000 requests per month. Above that, subscription costs scale with usage. IronOCR's $999 Lite perpetual license carries no per-document charge at any volume. A team processing 100,000 documents per month pays the same license fee as a team processing 1,000 documents per month. Budget forecasting for document-processing applications becomes a fixed annual cost rather than a variable line item that grows with business success. The IronOCR product page includes a free trial that lets teams validate accuracy on their specific document types before purchasing.
Frequently Asked Questions
Why should I migrate from OCR.space API to IronOCR?
Common drivers include eliminating COM interop complexity, replacing file-based license management, avoiding per-page billing, enabling Docker/container deployment, and adopting a NuGet-native workflow that integrates with standard .NET tooling.
What are the main code changes when migrating from OCR.space API to IronOCR?
Replace OCR.space initialization sequences with IronTesseract instantiation, remove COM lifecycle management (explicit Create/Load/Close patterns), and update result property names. The result is significantly fewer boilerplate lines.
How do I install IronOCR to begin the migration?
Run 'Install-Package IronOcr' in Package Manager Console or 'dotnet add package IronOcr' in the CLI. Language packs are separate packages: 'dotnet add package IronOcr.Languages.French' for French, for example.
Does IronOCR match the OCR accuracy of OCR.space API for standard business documents?
IronOCR achieves high accuracy for standard business content including invoices, contracts, receipts, and typed forms. Image preprocessing filters (deskew, noise removal, contrast enhancement) further improve recognition on degraded input.
How does IronOCR handle the language data that OCR.space API installs separately?
Language data in IronOCR is distributed as NuGet packages. 'dotnet add package IronOcr.Languages.German' installs German support. No manual file placement or directory paths are involved.
Does migrating from OCR.space API to IronOCR require changes to deployment infrastructure?
IronOCR requires fewer infrastructure changes than OCR.space API. There are no SDK binary paths, license file placements, or license server configurations. The NuGet package contains the complete OCR engine, and the license key is a string set in application code.
How do I configure IronOCR licensing after migration?
Assign IronOcr.License.LicenseKey = "YOUR-KEY" in application startup code. In Docker or Kubernetes, store the key as an environment variable and read it in startup. Use License.IsValidLicense to validate before accepting traffic.
Can IronOCR process PDFs the same way OCR.space does?
Yes. IronOCR reads both native and scanned PDFs. Instantiate IronTesseract, call ocr.Read(input) where input is a PDF path or OcrPdfInput, and iterate the OcrResult pages. No separate PDF rendering pipeline is required.
How does IronOCR handle threading in high-volume processing?
IronTesseract is safe to instantiate per-thread. Spin up one instance per thread in a Parallel.ForEach or Task pool, run OCR concurrently, and dispose each instance when done. No global state or locking is required.
What output formats does IronOCR support after text extraction?
IronOCR returns structured results including text, word coordinates, confidence scores, and page structure. Export options include plain text, searchable PDF, and structured result objects for downstream processing.
Is IronOCR pricing more predictable than OCR.space API for scaling workloads?
IronOCR uses flat-rate perpetual licensing with no per-page or volume charges. Whether you process 10,000 or 10 million pages, the license cost remains constant. Volume and team licensing options are on the IronOCR pricing page.
What happens to my existing tests after migrating from OCR.space API to IronOCR?
Tests that assert on extracted text content should continue to pass after migration. Tests that validate API call patterns or COM object lifecycle will need updating to reflect IronOCR's simpler initialization and result model.

