Skip to footer content
MIGRATION GUIDES

Migrating from Azure Computer Vision OCR to

This guide walks .NET developers through replacing Azure Computer Vision OCR with IronOCR, an on-premise OCR library that processes documents locally without cloud infrastructure. It covers the mechanical steps of swapping the NuGet package and namespaces, translating the Azure async polling model to synchronous local calls, and handling the specific patterns — dependency injection wiring, Form Recognizer polling loops, multi-page TIFF processing, and batch throughput — that require the most attention during migration.

Why Migrate from Azure Computer Vision OCR

The case for migration is not abstract. Teams typically reach the decision after encountering one or more concrete operational problems with Azure Computer Vision in production.

Endpoint and API key management never ends. Every deployment environment — development, staging, production, disaster recovery — requires a provisioned Azure Cognitive Services resource, an endpoint URL, and at least one API key. Keys must be rotated. Endpoints change when resources move regions. Every environment needs outbound firewall rules to reach cognitiveservices.azure.com. The operational surface area grows with each environment and developer on the team. IronOCR replaces all of that with a single string license key set once at application startup, with no rotation schedule and no outbound network requirement.

Per-page billing punishes multi-page documents. Azure Computer Vision counts every PDF page as a separate transaction. A 20-page contract is 20 billable calls. At $1.00 per 1,000 transactions, a team processing 50,000 multi-page documents per month at an average of 4 pages each generates 200,000 transactions — $195 per month after the free tier, $2,340 per year. That is the break-even point against IronOCR Lite ($999) in fewer than four months, after which every additional page costs nothing.

Async propagation spreads through the entire call stack. Azure Computer Vision cannot return synchronously — cloud I/O has network latency. The async/await requirement on AnalyzeAsync forces every calling method to be async, propagating the pattern from the service layer up through controllers, background workers, and any synchronous code that must be refactored to accommodate it. Form Recognizer's polling-based operations compound this: WaitUntil.Completed blocks the thread, and true non-blocking behavior requires managing UpdateStatusAsync polling loops manually.

Documents leave the network on every call. For teams processing HIPAA-covered health information, ITAR-controlled defense documents, attorney-client privileged communications, or any document category subject to data residency rules, the mandatory cloud transmission is an architectural incompatibility, not a tradeoff. There is no Azure Computer Vision mode that avoids transmitting document content to Microsoft data centers.

Rate limits create throughput ceilings. The S1 tier of Azure Computer Vision caps at 10 transactions per second. A batch job processing 3,600 images per hour hits the ceiling exactly. Exceeding it returns HTTP 429 responses, requiring retry logic with exponential backoff in every calling path. IronOCR's throughput ceiling is the hosting hardware — no service-imposed cap, no retry infrastructure required.

Image OCR and PDF OCR require two separate services. Standard image OCR uses ImageAnalysisClient from Azure.AI.Vision.ImageAnalysis. Full PDF processing requires DocumentAnalysisClient from Azure.AI.FormRecognizer.DocumentAnalysis — a different NuGet package, a different Azure resource, a different endpoint, and a different result schema. Every application that processes both images and PDFs carries this doubled configuration overhead. IronOCR handles both with IronTesseract.Read() and a single OcrInput loader.

The Fundamental Problem

// Azure: endpoint URL + API key + async + nested block traversal — before a single character
var client = new ImageAnalysisClient(new Uri(endpoint), new AzureKeyCredential(apiKey));
using var stream = File.OpenRead(imagePath);
var result = await client.AnalyzeAsync(BinaryData.FromStream(stream), VisualFeatures.Read);
var text = string.Join("\n", result.Value.Read.Blocks.SelectMany(b => b.Lines).Select(l => l.Text));

// IronOCR: no endpoint, no key rotation, no async, no traversal
var text = new IronTesseract().Read(imagePath).Text;
// Azure: endpoint URL + API key + async + nested block traversal — before a single character
var client = new ImageAnalysisClient(new Uri(endpoint), new AzureKeyCredential(apiKey));
using var stream = File.OpenRead(imagePath);
var result = await client.AnalyzeAsync(BinaryData.FromStream(stream), VisualFeatures.Read);
var text = string.Join("\n", result.Value.Read.Blocks.SelectMany(b => b.Lines).Select(l => l.Text));

// IronOCR: no endpoint, no key rotation, no async, no traversal
var text = new IronTesseract().Read(imagePath).Text;
Imports System
Imports System.IO
Imports Azure
Imports Azure.AI.Vision
Imports IronOcr

' Azure: endpoint URL + API key + async + nested block traversal — before a single character
Dim client As New ImageAnalysisClient(New Uri(endpoint), New AzureKeyCredential(apiKey))
Using stream As FileStream = File.OpenRead(imagePath)
    Dim result = Await client.AnalyzeAsync(BinaryData.FromStream(stream), VisualFeatures.Read)
    Dim text As String = String.Join(vbLf, result.Value.Read.Blocks.SelectMany(Function(b) b.Lines).Select(Function(l) l.Text))
End Using

' IronOCR: no endpoint, no key rotation, no async, no traversal
Dim text As String = New IronTesseract().Read(imagePath).Text
$vbLabelText   $csharpLabel

IronOCR vs Azure Computer Vision OCR: Feature Comparison

The table below covers the capabilities most relevant to teams evaluating this migration.

Feature Azure Computer Vision OCR IronOCR
Processing location Microsoft Azure cloud Local, on-premise
Internet required Yes, every request No
Azure subscription required Yes No
Pricing model Per-transaction ($1.00 per 1,000) Perpetual license (from $999)
Per-page billing on multi-page PDFs Yes — each page = 1 transaction No per-page cost
Free tier 5,000 transactions/month Trial mode (watermarked)
Image OCR API AnalyzeAsync (async only) Read() (synchronous)
PDF OCR Separate Form Recognizer service Built-in, same Read() call
Password-protected PDF Via Form Recognizer input.LoadPdf(path, Password: "x")
Searchable PDF output Manual construction result.SaveAsSearchablePdf()
Multi-page TIFF Not supported input.LoadImageFrames()
Automatic image preprocessing Opaque server-side, not configurable Deskew, DeNoise, Contrast, Binarize, Sharpen, Scale
Deep noise removal No input.DeepCleanBackgroundNoise()
Barcode reading during OCR Separate Image Analysis feature ocr.Configuration.ReadBarCodes = true
Region-based OCR Not directly (manual crop before upload) CropRectangle on OcrInput
Rate limits 10 TPS on S1 tier Hardware-bound only
Retry logic required Yes (HTTP 429, 5xx) No
Air-gapped deployment Impossible Fully supported
Languages supported 164+ (server-managed) 125+ (NuGet language packs)
Multi-language simultaneous Yes Yes (OcrLanguage.French + OcrLanguage.German)
Word bounding boxes Polygon (variable vertex count) Rectangle (x, y, width, height)
Confidence scoring Per-word float (0.0–1.0) Per-word and overall (0–100 scale)
hOCR export No result.SaveAsHocrFile()
Structured output hierarchy Blocks / Lines / Words Pages / Paragraphs / Lines / Words / Characters
.NET compatibility .NET Standard 2.0+ .NET Framework 4.6.2+, .NET Core, .NET 5–9
Cross-platform Windows, Linux, macOS (via cloud) Windows, Linux, macOS, Docker, ARM64
Commercial support Azure support plans IronOCR support included with license

Quick Start: Azure Computer Vision OCR to IronOCR Migration

Step 1: Replace NuGet Package

Remove the Azure Computer Vision package:

dotnet remove package Azure.AI.Vision.ImageAnalysis
dotnet remove package Azure.AI.Vision.ImageAnalysis
SHELL

If the project also uses Form Recognizer for PDF processing, remove that package too:

dotnet remove package Azure.AI.FormRecognizer
dotnet remove package Azure.AI.FormRecognizer
SHELL

Install IronOCR from NuGet:

dotnet add package IronOcr

Step 2: Update Namespaces

// Before (Azure Computer Vision)
using Azure;
using Azure.AI.Vision.ImageAnalysis;
// For PDF processing:
// using Azure.AI.FormRecognizer.DocumentAnalysis;

// After (IronOCR)
using IronOcr;
// Before (Azure Computer Vision)
using Azure;
using Azure.AI.Vision.ImageAnalysis;
// For PDF processing:
// using Azure.AI.FormRecognizer.DocumentAnalysis;

// After (IronOCR)
using IronOcr;
Imports IronOcr
' Before (Azure Computer Vision)
' Imports Azure
' Imports Azure.AI.Vision.ImageAnalysis
' For PDF processing:
' Imports Azure.AI.FormRecognizer.DocumentAnalysis

' After (IronOCR)
$vbLabelText   $csharpLabel

Step 3: Initialize License

Add the license key once at application startup, before any OCR call:

IronOcr.License.LicenseKey = "YOUR-LICENSE-KEY";
IronOcr.License.LicenseKey = "YOUR-LICENSE-KEY";
IronOcr.License.LicenseKey = "YOUR-LICENSE-KEY"
$vbLabelText   $csharpLabel

Store the key in an environment variable in production:

IronOcr.License.LicenseKey = Environment.GetEnvironmentVariable("IRONOCR_LICENSE");
IronOcr.License.LicenseKey = Environment.GetEnvironmentVariable("IRONOCR_LICENSE");
Imports System

IronOcr.License.LicenseKey = Environment.GetEnvironmentVariable("IRONOCR_LICENSE")
$vbLabelText   $csharpLabel

Code Migration Examples

Replacing Dependency-Injected Azure Client Configuration

Teams that follow the recommended Azure SDK pattern register ImageAnalysisClient in the DI container using IOptions<AzureComputerVisionOptions> or direct IConfiguration binding. This wiring pulls endpoint URLs and API keys from appsettings.json and requires outbound network configuration in every deployment environment.

Azure Computer Vision Approach:

// appsettings.json binds to this class
public class AzureComputerVisionOptions
{
    public string Endpoint { get; set; }   // "https://your-resource.cognitiveservices.azure.com/"
    public string ApiKey   { get; set; }   // rotated periodically
}

// Program.cs / Startup.cs
services.Configure<AzureComputerVisionOptions>(
    configuration.GetSection("AzureComputerVision"));

services.AddSingleton<ImageAnalysisClient>(sp =>
{
    var opts = sp.GetRequiredService<IOptions<AzureComputerVisionOptions>>().Value;
    return new ImageAnalysisClient(
        new Uri(opts.Endpoint),
        new AzureKeyCredential(opts.ApiKey));
});

services.AddScoped<IOcrService, AzureOcrService>();
// appsettings.json binds to this class
public class AzureComputerVisionOptions
{
    public string Endpoint { get; set; }   // "https://your-resource.cognitiveservices.azure.com/"
    public string ApiKey   { get; set; }   // rotated periodically
}

// Program.cs / Startup.cs
services.Configure<AzureComputerVisionOptions>(
    configuration.GetSection("AzureComputerVision"));

services.AddSingleton<ImageAnalysisClient>(sp =>
{
    var opts = sp.GetRequiredService<IOptions<AzureComputerVisionOptions>>().Value;
    return new ImageAnalysisClient(
        new Uri(opts.Endpoint),
        new AzureKeyCredential(opts.ApiKey));
});

services.AddScoped<IOcrService, AzureOcrService>();
' appsettings.json binds to this class
Public Class AzureComputerVisionOptions
    Public Property Endpoint As String   ' "https://your-resource.cognitiveservices.azure.com/"
    Public Property ApiKey As String     ' rotated periodically
End Class

' Program.vb / Startup.vb
services.Configure(Of AzureComputerVisionOptions)(
    configuration.GetSection("AzureComputerVision"))

services.AddSingleton(Of ImageAnalysisClient)(Function(sp)
    Dim opts = sp.GetRequiredService(Of IOptions(Of AzureComputerVisionOptions))().Value
    Return New ImageAnalysisClient(
        New Uri(opts.Endpoint),
        New AzureKeyCredential(opts.ApiKey))
End Function)

services.AddScoped(Of IOcrService, AzureOcrService)()
$vbLabelText   $csharpLabel
// AzureOcrService.cs
public class AzureOcrService : IOcrService
{
    private readonly ImageAnalysisClient _client;

    public AzureOcrService(ImageAnalysisClient client)
    {
        _client = client;
    }

    public async Task<string> ReadAsync(string imagePath)
    {
        using var stream = File.OpenRead(imagePath);
        var data = BinaryData.FromStream(stream);
        var result = await _client.AnalyzeAsync(data, VisualFeatures.Read);

        return string.Join("\n",
            result.Value.Read.Blocks
                .SelectMany(b => b.Lines)
                .Select(l => l.Text));
    }
}
// AzureOcrService.cs
public class AzureOcrService : IOcrService
{
    private readonly ImageAnalysisClient _client;

    public AzureOcrService(ImageAnalysisClient client)
    {
        _client = client;
    }

    public async Task<string> ReadAsync(string imagePath)
    {
        using var stream = File.OpenRead(imagePath);
        var data = BinaryData.FromStream(stream);
        var result = await _client.AnalyzeAsync(data, VisualFeatures.Read);

        return string.Join("\n",
            result.Value.Read.Blocks
                .SelectMany(b => b.Lines)
                .Select(l => l.Text));
    }
}
Imports System.IO
Imports System.Threading.Tasks

Public Class AzureOcrService
    Implements IOcrService

    Private ReadOnly _client As ImageAnalysisClient

    Public Sub New(client As ImageAnalysisClient)
        _client = client
    End Sub

    Public Async Function ReadAsync(imagePath As String) As Task(Of String) Implements IOcrService.ReadAsync
        Using stream = File.OpenRead(imagePath)
            Dim data = BinaryData.FromStream(stream)
            Dim result = Await _client.AnalyzeAsync(data, VisualFeatures.Read)

            Return String.Join(vbLf, result.Value.Read.Blocks _
                .SelectMany(Function(b) b.Lines) _
                .Select(Function(l) l.Text))
        End Using
    End Function
End Class
$vbLabelText   $csharpLabel

IronOCR Approach:

// Program.cs / Startup.cs
// One-time license key — no endpoint, no credential class, no options binding
IronOcr.License.LicenseKey = Environment.GetEnvironmentVariable("IRONOCR_LICENSE");

// Register IronTesseract as a singleton — it is thread-safe
services.AddSingleton<IronTesseract>();
services.AddScoped<IOcrService, IronOcrService>();
// Program.cs / Startup.cs
// One-time license key — no endpoint, no credential class, no options binding
IronOcr.License.LicenseKey = Environment.GetEnvironmentVariable("IRONOCR_LICENSE");

// Register IronTesseract as a singleton — it is thread-safe
services.AddSingleton<IronTesseract>();
services.AddScoped<IOcrService, IronOcrService>();
' Program.vb / Startup.vb
' One-time license key — no endpoint, no credential class, no options binding
IronOcr.License.LicenseKey = Environment.GetEnvironmentVariable("IRONOCR_LICENSE")

' Register IronTesseract as a singleton — it is thread-safe
services.AddSingleton(Of IronTesseract)()
services.AddScoped(Of IOcrService, IronOcrService)()
$vbLabelText   $csharpLabel
// IronOcrService.cs
public class IronOcrService : IOcrService
{
    private readonly IronTesseract _ocr;

    public IronOcrService(IronTesseract ocr)
    {
        _ocr = ocr;
    }

    public string Read(string imagePath)
    {
        return _ocr.Read(imagePath).Text;
    }
}
// IronOcrService.cs
public class IronOcrService : IOcrService
{
    private readonly IronTesseract _ocr;

    public IronOcrService(IronTesseract ocr)
    {
        _ocr = ocr;
    }

    public string Read(string imagePath)
    {
        return _ocr.Read(imagePath).Text;
    }
}
' IronOcrService.vb
Public Class IronOcrService
    Implements IOcrService

    Private ReadOnly _ocr As IronTesseract

    Public Sub New(ocr As IronTesseract)
        _ocr = ocr
    End Sub

    Public Function Read(imagePath As String) As String Implements IOcrService.Read
        Return _ocr.Read(imagePath).Text
    End Function
End Class
$vbLabelText   $csharpLabel

The DI wiring drops from two configuration classes (options + client factory) to a single AddSingleton<IronTesseract>() call. The appsettings.json Azure section, key vault references, and outbound firewall rules for cognitiveservices.azure.com are all removed. See the IronTesseract setup guide for configuration options available on the singleton instance.

Eliminating the Form Recognizer Polling Loop

Form Recognizer's AnalyzeDocumentAsync returns a LongRunningOperation. WaitUntil.Completed blocks the calling thread until the cloud job finishes — typically 2–10 seconds per document. For non-blocking behavior, teams write a UpdateStatusAsync polling loop with a delay between polls, adding 30–50 lines of infrastructure code that has no OCR logic in it.

Azure Computer Vision Approach:

// DocumentAnalysisClient — separate from ImageAnalysisClient, separate resource
public class FormRecognizerPdfService
{
    private readonly DocumentAnalysisClient _client;

    public FormRecognizerPdfService(string endpoint, string apiKey)
    {
        _client = new DocumentAnalysisClient(
            new Uri(endpoint),
            new AzureKeyCredential(apiKey));
    }

    // Blocking wait — thread is held for the duration of cloud processing
    public async Task<string> ExtractPdfTextBlocking(string pdfPath)
    {
        using var stream = File.OpenRead(pdfPath);

        var operation = await _client.AnalyzeDocumentAsync(
            WaitUntil.Completed,  // blocks until Azure finishes
            "prebuilt-read",
            stream);

        var docResult = operation.Value;
        var sb = new StringBuilder();
        foreach (var page in docResult.Pages)
        {
            foreach (var line in page.Lines)
            {
                sb.AppendLine(line.Content);  // .Content, not .Text
            }
        }
        return sb.ToString();
    }

    // True async — manual polling loop required
    public async Task<string> ExtractPdfTextNonBlocking(string pdfPath)
    {
        using var stream = File.OpenRead(pdfPath);

        var operation = await _client.AnalyzeDocumentAsync(
            WaitUntil.Started,  // returns immediately, not complete yet
            "prebuilt-read",
            stream);

        // Poll every 500ms until the operation finishes
        while (!operation.HasCompleted)
        {
            await Task.Delay(500);
            await operation.UpdateStatusAsync();
        }

        var docResult = operation.Value;
        var sb = new StringBuilder();
        foreach (var page in docResult.Pages)
        {
            foreach (var line in page.Lines)
            {
                sb.AppendLine(line.Content);
            }
        }
        return sb.ToString();
    }
}
// DocumentAnalysisClient — separate from ImageAnalysisClient, separate resource
public class FormRecognizerPdfService
{
    private readonly DocumentAnalysisClient _client;

    public FormRecognizerPdfService(string endpoint, string apiKey)
    {
        _client = new DocumentAnalysisClient(
            new Uri(endpoint),
            new AzureKeyCredential(apiKey));
    }

    // Blocking wait — thread is held for the duration of cloud processing
    public async Task<string> ExtractPdfTextBlocking(string pdfPath)
    {
        using var stream = File.OpenRead(pdfPath);

        var operation = await _client.AnalyzeDocumentAsync(
            WaitUntil.Completed,  // blocks until Azure finishes
            "prebuilt-read",
            stream);

        var docResult = operation.Value;
        var sb = new StringBuilder();
        foreach (var page in docResult.Pages)
        {
            foreach (var line in page.Lines)
            {
                sb.AppendLine(line.Content);  // .Content, not .Text
            }
        }
        return sb.ToString();
    }

    // True async — manual polling loop required
    public async Task<string> ExtractPdfTextNonBlocking(string pdfPath)
    {
        using var stream = File.OpenRead(pdfPath);

        var operation = await _client.AnalyzeDocumentAsync(
            WaitUntil.Started,  // returns immediately, not complete yet
            "prebuilt-read",
            stream);

        // Poll every 500ms until the operation finishes
        while (!operation.HasCompleted)
        {
            await Task.Delay(500);
            await operation.UpdateStatusAsync();
        }

        var docResult = operation.Value;
        var sb = new StringBuilder();
        foreach (var page in docResult.Pages)
        {
            foreach (var line in page.Lines)
            {
                sb.AppendLine(line.Content);
            }
        }
        return sb.ToString();
    }
}
Imports System
Imports System.IO
Imports System.Text
Imports System.Threading.Tasks
Imports Azure
Imports Azure.AI.FormRecognizer.DocumentAnalysis

' DocumentAnalysisClient — separate from ImageAnalysisClient, separate resource
Public Class FormRecognizerPdfService
    Private ReadOnly _client As DocumentAnalysisClient

    Public Sub New(endpoint As String, apiKey As String)
        _client = New DocumentAnalysisClient(
            New Uri(endpoint),
            New AzureKeyCredential(apiKey))
    End Sub

    ' Blocking wait — thread is held for the duration of cloud processing
    Public Async Function ExtractPdfTextBlocking(pdfPath As String) As Task(Of String)
        Using stream = File.OpenRead(pdfPath)
            Dim operation = Await _client.AnalyzeDocumentAsync(
                WaitUntil.Completed,  ' blocks until Azure finishes
                "prebuilt-read",
                stream)

            Dim docResult = operation.Value
            Dim sb = New StringBuilder()
            For Each page In docResult.Pages
                For Each line In page.Lines
                    sb.AppendLine(line.Content)  ' .Content, not .Text
                Next
            Next
            Return sb.ToString()
        End Using
    End Function

    ' True async — manual polling loop required
    Public Async Function ExtractPdfTextNonBlocking(pdfPath As String) As Task(Of String)
        Using stream = File.OpenRead(pdfPath)
            Dim operation = Await _client.AnalyzeDocumentAsync(
                WaitUntil.Started,  ' returns immediately, not complete yet
                "prebuilt-read",
                stream)

            ' Poll every 500ms until the operation finishes
            While Not operation.HasCompleted
                Await Task.Delay(500)
                Await operation.UpdateStatusAsync()
            End While

            Dim docResult = operation.Value
            Dim sb = New StringBuilder()
            For Each page In docResult.Pages
                For Each line In page.Lines
                    sb.AppendLine(line.Content)
                Next
            Next
            Return sb.ToString()
        End Using
    End Function
End Class
$vbLabelText   $csharpLabel

IronOCR Approach:

// One class handles both images and PDFs — no second client or second resource
public class IronOcrDocumentService
{
    private readonly IronTesseract _ocr;

    public IronOcrDocumentService(IronTesseract ocr)
    {
        _ocr = ocr;
    }

    // Synchronous — returns immediately when local processing completes
    public string ExtractPdfText(string pdfPath)
    {
        using var input = new OcrInput();
        input.LoadPdf(pdfPath);
        return _ocr.Read(input).Text;
    }

    // Specific page range — no per-page billing penalty
    public string ExtractPageRange(string pdfPath, int startPage, int endPage)
    {
        using var input = new OcrInput();
        input.LoadPdfPages(pdfPath, startPage, endPage);
        return _ocr.Read(input).Text;
    }

    // If an async signature is required by an interface or controller
    public Task<string> ExtractPdfTextAsync(string pdfPath)
    {
        return Task.Run(() => ExtractPdfText(pdfPath));
    }
}
// One class handles both images and PDFs — no second client or second resource
public class IronOcrDocumentService
{
    private readonly IronTesseract _ocr;

    public IronOcrDocumentService(IronTesseract ocr)
    {
        _ocr = ocr;
    }

    // Synchronous — returns immediately when local processing completes
    public string ExtractPdfText(string pdfPath)
    {
        using var input = new OcrInput();
        input.LoadPdf(pdfPath);
        return _ocr.Read(input).Text;
    }

    // Specific page range — no per-page billing penalty
    public string ExtractPageRange(string pdfPath, int startPage, int endPage)
    {
        using var input = new OcrInput();
        input.LoadPdfPages(pdfPath, startPage, endPage);
        return _ocr.Read(input).Text;
    }

    // If an async signature is required by an interface or controller
    public Task<string> ExtractPdfTextAsync(string pdfPath)
    {
        return Task.Run(() => ExtractPdfText(pdfPath));
    }
}
Imports System.Threading.Tasks

' One class handles both images and PDFs — no second client or second resource
Public Class IronOcrDocumentService
    Private ReadOnly _ocr As IronTesseract

    Public Sub New(ocr As IronTesseract)
        _ocr = ocr
    End Sub

    ' Synchronous — returns immediately when local processing completes
    Public Function ExtractPdfText(pdfPath As String) As String
        Using input As New OcrInput()
            input.LoadPdf(pdfPath)
            Return _ocr.Read(input).Text
        End Using
    End Function

    ' Specific page range — no per-page billing penalty
    Public Function ExtractPageRange(pdfPath As String, startPage As Integer, endPage As Integer) As String
        Using input As New OcrInput()
            input.LoadPdfPages(pdfPath, startPage, endPage)
            Return _ocr.Read(input).Text
        End Using
    End Function

    ' If an async signature is required by an interface or controller
    Public Function ExtractPdfTextAsync(pdfPath As String) As Task(Of String)
        Return Task.Run(Function() ExtractPdfText(pdfPath))
    End Function
End Class
$vbLabelText   $csharpLabel

The polling loop and its delay logic disappear entirely. LoadPdfPages handles page range selection — no separate per-page call, no transaction counting. The PDF input guide covers stream input, byte array loading, and page range parameters in full detail.

Mapping Azure Word-Level Results to IronOCR Structured Output

Azure Computer Vision returns word bounding boxes as polygons with a variable number of vertices — typically four, but not guaranteed. The result hierarchy is Blocks → Lines → Words, and confidence is a float on a 0.0–1.0 scale. Code that reads word positions must handle the polygon vertex array and normalize the confidence scale for any threshold comparisons.

Azure Computer Vision Approach:

public async Task<List<WordResult>> ExtractWordPositionsAsync(string imagePath)
{
    using var stream = File.OpenRead(imagePath);
    var imageData = BinaryData.FromStream(stream);

    var response = await _client.AnalyzeAsync(imageData, VisualFeatures.Read);
    var words = new List<WordResult>();

    foreach (var block in response.Value.Read.Blocks)
    {
        foreach (var line in block.Lines)
        {
            foreach (var word in line.Words)
            {
                // BoundingPolygon is a list of ImagePoint — variable vertex count
                var polygon = word.BoundingPolygon;
                int minX = polygon.Min(p => p.X);
                int minY = polygon.Min(p => p.Y);
                int maxX = polygon.Max(p => p.X);
                int maxY = polygon.Max(p => p.Y);

                words.Add(new WordResult
                {
                    Text        = word.Text,
                    // Azure confidence: 0.0 to 1.0 — multiply by 100 for comparison
                    Confidence  = (double)word.Confidence * 100.0,
                    X           = minX,
                    Y           = minY,
                    Width       = maxX - minX,
                    Height      = maxY - minY
                });
            }
        }
    }
    return words;
}

public record WordResult(string Text, double Confidence, int X, int Y, int Width, int Height);
public async Task<List<WordResult>> ExtractWordPositionsAsync(string imagePath)
{
    using var stream = File.OpenRead(imagePath);
    var imageData = BinaryData.FromStream(stream);

    var response = await _client.AnalyzeAsync(imageData, VisualFeatures.Read);
    var words = new List<WordResult>();

    foreach (var block in response.Value.Read.Blocks)
    {
        foreach (var line in block.Lines)
        {
            foreach (var word in line.Words)
            {
                // BoundingPolygon is a list of ImagePoint — variable vertex count
                var polygon = word.BoundingPolygon;
                int minX = polygon.Min(p => p.X);
                int minY = polygon.Min(p => p.Y);
                int maxX = polygon.Max(p => p.X);
                int maxY = polygon.Max(p => p.Y);

                words.Add(new WordResult
                {
                    Text        = word.Text,
                    // Azure confidence: 0.0 to 1.0 — multiply by 100 for comparison
                    Confidence  = (double)word.Confidence * 100.0,
                    X           = minX,
                    Y           = minY,
                    Width       = maxX - minX,
                    Height      = maxY - minY
                });
            }
        }
    }
    return words;
}

public record WordResult(string Text, double Confidence, int X, int Y, int Width, int Height);
Imports System.IO
Imports System.Threading.Tasks
Imports System.Linq

Public Class ImageAnalyzer
    Private _client As SomeClientType ' Replace with the actual type of _client

    Public Async Function ExtractWordPositionsAsync(imagePath As String) As Task(Of List(Of WordResult))
        Using stream = File.OpenRead(imagePath)
            Dim imageData = BinaryData.FromStream(stream)

            Dim response = Await _client.AnalyzeAsync(imageData, VisualFeatures.Read)
            Dim words = New List(Of WordResult)()

            For Each block In response.Value.Read.Blocks
                For Each line In block.Lines
                    For Each word In line.Words
                        ' BoundingPolygon is a list of ImagePoint — variable vertex count
                        Dim polygon = word.BoundingPolygon
                        Dim minX = polygon.Min(Function(p) p.X)
                        Dim minY = polygon.Min(Function(p) p.Y)
                        Dim maxX = polygon.Max(Function(p) p.X)
                        Dim maxY = polygon.Max(Function(p) p.Y)

                        words.Add(New WordResult With {
                            .Text = word.Text,
                            ' Azure confidence: 0.0 to 1.0 — multiply by 100 for comparison
                            .Confidence = CDbl(word.Confidence) * 100.0,
                            .X = minX,
                            .Y = minY,
                            .Width = maxX - minX,
                            .Height = maxY - minY
                        })
                    Next
                Next
            Next
            Return words
        End Using
    End Function
End Class

Public Class WordResult
    Public Property Text As String
    Public Property Confidence As Double
    Public Property X As Integer
    Public Property Y As Integer
    Public Property Width As Integer
    Public Property Height As Integer
End Class
$vbLabelText   $csharpLabel

IronOCR Approach:

public List<WordResult> ExtractWordPositions(string imagePath)
{
    var result = new IronTesseract().Read(imagePath);
    var words = new List<WordResult>();

    foreach (var page in result.Pages)
    {
        foreach (var line in page.Lines)
        {
            foreach (var word in line.Words)
            {
                // Rectangle-based bounding box — no polygon math required
                // Confidence is already 0–100, matching the converted Azure scale
                words.Add(new WordResult
                {
                    Text       = word.Text,
                    Confidence = word.Confidence,   // 0–100, no conversion needed
                    X          = word.X,
                    Y          = word.Y,
                    Width      = word.Width,
                    Height     = word.Height
                });
            }
        }
    }
    return words;
}

// Filter to only high-confidence words — common post-processing pattern
public IEnumerable<string> ExtractHighConfidenceWords(string imagePath, double threshold = 80.0)
{
    var result = new IronTesseract().Read(imagePath);
    return result.Words
        .Where(w => w.Confidence >= threshold)
        .Select(w => w.Text);
}

public record WordResult(string Text, double Confidence, int X, int Y, int Width, int Height);
public List<WordResult> ExtractWordPositions(string imagePath)
{
    var result = new IronTesseract().Read(imagePath);
    var words = new List<WordResult>();

    foreach (var page in result.Pages)
    {
        foreach (var line in page.Lines)
        {
            foreach (var word in line.Words)
            {
                // Rectangle-based bounding box — no polygon math required
                // Confidence is already 0–100, matching the converted Azure scale
                words.Add(new WordResult
                {
                    Text       = word.Text,
                    Confidence = word.Confidence,   // 0–100, no conversion needed
                    X          = word.X,
                    Y          = word.Y,
                    Width      = word.Width,
                    Height     = word.Height
                });
            }
        }
    }
    return words;
}

// Filter to only high-confidence words — common post-processing pattern
public IEnumerable<string> ExtractHighConfidenceWords(string imagePath, double threshold = 80.0)
{
    var result = new IronTesseract().Read(imagePath);
    return result.Words
        .Where(w => w.Confidence >= threshold)
        .Select(w => w.Text);
}

public record WordResult(string Text, double Confidence, int X, int Y, int Width, int Height);
Imports System.Collections.Generic
Imports System.Linq

Public Class WordExtractor

    Public Function ExtractWordPositions(imagePath As String) As List(Of WordResult)
        Dim result = New IronTesseract().Read(imagePath)
        Dim words = New List(Of WordResult)()

        For Each page In result.Pages
            For Each line In page.Lines
                For Each word In line.Words
                    ' Rectangle-based bounding box — no polygon math required
                    ' Confidence is already 0–100, matching the converted Azure scale
                    words.Add(New WordResult With {
                        .Text = word.Text,
                        .Confidence = word.Confidence,   ' 0–100, no conversion needed
                        .X = word.X,
                        .Y = word.Y,
                        .Width = word.Width,
                        .Height = word.Height
                    })
                Next
            Next
        Next

        Return words
    End Function

    ' Filter to only high-confidence words — common post-processing pattern
    Public Function ExtractHighConfidenceWords(imagePath As String, Optional threshold As Double = 80.0) As IEnumerable(Of String)
        Dim result = New IronTesseract().Read(imagePath)
        Return result.Words _
            .Where(Function(w) w.Confidence >= threshold) _
            .Select(Function(w) w.Text)
    End Function

End Class

Public Class WordResult
    Public Property Text As String
    Public Property Confidence As Double
    Public Property X As Integer
    Public Property Y As Integer
    Public Property Width As Integer
    Public Property Height As Integer
End Class
$vbLabelText   $csharpLabel

The polygon-to-rectangle conversion disappears. Confidence values match directly once the Azure 0.0–1.0 values are multiplied by 100 — any existing threshold logic needs that one adjustment. The structured data output guide documents the complete hierarchy and coordinate properties. For the confidence scoring model specifically, see the confidence scores guide.

Multi-Page TIFF Processing Without Cloud Upload

Azure Computer Vision's ImageAnalysisClient accepts single images. Multi-frame TIFF files — common in document scanning workflows, fax archives, and medical imaging pipelines — require either splitting the TIFF into individual images before upload (one transaction per frame) or switching to Form Recognizer with its separate configuration. Neither path is clean.

Azure Computer Vision Approach:

// Azure does not support multi-frame TIFF directly via ImageAnalysisClient
// Must split frames manually and upload each as a separate transaction
public async Task<string> ExtractMultiFrameTiffAsync(string tiffPath)
{
    // Load TIFF using System.Drawing or a third-party library
    using var bitmap = new System.Drawing.Bitmap(tiffPath);
    int frameCount = bitmap.GetFrameCount(
        System.Drawing.Imaging.FrameDimension.Page);

    var allText = new StringBuilder();

    for (int i = 0; i < frameCount; i++)
    {
        // Select frame, save to temporary PNG, upload to Azure
        bitmap.SelectActiveFrame(System.Drawing.Imaging.FrameDimension.Page, i);

        using var ms = new MemoryStream();
        bitmap.Save(ms, System.Drawing.Imaging.ImageFormat.Png);
        ms.Position = 0;

        // Each frame = 1 Azure transaction = $0.001
        var imageData = BinaryData.FromStream(ms);
        var result = await _client.AnalyzeAsync(imageData, VisualFeatures.Read);

        foreach (var block in result.Value.Read.Blocks)
            foreach (var line in block.Lines)
                allText.AppendLine(line.Text);
    }

    return allText.ToString();
}
// Azure does not support multi-frame TIFF directly via ImageAnalysisClient
// Must split frames manually and upload each as a separate transaction
public async Task<string> ExtractMultiFrameTiffAsync(string tiffPath)
{
    // Load TIFF using System.Drawing or a third-party library
    using var bitmap = new System.Drawing.Bitmap(tiffPath);
    int frameCount = bitmap.GetFrameCount(
        System.Drawing.Imaging.FrameDimension.Page);

    var allText = new StringBuilder();

    for (int i = 0; i < frameCount; i++)
    {
        // Select frame, save to temporary PNG, upload to Azure
        bitmap.SelectActiveFrame(System.Drawing.Imaging.FrameDimension.Page, i);

        using var ms = new MemoryStream();
        bitmap.Save(ms, System.Drawing.Imaging.ImageFormat.Png);
        ms.Position = 0;

        // Each frame = 1 Azure transaction = $0.001
        var imageData = BinaryData.FromStream(ms);
        var result = await _client.AnalyzeAsync(imageData, VisualFeatures.Read);

        foreach (var block in result.Value.Read.Blocks)
            foreach (var line in block.Lines)
                allText.AppendLine(line.Text);
    }

    return allText.ToString();
}
Imports System.Drawing
Imports System.Drawing.Imaging
Imports System.IO
Imports System.Text
Imports System.Threading.Tasks

Public Class TiffProcessor
    Private _client As ImageAnalysisClient

    Public Async Function ExtractMultiFrameTiffAsync(tiffPath As String) As Task(Of String)
        ' Load TIFF using System.Drawing or a third-party library
        Using bitmap As New Bitmap(tiffPath)
            Dim frameCount As Integer = bitmap.GetFrameCount(FrameDimension.Page)

            Dim allText As New StringBuilder()

            For i As Integer = 0 To frameCount - 1
                ' Select frame, save to temporary PNG, upload to Azure
                bitmap.SelectActiveFrame(FrameDimension.Page, i)

                Using ms As New MemoryStream()
                    bitmap.Save(ms, Imaging.ImageFormat.Png)
                    ms.Position = 0

                    ' Each frame = 1 Azure transaction = $0.001
                    Dim imageData As BinaryData = BinaryData.FromStream(ms)
                    Dim result = Await _client.AnalyzeAsync(imageData, VisualFeatures.Read)

                    For Each block In result.Value.Read.Blocks
                        For Each line In block.Lines
                            allText.AppendLine(line.Text)
                        Next
                    Next
                End Using
            Next

            Return allText.ToString()
        End Using
    End Function
End Class
$vbLabelText   $csharpLabel

IronOCR Approach:

// IronOCR handles multi-frame TIFF natively — single call, no frame splitting
public string ExtractMultiFrameTiff(string tiffPath)
{
    using var input = new OcrInput();
    input.LoadImageFrames(tiffPath);  // all frames loaded automatically
    var result = new IronTesseract().Read(input);
    return result.Text;
}

// Access per-page data for frame-level reporting
public void ExtractTiffWithPageStats(string tiffPath)
{
    using var input = new OcrInput();
    input.LoadImageFrames(tiffPath);

    var ocr = new IronTesseract();
    var result = ocr.Read(input);

    Console.WriteLine($"Total frames processed: {result.Pages.Length}");
    foreach (var page in result.Pages)
    {
        Console.WriteLine($"Frame {page.PageNumber}: " +
            $"{page.Words.Length} words, " +
            $"confidence {page.Confidence:F1}%");
    }
}

// Combine with preprocessing for scanned TIFF archives
public string ExtractLowQualityTiffArchive(string tiffPath)
{
    using var input = new OcrInput();
    input.LoadImageFrames(tiffPath);
    input.Deskew();
    input.DeNoise();
    input.Contrast();

    var result = new IronTesseract().Read(input);
    return result.Text;
}
// IronOCR handles multi-frame TIFF natively — single call, no frame splitting
public string ExtractMultiFrameTiff(string tiffPath)
{
    using var input = new OcrInput();
    input.LoadImageFrames(tiffPath);  // all frames loaded automatically
    var result = new IronTesseract().Read(input);
    return result.Text;
}

// Access per-page data for frame-level reporting
public void ExtractTiffWithPageStats(string tiffPath)
{
    using var input = new OcrInput();
    input.LoadImageFrames(tiffPath);

    var ocr = new IronTesseract();
    var result = ocr.Read(input);

    Console.WriteLine($"Total frames processed: {result.Pages.Length}");
    foreach (var page in result.Pages)
    {
        Console.WriteLine($"Frame {page.PageNumber}: " +
            $"{page.Words.Length} words, " +
            $"confidence {page.Confidence:F1}%");
    }
}

// Combine with preprocessing for scanned TIFF archives
public string ExtractLowQualityTiffArchive(string tiffPath)
{
    using var input = new OcrInput();
    input.LoadImageFrames(tiffPath);
    input.Deskew();
    input.DeNoise();
    input.Contrast();

    var result = new IronTesseract().Read(input);
    return result.Text;
}
Imports System

' IronOCR handles multi-frame TIFF natively — single call, no frame splitting
Public Function ExtractMultiFrameTiff(tiffPath As String) As String
    Using input As New OcrInput()
        input.LoadImageFrames(tiffPath)  ' all frames loaded automatically
        Dim result = New IronTesseract().Read(input)
        Return result.Text
    End Using
End Function

' Access per-page data for frame-level reporting
Public Sub ExtractTiffWithPageStats(tiffPath As String)
    Using input As New OcrInput()
        input.LoadImageFrames(tiffPath)

        Dim ocr As New IronTesseract()
        Dim result = ocr.Read(input)

        Console.WriteLine($"Total frames processed: {result.Pages.Length}")
        For Each page In result.Pages
            Console.WriteLine($"Frame {page.PageNumber}: " &
                              $"{page.Words.Length} words, " &
                              $"confidence {page.Confidence:F1}%")
        Next
    End Using
End Sub

' Combine with preprocessing for scanned TIFF archives
Public Function ExtractLowQualityTiffArchive(tiffPath As String) As String
    Using input As New OcrInput()
        input.LoadImageFrames(tiffPath)
        input.Deskew()
        input.DeNoise()
        input.Contrast()

        Dim result = New IronTesseract().Read(input)
        Return result.Text
    End Using
End Function
$vbLabelText   $csharpLabel

The per-frame transaction cost drops to zero. The System.Drawing frame extraction loop and its temporary PNG serialization step are eliminated entirely. For fax archive workflows where document quality varies, the TIFF and GIF input guide covers frame selection options, and the image quality correction guide documents the full preprocessing filter set.

Parallel Batch Processing Without Rate-Limit Queuing

Azure Computer Vision's S1 tier caps throughput at 10 transactions per second. Batch jobs that exceed this rate receive HTTP 429 responses. Production implementations require a rate-limiting wrapper, a semaphore, or a queuing layer to stay within the cap. The IronOCR API is thread-safe by design — create one IronTesseract instance per thread and run with Parallel.ForEach.

Azure Computer Vision Approach:

// Must throttle to 10 TPS to avoid 429 errors on S1 tier
public class ThrottledAzureBatchProcessor
{
    private readonly ImageAnalysisClient _client;
    // Semaphore limits concurrent Azure calls to stay under 10 TPS
    private readonly SemaphoreSlim _throttle = new SemaphoreSlim(10, 10);

    public async Task<Dictionary<string, string>> ProcessBatchAsync(
        IEnumerable<string> imagePaths)
    {
        var results = new ConcurrentDictionary<string, string>();
        var tasks = imagePaths.Select(async path =>
        {
            await _throttle.WaitAsync();
            try
            {
                using var stream = File.OpenRead(path);
                var data = BinaryData.FromStream(stream);
                var response = await _client.AnalyzeAsync(data, VisualFeatures.Read);

                var text = string.Join("\n",
                    response.Value.Read.Blocks
                        .SelectMany(b => b.Lines)
                        .Select(l => l.Text));

                results[path] = text;

                // Respect 1-second window for the 10 TPS ceiling
                await Task.Delay(100);
            }
            catch (RequestFailedException ex) when (ex.Status == 429)
            {
                // Rate limited despite throttling — back off and retry
                await Task.Delay(2000);
                // Re-queue or log failure — simplified here
                results[path] = string.Empty;
            }
            finally
            {
                _throttle.Release();
            }
        });

        await Task.WhenAll(tasks);
        return new Dictionary<string, string>(results);
    }
}
// Must throttle to 10 TPS to avoid 429 errors on S1 tier
public class ThrottledAzureBatchProcessor
{
    private readonly ImageAnalysisClient _client;
    // Semaphore limits concurrent Azure calls to stay under 10 TPS
    private readonly SemaphoreSlim _throttle = new SemaphoreSlim(10, 10);

    public async Task<Dictionary<string, string>> ProcessBatchAsync(
        IEnumerable<string> imagePaths)
    {
        var results = new ConcurrentDictionary<string, string>();
        var tasks = imagePaths.Select(async path =>
        {
            await _throttle.WaitAsync();
            try
            {
                using var stream = File.OpenRead(path);
                var data = BinaryData.FromStream(stream);
                var response = await _client.AnalyzeAsync(data, VisualFeatures.Read);

                var text = string.Join("\n",
                    response.Value.Read.Blocks
                        .SelectMany(b => b.Lines)
                        .Select(l => l.Text));

                results[path] = text;

                // Respect 1-second window for the 10 TPS ceiling
                await Task.Delay(100);
            }
            catch (RequestFailedException ex) when (ex.Status == 429)
            {
                // Rate limited despite throttling — back off and retry
                await Task.Delay(2000);
                // Re-queue or log failure — simplified here
                results[path] = string.Empty;
            }
            finally
            {
                _throttle.Release();
            }
        });

        await Task.WhenAll(tasks);
        return new Dictionary<string, string>(results);
    }
}
Imports System.Collections.Concurrent
Imports System.IO
Imports System.Threading
Imports System.Threading.Tasks

' Must throttle to 10 TPS to avoid 429 errors on S1 tier
Public Class ThrottledAzureBatchProcessor
    Private ReadOnly _client As ImageAnalysisClient
    ' Semaphore limits concurrent Azure calls to stay under 10 TPS
    Private ReadOnly _throttle As New SemaphoreSlim(10, 10)

    Public Async Function ProcessBatchAsync(imagePaths As IEnumerable(Of String)) As Task(Of Dictionary(Of String, String))
        Dim results As New ConcurrentDictionary(Of String, String)()
        Dim tasks = imagePaths.Select(Async Function(path)
                                          Await _throttle.WaitAsync()
                                          Try
                                              Using stream = File.OpenRead(path)
                                                  Dim data = BinaryData.FromStream(stream)
                                                  Dim response = Await _client.AnalyzeAsync(data, VisualFeatures.Read)

                                                  Dim text = String.Join(vbLf,
                                                                         response.Value.Read.Blocks _
                                                                         .SelectMany(Function(b) b.Lines) _
                                                                         .Select(Function(l) l.Text))

                                                  results(path) = text

                                                  ' Respect 1-second window for the 10 TPS ceiling
                                                  Await Task.Delay(100)
                                              End Using
                                          Catch ex As RequestFailedException When ex.Status = 429
                                              ' Rate limited despite throttling — back off and retry
                                              Await Task.Delay(2000)
                                              ' Re-queue or log failure — simplified here
                                              results(path) = String.Empty
                                          Finally
                                              _throttle.Release()
                                          End Try
                                      End Function)

        Await Task.WhenAll(tasks)
        Return New Dictionary(Of String, String)(results)
    End Function
End Class
$vbLabelText   $csharpLabel

IronOCR Approach:

// No rate limiting needed — throughput is hardware-bound
public Dictionary<string, string> ProcessBatch(IEnumerable<string> imagePaths)
{
    var results = new ConcurrentDictionary<string, string>();

    Parallel.ForEach(imagePaths, imagePath =>
    {
        // Each thread gets its own IronTesseract instance — fully thread-safe
        var ocr = new IronTesseract();
        var result = ocr.Read(imagePath);
        results[imagePath] = result.Text;
    });

    return new Dictionary<string, string>(results);
}

// With controlled parallelism for memory-constrained environments
public Dictionary<string, string> ProcessBatchControlled(
    IEnumerable<string> imagePaths, int maxDegreeOfParallelism = 4)
{
    var results = new ConcurrentDictionary<string, string>();
    var options = new ParallelOptions { MaxDegreeOfParallelism = maxDegreeOfParallelism };

    Parallel.ForEach(imagePaths, options, imagePath =>
    {
        var ocr = new IronTesseract();
        var result = ocr.Read(imagePath);
        results[imagePath] = result.Text;
    });

    return new Dictionary<string, string>(results);
}
// No rate limiting needed — throughput is hardware-bound
public Dictionary<string, string> ProcessBatch(IEnumerable<string> imagePaths)
{
    var results = new ConcurrentDictionary<string, string>();

    Parallel.ForEach(imagePaths, imagePath =>
    {
        // Each thread gets its own IronTesseract instance — fully thread-safe
        var ocr = new IronTesseract();
        var result = ocr.Read(imagePath);
        results[imagePath] = result.Text;
    });

    return new Dictionary<string, string>(results);
}

// With controlled parallelism for memory-constrained environments
public Dictionary<string, string> ProcessBatchControlled(
    IEnumerable<string> imagePaths, int maxDegreeOfParallelism = 4)
{
    var results = new ConcurrentDictionary<string, string>();
    var options = new ParallelOptions { MaxDegreeOfParallelism = maxDegreeOfParallelism };

    Parallel.ForEach(imagePaths, options, imagePath =>
    {
        var ocr = new IronTesseract();
        var result = ocr.Read(imagePath);
        results[imagePath] = result.Text;
    });

    return new Dictionary<string, string>(results);
}
Imports System.Collections.Concurrent
Imports System.Collections.Generic
Imports System.Threading.Tasks

' No rate limiting needed — throughput is hardware-bound
Public Function ProcessBatch(imagePaths As IEnumerable(Of String)) As Dictionary(Of String, String)
    Dim results = New ConcurrentDictionary(Of String, String)()

    Parallel.ForEach(imagePaths, Sub(imagePath)
                                     ' Each thread gets its own IronTesseract instance — fully thread-safe
                                     Dim ocr = New IronTesseract()
                                     Dim result = ocr.Read(imagePath)
                                     results(imagePath) = result.Text
                                 End Sub)

    Return New Dictionary(Of String, String)(results)
End Function

' With controlled parallelism for memory-constrained environments
Public Function ProcessBatchControlled(imagePaths As IEnumerable(Of String), Optional maxDegreeOfParallelism As Integer = 4) As Dictionary(Of String, String)
    Dim results = New ConcurrentDictionary(Of String, String)()
    Dim options = New ParallelOptions With {.MaxDegreeOfParallelism = maxDegreeOfParallelism}

    Parallel.ForEach(imagePaths, options, Sub(imagePath)
                                              Dim ocr = New IronTesseract()
                                              Dim result = ocr.Read(imagePath)
                                              results(imagePath) = result.Text
                                          End Sub)

    Return New Dictionary(Of String, String)(results)
End Function
$vbLabelText   $csharpLabel

The semaphore, the 100ms delay, and the HTTP 429 catch block are all removed. Parallelism is limited only by CPU cores and available memory, not by a service tier. The multithreading example shows the full pattern with timing comparisons, and the speed optimization guide covers engine configuration tuning for batch workloads.

Preprocessing Low-Quality Scans That Azure Rejects

Azure Computer Vision performs server-side image enhancement, but it is opaque and not configurable. Documents that are too skewed, too noisy, or too low-contrast return low-confidence results or empty text with no way to intervene. IronOCR exposes the preprocessing pipeline directly on OcrInput.

Azure Computer Vision Approach:

// No client-side preprocessing API — must preprocess externally before upload
public async Task<string> ExtractFromLowQualityScanAsync(string imagePath)
{
    // Option 1: Accept whatever Azure returns (may be empty or low-quality)
    using var stream = File.OpenRead(imagePath);
    var imageData = BinaryData.FromStream(stream);

    var result = await _client.AnalyzeAsync(imageData, VisualFeatures.Read);

    // No way to know if the server applied enhancement
    // No confidence on the overall result — only per-word
    var text = string.Join("\n",
        result.Value.Read.Blocks
            .SelectMany(b => b.Lines)
            .Select(l => l.Text));

    if (string.IsNullOrWhiteSpace(text))
    {
        // Option 2: Apply external preprocessing using System.Drawing, SkiaSharp,
        // or ImageMagick, re-serialize to stream, re-upload — second billable transaction
        throw new Exception("Azure returned empty result; manual preprocessing needed");
    }

    return text;
}
// No client-side preprocessing API — must preprocess externally before upload
public async Task<string> ExtractFromLowQualityScanAsync(string imagePath)
{
    // Option 1: Accept whatever Azure returns (may be empty or low-quality)
    using var stream = File.OpenRead(imagePath);
    var imageData = BinaryData.FromStream(stream);

    var result = await _client.AnalyzeAsync(imageData, VisualFeatures.Read);

    // No way to know if the server applied enhancement
    // No confidence on the overall result — only per-word
    var text = string.Join("\n",
        result.Value.Read.Blocks
            .SelectMany(b => b.Lines)
            .Select(l => l.Text));

    if (string.IsNullOrWhiteSpace(text))
    {
        // Option 2: Apply external preprocessing using System.Drawing, SkiaSharp,
        // or ImageMagick, re-serialize to stream, re-upload — second billable transaction
        throw new Exception("Azure returned empty result; manual preprocessing needed");
    }

    return text;
}
Imports System.IO
Imports System.Threading.Tasks

Public Class Example
    Public Async Function ExtractFromLowQualityScanAsync(imagePath As String) As Task(Of String)
        ' Option 1: Accept whatever Azure returns (may be empty or low-quality)
        Using stream = File.OpenRead(imagePath)
            Dim imageData = BinaryData.FromStream(stream)

            Dim result = Await _client.AnalyzeAsync(imageData, VisualFeatures.Read)

            ' No way to know if the server applied enhancement
            ' No confidence on the overall result — only per-word
            Dim text = String.Join(vbLf, result.Value.Read.Blocks _
                .SelectMany(Function(b) b.Lines) _
                .Select(Function(l) l.Text))

            If String.IsNullOrWhiteSpace(text) Then
                ' Option 2: Apply external preprocessing using System.Drawing, SkiaSharp,
                ' or ImageMagick, re-serialize to stream, re-upload — second billable transaction
                Throw New Exception("Azure returned empty result; manual preprocessing needed")
            End If

            Return text
        End Using
    End Function
End Class
$vbLabelText   $csharpLabel

IronOCR Approach:

// Preprocessing is part of the same call — no re-upload, no second transaction
public string ExtractFromLowQualityScan(string imagePath)
{
    using var input = new OcrInput();
    input.LoadImage(imagePath);

    // Correct common scanning defects before OCR
    input.Deskew();           // Fix rotated documents
    input.DeNoise();          // Remove scanner noise
    input.Contrast();         // Improve contrast for faded documents
    input.Binarize();         // Convert to black and white

    var ocr = new IronTesseract();
    var result = ocr.Read(input);

    Console.WriteLine($"Confidence after preprocessing: {result.Confidence:F1}%");
    return result.Text;
}

// For severely degraded documents
public string ExtractFromDegradedDocument(string imagePath)
{
    using var input = new OcrInput();
    input.LoadImage(imagePath);
    input.DeepCleanBackgroundNoise();  // Deep learning-based noise removal
    input.Deskew();
    input.Scale(150);                  // Upscale for better character resolution

    var result = new IronTesseract().Read(input);
    return result.Text;
}
// Preprocessing is part of the same call — no re-upload, no second transaction
public string ExtractFromLowQualityScan(string imagePath)
{
    using var input = new OcrInput();
    input.LoadImage(imagePath);

    // Correct common scanning defects before OCR
    input.Deskew();           // Fix rotated documents
    input.DeNoise();          // Remove scanner noise
    input.Contrast();         // Improve contrast for faded documents
    input.Binarize();         // Convert to black and white

    var ocr = new IronTesseract();
    var result = ocr.Read(input);

    Console.WriteLine($"Confidence after preprocessing: {result.Confidence:F1}%");
    return result.Text;
}

// For severely degraded documents
public string ExtractFromDegradedDocument(string imagePath)
{
    using var input = new OcrInput();
    input.LoadImage(imagePath);
    input.DeepCleanBackgroundNoise();  // Deep learning-based noise removal
    input.Deskew();
    input.Scale(150);                  // Upscale for better character resolution

    var result = new IronTesseract().Read(input);
    return result.Text;
}
Imports System

Public Class OcrProcessor
    ' Preprocessing is part of the same call — no re-upload, no second transaction
    Public Function ExtractFromLowQualityScan(imagePath As String) As String
        Using input As New OcrInput()
            input.LoadImage(imagePath)

            ' Correct common scanning defects before OCR
            input.Deskew()           ' Fix rotated documents
            input.DeNoise()          ' Remove scanner noise
            input.Contrast()         ' Improve contrast for faded documents
            input.Binarize()         ' Convert to black and white

            Dim ocr As New IronTesseract()
            Dim result = ocr.Read(input)

            Console.WriteLine($"Confidence after preprocessing: {result.Confidence:F1}%")
            Return result.Text
        End Using
    End Function

    ' For severely degraded documents
    Public Function ExtractFromDegradedDocument(imagePath As String) As String
        Using input As New OcrInput()
            input.LoadImage(imagePath)
            input.DeepCleanBackgroundNoise()  ' Deep learning-based noise removal
            input.Deskew()
            input.Scale(150)                  ' Upscale for better character resolution

            Dim result = New IronTesseract().Read(input)
            Return result.Text
        End Using
    End Function
End Class
$vbLabelText   $csharpLabel

The external preprocessing dependency — System.Drawing, SkiaSharp, or ImageMagick — is removed. The re-upload and second transaction cost disappear. The preprocessing pipeline is part of the OcrInput lifecycle, so it is applied before the OCR engine sees the image. The image filters tutorial walks through each filter with before/after accuracy comparisons.

Azure Computer Vision OCR API to IronOCR Mapping Reference

Azure Computer Vision IronOCR Equivalent
ImageAnalysisClient IronTesseract
new AzureKeyCredential(apiKey) IronOcr.License.LicenseKey = key
new Uri(endpoint) Not required
client.AnalyzeAsync(data, VisualFeatures.Read) ocr.Read(imagePath)
BinaryData.FromStream(stream) input.LoadImage(stream)
BinaryData.FromBytes(bytes) input.LoadImage(bytes)
result.Value.Read.Blocks result.Pages[i].Paragraphs
block.Lines result.Pages[i].Lines
line.Text line.Text
line.Words line.Words
word.Text word.Text
word.Confidence (0.0–1.0 float) word.Confidence (0–100 double)
word.BoundingPolygon word.X, word.Y, word.Width, word.Height
DocumentAnalysisClient IronTesseract + OcrInput
AnalyzeDocumentAsync(WaitUntil.Completed, "prebuilt-read", stream) ocr.Read(input) with input.LoadPdf(path)
operation.Value.Pages result.Pages
page.Lines[i].Content result.Lines[i].Text
UpdateStatusAsync() polling loop Not required — synchronous result
RequestFailedException (status 429) Not applicable — no rate limits
RequestFailedException (status 5xx) Not applicable — no service errors
VisualFeatures.Read enum flag Implicit — Read() always extracts text
Form Recognizer prebuilt-read model Built-in OCR engine (no model selection)
Azure endpoint URL in appsettings.json Not required
API key rotation procedures Not required

Common Migration Issues and Solutions

Issue 1: Async Interface Contracts That Cannot Change

Azure Computer Vision: Service interfaces often declare Task<string> return types because Azure mandates async. Calling code, controllers, and background workers are all written as async methods. Switching to IronOCR removes the need for async in the OCR layer, but changing every interface signature is not always feasible in a large codebase.

Solution: Wrap the synchronous IronOCR call in Task.Run to satisfy the existing interface without cascading refactors:

// Existing interface — do not change it
public interface IOcrService
{
    Task<string> ReadAsync(string imagePath);
}

// New IronOCR implementation — fulfills the contract
public class IronOcrService : IOcrService
{
    private readonly IronTesseract _ocr;

    public IronOcrService(IronTesseract ocr) => _ocr = ocr;

    public Task<string> ReadAsync(string imagePath)
    {
        // Task.Run offloads to thread pool — no await chain needed
        return Task.Run(() => _ocr.Read(imagePath).Text);
    }
}
// Existing interface — do not change it
public interface IOcrService
{
    Task<string> ReadAsync(string imagePath);
}

// New IronOCR implementation — fulfills the contract
public class IronOcrService : IOcrService
{
    private readonly IronTesseract _ocr;

    public IronOcrService(IronTesseract ocr) => _ocr = ocr;

    public Task<string> ReadAsync(string imagePath)
    {
        // Task.Run offloads to thread pool — no await chain needed
        return Task.Run(() => _ocr.Read(imagePath).Text);
    }
}
Imports System.Threading.Tasks

' Existing interface — do not change it
Public Interface IOcrService
    Function ReadAsync(imagePath As String) As Task(Of String)
End Interface

' New IronOCR implementation — fulfills the contract
Public Class IronOcrService
    Implements IOcrService

    Private ReadOnly _ocr As IronTesseract

    Public Sub New(ocr As IronTesseract)
        _ocr = ocr
    End Sub

    Public Function ReadAsync(imagePath As String) As Task(Of String) Implements IOcrService.ReadAsync
        ' Task.Run offloads to thread pool — no await chain needed
        Return Task.Run(Function() _ocr.Read(imagePath).Text)
    End Function
End Class
$vbLabelText   $csharpLabel

This is a valid intermediate step. The async OCR guide covers IronOCR's built-in async support for scenarios where full async integration is preferred.

Issue 2: Confidence Threshold Logic Producing Wrong Results

Azure Computer Vision: Azure returns word confidence as a float between 0.0 and 1.0. Existing filtering code uses thresholds like word.Confidence > 0.85f. After migration, these comparisons always evaluate to false because IronOCR confidence is 0–100, not 0–1.

Solution: Multiply existing Azure thresholds by 100 when updating filtering logic:

// Before: Azure threshold (0.0 - 1.0 scale)
var highConfidenceWords = azureWords
    .Where(w => w.Confidence > 0.85f)
    .Select(w => w.Text);

// After: IronOCR threshold (0 - 100 scale)
var result = new IronTesseract().Read(imagePath);
var highConfidenceWords = result.Words
    .Where(w => w.Confidence > 85.0)
    .Select(w => w.Text);

// Overall document confidence — also on 0-100 scale
if (result.Confidence < 70.0)
{
    // Document may need preprocessing or manual review
}
// Before: Azure threshold (0.0 - 1.0 scale)
var highConfidenceWords = azureWords
    .Where(w => w.Confidence > 0.85f)
    .Select(w => w.Text);

// After: IronOCR threshold (0 - 100 scale)
var result = new IronTesseract().Read(imagePath);
var highConfidenceWords = result.Words
    .Where(w => w.Confidence > 85.0)
    .Select(w => w.Text);

// Overall document confidence — also on 0-100 scale
if (result.Confidence < 70.0)
{
    // Document may need preprocessing or manual review
}
Imports System.Linq

' Before: Azure threshold (0.0 - 1.0 scale)
Dim highConfidenceWords = azureWords _
    .Where(Function(w) w.Confidence > 0.85F) _
    .Select(Function(w) w.Text)

' After: IronOCR threshold (0 - 100 scale)
Dim result = New IronTesseract().Read(imagePath)
Dim highConfidenceWords = result.Words _
    .Where(Function(w) w.Confidence > 85.0) _
    .Select(Function(w) w.Text)

' Overall document confidence — also on 0-100 scale
If result.Confidence < 70.0 Then
    ' Document may need preprocessing or manual review
End If
$vbLabelText   $csharpLabel

Issue 3: Form Recognizer Prebuilt Model Field Extraction Has No Direct IronOCR Equivalent

Azure Computer Vision: Form Recognizer's prebuilt invoice and receipt models extract named fields automatically — InvoiceTotal, VendorName, InvoiceDate — without specifying where those fields appear on the page. The extraction logic is embedded in the Azure model.

Solution: Replace model-based field extraction with region-based OCR using CropRectangle. This requires knowing the document layout, but most real-world deployments already have fixed templates:

var ocr = new IronTesseract();

// Define extraction zones for a known invoice template
var headerZone    = new CropRectangle(50,  40,  400, 60);
var totalZone     = new CropRectangle(350, 600, 250, 50);
var dateZone      = new CropRectangle(400, 100, 200, 40);

string header, total, date;

using (var input = new OcrInput())
{
    input.LoadImage("invoice.jpg", headerZone);
    header = ocr.Read(input).Text.Trim();
}

using (var input = new OcrInput())
{
    input.LoadImage("invoice.jpg", totalZone);
    total = ocr.Read(input).Text.Trim();
}

using (var input = new OcrInput())
{
    input.LoadImage("invoice.jpg", dateZone);
    date = ocr.Read(input).Text.Trim();
}
var ocr = new IronTesseract();

// Define extraction zones for a known invoice template
var headerZone    = new CropRectangle(50,  40,  400, 60);
var totalZone     = new CropRectangle(350, 600, 250, 50);
var dateZone      = new CropRectangle(400, 100, 200, 40);

string header, total, date;

using (var input = new OcrInput())
{
    input.LoadImage("invoice.jpg", headerZone);
    header = ocr.Read(input).Text.Trim();
}

using (var input = new OcrInput())
{
    input.LoadImage("invoice.jpg", totalZone);
    total = ocr.Read(input).Text.Trim();
}

using (var input = new OcrInput())
{
    input.LoadImage("invoice.jpg", dateZone);
    date = ocr.Read(input).Text.Trim();
}
Imports IronOcr

Dim ocr As New IronTesseract()

' Define extraction zones for a known invoice template
Dim headerZone As New CropRectangle(50, 40, 400, 60)
Dim totalZone As New CropRectangle(350, 600, 250, 50)
Dim dateZone As New CropRectangle(400, 100, 200, 40)

Dim header As String
Dim total As String
Dim date As String

Using input As New OcrInput()
    input.LoadImage("invoice.jpg", headerZone)
    header = ocr.Read(input).Text.Trim()
End Using

Using input As New OcrInput()
    input.LoadImage("invoice.jpg", totalZone)
    total = ocr.Read(input).Text.Trim()
End Using

Using input As New OcrInput()
    input.LoadImage("invoice.jpg", dateZone)
    date = ocr.Read(input).Text.Trim()
End Using
$vbLabelText   $csharpLabel

The region-based OCR guide covers coordinate system details and multi-region batching.

Issue 4: Missing hOCR and Structured Export

Azure Computer Vision: Azure does not provide hOCR output. Teams that need standardized layout data for downstream document analysis tools extract bounding boxes manually from the Azure response and construct their own output format.

Solution: IronOCR produces standards-compliant hOCR in one call:

var result = new IronTesseract().Read("document.jpg");

// Write hOCR file — recognized by most document analysis tools
result.SaveAsHocrFile("document.hocr");

// Searchable PDF — alternative for archive and search indexing workflows
result.SaveAsSearchablePdf("document-searchable.pdf");
var result = new IronTesseract().Read("document.jpg");

// Write hOCR file — recognized by most document analysis tools
result.SaveAsHocrFile("document.hocr");

// Searchable PDF — alternative for archive and search indexing workflows
result.SaveAsSearchablePdf("document-searchable.pdf");
Dim result = (New IronTesseract()).Read("document.jpg")

' Write hOCR file — recognized by most document analysis tools
result.SaveAsHocrFile("document.hocr")

' Searchable PDF — alternative for archive and search indexing workflows
result.SaveAsSearchablePdf("document-searchable.pdf")
$vbLabelText   $csharpLabel

Issue 5: Azure SDK Version Conflicts with Other Azure Packages

Azure Computer Vision: Projects that use multiple Azure SDK packages (Azure.Storage.Blobs, Azure.Identity, Azure.KeyVault.Secrets) can encounter version conflicts between Azure.Core transitive dependencies. The Azure SDK's monorepo versioning policy helps but does not eliminate all conflicts, particularly when mixing GA and preview SDK versions.

Solution: Removing Azure.AI.Vision.ImageAnalysis and Azure.AI.FormRecognizer eliminates those SDK packages from the dependency tree. If the project uses Azure only for OCR, the entire Azure.* dependency set is removed. If other Azure services remain, the reduced package count lowers the surface area for conflicts:

# Remove only the OCR-related Azure packages
dotnet remove package Azure.AI.Vision.ImageAnalysis
dotnet remove package Azure.AI.FormRecognizer

# Verify remaining Azure packages have no new conflicts
dotnet restore
dotnet build
# Remove only the OCR-related Azure packages
dotnet remove package Azure.AI.Vision.ImageAnalysis
dotnet remove package Azure.AI.FormRecognizer

# Verify remaining Azure packages have no new conflicts
dotnet restore
dotnet build
SHELL

Issue 6: Scanned Document Quality Below Azure Acceptance Threshold

Azure Computer Vision: Very low-resolution images (below ~150 DPI) or severely skewed scans return minimal or empty text from Azure's server-side pipeline with no feedback about what enhancement was attempted. The caller has no way to improve the result without external preprocessing.

Solution: Use IronOCR's preprocessing pipeline to prepare the image before OCR:

using var input = new OcrInput();
input.LoadImage("low-quality-scan.jpg");
input.Deskew();
input.DeNoise();
input.Scale(200);       // Upscale to improve character resolution
input.Contrast();
input.Sharpen();

var result = new IronTesseract().Read(input);
Console.WriteLine($"Extraction confidence: {result.Confidence:F1}%");
using var input = new OcrInput();
input.LoadImage("low-quality-scan.jpg");
input.Deskew();
input.DeNoise();
input.Scale(200);       // Upscale to improve character resolution
input.Contrast();
input.Sharpen();

var result = new IronTesseract().Read(input);
Console.WriteLine($"Extraction confidence: {result.Confidence:F1}%");
Imports IronOcr

Using input As New OcrInput()
    input.LoadImage("low-quality-scan.jpg")
    input.Deskew()
    input.DeNoise()
    input.Scale(200) ' Upscale to improve character resolution
    input.Contrast()
    input.Sharpen()

    Dim result = New IronTesseract().Read(input)
    Console.WriteLine($"Extraction confidence: {result.Confidence:F1}%")
End Using
$vbLabelText   $csharpLabel

The image orientation correction guide covers deskew and rotation detection specifically.

Azure Computer Vision OCR Migration Checklist

Pre-Migration

Locate all Azure Computer Vision and Form Recognizer usage in the codebase:

# Find all Azure OCR-related using statements
grep -r "Azure.AI.Vision.ImageAnalysis" --include="*.cs" .
grep -r "Azure.AI.FormRecognizer" --include="*.cs" .
grep -r "ImageAnalysisClient" --include="*.cs" .
grep -r "DocumentAnalysisClient" --include="*.cs" .
grep -r "AnalyzeAsync" --include="*.cs" .
grep -r "AnalyzeDocumentAsync" --include="*.cs" .
grep -r "VisualFeatures.Read" --include="*.cs" .
grep -r "WaitUntil.Completed" --include="*.cs" .
grep -r "UpdateStatusAsync" --include="*.cs" .
grep -r "AzureKeyCredential" --include="*.cs" .
# Find all Azure OCR-related using statements
grep -r "Azure.AI.Vision.ImageAnalysis" --include="*.cs" .
grep -r "Azure.AI.FormRecognizer" --include="*.cs" .
grep -r "ImageAnalysisClient" --include="*.cs" .
grep -r "DocumentAnalysisClient" --include="*.cs" .
grep -r "AnalyzeAsync" --include="*.cs" .
grep -r "AnalyzeDocumentAsync" --include="*.cs" .
grep -r "VisualFeatures.Read" --include="*.cs" .
grep -r "WaitUntil.Completed" --include="*.cs" .
grep -r "UpdateStatusAsync" --include="*.cs" .
grep -r "AzureKeyCredential" --include="*.cs" .
SHELL

Identify configuration files containing Azure OCR endpoints and keys:

grep -r "cognitiveservices.azure.com" --include="*.json" .
grep -r "AzureComputerVision\|FormRecognizer" --include="*.json" .
grep -r "ComputerVision\|FormRecognizer" --include="appsettings*.json" .
grep -r "cognitiveservices.azure.com" --include="*.json" .
grep -r "AzureComputerVision\|FormRecognizer" --include="*.json" .
grep -r "ComputerVision\|FormRecognizer" --include="appsettings*.json" .
SHELL

Inventory items before coding begins:

  • Count of classes implementing Azure OCR service patterns
  • Count of async method chains that propagate from OCR calls
  • Identify any word confidence thresholds using the 0.0–1.0 Azure scale
  • Identify Form Recognizer prebuilt model usage (invoice, receipt, identity) requiring region-based replacement
  • Identify multi-frame TIFF inputs currently split for per-frame upload
  • Check Docker and CI/CD configurations for outbound Azure network rules that will no longer be needed

Code Migration

  1. Remove Azure.AI.Vision.ImageAnalysis NuGet package from all projects that use it
  2. Remove Azure.AI.FormRecognizer NuGet package from all projects that use it
  3. Run dotnet add package IronOcr in each affected project
  4. Add IronOcr.License.LicenseKey = Environment.GetEnvironmentVariable("IRONOCR_LICENSE") at application startup
  5. Replace using Azure; using Azure.AI.Vision.ImageAnalysis; with using IronOcr;
  6. Register IronTesseract as a singleton in DI (services.AddSingleton<IronTesseract>())
  7. Remove AzureComputerVisionOptions or equivalent configuration classes; remove the appsettings.json Azure OCR sections
  8. Replace ImageAnalysisClient constructor injection with IronTesseract injection in all service classes
  9. Replace AnalyzeAsync(BinaryData.FromStream(stream), VisualFeatures.Read) calls with ocr.Read(imagePath) or ocr.Read(input) with OcrInput as appropriate
  10. Remove all UpdateStatusAsync polling loops and WaitUntil.Completed patterns; replace DocumentAnalysisClient with IronTesseract + OcrInput.LoadPdf()
  11. Update word confidence threshold comparisons: multiply all Azure 0.0–1.0 values by 100
  12. Replace polygon bounding box access (word.BoundingPolygon.Min/Max) with direct word.X, word.Y, word.Width, word.Height properties
  13. Replace multi-frame TIFF per-frame upload loops with input.LoadImageFrames(tiffPath)
  14. Convert Form Recognizer prebuilt model field extraction to CropRectangle-based region OCR for known document templates
  15. Remove RequestFailedException catch blocks for HTTP 429 and 5xx; simplify error handling to file system and input validation exceptions only

Post-Migration

  • Verify plain text extraction output matches or improves on Azure results for a representative sample of 20+ documents
  • Confirm multi-page PDF processing produces text for all pages without per-page billing counters in monitoring
  • Test multi-frame TIFF processing returns the same page count as the source frame count
  • Validate word confidence values fall in the 0–100 range and that threshold comparisons behave correctly
  • Verify word bounding box coordinates align correctly with the source image coordinate system
  • Run the batch processing path and confirm it completes without rate-limit errors or semaphore contention
  • Test in an environment with no outbound internet access to confirm no Azure endpoint calls occur
  • Confirm Docker and container deployments start successfully without outbound firewall rules for cognitiveservices.azure.com
  • Check that DI container construction succeeds and IronTesseract singleton is correctly resolved
  • Verify the IRONOCR_LICENSE environment variable is set in all deployment environments and that OCR produces licensed (non-watermarked) output

Key Benefits of Migrating to IronOCR

Cost becomes predictable. Azure Computer Vision's per-transaction meter runs continuously. A single month of high document volume can exceed the annual cost of an IronOCR license. After migration, the OCR budget is a fixed line item. Processing volume can grow — due to business expansion, bulk reprocessing, or temporary spikes — without generating a larger invoice. The IronOCR licensing page shows tier options from $999 for a single-developer license to $2,999 for unlimited developers.

The application stack becomes simpler. Removing Azure.AI.Vision.ImageAnalysis and Azure.AI.FormRecognizer eliminates two NuGet packages, two Azure resource configurations, two sets of credentials, and the entire async polling infrastructure. Service classes that previously required an async Task<string> signature because of cloud I/O become synchronous methods. Error handling narrows from network failures, rate limits, and service availability to file system and input validation. Every future developer working in this codebase encounters less code to understand.

Document data stays within the organization's control. After migration, OCR processing is a local operation. Documents do not cross an organizational boundary, do not appear in Azure telemetry, and are not subject to Microsoft's data retention or processing policies. HIPAA-covered entities, ITAR contractors, GDPR-regulated organizations, and any team with data residency requirements can process documents without compliance review of a cloud third party. The Linux deployment guide and Docker deployment guide show how to deploy IronOCR in restricted environments.

Batch throughput scales to hardware. The 10 TPS ceiling from Azure's S1 tier is a hard limit that requires queuing, throttling, or tier upgrades to work around. After migration, concurrent OCR jobs saturate available CPU cores without any service-imposed cap. A four-core server can run four parallel IronTesseract instances simultaneously. The multithreading example demonstrates the pattern and shows throughput scaling with core count.

Preprocessing defects are solvable in code. Azure Computer Vision's server-side image enhancement is a black box. When a scan returns empty or low-confidence output, the only options are to accept it or preprocess externally before re-uploading at additional cost. IronOCR's OcrInput pipeline exposes deskew, denoise, contrast, binarization, scale, sharpen, and deep noise removal as first-class methods. Problematic scan types become tunable parameters. The preprocessing features page lists the complete filter set with guidance on which filters address which scan defects.

Please noteAzure Computer Vision and Tesseract are registered trademarks of their respective owners. This site is not affiliated with, endorsed by, or sponsored by Google or Microsoft. All product names, logos, and brands are property of their respective owners. Comparisons are for informational purposes only and reflect publicly available information at the time of writing.

Frequently Asked Questions

Why should I migrate from Azure Computer Vision OCR to IronOCR?

Common drivers include eliminating COM interop complexity, replacing file-based license management, avoiding per-page billing, enabling Docker/container deployment, and adopting a NuGet-native workflow that integrates with standard .NET tooling.

What are the main code changes when migrating from Azure Computer Vision OCR to IronOCR?

Replace Azure Computer Vision initialization sequences with IronTesseract instantiation, remove COM lifecycle management (explicit Create/Load/Close patterns), and update result property names. The result is significantly fewer boilerplate lines.

How do I install IronOCR to begin the migration?

Run 'Install-Package IronOcr' in Package Manager Console or 'dotnet add package IronOcr' in the CLI. Language packs are separate packages: 'dotnet add package IronOcr.Languages.French' for French, for example.

Does IronOCR match the OCR accuracy of Azure Computer Vision OCR for standard business documents?

IronOCR achieves high accuracy for standard business content including invoices, contracts, receipts, and typed forms. Image preprocessing filters (deskew, noise removal, contrast enhancement) further improve recognition on degraded input.

How does IronOCR handle the language data that Azure Computer Vision OCR installs separately?

Language data in IronOCR is distributed as NuGet packages. 'dotnet add package IronOcr.Languages.German' installs German support. No manual file placement or directory paths are involved.

Does migrating from Azure Computer Vision OCR to IronOCR require changes to deployment infrastructure?

IronOCR requires fewer infrastructure changes than Azure Computer Vision OCR. There are no SDK binary paths, license file placements, or license server configurations. The NuGet package contains the complete OCR engine, and the license key is a string set in application code.

How do I configure IronOCR licensing after migration?

Assign IronOcr.License.LicenseKey = "YOUR-KEY" in application startup code. In Docker or Kubernetes, store the key as an environment variable and read it in startup. Use License.IsValidLicense to validate before accepting traffic.

Can IronOCR process PDFs the same way Azure Computer Vision does?

Yes. IronOCR reads both native and scanned PDFs. Instantiate IronTesseract, call ocr.Read(input) where input is a PDF path or OcrPdfInput, and iterate the OcrResult pages. No separate PDF rendering pipeline is required.

How does IronOCR handle threading in high-volume processing?

IronTesseract is safe to instantiate per-thread. Spin up one instance per thread in a Parallel.ForEach or Task pool, run OCR concurrently, and dispose each instance when done. No global state or locking is required.

What output formats does IronOCR support after text extraction?

IronOCR returns structured results including text, word coordinates, confidence scores, and page structure. Export options include plain text, searchable PDF, and structured result objects for downstream processing.

Is IronOCR pricing more predictable than Azure Computer Vision OCR for scaling workloads?

IronOCR uses flat-rate perpetual licensing with no per-page or volume charges. Whether you process 10,000 or 10 million pages, the license cost remains constant. Volume and team licensing options are on the IronOCR pricing page.

What happens to my existing tests after migrating from Azure Computer Vision OCR to IronOCR?

Tests that assert on extracted text content should continue to pass after migration. Tests that validate API call patterns or COM object lifecycle will need updating to reflect IronOCR's simpler initialization and result model.

Kannaopat Udonpant
Software Engineer
Before becoming a Software Engineer, Kannapat completed a Environmental Resources PhD from Hokkaido University in Japan. While pursuing his degree, Kannapat also became a member of the Vehicle Robotics Laboratory, which is part of the Department of Bioproduction Engineering. In 2022, he leveraged his C# skills to join Iron Software's engineering ...
Read More

Iron Support Team

We're online 24 hours, 5 days a week.
Chat
Email
Call Me