Migrating from Kofax OmniPage to IronOCR
This guide is for .NET developers who are moving OCR workloads off the Kofax OmniPage Capture SDK (now marketed as Tungsten Automation) and onto IronOCR. It covers the complete migration path: removing the SDK installer dependency, eliminating the engine lifecycle ceremony, replacing zone-based recognition patterns, and modernizing error handling. No prior reading of the comparison article is required.
Why Migrate from Kofax OmniPage (Tungsten)
The OmniPage Capture SDK was designed for enterprise document management infrastructure in an era before NuGet, Docker, and CI/CD pipelines. Each of its architectural choices that made sense in 2005 creates friction in 2026.
The SDK installer has no place in a NuGet world. Every development machine, build agent, and production server that runs OmniPage code requires the Tungsten SDK installer to execute before compilation. There is no .csproj package reference to restore. Upgrading the SDK version means re-running the installer across the entire fleet. A new developer cannot run the project without contacting whoever manages the SDK license and waiting for installer access.
The license file is an operational liability. A .lic file must be present at a specific path on every machine that calls engine.Initialize(). Deploy to a new server and forget the file, and every OCR call fails with a license exception — not an OCR error. Use floating licenses and a network partition between your application server and the license server means every Initialize() call fails until connectivity restores, regardless of whether that machine successfully validated its license yesterday.
Engine lifecycle management leaks resources on error paths. OmniPageEngine.Shutdown() must be called in every possible code path — including exception handlers, early returns, and timeouts — or a floating license seat stays locked until the license server's checkout timeout expires. Writing defensively around this constraint means wrapping every integration point in IDisposable patterns that exist solely to protect against the engine shutdown being skipped.
Hardware fingerprinting conflicts with modern deployment. OmniPage activation ties to hardware. Container restarts, VM migrations, and cloud autoscaling all involve hardware identity changes that trigger reactivation requirements. A deployment model incompatible with autoscaling is a deployment model incompatible with cloud infrastructure.
Per-page pricing is unpredictable at scale. A document processing spike — a new client onboarded, a backlog batch submitted — generates an overage invoice against a quota nobody budgeted for. IronOCR is perpetual and unlimited within the licensed tier: no per-page meter, no quota, no overage invoices.
The procurement process blocks shipping. The OmniPage SDK is sales-gated. Evaluation access, pricing, and contract terms all require a sales engagement that runs 4–12 weeks. Teams building OCR features under a product delivery deadline cannot wait for an enterprise procurement cycle. dotnet add package IronOcr resolves in 30 seconds. See the IronOCR licensing page for published, self-service pricing — $999 Lite through $2,999 Enterprise, all perpetual.
The Fundamental Problem
OmniPage requires a full initialization ceremony before the first byte is read, and a shutdown ceremony after the last — every call site must manage both:
// OmniPage: Ceremony required on EVERY entry point — init, process, shutdown
using var engine = new OmniPageEngine();
engine.SetLicenseFile(@"C:\Program Files\OmniPage\license.lic"); // File must exist here
engine.Initialize(); // Network call to license server — can fail for license reasons, not OCR reasons
var document = engine.CreateDocument();
document.AddPage("invoice.jpg");
document.Recognize(new RecognitionSettings { Language = "English" });
string text = document.GetText();
document.Dispose(); // Must not be skipped
engine.Shutdown(); // Must not be skipped — omitting this locks a floating seat
// OmniPage: Ceremony required on EVERY entry point — init, process, shutdown
using var engine = new OmniPageEngine();
engine.SetLicenseFile(@"C:\Program Files\OmniPage\license.lic"); // File must exist here
engine.Initialize(); // Network call to license server — can fail for license reasons, not OCR reasons
var document = engine.CreateDocument();
document.AddPage("invoice.jpg");
document.Recognize(new RecognitionSettings { Language = "English" });
string text = document.GetText();
document.Dispose(); // Must not be skipped
engine.Shutdown(); // Must not be skipped — omitting this locks a floating seat
Imports OmniPage
' OmniPage: Ceremony required on EVERY entry point — init, process, shutdown
Using engine As New OmniPageEngine()
engine.SetLicenseFile("C:\Program Files\OmniPage\license.lic") ' File must exist here
engine.Initialize() ' Network call to license server — can fail for license reasons, not OCR reasons
Dim document = engine.CreateDocument()
document.AddPage("invoice.jpg")
document.Recognize(New RecognitionSettings With {.Language = "English"})
Dim text As String = document.GetText()
document.Dispose() ' Must not be skipped
engine.Shutdown() ' Must not be skipped — omitting this locks a floating seat
End Using
// IronOCR: One NuGet package, one line at startup, one call to read
IronOcr.License.LicenseKey = "YOUR-LICENSE-KEY"; // At app startup — once, ever
var text = new IronTesseract().Read("invoice.jpg").Text;
// IronOCR: One NuGet package, one line at startup, one call to read
IronOcr.License.LicenseKey = "YOUR-LICENSE-KEY"; // At app startup — once, ever
var text = new IronTesseract().Read("invoice.jpg").Text;
Imports IronOcr
' IronOCR: One NuGet package, one line at startup, one call to read
IronOcr.License.LicenseKey = "YOUR-LICENSE-KEY" ' At app startup — once, ever
Dim text = New IronTesseract().Read("invoice.jpg").Text
The OmniPage version has eight distinct steps, two mandatory cleanup calls, a network dependency, and a file system dependency. The IronOCR version has two.
IronOCR vs Kofax OmniPage (Tungsten): Feature Comparison
The table below covers the capabilities most relevant to teams evaluating this migration.
| Feature | Kofax OmniPage SDK | IronOCR |
|---|---|---|
| Installation method | Custom SDK installer (no NuGet) | dotnet add package IronOcr |
| Time to first OCR call | 4–12 weeks (procurement) + hours (setup) | 30 seconds |
| License mechanism | .lic file on disk + optional license server |
String key at app startup |
| Hardware fingerprinting | Yes (reactivation on VM migration) | No |
| License server required | Yes (for floating licenses) | No |
| Per-page runtime billing | Available (variable) | No |
| Published pricing | No (contact sales) | Yes ($999 / $1,499 / $2,999 perpetual) |
| Annual maintenance fees | 18–25% of license cost | Optional |
| Engine lifecycle management | Required (Initialize / Shutdown) |
Not required |
| Windows x64 | Yes | Yes |
| Linux | Yes (added January 2026) | Yes (all versions) |
| macOS | No | Yes |
| Docker / Kubernetes | Difficult (installer + license file in image) | Straightforward (NuGet package only) |
| Azure / AWS Lambda | Complex (license server reachability) | Straightforward |
| Image input formats | Yes | JPG, PNG, BMP, TIFF, GIF, and more |
| Native PDF input | Yes (may require module license) | Yes (built-in, no extra license) |
| Multi-page TIFF | Yes | Yes |
| Searchable PDF output | Yes | Yes (result.SaveAsSearchablePdf()) |
| Automatic preprocessing | Settings-object configuration | Built-in + explicit pipeline API |
| Languages supported | 120+ | 125+ (separate NuGet packages) |
| Structured data output | Yes (word coordinates) | Pages, Paragraphs, Lines, Words, Characters with coordinates |
| Confidence scoring | Yes | Yes (result.Confidence, per-word) |
| Barcode reading | Add-on module (separate license) | Built-in (ocr.Configuration.ReadBarCodes = true) |
| Thread safety | Complex (engine sharing constraints) | Full (one IronTesseract per thread) |
| CI/CD pipeline support | Requires installer on every agent | Standard dotnet restore |
Quick Start: Kofax OmniPage to IronOCR Migration
Step 1: Replace the SDK with a NuGet Package
OmniPage has no NuGet package to remove. Remove the manual DLL references from the .csproj file and uninstall the SDK from development machines when the migration is complete. Then install IronOCR:
dotnet add package IronOcr
The IronOCR NuGet package bundles the OCR engine, preprocessing filters, and runtime components. No separate installer, no native DLL management, no component registration.
Step 2: Update Namespaces
// Before (Kofax OmniPage)
using Kofax.OmniPage.CSDK;
using Kofax.OmniPageCSDK;
using CSDK;
// After (IronOCR)
using IronOcr;
// Before (Kofax OmniPage)
using Kofax.OmniPage.CSDK;
using Kofax.OmniPageCSDK;
using CSDK;
// After (IronOCR)
using IronOcr;
Imports IronOcr
Step 3: Initialize License
Add one line at application startup. This replaces the .lic file path, the license server configuration, and the engine.Initialize() call:
IronOcr.License.LicenseKey = "YOUR-LICENSE-KEY";
IronOcr.License.LicenseKey = "YOUR-LICENSE-KEY";
IronOcr.License.LicenseKey = "YOUR-LICENSE-KEY"
Store the key in an environment variable (IRONOCR_LICENSE_KEY) or appsettings.json rather than source code. The key is read offline — no network call occurs at runtime.
Code Migration Examples
Engine Initialization Error Path Replacement
The most dangerous aspect of OmniPage's lifecycle model is not the happy path — it is the error path. Any exception thrown between engine.Initialize() and engine.Shutdown() can leave a floating license seat locked if Shutdown() is not reached. Production code requires try/finally blocks around every document processing call:
Kofax OmniPage Approach:
// OmniPage: try/finally required everywhere to protect license seat release
public string ProcessInvoice(string imagePath)
{
var engine = new OmniPageEngine();
engine.SetLicenseFile(@"C:\Program Files\OmniPage\license.lic");
try
{
engine.Initialize(); // Contacts license server — failure here locks nothing
var document = engine.CreateDocument();
try
{
document.AddPage(imagePath);
document.Recognize(new RecognitionSettings { Language = "English" });
return document.GetText();
}
catch (RecognitionException ex)
{
// Log, rethrow — but Shutdown must still be called
throw new OcrProcessingException("Recognition failed", ex);
}
finally
{
document.Dispose(); // Inner finally: release document
}
}
catch (LicenseValidationException ex)
{
throw new OcrProcessingException("License validation failed", ex);
}
finally
{
engine.Shutdown(); // Outer finally: release license seat — MUST execute
}
}
// OmniPage: try/finally required everywhere to protect license seat release
public string ProcessInvoice(string imagePath)
{
var engine = new OmniPageEngine();
engine.SetLicenseFile(@"C:\Program Files\OmniPage\license.lic");
try
{
engine.Initialize(); // Contacts license server — failure here locks nothing
var document = engine.CreateDocument();
try
{
document.AddPage(imagePath);
document.Recognize(new RecognitionSettings { Language = "English" });
return document.GetText();
}
catch (RecognitionException ex)
{
// Log, rethrow — but Shutdown must still be called
throw new OcrProcessingException("Recognition failed", ex);
}
finally
{
document.Dispose(); // Inner finally: release document
}
}
catch (LicenseValidationException ex)
{
throw new OcrProcessingException("License validation failed", ex);
}
finally
{
engine.Shutdown(); // Outer finally: release license seat — MUST execute
}
}
Imports System
' OmniPage: try/finally required everywhere to protect license seat release
Public Function ProcessInvoice(imagePath As String) As String
Dim engine As New OmniPageEngine()
engine.SetLicenseFile("C:\Program Files\OmniPage\license.lic")
Try
engine.Initialize() ' Contacts license server — failure here locks nothing
Dim document = engine.CreateDocument()
Try
document.AddPage(imagePath)
document.Recognize(New RecognitionSettings With {.Language = "English"})
Return document.GetText()
Catch ex As RecognitionException
' Log, rethrow — but Shutdown must still be called
Throw New OcrProcessingException("Recognition failed", ex)
Finally
document.Dispose() ' Inner finally: release document
End Try
Catch ex As LicenseValidationException
Throw New OcrProcessingException("License validation failed", ex)
Finally
engine.Shutdown() ' Outer finally: release license seat — MUST execute
End Try
End Function
IronOCR Approach:
// IronOCR: No license seat to release, no engine to shut down
public string ProcessInvoice(string imagePath)
{
using var input = new OcrInput();
input.LoadImage(imagePath);
var ocr = new IronTesseract();
var result = ocr.Read(input);
return result.Text;
// OcrInput disposal is automatic — no license implications on any code path
}
// IronOCR: No license seat to release, no engine to shut down
public string ProcessInvoice(string imagePath)
{
using var input = new OcrInput();
input.LoadImage(imagePath);
var ocr = new IronTesseract();
var result = ocr.Read(input);
return result.Text;
// OcrInput disposal is automatic — no license implications on any code path
}
Imports IronOcr
Public Function ProcessInvoice(imagePath As String) As String
Using input As New OcrInput()
input.LoadImage(imagePath)
Dim ocr As New IronTesseract()
Dim result = ocr.Read(input)
Return result.Text
End Using
' OcrInput disposal is automatic — no license implications on any code path
End Function
The using block on OcrInput handles memory cleanup for the loaded image data. No try/finally exists to protect a license seat. An exception anywhere in this method has no side effects outside the method's own scope. See the IronTesseract setup guide for configuration options including thread-local instances.
Zone-Based Recognition Migration
OmniPage's primary structured extraction mechanism is zone definition: document regions are declared with coordinate boundaries and recognition type per zone (OCR, ICR, OMR, barcode). Developers define FormZone objects per document template and pass them to the recognition engine. IronOCR uses CropRectangle to achieve the same targeted extraction without template management or zone type declarations:
Kofax OmniPage Approach:
// OmniPage: Zone-based form extraction with template management
public Dictionary<string, string> ExtractInvoiceFields(string invoicePath)
{
using var engine = new OmniPageEngine();
engine.SetLicenseFile(licensePath);
engine.Initialize();
// Define zones per document template
var zones = new List<FormZone>
{
new FormZone { Name = "InvoiceNumber", Type = "OCR",
X = 520, Y = 80, Width = 200, Height = 30 },
new FormZone { Name = "InvoiceDate", Type = "OCR",
X = 520, Y = 120, Width = 200, Height = 30 },
new FormZone { Name = "TotalAmount", Type = "OCR",
X = 520, Y = 580, Width = 200, Height = 30 },
new FormZone { Name = "VendorName", Type = "OCR",
X = 50, Y = 80, Width = 300, Height = 40 }
};
var template = new FormTemplate { Name = "StandardInvoice", Zones = zones };
var settings = new RecognitionSettings { Language = "English" };
var document = engine.CreateDocument();
document.AddPage(invoicePath);
document.ApplyTemplate(template);
document.Recognize(settings);
var fields = new Dictionary<string, string>();
foreach (var zone in zones)
fields[zone.Name] = document.GetZoneText(zone.Name);
document.Dispose();
engine.Shutdown();
return fields;
}
// OmniPage: Zone-based form extraction with template management
public Dictionary<string, string> ExtractInvoiceFields(string invoicePath)
{
using var engine = new OmniPageEngine();
engine.SetLicenseFile(licensePath);
engine.Initialize();
// Define zones per document template
var zones = new List<FormZone>
{
new FormZone { Name = "InvoiceNumber", Type = "OCR",
X = 520, Y = 80, Width = 200, Height = 30 },
new FormZone { Name = "InvoiceDate", Type = "OCR",
X = 520, Y = 120, Width = 200, Height = 30 },
new FormZone { Name = "TotalAmount", Type = "OCR",
X = 520, Y = 580, Width = 200, Height = 30 },
new FormZone { Name = "VendorName", Type = "OCR",
X = 50, Y = 80, Width = 300, Height = 40 }
};
var template = new FormTemplate { Name = "StandardInvoice", Zones = zones };
var settings = new RecognitionSettings { Language = "English" };
var document = engine.CreateDocument();
document.AddPage(invoicePath);
document.ApplyTemplate(template);
document.Recognize(settings);
var fields = new Dictionary<string, string>();
foreach (var zone in zones)
fields[zone.Name] = document.GetZoneText(zone.Name);
document.Dispose();
engine.Shutdown();
return fields;
}
Imports System
Imports System.Collections.Generic
' OmniPage: Zone-based form extraction with template management
Public Function ExtractInvoiceFields(invoicePath As String) As Dictionary(Of String, String)
Using engine As New OmniPageEngine()
engine.SetLicenseFile(licensePath)
engine.Initialize()
' Define zones per document template
Dim zones As New List(Of FormZone) From {
New FormZone With {.Name = "InvoiceNumber", .Type = "OCR", .X = 520, .Y = 80, .Width = 200, .Height = 30},
New FormZone With {.Name = "InvoiceDate", .Type = "OCR", .X = 520, .Y = 120, .Width = 200, .Height = 30},
New FormZone With {.Name = "TotalAmount", .Type = "OCR", .X = 520, .Y = 580, .Width = 200, .Height = 30},
New FormZone With {.Name = "VendorName", .Type = "OCR", .X = 50, .Y = 80, .Width = 300, .Height = 40}
}
Dim template As New FormTemplate With {.Name = "StandardInvoice", .Zones = zones}
Dim settings As New RecognitionSettings With {.Language = "English"}
Dim document = engine.CreateDocument()
document.AddPage(invoicePath)
document.ApplyTemplate(template)
document.Recognize(settings)
Dim fields As New Dictionary(Of String, String)()
For Each zone In zones
fields(zone.Name) = document.GetZoneText(zone.Name)
Next
document.Dispose()
engine.Shutdown()
Return fields
End Using
End Function
IronOCR Approach:
// IronOCR: CropRectangle targets specific regions — no template management
public Dictionary<string, string> ExtractInvoiceFields(string invoicePath)
{
var fields = new Dictionary<string, string>();
var ocr = new IronTesseract();
// Extract each field by reading only the target region
var fieldRegions = new Dictionary<string, CropRectangle>
{
["InvoiceNumber"] = new CropRectangle(520, 80, 200, 30),
["InvoiceDate"] = new CropRectangle(520, 120, 200, 30),
["TotalAmount"] = new CropRectangle(520, 580, 200, 30),
["VendorName"] = new CropRectangle(50, 80, 300, 40)
};
foreach (var (fieldName, region) in fieldRegions)
{
using var input = new OcrInput();
input.LoadImage(invoicePath, region);
fields[fieldName] = ocr.Read(input).Text.Trim();
}
return fields;
}
// IronOCR: CropRectangle targets specific regions — no template management
public Dictionary<string, string> ExtractInvoiceFields(string invoicePath)
{
var fields = new Dictionary<string, string>();
var ocr = new IronTesseract();
// Extract each field by reading only the target region
var fieldRegions = new Dictionary<string, CropRectangle>
{
["InvoiceNumber"] = new CropRectangle(520, 80, 200, 30),
["InvoiceDate"] = new CropRectangle(520, 120, 200, 30),
["TotalAmount"] = new CropRectangle(520, 580, 200, 30),
["VendorName"] = new CropRectangle(50, 80, 300, 40)
};
foreach (var (fieldName, region) in fieldRegions)
{
using var input = new OcrInput();
input.LoadImage(invoicePath, region);
fields[fieldName] = ocr.Read(input).Text.Trim();
}
return fields;
}
Imports System.Collections.Generic
' IronOCR: CropRectangle targets specific regions — no template management
Public Function ExtractInvoiceFields(invoicePath As String) As Dictionary(Of String, String)
Dim fields As New Dictionary(Of String, String)()
Dim ocr As New IronTesseract()
' Extract each field by reading only the target region
Dim fieldRegions As New Dictionary(Of String, CropRectangle) From {
{"InvoiceNumber", New CropRectangle(520, 80, 200, 30)},
{"InvoiceDate", New CropRectangle(520, 120, 200, 30)},
{"TotalAmount", New CropRectangle(520, 580, 200, 30)},
{"VendorName", New CropRectangle(50, 80, 300, 40)}
}
For Each kvp In fieldRegions
Dim fieldName = kvp.Key
Dim region = kvp.Value
Using input As New OcrInput()
input.LoadImage(invoicePath, region)
fields(fieldName) = ocr.Read(input).Text.Trim()
End Using
Next
Return fields
End Function
Each CropRectangle specifies (x, y, width, height) in pixels. The OCR engine processes only the nominated region, not the full page — the same efficiency benefit that zone-based processing provides in OmniPage. The region-based OCR guide covers coordinate selection patterns, including how to extract multiple regions from a single image load using OcrInput's multi-region API.
Structured Data Output Migration
OmniPage exposes document structure through a proprietary export pipeline: format settings are applied to a recognized document, and output is written to a file in a specified format (RTF, XML, CSV, searchable PDF). Accessing word-level coordinates requires iterating a result iterator with explicit position queries. IronOCR exposes the same structured data directly on the result object without a secondary export step:
Kofax OmniPage Approach:
// OmniPage: Structured output requires format configuration and file export
public void ExtractStructuredContent(string imagePath, string outputDir)
{
using var engine = new OmniPageEngine();
engine.SetLicenseFile(licensePath);
engine.Initialize();
var settings = new RecognitionSettings { Language = "English" };
var document = engine.CreateDocument();
document.AddPage(imagePath);
document.Recognize(settings);
// Word-level coordinates through result iterator
var iterator = document.GetResultIterator(ResultIteratorLevel.Word);
while (iterator.MoveNext())
{
string word = iterator.GetText();
var bounds = iterator.GetBoundingBox();
double confidence = iterator.GetConfidence();
Console.WriteLine($"Word: {word} at ({bounds.X},{bounds.Y}) conf:{confidence:F1}%");
}
// Structured export requires separate output format configuration
var outputSettings = new OutputSettings
{
Format = OutputFormat.XML,
IncludeCoordinates = true,
IncludeConfidence = true
};
document.SaveAs(Path.Combine(outputDir, "output.xml"), outputSettings);
document.Dispose();
engine.Shutdown();
}
// OmniPage: Structured output requires format configuration and file export
public void ExtractStructuredContent(string imagePath, string outputDir)
{
using var engine = new OmniPageEngine();
engine.SetLicenseFile(licensePath);
engine.Initialize();
var settings = new RecognitionSettings { Language = "English" };
var document = engine.CreateDocument();
document.AddPage(imagePath);
document.Recognize(settings);
// Word-level coordinates through result iterator
var iterator = document.GetResultIterator(ResultIteratorLevel.Word);
while (iterator.MoveNext())
{
string word = iterator.GetText();
var bounds = iterator.GetBoundingBox();
double confidence = iterator.GetConfidence();
Console.WriteLine($"Word: {word} at ({bounds.X},{bounds.Y}) conf:{confidence:F1}%");
}
// Structured export requires separate output format configuration
var outputSettings = new OutputSettings
{
Format = OutputFormat.XML,
IncludeCoordinates = true,
IncludeConfidence = true
};
document.SaveAs(Path.Combine(outputDir, "output.xml"), outputSettings);
document.Dispose();
engine.Shutdown();
}
Imports System
Imports System.IO
Public Sub ExtractStructuredContent(imagePath As String, outputDir As String)
Using engine As New OmniPageEngine()
engine.SetLicenseFile(licensePath)
engine.Initialize()
Dim settings As New RecognitionSettings With {.Language = "English"}
Dim document = engine.CreateDocument()
document.AddPage(imagePath)
document.Recognize(settings)
' Word-level coordinates through result iterator
Dim iterator = document.GetResultIterator(ResultIteratorLevel.Word)
While iterator.MoveNext()
Dim word As String = iterator.GetText()
Dim bounds = iterator.GetBoundingBox()
Dim confidence As Double = iterator.GetConfidence()
Console.WriteLine($"Word: {word} at ({bounds.X},{bounds.Y}) conf:{confidence:F1}%")
End While
' Structured export requires separate output format configuration
Dim outputSettings As New OutputSettings With {
.Format = OutputFormat.XML,
.IncludeCoordinates = True,
.IncludeConfidence = True
}
document.SaveAs(Path.Combine(outputDir, "output.xml"), outputSettings)
document.Dispose()
engine.Shutdown()
End Using
End Sub
IronOCR Approach:
// IronOCR: Structured data directly on OcrResult — no export step
public void ExtractStructuredContent(string imagePath)
{
var result = new IronTesseract().Read(imagePath);
Console.WriteLine($"Confidence: {result.Confidence:F1}%");
Console.WriteLine($"Pages: {result.Pages.Length}");
foreach (var page in result.Pages)
{
foreach (var paragraph in page.Paragraphs)
{
Console.WriteLine($"[Paragraph at ({paragraph.X},{paragraph.Y})]");
Console.WriteLine(paragraph.Text);
foreach (var word in paragraph.Words)
{
// Per-word confidence and coordinates — no iterator needed
Console.WriteLine(
$" Word: '{word.Text}' " +
$"at ({word.X},{word.Y}) " +
$"conf: {word.Confidence:F1}%");
}
}
}
}
// IronOCR: Structured data directly on OcrResult — no export step
public void ExtractStructuredContent(string imagePath)
{
var result = new IronTesseract().Read(imagePath);
Console.WriteLine($"Confidence: {result.Confidence:F1}%");
Console.WriteLine($"Pages: {result.Pages.Length}");
foreach (var page in result.Pages)
{
foreach (var paragraph in page.Paragraphs)
{
Console.WriteLine($"[Paragraph at ({paragraph.X},{paragraph.Y})]");
Console.WriteLine(paragraph.Text);
foreach (var word in paragraph.Words)
{
// Per-word confidence and coordinates — no iterator needed
Console.WriteLine(
$" Word: '{word.Text}' " +
$"at ({word.X},{word.Y}) " +
$"conf: {word.Confidence:F1}%");
}
}
}
}
Public Sub ExtractStructuredContent(imagePath As String)
Dim result = New IronTesseract().Read(imagePath)
Console.WriteLine($"Confidence: {result.Confidence:F1}%")
Console.WriteLine($"Pages: {result.Pages.Length}")
For Each page In result.Pages
For Each paragraph In page.Paragraphs
Console.WriteLine($"[Paragraph at ({paragraph.X},{paragraph.Y})]")
Console.WriteLine(paragraph.Text)
For Each word In paragraph.Words
' Per-word confidence and coordinates — no iterator needed
Console.WriteLine(
$" Word: '{word.Text}' " &
$"at ({word.X},{word.Y}) " &
$"conf: {word.Confidence:F1}%")
Next
Next
Next
End Sub
The OcrResult object's hierarchy — Pages, Paragraphs, Lines, Words, Characters — provides the same positional data OmniPage's result iterator exposes, but as a navigable object graph rather than a sequential cursor. Confidence scores are available at the result, page, paragraph, and word level without additional configuration. The read results guide covers all structured data properties, and the confidence scores guide covers per-word validation patterns.
Multi-Page TIFF Processing Migration
OmniPage's multi-page TIFF workflow requires page-by-page extraction from the TIFF container, with a separate recognition call and document disposal per page. This creates both boilerplate and a resource management surface area proportional to the page count. IronOCR loads multi-frame TIFF files in a single call:
Kofax OmniPage Approach:
// OmniPage: Page-by-page TIFF extraction with per-page lifecycle
public List<string> ExtractFromMultiPageTiff(string tiffPath)
{
var pageTexts = new List<string>();
using var engine = new OmniPageEngine();
engine.SetLicenseFile(licensePath);
engine.Initialize();
// Open TIFF and iterate frames
var tiffContainer = engine.OpenTiff(tiffPath);
int pageCount = tiffContainer.GetPageCount();
var settings = new RecognitionSettings { Language = "English" };
for (int i = 0; i < pageCount; i++)
{
var page = tiffContainer.GetPage(i); // Extract frame
var document = engine.CreateDocument();
try
{
document.AddPage(page);
document.Recognize(settings);
pageTexts.Add(document.GetText());
}
finally
{
document.Dispose(); // Dispose per page to manage memory
}
}
tiffContainer.Dispose();
engine.Shutdown();
return pageTexts;
}
// OmniPage: Page-by-page TIFF extraction with per-page lifecycle
public List<string> ExtractFromMultiPageTiff(string tiffPath)
{
var pageTexts = new List<string>();
using var engine = new OmniPageEngine();
engine.SetLicenseFile(licensePath);
engine.Initialize();
// Open TIFF and iterate frames
var tiffContainer = engine.OpenTiff(tiffPath);
int pageCount = tiffContainer.GetPageCount();
var settings = new RecognitionSettings { Language = "English" };
for (int i = 0; i < pageCount; i++)
{
var page = tiffContainer.GetPage(i); // Extract frame
var document = engine.CreateDocument();
try
{
document.AddPage(page);
document.Recognize(settings);
pageTexts.Add(document.GetText());
}
finally
{
document.Dispose(); // Dispose per page to manage memory
}
}
tiffContainer.Dispose();
engine.Shutdown();
return pageTexts;
}
Imports System.Collections.Generic
' OmniPage: Page-by-page TIFF extraction with per-page lifecycle
Public Function ExtractFromMultiPageTiff(tiffPath As String) As List(Of String)
Dim pageTexts As New List(Of String)()
Using engine As New OmniPageEngine()
engine.SetLicenseFile(licensePath)
engine.Initialize()
' Open TIFF and iterate frames
Dim tiffContainer = engine.OpenTiff(tiffPath)
Dim pageCount As Integer = tiffContainer.GetPageCount()
Dim settings As New RecognitionSettings With {.Language = "English"}
For i As Integer = 0 To pageCount - 1
Dim page = tiffContainer.GetPage(i) ' Extract frame
Dim document = engine.CreateDocument()
Try
document.AddPage(page)
document.Recognize(settings)
pageTexts.Add(document.GetText())
Finally
document.Dispose() ' Dispose per page to manage memory
End Try
Next
tiffContainer.Dispose()
engine.Shutdown()
End Using
Return pageTexts
End Function
IronOCR Approach:
// IronOCR: Multi-frame TIFF loaded in one call — all pages in one result
public List<string> ExtractFromMultiPageTiff(string tiffPath)
{
using var input = new OcrInput();
input.LoadImageFrames(tiffPath); // All frames loaded automatically
var result = new IronTesseract().Read(input);
// Each TIFF frame becomes a page in the result
return result.Pages.Select(page => page.Text).ToList();
}
// IronOCR: Multi-frame TIFF loaded in one call — all pages in one result
public List<string> ExtractFromMultiPageTiff(string tiffPath)
{
using var input = new OcrInput();
input.LoadImageFrames(tiffPath); // All frames loaded automatically
var result = new IronTesseract().Read(input);
// Each TIFF frame becomes a page in the result
return result.Pages.Select(page => page.Text).ToList();
}
Imports System.Collections.Generic
Imports IronOcr
' IronOCR: Multi-frame TIFF loaded in one call — all pages in one result
Public Function ExtractFromMultiPageTiff(tiffPath As String) As List(Of String)
Using input As New OcrInput()
input.LoadImageFrames(tiffPath) ' All frames loaded automatically
Dim result = New IronTesseract().Read(input)
' Each TIFF frame becomes a page in the result
Return result.Pages.Select(Function(page) page.Text).ToList()
End Using
End Function
LoadImageFrames reads every frame from the TIFF container in a single call. The result exposes one OcrResult.Page per frame, preserving the page-by-page structure without requiring a manual iteration loop or per-page cleanup. For production TIFF batch workflows, see the TIFF and GIF input guide.
Thread-Safe Parallel Processing Migration
OmniPage's engine is a shared, stateful resource. Sharing one OmniPageEngine instance across threads requires external synchronization because multiple threads calling CreateDocument() concurrently can produce interleaved state. The safe pattern — one engine instance per thread — conflicts with the engine's heavyweight initialization cost. IronOCR instances carry no shared state: create one IronTesseract per thread with zero coordination overhead:
Kofax OmniPage Approach:
// OmniPage: Shared engine with locking to serialize document operations
public ConcurrentDictionary<string, string> ProcessBatchWithEngine(
string[] imagePaths, string licensePath)
{
var results = new ConcurrentDictionary<string, string>();
var engineLock = new object();
// One engine — document operations must be serialized
using var engine = new OmniPageEngine();
engine.SetLicenseFile(licensePath);
engine.Initialize();
var settings = new RecognitionSettings { Language = "English" };
Parallel.ForEach(imagePaths, imagePath =>
{
lock (engineLock) // Serialize: one recognition at a time
{
var document = engine.CreateDocument();
try
{
document.AddPage(imagePath);
document.Recognize(settings);
results[imagePath] = document.GetText();
}
finally
{
document.Dispose();
}
}
});
engine.Shutdown();
return results;
}
// OmniPage: Shared engine with locking to serialize document operations
public ConcurrentDictionary<string, string> ProcessBatchWithEngine(
string[] imagePaths, string licensePath)
{
var results = new ConcurrentDictionary<string, string>();
var engineLock = new object();
// One engine — document operations must be serialized
using var engine = new OmniPageEngine();
engine.SetLicenseFile(licensePath);
engine.Initialize();
var settings = new RecognitionSettings { Language = "English" };
Parallel.ForEach(imagePaths, imagePath =>
{
lock (engineLock) // Serialize: one recognition at a time
{
var document = engine.CreateDocument();
try
{
document.AddPage(imagePath);
document.Recognize(settings);
results[imagePath] = document.GetText();
}
finally
{
document.Dispose();
}
}
});
engine.Shutdown();
return results;
}
Imports System.Collections.Concurrent
Imports System.Threading.Tasks
' OmniPage: Shared engine with locking to serialize document operations
Public Function ProcessBatchWithEngine(imagePaths As String(), licensePath As String) As ConcurrentDictionary(Of String, String)
Dim results As New ConcurrentDictionary(Of String, String)()
Dim engineLock As New Object()
' One engine — document operations must be serialized
Using engine As New OmniPageEngine()
engine.SetLicenseFile(licensePath)
engine.Initialize()
Dim settings As New RecognitionSettings With {.Language = "English"}
Parallel.ForEach(imagePaths, Sub(imagePath)
SyncLock engineLock ' Serialize: one recognition at a time
Dim document = engine.CreateDocument()
Try
document.AddPage(imagePath)
document.Recognize(settings)
results(imagePath) = document.GetText()
Finally
document.Dispose()
End Try
End SyncLock
End Sub)
engine.Shutdown()
End Using
Return results
End Function
IronOCR Approach:
// IronOCR: One IronTesseract per thread — no locking, genuine parallelism
public ConcurrentDictionary<string, string> ProcessBatchInParallel(string[] imagePaths)
{
var results = new ConcurrentDictionary<string, string>();
Parallel.ForEach(imagePaths, imagePath =>
{
// Each thread creates its own instance — no shared state, no locks
var ocr = new IronTesseract();
using var input = new OcrInput();
input.LoadImage(imagePath);
var result = ocr.Read(input);
results[imagePath] = result.Text;
});
return results;
}
// IronOCR: One IronTesseract per thread — no locking, genuine parallelism
public ConcurrentDictionary<string, string> ProcessBatchInParallel(string[] imagePaths)
{
var results = new ConcurrentDictionary<string, string>();
Parallel.ForEach(imagePaths, imagePath =>
{
// Each thread creates its own instance — no shared state, no locks
var ocr = new IronTesseract();
using var input = new OcrInput();
input.LoadImage(imagePath);
var result = ocr.Read(input);
results[imagePath] = result.Text;
});
return results;
}
Imports System.Collections.Concurrent
Imports System.Threading.Tasks
Imports IronOcr
' IronOCR: One IronTesseract per thread — no locking, genuine parallelism
Public Function ProcessBatchInParallel(imagePaths As String()) As ConcurrentDictionary(Of String, String)
Dim results As New ConcurrentDictionary(Of String, String)()
Parallel.ForEach(imagePaths, Sub(imagePath)
' Each thread creates its own instance — no shared state, no locks
Dim ocr As New IronTesseract()
Using input As New OcrInput()
input.LoadImage(imagePath)
Dim result = ocr.Read(input)
results(imagePath) = result.Text
End Using
End Sub)
Return results
End Function
The OmniPage version serializes all recognition through a lock, negating the parallelism. The IronOCR version processes documents concurrently with no coordination. For high-throughput batch workloads, see the multithreading example and the speed optimization guide.
Searchable PDF Generation from Scanned Archive
OmniPage's searchable PDF output requires assembling a recognition pipeline with explicit OutputSettings and OutputFormat configuration before saving. IronOCR produces searchable PDFs from a single method call on the result object:
Kofax OmniPage Approach:
// OmniPage: Searchable PDF requires OutputSettings configuration before save
public void ConvertArchiveToSearchable(string[] scannedPdfPaths, string outputDirectory)
{
using var engine = new OmniPageEngine();
engine.SetLicenseFile(licensePath);
engine.Initialize();
var recognitionSettings = new RecognitionSettings
{
Language = "English",
AccuracyMode = "Maximum",
PreserveLayout = true
};
var outputSettings = new OutputSettings
{
Format = OutputFormat.SearchablePDF,
Compression = PDFCompression.Standard,
ImageQuality = 85,
EmbedFonts = true
};
foreach (var pdfPath in scannedPdfPaths)
{
var document = engine.OpenPDF(pdfPath);
document.RecognizeAll(recognitionSettings);
string outputPath = Path.Combine(
outputDirectory,
Path.GetFileNameWithoutExtension(pdfPath) + "_searchable.pdf");
document.SaveAs(outputPath, outputSettings);
document.Dispose();
}
engine.Shutdown();
}
// OmniPage: Searchable PDF requires OutputSettings configuration before save
public void ConvertArchiveToSearchable(string[] scannedPdfPaths, string outputDirectory)
{
using var engine = new OmniPageEngine();
engine.SetLicenseFile(licensePath);
engine.Initialize();
var recognitionSettings = new RecognitionSettings
{
Language = "English",
AccuracyMode = "Maximum",
PreserveLayout = true
};
var outputSettings = new OutputSettings
{
Format = OutputFormat.SearchablePDF,
Compression = PDFCompression.Standard,
ImageQuality = 85,
EmbedFonts = true
};
foreach (var pdfPath in scannedPdfPaths)
{
var document = engine.OpenPDF(pdfPath);
document.RecognizeAll(recognitionSettings);
string outputPath = Path.Combine(
outputDirectory,
Path.GetFileNameWithoutExtension(pdfPath) + "_searchable.pdf");
document.SaveAs(outputPath, outputSettings);
document.Dispose();
}
engine.Shutdown();
}
Imports System.IO
' OmniPage: Searchable PDF requires OutputSettings configuration before save
Public Sub ConvertArchiveToSearchable(scannedPdfPaths As String(), outputDirectory As String)
Using engine As New OmniPageEngine()
engine.SetLicenseFile(licensePath)
engine.Initialize()
Dim recognitionSettings As New RecognitionSettings With {
.Language = "English",
.AccuracyMode = "Maximum",
.PreserveLayout = True
}
Dim outputSettings As New OutputSettings With {
.Format = OutputFormat.SearchablePDF,
.Compression = PDFCompression.Standard,
.ImageQuality = 85,
.EmbedFonts = True
}
For Each pdfPath As String In scannedPdfPaths
Dim document = engine.OpenPDF(pdfPath)
document.RecognizeAll(recognitionSettings)
Dim outputPath As String = Path.Combine(outputDirectory, Path.GetFileNameWithoutExtension(pdfPath) & "_searchable.pdf")
document.SaveAs(outputPath, outputSettings)
document.Dispose()
Next
End Using
engine.Shutdown()
End Sub
IronOCR Approach:
// IronOCR: Searchable PDF output is one method call on the result
public void ConvertArchiveToSearchable(string[] scannedPdfPaths, string outputDirectory)
{
var ocr = new IronTesseract();
foreach (var pdfPath in scannedPdfPaths)
{
using var input = new OcrInput();
input.LoadPdf(pdfPath); // All pages loaded automatically
var result = ocr.Read(input);
string outputPath = Path.Combine(
outputDirectory,
Path.GetFileNameWithoutExtension(pdfPath) + "_searchable.pdf");
result.SaveAsSearchablePdf(outputPath);
Console.WriteLine($"Processed: {Path.GetFileName(pdfPath)} ({result.Pages.Length} pages)");
}
}
// IronOCR: Searchable PDF output is one method call on the result
public void ConvertArchiveToSearchable(string[] scannedPdfPaths, string outputDirectory)
{
var ocr = new IronTesseract();
foreach (var pdfPath in scannedPdfPaths)
{
using var input = new OcrInput();
input.LoadPdf(pdfPath); // All pages loaded automatically
var result = ocr.Read(input);
string outputPath = Path.Combine(
outputDirectory,
Path.GetFileNameWithoutExtension(pdfPath) + "_searchable.pdf");
result.SaveAsSearchablePdf(outputPath);
Console.WriteLine($"Processed: {Path.GetFileName(pdfPath)} ({result.Pages.Length} pages)");
}
}
Imports System.IO
Imports IronOcr
Public Sub ConvertArchiveToSearchable(scannedPdfPaths As String(), outputDirectory As String)
Dim ocr As New IronTesseract()
For Each pdfPath As String In scannedPdfPaths
Using input As New OcrInput()
input.LoadPdf(pdfPath) ' All pages loaded automatically
Dim result = ocr.Read(input)
Dim outputPath As String = Path.Combine(outputDirectory, Path.GetFileNameWithoutExtension(pdfPath) & "_searchable.pdf")
result.SaveAsSearchablePdf(outputPath)
Console.WriteLine($"Processed: {Path.GetFileName(pdfPath)} ({result.Pages.Length} pages)")
End Using
Next
End Sub
SaveAsSearchablePdf embeds a text layer into the PDF, making it indexable in document management systems without altering the visual appearance of the original scan. The searchable PDF guide covers text layer options and the PDF OCR example demonstrates end-to-end scanned PDF processing.
Kofax OmniPage API to IronOCR Mapping Reference
| Kofax OmniPage | IronOCR Equivalent |
|---|---|
using Kofax.OmniPage.CSDK; |
using IronOcr; |
using Kofax.OmniPageCSDK; |
using IronOcr; |
using CSDK; |
using IronOcr; |
new OmniPageEngine() |
Not required — no engine object |
engine.SetLicenseFile(path) |
IronOcr.License.LicenseKey = "key"; |
engine.Initialize() |
Not required |
engine.Shutdown() |
Not required |
engine.CreateDocument() |
new OcrInput() |
document.AddPage(imagePath) |
input.LoadImage(imagePath) |
engine.OpenPDF(pdfPath) |
input.LoadPdf(pdfPath) |
engine.OpenTiff(tiffPath) |
input.LoadImageFrames(tiffPath) |
document.Recognize(settings) |
new IronTesseract().Read(input) |
document.RecognizeAll(settings) |
new IronTesseract().Read(input) (all pages at once) |
document.GetText() |
result.Text |
document.GetZoneText(zoneName) |
input.LoadImage(path, cropRectangle) + result.Text |
document.Dispose() |
using var input = new OcrInput() (automatic) |
iterator.GetText() |
result.Pages[n].Words[m].Text |
iterator.GetBoundingBox() |
result.Pages[n].Words[m].X / Y / Width / Height |
iterator.GetConfidence() |
result.Pages[n].Words[m].Confidence |
engine.LoadLanguageDictionary("German") |
dotnet add package IronOcr.Languages.German |
settings.PrimaryLanguage = "English" |
ocr.Language = OcrLanguage.English; |
settings.SecondaryLanguages = new[] {"German"} |
ocr.Language = OcrLanguage.English + OcrLanguage.German; |
settings.DeskewImage = true |
input.Deskew(); |
settings.DespeckleLevel = 2 |
input.DeNoise(); |
settings.ContrastEnhancement = true |
input.Contrast(); |
settings.AutoRotate = true |
input.Deskew(); (includes rotation correction) |
document.SaveAs(path, outputSettings) |
result.SaveAsSearchablePdf(path) |
OutputFormat.SearchablePDF |
result.SaveAsSearchablePdf(path) |
OutputFormat.XML (with coordinates) |
result.Pages / .Paragraphs / .Words object graph |
FormZone with coordinate bounds |
new CropRectangle(x, y, width, height) |
| Per-page license counting | Not applicable |
.lic file deployment |
Not applicable |
| License server configuration | Not applicable |
Common Migration Issues and Solutions
Issue 1: License File Not Found Exceptions in Production
Kofax OmniPage: Every deployment requires the .lic file to exist at a specific path accessible to the application process. A deployment pipeline that does not explicitly copy the license file to the target server causes FileNotFoundException or LicenseException at engine initialization. Security teams often flag embedding .lic files in container images or committing them to source control.
Solution: IronOCR reads its license from a string. Store the key as an environment variable and read it at startup — no file to deploy, no path to configure:
// At application startup — reads IRONOCR_LICENSE_KEY from environment
IronOcr.License.LicenseKey = Environment.GetEnvironmentVariable("IRONOCR_LICENSE_KEY")
?? throw new InvalidOperationException("IronOCR license key not configured.");
// At application startup — reads IRONOCR_LICENSE_KEY from environment
IronOcr.License.LicenseKey = Environment.GetEnvironmentVariable("IRONOCR_LICENSE_KEY")
?? throw new InvalidOperationException("IronOCR license key not configured.");
Imports System
' At application startup — reads IRONOCR_LICENSE_KEY from environment
IronOcr.License.LicenseKey = If(Environment.GetEnvironmentVariable("IRONOCR_LICENSE_KEY"), Throw New InvalidOperationException("IronOCR license key not configured."))
In Docker, pass --env IRONOCR_LICENSE_KEY=YOUR-KEY. In Azure App Service, set it as an Application Setting. In Kubernetes, inject it via a Secret. No file mounts, no path configuration, no security review for embedded license files.
Issue 2: Floating License Seats Locked After Process Crash
Kofax OmniPage: When an application process crashes, is killed by a watchdog, or exits via Environment.FailFast, engine.Shutdown() does not execute. Floating license seats remain checked out until the license server's timeout expires — typically 30–60 minutes. In a containerized environment where pods restart frequently, this behavior depletes the license pool.
Solution: IronOCR has no concept of a checked-out license seat. A process crash releases no resources and locks no external system. The next process starts immediately with no waiting for seat availability:
// IronOCR: Process crash has no license implications whatsoever
// Start processing immediately after any failure
IronOcr.License.LicenseKey = "YOUR-LICENSE-KEY";
var result = new IronTesseract().Read("document.jpg"); // No seat to check out
// IronOCR: Process crash has no license implications whatsoever
// Start processing immediately after any failure
IronOcr.License.LicenseKey = "YOUR-LICENSE-KEY";
var result = new IronTesseract().Read("document.jpg"); // No seat to check out
' IronOCR: Process crash has no license implications whatsoever
' Start processing immediately after any failure
IronOcr.License.LicenseKey = "YOUR-LICENSE-KEY"
Dim result = New IronTesseract().Read("document.jpg") ' No seat to check out
For resilient ASP.NET Core services, see the async OCR guide for patterns that integrate cleanly with hosted services and graceful shutdown.
Issue 3: Docker Image Build Fails — Installer Cannot Run Non-Interactively
Kofax OmniPage: The OmniPage SDK installer is a GUI or semi-interactive installer that does not run cleanly in a docker build context. Teams attempting to bake the SDK into a container image encounter failures in the installer's license agreement or component selection steps. Workarounds (silent install switches, pre-extracted DLL copies) are undocumented and version-specific.
Solution: IronOCR's Dockerfile is three lines:
FROM mcr.microsoft.com/dotnet/aspnet:8.0
RUN apt-get update && apt-get install -y libgdiplus # Single Linux dependency
COPY --from=build /app/publish /app
WORKDIR /app
ENTRYPOINT ["dotnet", "YourApp.dll"]
libgdiplus is the only system dependency. The IronOCR NuGet package is restored by dotnet restore inside the build stage like any other package reference. The Docker deployment guide covers both Debian and Alpine base images, and the Linux deployment guide covers distribution-specific package names.
Issue 4: Reactivation Required After VM Migration or Cloud Autoscaling
Kofax OmniPage: OmniPage activation is tied to hardware identity. Cloud environments that migrate VMs between physical hosts, autoscale to new instances, or replace containers with fresh images trigger hardware identity changes that require reactivation. Contacting Tungsten support for reactivation mid-incident adds operational risk.
Solution: IronOCR's license key is hardware-independent. The same key works on any machine, any container, any cloud region, any number of autoscaled instances within the licensed tier. No reactivation, no support contact, no downtime:
// Identical startup code on any hardware topology
IronOcr.License.LicenseKey = "YOUR-LICENSE-KEY";
bool valid = IronOcr.License.IsLicensed; // Verify without a network call
// Identical startup code on any hardware topology
IronOcr.License.LicenseKey = "YOUR-LICENSE-KEY";
bool valid = IronOcr.License.IsLicensed; // Verify without a network call
' Identical startup code on any hardware topology
IronOcr.License.LicenseKey = "YOUR-LICENSE-KEY"
Dim valid As Boolean = IronOcr.License.IsLicensed ' Verify without a network call
Issue 5: RecognitionSettings Object Has No IronOCR Equivalent
Kofax OmniPage: OmniPage uses a RecognitionSettings object (or PreprocessingSettings depending on SDK version) to configure engine behavior per document. Teams migrating expect a parallel configuration object in IronOCR.
Solution: IronOCR splits configuration across two surfaces: preprocessing operations on OcrInput and engine settings on IronTesseract. There is no settings object to instantiate:
// OmniPage settings object → IronOCR method calls on OcrInput
var ocr = new IronTesseract();
ocr.Language = OcrLanguage.English; // RecognitionSettings.Language
// ocr.Configuration.TesseractVersion = ... // Engine version already optimized
using var input = new OcrInput();
input.LoadImage(imagePath);
input.Deskew(); // settings.DeskewImage = true
input.DeNoise(); // settings.DespeckleLevel = 2
input.Contrast(); // settings.ContrastEnhancement = true
var result = ocr.Read(input);
// OmniPage settings object → IronOCR method calls on OcrInput
var ocr = new IronTesseract();
ocr.Language = OcrLanguage.English; // RecognitionSettings.Language
// ocr.Configuration.TesseractVersion = ... // Engine version already optimized
using var input = new OcrInput();
input.LoadImage(imagePath);
input.Deskew(); // settings.DeskewImage = true
input.DeNoise(); // settings.DespeckleLevel = 2
input.Contrast(); // settings.ContrastEnhancement = true
var result = ocr.Read(input);
Imports IronOcr
Dim ocr As New IronTesseract()
ocr.Language = OcrLanguage.English
Using input As New OcrInput()
input.LoadImage(imagePath)
input.Deskew()
input.DeNoise()
input.Contrast()
Dim result = ocr.Read(input)
End Using
The image quality correction guide lists all available preprocessing methods with guidance on which filters apply to which document quality profiles.
Issue 6: Zone Template Files Cannot Be Ported Directly
Kofax OmniPage: OmniPage form templates store zone definitions in proprietary .fdt or .fpf template files that define field positions, recognition types, and validation rules per document class. These files are not importable by IronOCR.
Solution: Extract the coordinate data from the template files (they are XML-based) and convert each zone to a CropRectangle. Field names become dictionary keys; zone coordinates map directly to (x, y, width, height) parameters:
// Convert OmniPage zone definition to IronOCR CropRectangle
// OmniPage zone: Name="TotalDue" X="490" Y="612" Width="180" Height="28"
// IronOCR equivalent:
var totalDueRegion = new CropRectangle(490, 612, 180, 28);
using var input = new OcrInput();
input.LoadImage(invoicePath, totalDueRegion);
string totalDue = new IronTesseract().Read(input).Text.Trim();
// Convert OmniPage zone definition to IronOCR CropRectangle
// OmniPage zone: Name="TotalDue" X="490" Y="612" Width="180" Height="28"
// IronOCR equivalent:
var totalDueRegion = new CropRectangle(490, 612, 180, 28);
using var input = new OcrInput();
input.LoadImage(invoicePath, totalDueRegion);
string totalDue = new IronTesseract().Read(input).Text.Trim();
Imports IronOcr
' Convert OmniPage zone definition to IronOCR CropRectangle
' OmniPage zone: Name="TotalDue" X="490" Y="612" Width="180" Height="28"
' IronOCR equivalent:
Dim totalDueRegion As New CropRectangle(490, 612, 180, 28)
Using input As New OcrInput()
input.LoadImage(invoicePath, totalDueRegion)
Dim totalDue As String = New IronTesseract().Read(input).Text.Trim()
End Using
For documents where zones vary per page, load the same image with different CropRectangle values per region rather than reloading the file. The region crop example demonstrates multiple region reads from a single image source.
Kofax OmniPage Migration Checklist
Pre-Migration
Identify all OmniPage integration points in the codebase:
# Find all OmniPage namespace references
grep -r "Kofax\|OmniPage\|CSDK" --include="*.cs" --include="*.csproj" .
# Find engine lifecycle calls
grep -r "Initialize\|Shutdown\|SetLicenseFile" --include="*.cs" .
# Find document and zone creation patterns
grep -r "CreateDocument\|AddPage\|FormZone\|RecognitionSettings\|PreprocessingSettings" --include="*.cs" .
# Find output format configuration
grep -r "OutputSettings\|OutputFormat\|SaveAs\|GetText" --include="*.cs" .
# Find license file path references
grep -r "\.lic\|license\.lic\|licensePath" --include="*.cs" --include="*.config" --include="*.json" .
# Find all OmniPage namespace references
grep -r "Kofax\|OmniPage\|CSDK" --include="*.cs" --include="*.csproj" .
# Find engine lifecycle calls
grep -r "Initialize\|Shutdown\|SetLicenseFile" --include="*.cs" .
# Find document and zone creation patterns
grep -r "CreateDocument\|AddPage\|FormZone\|RecognitionSettings\|PreprocessingSettings" --include="*.cs" .
# Find output format configuration
grep -r "OutputSettings\|OutputFormat\|SaveAs\|GetText" --include="*.cs" .
# Find license file path references
grep -r "\.lic\|license\.lic\|licensePath" --include="*.cs" --include="*.config" --include="*.json" .
Inventory before starting:
- Count files with OmniPage namespace imports
- List all
FormTemplateandFormZonedefinitions and extract their coordinate data - Identify which language dictionaries are loaded and map to
IronOcr.Languages.*NuGet packages - Note which output formats are used (searchable PDF, plain text, XML) and their IronOCR equivalents
- Locate all
.licfile references in deployment scripts and configuration
Code Migration
- Remove OmniPage DLL references from all
.csprojfiles - Run
dotnet add package IronOcrin each project that performs OCR - Add language packs for each language in use:
dotnet add package IronOcr.Languages.[Language] - Add
IronOcr.License.LicenseKey = ...;at each application entry point (Program.cs, Startup.cs, or service host) - Replace all
using Kofax.OmniPage.CSDK;,using CSDK;, and related namespace imports withusing IronOcr; - Remove all
OmniPageEngineconstruction,SetLicenseFile, andInitializecalls - Remove all
engine.Shutdown()calls and theIDisposableimplementations that exist solely to ensure shutdown - Replace
engine.CreateDocument()+document.AddPage()withnew OcrInput()+input.LoadImage() - Replace
engine.OpenPDF()+ per-page iteration withinput.LoadPdf()(all pages in one call) - Replace
engine.OpenTiff()+ per-frame loops withinput.LoadImageFrames() - Replace
document.Recognize(settings)withnew IronTesseract().Read(input) - Convert
FormZonecoordinate data toCropRectangle(x, y, width, height)instances - Replace
document.GetZoneText()with per-regioninput.LoadImage(path, cropRectangle)+result.Text - Replace
document.SaveAs(path, outputSettings)for searchable PDF withresult.SaveAsSearchablePdf(path) - Replace result iterator patterns with
result.Pages/.Paragraphs/.Wordsproperty traversal - Remove
PreprocessingSettingsobjects and replace boolean flags with explicitinput.Deskew(),input.DeNoise(),input.Contrast()method calls - Remove license file paths from deployment scripts, configuration files, and infrastructure-as-code templates
Post-Migration
- Verify OCR accuracy on a representative sample of 100+ documents from the live corpus — compare against OmniPage output line-by-line for invoices, contracts, and forms
- Confirm that
CropRectangle-based zone extraction returns text matching the OmniPageGetZoneText()output for each mapped field - Test multi-page PDF processing: verify page count, page order, and text continuity
- Test multi-frame TIFF processing: confirm all frames are processed and frame sequence is preserved
- Verify searchable PDF output is indexable in the document management system used (SharePoint, OpenText, etc.)
- Run parallel processing test with 50+ concurrent documents and confirm no thread safety issues
- Test language pack coverage: load each previously used language dictionary via
IronOcr.Languages.*and verify recognition quality - Confirm that process crashes and container restarts do not require any license recovery action
- Run Docker build on CI agent — verify
dotnet restoreresolves IronOCR with no installer steps - Validate confidence scores are available per word and match the data quality expectations of any downstream validation workflow
- Check that the application starts cleanly without the
.licfile present in any path
Key Benefits of Migrating to IronOCR
Deployment complexity drops from weeks to minutes. A project that previously required SDK installer access, .lic file deployment, firewall configuration for the license server, and infrastructure team coordination now deploys via dotnet restore. A new developer clones the repository and runs the project. A new production server is provisioned by the CI/CD pipeline. Container images build without installer steps.
License risk disappears entirely. There are no floating seats to exhaust, no checkout timeouts to wait for, no hardware fingerprints to reactivate, and no .lic files to protect in deployment pipelines. An application crash at 3 AM releases nothing and locks nothing. The on-call engineer restarts the process; OCR resumes immediately. The IronOCR product page covers all licensing tiers.
Cross-platform deployment becomes a standard NuGet target. macOS development machines work without a platform exception. Linux production servers do not require a January 2026 SDK version. Docker containers build from a standard mcr.microsoft.com/dotnet/aspnet base image with one system dependency. The same IronOcr package reference in .csproj produces a working build on Windows, Linux, and macOS. See the Azure deployment guide and AWS deployment guide for cloud-specific configuration.
Parallel throughput scales with hardware. OmniPage's shared engine architecture requires locking that serializes concurrent recognition. IronOCR's stateless IronTesseract instances scale linearly with available CPU cores. Doubling the core count of a batch processing server doubles throughput without configuration changes or additional licensing.
Structured data access is immediate. OcrResult.Pages, Paragraphs, Lines, Words, and Characters expose the complete document structure as a navigable .NET object graph. Per-word confidence and coordinates are properties on each word object. There is no secondary export pipeline, no format configuration, and no file output required to access positional data.
Cost is fixed and predictable. OmniPage's combination of SDK license, annual maintenance, and per-page runtime fees creates a cost that scales with usage and recurs annually. IronOCR at $999–$2,999 perpetual is a one-time purchase. A 500,000-page-per-year workflow that was generating per-page charges recoups the IronOCR license cost within weeks. The IronOCR documentation hub and tutorials are available without a support contract.
Frequently Asked Questions
Why should I migrate from Kofax OmniPage to IronOCR?
Common drivers include eliminating COM interop complexity, replacing file-based license management, avoiding per-page billing, enabling Docker/container deployment, and adopting a NuGet-native workflow that integrates with standard .NET tooling.
What are the main code changes when migrating from Kofax OmniPage to IronOCR?
Replace Kofax OmniPage initialization sequences with IronTesseract instantiation, remove COM lifecycle management (explicit Create/Load/Close patterns), and update result property names. The result is significantly fewer boilerplate lines.
How do I install IronOCR to begin the migration?
Run 'Install-Package IronOcr' in Package Manager Console or 'dotnet add package IronOcr' in the CLI. Language packs are separate packages: 'dotnet add package IronOcr.Languages.French' for French, for example.
Does IronOCR match the OCR accuracy of Kofax OmniPage for standard business documents?
IronOCR achieves high accuracy for standard business content including invoices, contracts, receipts, and typed forms. Image preprocessing filters (deskew, noise removal, contrast enhancement) further improve recognition on degraded input.
How does IronOCR handle the language data that Kofax OmniPage installs separately?
Language data in IronOCR is distributed as NuGet packages. 'dotnet add package IronOcr.Languages.German' installs German support. No manual file placement or directory paths are involved.
Does migrating from Kofax OmniPage to IronOCR require changes to deployment infrastructure?
IronOCR requires fewer infrastructure changes than Kofax OmniPage. There are no SDK binary paths, license file placements, or license server configurations. The NuGet package contains the complete OCR engine, and the license key is a string set in application code.
How do I configure IronOCR licensing after migration?
Assign IronOcr.License.LicenseKey = "YOUR-KEY" in application startup code. In Docker or Kubernetes, store the key as an environment variable and read it in startup. Use License.IsValidLicense to validate before accepting traffic.
Can IronOCR process PDFs the same way Kofax OmniPage does?
Yes. IronOCR reads both native and scanned PDFs. Instantiate IronTesseract, call ocr.Read(input) where input is a PDF path or OcrPdfInput, and iterate the OcrResult pages. No separate PDF rendering pipeline is required.
How does IronOCR handle threading in high-volume processing?
IronTesseract is safe to instantiate per-thread. Spin up one instance per thread in a Parallel.ForEach or Task pool, run OCR concurrently, and dispose each instance when done. No global state or locking is required.
What output formats does IronOCR support after text extraction?
IronOCR returns structured results including text, word coordinates, confidence scores, and page structure. Export options include plain text, searchable PDF, and structured result objects for downstream processing.
Is IronOCR pricing more predictable than Kofax OmniPage for scaling workloads?
IronOCR uses flat-rate perpetual licensing with no per-page or volume charges. Whether you process 10,000 or 10 million pages, the license cost remains constant. Volume and team licensing options are on the IronOCR pricing page.
What happens to my existing tests after migrating from Kofax OmniPage to IronOCR?
Tests that assert on extracted text content should continue to pass after migration. Tests that validate API call patterns or COM object lifecycle will need updating to reflect IronOCR's simpler initialization and result model.

