Read Text from Images with C# OCR
In this tutorial, we will learn how to convert images to text in C# and other .NET languages.
How to Convert Images to Text in C#
- Download the OCR Image to Text IronOCR Library
- Adjust Crop Regions to Read Parts of an Image
- Use up to 125 international languages via Language Packs
- Export OCR Scan Results as Text or Searchable PDF
Reading Text from Images in .NET Applications
We will use the IronOcr.IronTesseract
class to recognize text within images and explore how to use Iron Tesseract OCR to maximize accuracy and speed in .NET applications.
To achieve "Image to Text" functionality, we need to install the IronOCR library into a Visual Studio project.
You can download the IronOcr DLL or use NuGet.
Install-Package IronOcr
Why IronOCR?
We use IronOCR to manage Tesseract because:
- It works out of the box in pure .NET.
- It does not require Tesseract to be installed on your machine.
- It runs the latest engines: Tesseract 5 (as well as Tesseract 4 & 3).
- It is compatible with .NET Framework 4.5+, .NET Standard 2+, and .NET Core 2, 3 & 5.
- It improves accuracy and speed over traditional Tesseract.
- It supports Xamarin, Mono, Azure, and Docker.
- It manages the complex Tesseract dictionary system using NuGet packages.
- It supports PDFs, MultiFrame TIFFs, and all major image formats without configuration.
- It can correct low-quality and skewed scans to achieve the best results.
Start using IronOCR in your project today with a free trial.
Using Tesseract in C#
This simple example demonstrates using the IronOcr.IronTesseract
class to read text from an image and return its value as a string.
:path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-1.cs
// This code snippet uses the IronOcr library to perform Optical Character Recognition (OCR)
// on an image file and print the extracted text to the console.
// To use IronOcr, ensure you have installed the package via NuGet Package Manager:
// PM> Install-Package IronOcr
using IronOcr;
try
{
// Create an instance of IronTesseract, which is used to perform OCR on images.
var ocrEngine = new IronTesseract();
// Specify the file path to the image you want to process.
// Ensure the path is correct; it's currently set to a relative path.
var filePath = @"img\Screenshot.png";
// Use the Read method to perform OCR on the specified image file.
// This returns an OcrResult which contains the recognized text.
using (var input = new OcrInput(filePath))
{
OcrResult result = ocrEngine.Read(input);
// Output the extracted text to the console.
Console.WriteLine(result.Text);
}
}
catch (OcrException ex)
{
// Handle any OCR-specific exceptions that might occur
Console.WriteLine("An error occurred while processing the image: " + ex.Message);
}
catch (Exception ex)
{
// Handle any general exceptions that might occur
Console.WriteLine("An error occurred: " + ex.Message);
}
' This code snippet uses the IronOcr library to perform Optical Character Recognition (OCR)
' on an image file and print the extracted text to the console.
' To use IronOcr, ensure you have installed the package via NuGet Package Manager:
' PM> Install-Package IronOcr
Imports IronOcr
Try
' Create an instance of IronTesseract, which is used to perform OCR on images.
Dim ocrEngine = New IronTesseract()
' Specify the file path to the image you want to process.
' Ensure the path is correct; it's currently set to a relative path.
Dim filePath = "img\Screenshot.png"
' Use the Read method to perform OCR on the specified image file.
' This returns an OcrResult which contains the recognized text.
Using input = New OcrInput(filePath)
Dim result As OcrResult = ocrEngine.Read(input)
' Output the extracted text to the console.
Console.WriteLine(result.Text)
End Using
Catch ex As OcrException
' Handle any OCR-specific exceptions that might occur
Console.WriteLine("An error occurred while processing the image: " & ex.Message)
Catch ex As Exception
' Handle any general exceptions that might occur
Console.WriteLine("An error occurred: " & ex.Message)
End Try
Which results in 100% accuracy with the following text:
IronOCR Simple Example
In this simple example we test the accuracy of our C# OCR library to read text from a PNG Image. This is a very basic test, but things will get more complicated as the tutorial continues.
The quick brown fox jumps over the lazy dog
The OCR process involves sophisticated behavior like scanning the image for alignment, quality, and resolution, optimizing the OCR engine, and using artificial intelligence to read text as a human would. Despite its complexity, the OCR process can match a human's speed and achieve a high level of accuracy.

Advanced Use of IronOCR Tesseract for C#
For real-world projects requiring optimal performance, use the OcrInput
and IronTesseract
classes within the IronOcr
namespace.
OcrInput Features:
- Works with various image formats like JPEG, TIFF, GIF, BMP, and PNG.
- Imports whole or parts of PDF documents.
- Enhances contrast, resolution, and size.
- Corrects for rotation, scan noise, digital noise, skew, and negative images.
IronTesseract Features:
- Access hundreds of prepackaged languages and variants.
- Tesseract 5, 4, or 3 OCR engines available out-of-the-box.
- Specify if the document is a screenshot, snippet, or full document.
- Read barcodes.
- Output results to: Searchable PDFs, Hocr HTML, a DOM, and Strings.
Example: Getting Started with OcrInput + IronTesseract
Here is a recommended starting configuration, suitable for most images:
:path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-2.cs
using IronOcr;
// Create an instance of the IronTesseract OCR engine.
IronTesseract ocr = new IronTesseract();
// Using statement to ensure resources are disposed of correctly.
using (OcrInput input = new OcrInput())
{
// Specify the pages to be read from a multi-page TIFF file.
int[] pageIndices = new int[] { 1, 2 };
// Load specified image frames (pages) from the TIFF file into the input object.
// Ensure the file path is correct, adjust as needed if the directory structure is different.
input.LoadImageFrames(@"img\Potter.tiff", pageIndices);
// Perform OCR on the loaded image frames.
// The Read method analyses the image frames and extracts text.
OcrResult result = ocr.Read(input);
// Output the extracted text to the console.
// If reading pages fails, result.Text will be empty or contain error messages.
Console.WriteLine(result.Text);
}
Imports IronOcr
' Create an instance of the IronTesseract OCR engine.
Private ocr As New IronTesseract()
' Using statement to ensure resources are disposed of correctly.
Using input As New OcrInput()
' Specify the pages to be read from a multi-page TIFF file.
Dim pageIndices() As Integer = { 1, 2 }
' Load specified image frames (pages) from the TIFF file into the input object.
' Ensure the file path is correct, adjust as needed if the directory structure is different.
input.LoadImageFrames("img\Potter.tiff", pageIndices)
' Perform OCR on the loaded image frames.
' The Read method analyses the image frames and extracts text.
Dim result As OcrResult = ocr.Read(input)
' Output the extracted text to the console.
' If reading pages fails, result.Text will be empty or contain error messages.
Console.WriteLine(result.Text)
End Using
This configuration can achieve 100% accuracy on a medium-quality scan.

Reading text and/or barcodes from scanned images such as TIFFs is simplified with IronOCR, achieving a high level of accuracy.
IronOCR is highly effective with real-world documents, including multi-page TIFFs and PDF extractions.
Example: A Low Quality Scan

In this case, we work with a low-quality scan with distortion and digital noise.
IronOCR excels in this scenario compared to other OCR libraries, handling real-world scanned images efficiently rather than synthetic test cases that guarantee 100% accuracy.
:path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-3.cs
// Include the necessary namespace for IronOcr
using IronOcr;
using System;
// Create a new instance of the IronTesseract class, which is responsible for OCR operations
var ocr = new IronTesseract();
try
{
// Create a new OCR input object that represents the images to be processed
using (var input = new OcrInput())
{
// Specify the indices of the pages in a multi-page image file you want to process
// Typically, indices are 0-based, but check if the library needs 1-based indices
var pageIndices = new int[] { 0, 1 }; // Adjust indices according to your requirements
// Load specific frames/pages from a TIFF file into the OCR input object
input.LoadImageFrames(@"img\Potter.LowQuality.tiff", pageIndices);
// Apply a deskew transformation to the input to correct any rotation or perspective distortion
input.Deskew(); // This method helps in removing rotation and perspective to improve OCR accuracy
// Perform OCR on the processed input and obtain the result
OcrResult result = ocr.Read(input);
// Output the recognized text to the console
Console.WriteLine("Recognized Text:");
Console.WriteLine(result.Text);
}
}
catch (Exception ex)
{
// Handle potential exceptions such as file not found, incorrect image format, etc.
Console.WriteLine("An error occurred during OCR processing: " + ex.Message);
}
' Include the necessary namespace for IronOcr
Imports IronOcr
Imports System
' Create a new instance of the IronTesseract class, which is responsible for OCR operations
Private ocr = New IronTesseract()
Try
' Create a new OCR input object that represents the images to be processed
Using input = New OcrInput()
' Specify the indices of the pages in a multi-page image file you want to process
' Typically, indices are 0-based, but check if the library needs 1-based indices
Dim pageIndices = New Integer() { 0, 1 } ' Adjust indices according to your requirements
' Load specific frames/pages from a TIFF file into the OCR input object
input.LoadImageFrames("img\Potter.LowQuality.tiff", pageIndices)
' Apply a deskew transformation to the input to correct any rotation or perspective distortion
input.Deskew() ' This method helps in removing rotation and perspective to improve OCR accuracy
' Perform OCR on the processed input and obtain the result
Dim result As OcrResult = ocr.Read(input)
' Output the recognized text to the console
Console.WriteLine("Recognized Text:")
Console.WriteLine(result.Text)
End Using
Catch ex As Exception
' Handle potential exceptions such as file not found, incorrect image format, etc.
Console.WriteLine("An error occurred during OCR processing: " & ex.Message)
End Try
With Input.Deskew()
, we reach 99.8% accuracy, nearly matching a high-quality scan.
Image filters might slightly increase runtime, but reduce OCR processing times. Developers must balance filters for their input documents.
If uncertain, Input.Deskew()
and Input.DeNoise()
are reliable filters for improving your OCR's performance.
Performance Tuning
The primary factor in OCR job speed is input image quality. Less noise and higher DPI (~200 dpi is ideal) make for the fastest and most accurate results.
IronOCR efficiently corrects imperfect documents, although this is computationally expensive and slower.
Using image formats with minimal digital noise like TIFF or PNG can yield faster results compared to JPEG.
Image Filters
The following image filters can significantly improve performance:
- OcrInput.Rotate(double degrees): Rotate images by degrees clockwise. Use negative degrees for counterclockwise rotation.
- OcrInput.Binarize(): Converts every pixel to black or white, improving OCR performance in low contrast cases.
- OcrInput.ToGrayScale(): Converts every pixel to grayscale, potentially improving speed.
- OcrInput.Contrast(): Automatically increases contrast, often enhancing OCR speed and accuracy.
- OcrInput.DeNoise(): Removes digital noise, beneficial where noise is expected.
- OcrInput.Invert(): Inverts image colors (white becomes black, black becomes white).
- OcrInput.Dilate(): Adds pixels to object boundaries in images.
- OcrInput.Erode(): Removes pixels on object boundaries.
- OcrInput.Deskew(): Aligns the image correctly. Critical for OCR since Tesseract's skew tolerance can be low.
- OcrInput.DeepCleanBackgroundNoise(): Heavy noise removal.
- OcrInput.EnhanceResolution: Enhances low-quality image resolution.
Performance Tuning for Speed
Consider using the following settings to speed up OCR for high-quality scans:
:path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-4.cs
using IronOcr;
// Initialize a new instance of the IronTesseract class which handles OCR operations
IronTesseract ocr = new IronTesseract();
// Configure for optimal speed by excluding specific characters from OCR consideration
ocr.Configuration.BlackListCharacters = "~`$#^*_{[]}
\\";
// Set the page segmentation mode to automatic
ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.Auto;
// Use the fast version of the English OCR language
ocr.Language = OcrLanguage.EnglishFast;
// Create an instance of OcrInput which will be used to load the document to be processed
using (OcrInput input = new OcrInput())
{
// Specify the image frames to load from a multi-page image file
int[] pageIndices = new int[] { 1, 2 };
// Load specified pages from the image file into the OcrInput instance
input.LoadImageFrames(@"img\Potter.tiff", pageIndices);
// Read the text from the input images and store the OCR result
OcrResult result = ocr.Read(input);
// Output the textual result of the OCR process to the console
Console.WriteLine(result.Text);
}
Imports IronOcr
' Initialize a new instance of the IronTesseract class which handles OCR operations
Private ocr As New IronTesseract()
' Configure for optimal speed by excluding specific characters from OCR consideration
ocr.Configuration.BlackListCharacters = "~`$#^*_{[]}
\"
' Set the page segmentation mode to automatic
ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.Auto
' Use the fast version of the English OCR language
ocr.Language = OcrLanguage.EnglishFast
' Create an instance of OcrInput which will be used to load the document to be processed
Using input As New OcrInput()
' Specify the image frames to load from a multi-page image file
Dim pageIndices() As Integer = { 1, 2 }
' Load specified pages from the image file into the OcrInput instance
input.LoadImageFrames("img\Potter.tiff", pageIndices)
' Read the text from the input images and store the OCR result
Dim result As OcrResult = ocr.Read(input)
' Output the textual result of the OCR process to the console
Console.WriteLine(result.Text)
End Using
This setup is 99.8% accurate compared to the baseline 100%, achieving a 35% speed boost.
Reading Cropped Regions of Images
Iron's version of Tesseract OCR can target specific image areas using System.Drawing.Rectangle
.
This is particularly useful when handling standardized forms where the text is localized in specific sections.
Example: Scanning an Area of a Page
Using System.Drawing.Rectangle
, you can specify pixel-based areas for OCR. This improves speed and prevents reading unnecessary text.
:path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-5.cs
using IronOcr; // Namespace for IronOcr library
using IronSoftware.Drawing; // Namespace reference for drawing-related classes
// Create an instance of the IronTesseract class for OCR processing
var ocr = new IronTesseract();
// Using statement ensures that resources are automatically released after use
using (var input = new OcrInput())
{
// Define a rectangle area to focus OCR processing on a specific portion of the image.
// This can significantly speed up the process by limiting the area to be analyzed.
// Rectangle arguments: x = 215, y = 1250, width = 1335, height = 280
var contentArea = new System.Drawing.Rectangle(x: 215, y: 1250, width: 1335, height: 280);
// Load the image with a specific area of interest
input.AddImage("img/ComSci.png", contentArea);
// Perform OCR on the loaded image input
OcrResult result = ocr.Read(input);
// Output the recognized text to the console
Console.WriteLine(result.Text);
}
Imports IronOcr ' Namespace for IronOcr library
Imports IronSoftware.Drawing ' Namespace reference for drawing-related classes
' Create an instance of the IronTesseract class for OCR processing
Private ocr = New IronTesseract()
' Using statement ensures that resources are automatically released after use
Using input = New OcrInput()
' Define a rectangle area to focus OCR processing on a specific portion of the image.
' This can significantly speed up the process by limiting the area to be analyzed.
' Rectangle arguments: x = 215, y = 1250, width = 1335, height = 280
Dim contentArea = New System.Drawing.Rectangle(x:= 215, y:= 1250, width:= 1335, height:= 280)
' Load the image with a specific area of interest
input.AddImage("img/ComSci.png", contentArea)
' Perform OCR on the loaded image input
Dim result As OcrResult = ocr.Read(input)
' Output the recognized text to the console
Console.WriteLine(result.Text)
End Using
This method offers 41% speed improvement and specific text extraction, ideal for .NET OCR contexts like invoices, checks, forms, etc. OCR cropping is also supported for PDFs.
International Languages
IronOCR supports 125 international languages via language packs, downloadable as DLLs from this website or the NuGet Package Manager for Visual Studio.
You can install them via the NuGet interface (search for "IronOcr.Languages") or from the OCR language packs page.
Example languages include:
- Afrikaans
- Amharic Also known as አማርኛ
- Arabic Also known as العربية
- ArabicAlphabet Also known as العربية
- ArmenianAlphabet Also known as Հայերեն
- Assamese Also known as অসমীয়া
- Azerbaijani Also known as azərbaycan dili
- AzerbaijaniCyrillic Also known as azərbaycan dili
- Belarusian Also known as беларуская мова
- Bengali Also known as Bangla,বাংলা
- BengaliAlphabet Also known as Bangla,বাংলা
- Tibetan Also known as Tibetan Standard, Tibetan, Central ཡིག་
- Bosnian Also known as bosanski jezik
- Breton Also known as brezhoneg
- Bulgarian Also known as български език
- CanadianAboriginalAlphabet Also known as Canadian First Nations, Indigenous Canadians, Native Canadian, Inuit
- Catalan Also known as català, valencià
- Cebuano Also known as Bisaya, Binisaya
- Czech Also known as čeština, český jazyk
- CherokeeAlphabet Also known as ᏣᎳᎩ ᎦᏬᏂᎯᏍᏗ, Tsalagi Gawonihisdi
- ChineseSimplified Also known as 中文 (Zhōngwén), 汉语, 漢語
- ChineseSimplifiedVertical Also known as 中文 (Zhōngwén), 汉语, 漢語
- ChineseTraditional Also known as 中文 (Zhōngwén), 汉语, 漢語
- ChineseTraditionalVertical Also known as 中文 (Zhōngwén), 汉语, 漢語
- Cherokee Also known as ᏣᎳᎩ ᎦᏬᏂᎯᏍᏗ, Tsalagi Gawonihisdi
- Corsican Also known as corsu, lingua corsa
- Welsh Also known as Cymraeg
- CyrillicAlphabet Also known as Cyrillic scripts
- Danish Also known as dansk
- DanishFraktur Also known as dansk
- German Also known as Deutsch
- GermanFraktur Also known as Deutsch
- DevanagariAlphabet Also known as Nagair,देवनागरी
- Divehi Also known as ދިވެހި
- Dzongkha Also known as རྫོང་ཁ
- Greek Also known as ελληνικά
- English
- MiddleEnglish Also known as English (1100-1500 AD)
- Esperanto
- Estonian Also known as eesti, eesti keel
- EthiopicAlphabet Also known as Ge'ez,ግዕዝ, Gəʿəz
- Basque Also known as euskara, euskera
- Faroese Also known as føroyskt
- Persian Also known as فارسی
- Filipino Also known as National Language of the Philippines, Standardized Tagalog
- Finnish Also known as suomi, suomen kieli
- Financial Also known as Financial, Numerical and Technical Documents
- French Also known as français, langue française
- FrakturAlphabet Also known as Generic Fraktur, Calligraphic hand of the Latin alphabet
- Frankish Also known as Frenkisk, Old Franconian
- MiddleFrench Also known as Moyen Français,Middle French (ca. 1400-1600 AD)
- WesternFrisian Also known as Frysk
- GeorgianAlphabet Also known as ქართული
- ScottishGaelic Also known as Gàidhlig
- Irish Also known as Gaeilge
- Galician Also known as galego
- AncientGreek Also known as Ἑλληνική
- GreekAlphabet Also known as ελληνικά
- Gujarati Also known as ગુજરાતી
- GujaratiAlphabet Also known as ગુજરાતી
- GurmukhiAlphabet Also known as Gurmukhī, ਗੁਰਮੁਖੀ, Shahmukhi, گُرمُکھی, Sihk Script
- HangulAlphabet Also known as Korean Alphabet,한글,Hangeul,조선글,hosŏn'gŭl
- HangulVerticalAlphabet Also known as Korean Alphabet,한글,Hangeul,조선글,hosŏn'gŭl
- HanSimplifiedAlphabet Also known as Samhan ,한어, 韓語
- HanSimplifiedVerticalAlphabet Also known as Samhan ,한어, 韓語
- HanTraditionalAlphabet Also known as Samhan ,한어, 韓語
- HanTraditionalVerticalAlphabet Also known as Samhan ,한어, 韓語
- Haitian Also known as Kreyòl ayisyen
- Hebrew Also known as עברית
- HebrewAlphabet Also known as עברית
- Hindi Also known as हिन्दी, हिंदी
- Croatian Also known as hrvatski jezik
- Hungarian Also known as magyar
- Armenian Also known as Հայերեն
- Inuktitut Also known as ᐃᓄᒃᑎᑐᑦ
- Indonesian Also known as Bahasa Indonesia
- Icelandic Also known as Íslenska
- Italian Also known as italiano
- ItalianOld Also known as italiano
- JapaneseAlphabet Also known as 日本語 (にほんご)
- JapaneseVerticalAlphabet Also known as 日本語 (にほんご)
- Javanese Also known as basa Jawa
- Japanese Also known as 日本語 (にほんご)
- JapaneseVertical Also known as 日本語 (にほんご)
- Kannada Also known as ಕನ್ನಡ
- KannadaAlphabet Also known as ಕನ್ನಡ
- Georgian Also known as ქართული
- GeorgianOld Also known as ქართული
- Kazakh Also known as қазақ тілі
- Khmer Also known as ខ្មែរ, ខេមរភាសា, ភាសាខ្មែរ
- KhmerAlphabet Also known as ខ្មែរ, ខេមរភាសា, ភាសាខ្មែរ
- Kyrgyz Also known as Кыргызча, Кыргыз тили
- NorthernKurdish Also known as Kurmanji, کورمانجی ,Kurmancî
- Korean Also known as 한국어 (韓國語), 조선어 (朝鮮語)
- KoreanVertical Also known as 한국어 (韓國語), 조선어 (朝鮮語)
- Lao Also known as ພາສາລາວ
- LaoAlphabet Also known as ພາສາລາວ
- Latin Also known as latine, lingua latina
- LatinAlphabet Also known as latine, lingua latina
- Latvian Also known as latviešu valoda
- Lithuanian Also known as lietuvių kalba
- Luxembourgish Also known as Lëtzebuergesch
- Malayalam Also known as മലയാളം
- MalayalamAlphabet Also known as മലയാളം
- Marathi Also known as मराठी
- MICR Also known as Magnetic Ink Character Recognition, MICR Cheque Encoding
- Macedonian Also known as македонски јазик
- Maltese Also known as Malti
- Mongolian Also known as монгол
- Maori Also known as te reo Māori
- Malay Also known as bahasa Melayu, بهاس ملايو
- Myanmar Also known as Burmese ,ဗမာစာ
- MyanmarAlphabet Also known as Burmese ,ဗမာစာ
- Nepali Also known as नेपाली
- Dutch Also known as Nederlands, Vlaams
- Norwegian Also known as Norsk
- Occitan Also known as occitan, lenga d'òc
- Oriya Also known as ଓଡ଼ିଆ
- OriyaAlphabet Also known as ଓଡ଼ିଆ
- Panjabi Also known as ਪੰਜਾਬੀ, پنجابی
- Polish Also known as język polski, polszczyzna
- Portuguese Also known as português
- Pashto Also known as پښتو
- Quechua Also known as Runa Simi, Kichwa
- Romanian Also known as limba română
- Russian Also known as русский язык
- Sanskrit Also known as संस्कृतम्
- Sinhala Also known as සිංහල
- SinhalaAlphabet Also known as සිංහල
- Slovak Also known as slovenčina, slovenský jazyk
- SlovakFraktur Also known as slovenčina, slovenský jazyk
- Slovene Also known as slovenski jezik, slovenščina
- Sindhi Also known as सिन्धी, سنڌي، سندھی
- Spanish Also known as español, castellano
- SpanishOld Also known as español, castellano
- Albanian Also known as gjuha shqipe
- Serbian Also known as српски језик
- SerbianLatin Also known as српски језик
- Sundanese Also known as Basa Sunda
- Swahili Also known as Kiswahili
- Swedish Also known as Svenska
- Syriac Also known as Syrian, Syriac Aramaic,ܠܫܢܐ ܣܘܪܝܝܐ, Leššānā Suryāyā
- SyriacAlphabet Also known as Syrian, Syriac Aramaic,ܠܫܢܐ ܣܘܪܝܝܐ, Leššānā Suryāyā
- Tamil Also known as தமிழ்
- TamilAlphabet Also known as தமிழ்
- Tatar Also known as татар теле, tatar tele
- Telugu Also known as తెలుగు
- TeluguAlphabet Also known as తెలుగు
- Tajik Also known as тоҷикӣ, toğikī, تاجیکی
- Tagalog Also known as Wikang Tagalog, ᜏᜒᜃᜅ᜔ ᜆᜄᜎᜓᜄ᜔
- Thai Also known as ไทย
- ThaanaAlphabet Also known as Taana , Tāna , ތާނަ
- ThaiAlphabet Also known as ไทย
- TibetanAlphabet Also known as Tibetan Standard, Tibetan, Central ཡིག་
- Tigrinya Also known as ትግርኛ
- Tonga Also known as faka Tonga
- Turkish Also known as Türkçe
- Uyghur Also known as Uyƣurqə, ئۇيغۇرچە
- Ukrainian Also known as українська мова
- Urdu Also known as اردو
- Uzbek Also known as O‘zbek, Ўзбек, أۇزبېك
- UzbekCyrillic Also known as O‘zbek, Ўзбек, أۇزبېك
- Vietnamese Also known as Tiếng Việt
- VietnameseAlphabet Also known as Tiếng Việt
- Yiddish Also known as ייִדיש
- Yoruba Also known as Yorùbá
Example: OCR in Arabic (+ many more)
The example below demonstrates how to scan documents written in Arabic.
Install-Package IronOcr.Languages.Arabic

:path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-6.cs
// PM> Install IronOcr.Languages.Arabic
using IronOcr;
// Create an instance of the IronTesseract class for OCR operations.
var ocr = new IronTesseract();
// Set the OCR language to Arabic.
ocr.Language = OcrLanguage.Arabic;
// Use a 'using' block for OcrInput to ensure proper disposal of resources.
using (var input = new OcrInput())
{
// Load the first frame of the image located at "img/arabic.gif".
input.AddImage("img/arabic.gif");
// Optional: Add image filters if necessary for better OCR performance.
// In this case, even though the input is very low quality,
// IronTesseract can read what conventional Tesseract cannot.
// Perform OCR to get the result.
var result = ocr.Read(input);
// Save the OCR result to a text file because the console might not display Arabic correctly on Windows.
result.SaveAsTextFile("arabic.txt");
}
' PM> Install IronOcr.Languages.Arabic
Imports IronOcr
' Create an instance of the IronTesseract class for OCR operations.
Private ocr = New IronTesseract()
' Set the OCR language to Arabic.
ocr.Language = OcrLanguage.Arabic
' Use a 'using' block for OcrInput to ensure proper disposal of resources.
Using input = New OcrInput()
' Load the first frame of the image located at "img/arabic.gif".
input.AddImage("img/arabic.gif")
' Optional: Add image filters if necessary for better OCR performance.
' In this case, even though the input is very low quality,
' IronTesseract can read what conventional Tesseract cannot.
' Perform OCR to get the result.
Dim result = ocr.Read(input)
' Save the OCR result to a text file because the console might not display Arabic correctly on Windows.
result.SaveAsTextFile("arabic.txt")
End Using
Example: OCR in more than one language in the same document
If a document contains multiple languages, such as English and Chinese, you can perform OCR as follows:
Install-Package IronOcr.Languages.ChineseSimplified
:path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-7.cs
// Include the necessary namespace for IronOcr functionality
using IronOcr;
// Create an instance of IronTesseract, which is used for OCR operations
var ocr = new IronTesseract();
// Set the primary language for OCR processing to Chinese Simplified
ocr.Language = OcrLanguage.ChineseSimplified;
// We can add any number of secondary languages for OCR processing.
// Here, English is added as a secondary language.
ocr.AddSecondaryLanguage(OcrLanguage.English);
// Optionally, custom Tesseract '.traineddata' files can be added by specifying a file path.
// This is useful for languages that are not supported out of the box or for improving existing language support.
// Using a using statement to ensure proper disposal of resources after OCR operation
using (var input = new OcrInput())
{
// Load the image that contains multi-language text for OCR processing
input.AddImage("img/MultiLanguage.jpeg");
// Perform OCR on the input and retrieve the result
var result = ocr.Read(input);
// Save the recognized text as a text file with the specified filename
result.SaveAsTextFile("MultiLanguage.txt");
}
// Note: Ensure that the necessary IronOcr package is installed and referenced in your project.
// Also, make sure that the image path is correct and accessible by the application.
' Include the necessary namespace for IronOcr functionality
Imports IronOcr
' Create an instance of IronTesseract, which is used for OCR operations
Private ocr = New IronTesseract()
' Set the primary language for OCR processing to Chinese Simplified
ocr.Language = OcrLanguage.ChineseSimplified
' We can add any number of secondary languages for OCR processing.
' Here, English is added as a secondary language.
ocr.AddSecondaryLanguage(OcrLanguage.English)
' Optionally, custom Tesseract '.traineddata' files can be added by specifying a file path.
' This is useful for languages that are not supported out of the box or for improving existing language support.
' Using a using statement to ensure proper disposal of resources after OCR operation
Using input = New OcrInput()
' Load the image that contains multi-language text for OCR processing
input.AddImage("img/MultiLanguage.jpeg")
' Perform OCR on the input and retrieve the result
Dim result = ocr.Read(input)
' Save the recognized text as a text file with the specified filename
result.SaveAsTextFile("MultiLanguage.txt")
End Using
' Note: Ensure that the necessary IronOcr package is installed and referenced in your project.
' Also, make sure that the image path is correct and accessible by the application.
Multi Page Documents
IronOCR allows combining multiple pages or images into a single OcrResult
. This is great for documents created from multiple images, enabling valuable features like creating searchable PDFs and HTML files.
IronOCR can mix and match images, TIFF frames, and PDF pages into a single OCR input.
:path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-8.cs
// Required namespace for OCR operations
using IronOcr;
// Create a new instance of IronTesseract to perform OCR operations
IronTesseract ocr = new IronTesseract();
// Using statement to ensure proper disposal of OcrInput resources
using (OcrInput input = new OcrInput())
{
// Load various images into the input object for OCR processing
input.AddImage("image1.jpeg");
input.AddImage("image2.png");
// Specify which frames to load from a multi-frame image (e.g., GIF)
int[] pageIndices = { 1, 2 };
input.AddImageFrames("image3.gif", pageIndices);
// Perform OCR on the loaded images and retrieve the result
OcrResult result = ocr.Read(input);
// Output the number of pages processed to the console
Console.WriteLine($"{result.Pages.Count} Pages processed."); // Expected: 3 Pages
}
' Required namespace for OCR operations
Imports IronOcr
' Create a new instance of IronTesseract to perform OCR operations
Private ocr As New IronTesseract()
' Using statement to ensure proper disposal of OcrInput resources
Using input As New OcrInput()
' Load various images into the input object for OCR processing
input.AddImage("image1.jpeg")
input.AddImage("image2.png")
' Specify which frames to load from a multi-frame image (e.g., GIF)
Dim pageIndices() As Integer = { 1, 2 }
input.AddImageFrames("image3.gif", pageIndices)
' Perform OCR on the loaded images and retrieve the result
Dim result As OcrResult = ocr.Read(input)
' Output the number of pages processed to the console
Console.WriteLine($"{result.Pages.Count} Pages processed.") ' Expected: 3 Pages
End Using
OCR all pages of a TIFF file as follows:
:path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-9.cs
using IronOcr;
// Instantiate the IronTesseract object, which will handle OCR processing.
IronTesseract ocr = new IronTesseract();
// Create an OcrInput object to hold the images to be processed.
// The 'using' statement ensures that resources are freed when the operation is complete.
using (OcrInput input = new OcrInput())
{
// Define the indices of the pages to load from a multi-frame TIFF image.
// This example loads the first and second pages.
int[] pageIndices = new int[] { 0, 1 }; // Corrected indices to start from 0
// Load the specified image frames (pages) from the TIFF file.
// The LoadImageFrames function reads specific frames of a TIFF file as indicated by the indices.
input.LoadImageFrames("MultiFrame.Tiff", pageIndices);
// Perform OCR on the input and store the result.
OcrResult result = ocr.Read(input);
// Output the recognized text to the console.
Console.WriteLine(result.Text);
// Output the number of pages processed, noting that each frame corresponds to a page.
Console.WriteLine($"{result.Pages.Count} Pages");
// Note: One page is returned for each frame (page) in the TIFF input.
}
Imports IronOcr
' Instantiate the IronTesseract object, which will handle OCR processing.
Private ocr As New IronTesseract()
' Create an OcrInput object to hold the images to be processed.
' The 'using' statement ensures that resources are freed when the operation is complete.
Using input As New OcrInput()
' Define the indices of the pages to load from a multi-frame TIFF image.
' This example loads the first and second pages.
Dim pageIndices() As Integer = { 0, 1 } ' Corrected indices to start from 0
' Load the specified image frames (pages) from the TIFF file.
' The LoadImageFrames function reads specific frames of a TIFF file as indicated by the indices.
input.LoadImageFrames("MultiFrame.Tiff", pageIndices)
' Perform OCR on the input and store the result.
Dim result As OcrResult = ocr.Read(input)
' Output the recognized text to the console.
Console.WriteLine(result.Text)
' Output the number of pages processed, noting that each frame corresponds to a page.
Console.WriteLine($"{result.Pages.Count} Pages")
' Note: One page is returned for each frame (page) in the TIFF input.
End Using
Converting TIFF documents or PDFs to searchable PDFs uses IronTesseract:
:path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-10.cs
using System; // Import system namespace for Console usage
using IronOcr; // Import IronOcr namespace to use OCR functionalities
// Create a new instance of IronTesseract, which is the main class for OCR operations
IronTesseract ocr = new IronTesseract();
// Create an OcrInput object within a using statement to ensure it is disposed of correctly after use
using (OcrInput input = new OcrInput())
{
try
{
// Load the PDF file into the OcrInput object, providing a password if the PDF is password-protected.
// If the PDF does not have a password, consider providing null or an empty string for the password parameter.
input.LoadPdf("example.pdf", "password");
// Perform OCR on the loaded input and store the result
OcrResult result = ocr.Read(input);
// Output the recognized text to the console
Console.WriteLine(result.Text);
// Output the number of pages recognized in the PDF
Console.WriteLine($"{result.Pages.Count} Pages");
// Note: Use result.Pages.Count instead of result.Pages.Length for better C# practices with collections
}
catch (Exception ex)
{
// In case of exceptions, output the error message to the console
Console.WriteLine("An error occurred during OCR processing: " + ex.Message);
}
}
Imports System ' Import system namespace for Console usage
Imports IronOcr ' Import IronOcr namespace to use OCR functionalities
' Create a new instance of IronTesseract, which is the main class for OCR operations
Private ocr As New IronTesseract()
' Create an OcrInput object within a using statement to ensure it is disposed of correctly after use
Using input As New OcrInput()
Try
' Load the PDF file into the OcrInput object, providing a password if the PDF is password-protected.
' If the PDF does not have a password, consider providing null or an empty string for the password parameter.
input.LoadPdf("example.pdf", "password")
' Perform OCR on the loaded input and store the result
Dim result As OcrResult = ocr.Read(input)
' Output the recognized text to the console
Console.WriteLine(result.Text)
' Output the number of pages recognized in the PDF
Console.WriteLine($"{result.Pages.Count} Pages")
' Note: Use result.Pages.Count instead of result.Pages.Length for better C# practices with collections
Catch ex As Exception
' In case of exceptions, output the error message to the console
Console.WriteLine("An error occurred during OCR processing: " & ex.Message)
End Try
End Using
Searchable PDFs
IronOCR can export results as searchable PDFs, a sought-after feature for applications requiring database updating, SEO, and PDF usability.
:path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-11.cs
using IronOcr;
// Initializes a new instance of the IronTesseract class
IronTesseract ocr = new IronTesseract();
// Using block to ensure that resources are disposed of properly
using (OcrInput input = new OcrInput())
{
// Sets the title for the OCR input
input.Title = "Quarterly Report";
// Loads individual images into the OCR input
input.AddImage("image1.jpeg");
input.AddImage("image2.png");
// Define specific page indices to load from an image with multiple frames
int[] pageIndices = new int[] { 1, 2 };
// Loads specific frames from a multi-frame image (e.g., GIF)
input.AddImageFrames("image3.gif", pageIndices);
// Performs OCR on the loaded images and stores the result
OcrResult result = ocr.Read(input);
// Saves the OCR result as a searchable PDF
result.SaveAsSearchablePdf("searchable.pdf");
}
Imports IronOcr
' Initializes a new instance of the IronTesseract class
Private ocr As New IronTesseract()
' Using block to ensure that resources are disposed of properly
Using input As New OcrInput()
' Sets the title for the OCR input
input.Title = "Quarterly Report"
' Loads individual images into the OCR input
input.AddImage("image1.jpeg")
input.AddImage("image2.png")
' Define specific page indices to load from an image with multiple frames
Dim pageIndices() As Integer = { 1, 2 }
' Loads specific frames from a multi-frame image (e.g., GIF)
input.AddImageFrames("image3.gif", pageIndices)
' Performs OCR on the loaded images and stores the result
Dim result As OcrResult = ocr.Read(input)
' Saves the OCR result as a searchable PDF
result.SaveAsSearchablePdf("searchable.pdf")
End Using
Similarly, convert existing PDFs into searchable ones:
:path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-12.cs
using IronOcr;
// Create an instance of the IronTesseract engine
var ocr = new IronTesseract();
// Use a using statement to ensure input resources are disposed of properly
using (var input = new OcrInput())
{
// Set a title for the input which might be useful for metadata information
input.Title = "Pdf Metadata Name";
// Load the PDF file into the OCR input. If the PDF is password protected, provide the password
input.LoadPdf("example.pdf", "password");
// Process the OCR operation on the loaded input
var result = ocr.Read(input);
// Save the result as a searchable PDF
result.SaveAsSearchablePdf("searchable.pdf");
}
Imports IronOcr
' Create an instance of the IronTesseract engine
Private ocr = New IronTesseract()
' Use a using statement to ensure input resources are disposed of properly
Using input = New OcrInput()
' Set a title for the input which might be useful for metadata information
input.Title = "Pdf Metadata Name"
' Load the PDF file into the OCR input. If the PDF is password protected, provide the password
input.LoadPdf("example.pdf", "password")
' Process the OCR operation on the loaded input
Dim result = ocr.Read(input)
' Save the result as a searchable PDF
result.SaveAsSearchablePdf("searchable.pdf")
End Using
Applying the same technique to TIFF conversions:
:path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-13.cs
// Import necessary namespace for OCR functionality using IronOCR library
using IronOcr;
// Instantiate IronTesseract object for OCR operations
var ocr = new IronTesseract();
// Utilize using statement for automatic disposal of OcrInput object after usage
using (var input = new OcrInput())
{
// Set a title for the OCR input, useful for identifying the content in larger projects
input.Title = "Pdf Title";
// Define the page indices of the images that are to be processed
var pageIndices = new int[] { 1, 2 };
// Load images from the specified TIFF file, using the defined page indices
input.LoadImageFrames("example.tiff", pageIndices);
// Perform OCR on the input and store the result
OcrResult result = ocr.Read(input);
// Save the OCR result as a searchable PDF document
result.SaveAsSearchablePdf("searchable.pdf");
}
' Import necessary namespace for OCR functionality using IronOCR library
Imports IronOcr
' Instantiate IronTesseract object for OCR operations
Private ocr = New IronTesseract()
' Utilize using statement for automatic disposal of OcrInput object after usage
Using input = New OcrInput()
' Set a title for the OCR input, useful for identifying the content in larger projects
input.Title = "Pdf Title"
' Define the page indices of the images that are to be processed
Dim pageIndices = New Integer() { 1, 2 }
' Load images from the specified TIFF file, using the defined page indices
input.LoadImageFrames("example.tiff", pageIndices)
' Perform OCR on the input and store the result
Dim result As OcrResult = ocr.Read(input)
' Save the OCR result as a searchable PDF document
result.SaveAsSearchablePdf("searchable.pdf")
End Using
Exporting Hocr HTML
IronOCR allows export of OCR results to Hocr HTML, facilitating limited PDF to HTML and TIFF to HTML conversion.
:path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-14.cs
using IronOcr;
// Instantiate the OCR engine
var ocr = new IronTesseract();
// Using block to properly dispose of OcrInput resources
using (var input = new OcrInput())
{
// Set a title for the OCR input
input.Title = "Html Title";
// Load images and PDFs for OCR processing
input.AddImage("image2.jpeg");
// The LoadPdf method requires a file and can include a password for encrypted PDFs
input.AddPdf("example.pdf", "password");
// Load specific frames from a TIFF image using page indices
var pageIndices = new int[] { 1, 2 };
input.AddTiff("example.tiff", pageIndices);
// Perform OCR on the provided input and obtain the result
OcrResult result = ocr.Read(input);
// Save the result as a HOCR file, which is a format for representing OCR results
result.SaveAsHocrFile("hocr.html");
}
Imports IronOcr
' Instantiate the OCR engine
Private ocr = New IronTesseract()
' Using block to properly dispose of OcrInput resources
Using input = New OcrInput()
' Set a title for the OCR input
input.Title = "Html Title"
' Load images and PDFs for OCR processing
input.AddImage("image2.jpeg")
' The LoadPdf method requires a file and can include a password for encrypted PDFs
input.AddPdf("example.pdf", "password")
' Load specific frames from a TIFF image using page indices
Dim pageIndices = New Integer() { 1, 2 }
input.AddTiff("example.tiff", pageIndices)
' Perform OCR on the provided input and obtain the result
Dim result As OcrResult = ocr.Read(input)
' Save the result as a HOCR file, which is a format for representing OCR results
result.SaveAsHocrFile("hocr.html")
End Using
Reading Barcodes in OCR Documents
IronOCR uniquely offers the ability to read barcodes and QR codes alongside text recognition.
:path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-15.cs
// Include the IronOcr namespace to use IronTesseract for OCR and barcode reading.
using IronOcr;
// Create a new instance of IronTesseract for performing optical character and barcode recognition.
var ocr = new IronTesseract();
// Enable barcode reading in the OCR engine's configuration.
ocr.Configuration.ReadBarCodes = true;
// The OcrInput class allows for the loading of images which are to be processed by the OCR engine.
using (var input = new OcrInput())
{
// Load an image file into the OcrInput. Ensure the path is accurate relative to the execution directory.
// The image should contain a barcode or text for OCR.
input.AddImage("img/Barcode.png");
// Perform OCR and barcode reading on the input image.
// The Read method returns an OcrResult containing recognized text, barcodes, and more.
var result = ocr.Read(input);
// Iterate through the detected barcodes in the OcrResult.
foreach (var barcode in result.Barcodes)
{
// Output the barcode value to the console.
Console.WriteLine($"Barcode Value: {barcode.Value}");
// Additional barcode properties, such as type and location, can be accessed as needed.
Console.WriteLine($"Type: {barcode.Type}, Location: {barcode.Location}");
}
}
' Include the IronOcr namespace to use IronTesseract for OCR and barcode reading.
Imports IronOcr
' Create a new instance of IronTesseract for performing optical character and barcode recognition.
Private ocr = New IronTesseract()
' Enable barcode reading in the OCR engine's configuration.
ocr.Configuration.ReadBarCodes = True
' The OcrInput class allows for the loading of images which are to be processed by the OCR engine.
Using input = New OcrInput()
' Load an image file into the OcrInput. Ensure the path is accurate relative to the execution directory.
' The image should contain a barcode or text for OCR.
input.AddImage("img/Barcode.png")
' Perform OCR and barcode reading on the input image.
' The Read method returns an OcrResult containing recognized text, barcodes, and more.
Dim result = ocr.Read(input)
' Iterate through the detected barcodes in the OcrResult.
For Each barcode In result.Barcodes
' Output the barcode value to the console.
Console.WriteLine($"Barcode Value: {barcode.Value}")
' Additional barcode properties, such as type and location, can be accessed as needed.
Console.WriteLine($"Type: {barcode.Type}, Location: {barcode.Location}")
Next barcode
End Using
A Detailed Look at Image to Text OCR Results
The OCR results object in IronOCR contains comprehensive information that advanced developers can leverage.
An OCR result includes collections of pages, each containing barcodes, graphs, text lines, words, and characters. These objects root their details, including location, font, confidence level, etc., giving developers flexibility in data handling.
Elements of the .NET OCR Results, like a paragraph, word, or barcode, can be exported as images or bitmaps.
:path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-16.cs
using System;
using IronOcr;
using IronSoftware.Drawing;
// Instantiate the IronTesseract OCR engine
IronTesseract ocr = new IronTesseract
{
// Enable reading of barcodes in the OCR process
Configuration = { ReadBarCodes = true }
};
// Create an OCR input object to hold the images to be analyzed
using OcrInput input = new OcrInput();
// Specify the page indexes of the TIFF file to be processed (pages 1 and 2)
int[] pageIndices = { 1, 2 };
input.LoadImageFrames(@"img\Potter.tiff", pageIndices);
// Process the input images and obtain the OCR result
OcrResult result = ocr.Read(input);
// Iterate through each page in the OCR result
foreach (var page in result.Pages)
{
// Fetch information about the current page
int pageNumber = page.PageNumber;
string pageText = page.Text;
int pageWordCount = page.WordCount;
// Obtain barcodes if any are present on the page
OcrResult.Barcode[] barcodes = page.Barcodes;
// Convert the page content into images, supporting both AnyBitmap and System.Drawing.Bitmap types
AnyBitmap pageImage = page.ToBitmap();
System.Drawing.Bitmap pageImageLegacy = page.ToBitmap();
double pageWidth = page.Width;
double pageHeight = page.Height;
// Iterate through each paragraph on the current page
foreach (var paragraph in page.Paragraphs)
{
// Extract paragraph details
int paragraphNumber = paragraph.ParagraphNumber;
string paragraphText = paragraph.Text;
System.Drawing.Bitmap paragraphImage = paragraph.ToBitmap();
int paragraphXLocation = paragraph.X;
int paragraphYLocation = paragraph.Y;
int paragraphWidth = paragraph.Width;
int paragraphHeight = paragraph.Height;
double paragraphOcrAccuracy = paragraph.Confidence;
var paragraphTextDirection = paragraph.TextDirection;
// Iterate through each line within the current paragraph
foreach (var line in paragraph.Lines)
{
// Extract line information
int lineNumber = line.LineNumber;
string lineText = line.Text;
AnyBitmap lineImage = line.ToBitmap();
System.Drawing.Bitmap lineImageLegacy = line.ToBitmap();
int lineXLocation = line.X;
int lineYLocation = line.Y;
int lineWidth = line.Width;
int lineHeight = line.Height;
double lineOcrAccuracy = line.Confidence;
double lineSkew = line.BaselineAngle;
double lineOffset = line.BaselineOffset;
// Iterate through each word in the line
foreach (var word in line.Words)
{
int wordNumber = word.WordNumber;
string wordText = word.Text;
AnyBitmap wordImage = word.ToBitmap();
System.Drawing.Image wordImageLegacy = word.ToBitmap();
int wordXLocation = word.X;
int wordYLocation = word.Y;
int wordWidth = word.Width;
int wordHeight = word.Height;
double wordOcrAccuracy = word.Confidence;
// Check for font details, available only when using certain Tesseract engine modes
if (word.Font != null)
{
string fontName = word.Font.FontName;
double fontSize = word.Font.FontSize;
bool isBold = word.Font.IsBold;
bool isFixedWidth = word.Font.IsFixedWidth;
bool isItalic = word.Font.IsItalic;
bool isSerif = word.Font.IsSerif;
bool isUnderlined = word.Font.IsUnderlined;
bool fontIsCaligraphic = word.Font.IsCaligraphic;
}
// Iterate through each character in the word
foreach (var character in word.Characters)
{
int characterNumber = character.CharacterNumber;
string characterText = character.Text;
AnyBitmap characterImage = character.ToBitmap();
System.Drawing.Bitmap characterImageLegacy = character.ToBitmap();
int characterXLocation = character.X;
int characterYLocation = character.Y;
int characterWidth = character.Width;
int characterHeight = character.Height;
double characterOcrAccuracy = character.Confidence;
// Get alternative symbol choices and their probabilities, useful for spell checking
OcrResult.Choice[] characterChoices = character.Choices;
}
}
}
}
}
Imports System
Imports IronOcr
Imports IronSoftware.Drawing
' Instantiate the IronTesseract OCR engine
Private ocr As New IronTesseract With {
.Configuration = { ReadBarCodes = True }
}
' Create an OCR input object to hold the images to be analyzed
Private OcrInput As using
' Specify the page indexes of the TIFF file to be processed (pages 1 and 2)
Private pageIndices() As Integer = { 1, 2 }
input.LoadImageFrames("img\Potter.tiff", pageIndices)
' Process the input images and obtain the OCR result
Dim result As OcrResult = ocr.Read(input)
' Iterate through each page in the OCR result
For Each page In result.Pages
' Fetch information about the current page
Dim pageNumber As Integer = page.PageNumber
Dim pageText As String = page.Text
Dim pageWordCount As Integer = page.WordCount
' Obtain barcodes if any are present on the page
Dim barcodes() As OcrResult.Barcode = page.Barcodes
' Convert the page content into images, supporting both AnyBitmap and System.Drawing.Bitmap types
Dim pageImage As AnyBitmap = page.ToBitmap()
Dim pageImageLegacy As System.Drawing.Bitmap = page.ToBitmap()
Dim pageWidth As Double = page.Width
Dim pageHeight As Double = page.Height
' Iterate through each paragraph on the current page
For Each paragraph In page.Paragraphs
' Extract paragraph details
Dim paragraphNumber As Integer = paragraph.ParagraphNumber
Dim paragraphText As String = paragraph.Text
Dim paragraphImage As System.Drawing.Bitmap = paragraph.ToBitmap()
Dim paragraphXLocation As Integer = paragraph.X
Dim paragraphYLocation As Integer = paragraph.Y
Dim paragraphWidth As Integer = paragraph.Width
Dim paragraphHeight As Integer = paragraph.Height
Dim paragraphOcrAccuracy As Double = paragraph.Confidence
Dim paragraphTextDirection = paragraph.TextDirection
' Iterate through each line within the current paragraph
For Each line In paragraph.Lines
' Extract line information
Dim lineNumber As Integer = line.LineNumber
Dim lineText As String = line.Text
Dim lineImage As AnyBitmap = line.ToBitmap()
Dim lineImageLegacy As System.Drawing.Bitmap = line.ToBitmap()
Dim lineXLocation As Integer = line.X
Dim lineYLocation As Integer = line.Y
Dim lineWidth As Integer = line.Width
Dim lineHeight As Integer = line.Height
Dim lineOcrAccuracy As Double = line.Confidence
Dim lineSkew As Double = line.BaselineAngle
Dim lineOffset As Double = line.BaselineOffset
' Iterate through each word in the line
For Each word In line.Words
Dim wordNumber As Integer = word.WordNumber
Dim wordText As String = word.Text
Dim wordImage As AnyBitmap = word.ToBitmap()
Dim wordImageLegacy As System.Drawing.Image = word.ToBitmap()
Dim wordXLocation As Integer = word.X
Dim wordYLocation As Integer = word.Y
Dim wordWidth As Integer = word.Width
Dim wordHeight As Integer = word.Height
Dim wordOcrAccuracy As Double = word.Confidence
' Check for font details, available only when using certain Tesseract engine modes
If word.Font IsNot Nothing Then
Dim fontName As String = word.Font.FontName
Dim fontSize As Double = word.Font.FontSize
Dim isBold As Boolean = word.Font.IsBold
Dim isFixedWidth As Boolean = word.Font.IsFixedWidth
Dim isItalic As Boolean = word.Font.IsItalic
Dim isSerif As Boolean = word.Font.IsSerif
Dim isUnderlined As Boolean = word.Font.IsUnderlined
Dim fontIsCaligraphic As Boolean = word.Font.IsCaligraphic
End If
' Iterate through each character in the word
For Each character In word.Characters
Dim characterNumber As Integer = character.CharacterNumber
Dim characterText As String = character.Text
Dim characterImage As AnyBitmap = character.ToBitmap()
Dim characterImageLegacy As System.Drawing.Bitmap = character.ToBitmap()
Dim characterXLocation As Integer = character.X
Dim characterYLocation As Integer = character.Y
Dim characterWidth As Integer = character.Width
Dim characterHeight As Integer = character.Height
Dim characterOcrAccuracy As Double = character.Confidence
' Get alternative symbol choices and their probabilities, useful for spell checking
Dim characterChoices() As OcrResult.Choice = character.Choices
Next character
Next word
Next line
Next paragraph
Next page
Summary
IronOCR provides C# developers with the most advanced Tesseract API, executable across various platforms, including Windows, Linux, and Mac. Its capability to accurately read even imperfect documents at high statistical accuracy is exceptional. Moreover, it supports barcode reading and exporting OCR data as HTML or searchable PDFs, a unique feature compared to other OCR solutions or plain Tesseract.
Moving Forward
To further explore IronOCR:
- Check our C# Tesseract OCR Quickstart guide.
- Browse C# & VB code examples.
- Dig into the MSDN-style API Reference.
Source Code Download
Explore more .NET OCR tutorials in this section.
Frequently Asked Questions
What is IronOCR?
IronOCR is a C# OCR library that allows developers to read text from images and PDFs without using Tesseract, offering improved accuracy and speed.
How do I install IronOCR in a .NET project?
You can install IronOCR via the NuGet package manager using the command: `Install-Package IronOcr`.
What languages does IronOCR support?
IronOCR supports 125 international languages, which can be added via downloadable language packs from NuGet.
Can IronOCR handle low-quality scans?
Yes, IronOCR can handle low-quality and skewed scans by using image filters like deskew and denoise to improve accuracy.
How can I read a specific region of an image using IronOCR?
You can specify a region using `System.Drawing.Rectangle` to target specific areas of an image for OCR processing.
Does IronOCR support multi-page document processing?
Yes, IronOCR can process multi-page documents, combining images, TIFF frames, and PDF pages into a single OCR input.
What output formats does IronOCR support?
IronOCR can output results as strings, searchable PDFs, Hocr HTML, and more, making it versatile for various applications.
Can IronOCR read barcodes from documents?
Yes, IronOCR can detect and read barcodes and QR codes alongside text recognition from images.
How does IronOCR improve upon traditional Tesseract?
IronOCR enhances the accuracy and speed of the Tesseract engine, manages complex dictionaries, and supports modern .NET frameworks natively.
Is IronOCR compatible with various .NET platforms?
IronOCR is compatible with .NET Framework 4.5+, .NET Standard 2+, .NET Core, Xamarin, Mono, Azure, and Docker, providing wide platform support.