Reading Text from Images in .NET Applications

We will use the IronOcr.IronTesseract class to recognize text within images and explore how to use Iron Tesseract OCR to maximize accuracy and speed in .NET applications.

To achieve "Image to Text" functionality, we need to install the IronOCR library into a Visual Studio project.

You can download the IronOcr DLL or use NuGet.

Install-Package IronOcr

Why IronOCR?

We use IronOCR to manage Tesseract because:

  • It works out of the box in pure .NET.
  • It does not require Tesseract to be installed on your machine.
  • It runs the latest engines: Tesseract 5 (as well as Tesseract 4 & 3).
  • It is compatible with .NET Framework 4.5+, .NET Standard 2+, and .NET Core 2, 3 & 5.
  • It improves accuracy and speed over traditional Tesseract.
  • It supports Xamarin, Mono, Azure, and Docker.
  • It manages the complex Tesseract dictionary system using NuGet packages.
  • It supports PDFs, MultiFrame TIFFs, and all major image formats without configuration.
  • It can correct low-quality and skewed scans to achieve the best results.

Start using IronOCR in your project today with a free trial.

First Step:
green arrow pointer

Using Tesseract in C#

This simple example demonstrates using the IronOcr.IronTesseract class to read text from an image and return its value as a string.

:path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-1.cs
// This code snippet uses the IronOcr library to perform Optical Character Recognition (OCR)
// on an image file and print the extracted text to the console.

// To use IronOcr, ensure you have installed the package via NuGet Package Manager:
// PM> Install-Package IronOcr

using IronOcr;

try
{
    // Create an instance of IronTesseract, which is used to perform OCR on images.
    var ocrEngine = new IronTesseract();

    // Specify the file path to the image you want to process. 
    // Ensure the path is correct; it's currently set to a relative path.
    var filePath = @"img\Screenshot.png";

    // Use the Read method to perform OCR on the specified image file.
    // This returns an OcrResult which contains the recognized text.
    using (var input = new OcrInput(filePath))
    {
        OcrResult result = ocrEngine.Read(input);

        // Output the extracted text to the console.
        Console.WriteLine(result.Text);
    }
}
catch (OcrException ex)
{
    // Handle any OCR-specific exceptions that might occur
    Console.WriteLine("An error occurred while processing the image: " + ex.Message);
}
catch (Exception ex)
{
    // Handle any general exceptions that might occur
    Console.WriteLine("An error occurred: " + ex.Message);
}
' This code snippet uses the IronOcr library to perform Optical Character Recognition (OCR)

' on an image file and print the extracted text to the console.



' To use IronOcr, ensure you have installed the package via NuGet Package Manager:

' PM> Install-Package IronOcr



Imports IronOcr



Try

	' Create an instance of IronTesseract, which is used to perform OCR on images.

	Dim ocrEngine = New IronTesseract()



	' Specify the file path to the image you want to process. 

	' Ensure the path is correct; it's currently set to a relative path.

	Dim filePath = "img\Screenshot.png"



	' Use the Read method to perform OCR on the specified image file.

	' This returns an OcrResult which contains the recognized text.

	Using input = New OcrInput(filePath)

		Dim result As OcrResult = ocrEngine.Read(input)



		' Output the extracted text to the console.

		Console.WriteLine(result.Text)

	End Using

Catch ex As OcrException

	' Handle any OCR-specific exceptions that might occur

	Console.WriteLine("An error occurred while processing the image: " & ex.Message)

Catch ex As Exception

	' Handle any general exceptions that might occur

	Console.WriteLine("An error occurred: " & ex.Message)

End Try
$vbLabelText   $csharpLabel

Which results in 100% accuracy with the following text:

IronOCR Simple Example

In this simple example we test the accuracy of our C# OCR library to read text from a PNG Image. This is a very basic test, but things will get more complicated as the tutorial continues.

The quick brown fox jumps over the lazy dog

The OCR process involves sophisticated behavior like scanning the image for alignment, quality, and resolution, optimizing the OCR engine, and using artificial intelligence to read text as a human would. Despite its complexity, the OCR process can match a human's speed and achieve a high level of accuracy.

C# OCR application results accuracy

Advanced Use of IronOCR Tesseract for C#

For real-world projects requiring optimal performance, use the OcrInput and IronTesseract classes within the IronOcr namespace.

OcrInput Features:

  • Works with various image formats like JPEG, TIFF, GIF, BMP, and PNG.
  • Imports whole or parts of PDF documents.
  • Enhances contrast, resolution, and size.
  • Corrects for rotation, scan noise, digital noise, skew, and negative images.

IronTesseract Features:

  • Access hundreds of prepackaged languages and variants.
  • Tesseract 5, 4, or 3 OCR engines available out-of-the-box.
  • Specify if the document is a screenshot, snippet, or full document.
  • Read barcodes.
  • Output results to: Searchable PDFs, Hocr HTML, a DOM, and Strings.

Example: Getting Started with OcrInput + IronTesseract

Here is a recommended starting configuration, suitable for most images:

:path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-2.cs
using IronOcr;

// Create an instance of the IronTesseract OCR engine.
IronTesseract ocr = new IronTesseract();

// Using statement to ensure resources are disposed of correctly.
using (OcrInput input = new OcrInput())
{
    // Specify the pages to be read from a multi-page TIFF file.
    int[] pageIndices = new int[] { 1, 2 };

    // Load specified image frames (pages) from the TIFF file into the input object.
    // Ensure the file path is correct, adjust as needed if the directory structure is different.
    input.LoadImageFrames(@"img\Potter.tiff", pageIndices);

    // Perform OCR on the loaded image frames.
    // The Read method analyses the image frames and extracts text.
    OcrResult result = ocr.Read(input);

    // Output the extracted text to the console.
    // If reading pages fails, result.Text will be empty or contain error messages.
    Console.WriteLine(result.Text);
}
Imports IronOcr



' Create an instance of the IronTesseract OCR engine.

Private ocr As New IronTesseract()



' Using statement to ensure resources are disposed of correctly.

Using input As New OcrInput()

	' Specify the pages to be read from a multi-page TIFF file.

	Dim pageIndices() As Integer = { 1, 2 }



	' Load specified image frames (pages) from the TIFF file into the input object.

	' Ensure the file path is correct, adjust as needed if the directory structure is different.

	input.LoadImageFrames("img\Potter.tiff", pageIndices)



	' Perform OCR on the loaded image frames.

	' The Read method analyses the image frames and extracts text.

	Dim result As OcrResult = ocr.Read(input)



	' Output the extracted text to the console.

	' If reading pages fails, result.Text will be empty or contain error messages.

	Console.WriteLine(result.Text)

End Using
$vbLabelText   $csharpLabel

This configuration can achieve 100% accuracy on a medium-quality scan.


C# OCR Scan From Tiff Example

Reading text and/or barcodes from scanned images such as TIFFs is simplified with IronOCR, achieving a high level of accuracy.

IronOCR is highly effective with real-world documents, including multi-page TIFFs and PDF extractions.

Example: A Low Quality Scan


C# OCR Low Resolution Scan with Digital Noise

In this case, we work with a low-quality scan with distortion and digital noise.

IronOCR excels in this scenario compared to other OCR libraries, handling real-world scanned images efficiently rather than synthetic test cases that guarantee 100% accuracy.

:path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-3.cs
// Include the necessary namespace for IronOcr
using IronOcr;
using System;

// Create a new instance of the IronTesseract class, which is responsible for OCR operations
var ocr = new IronTesseract();

try
{
    // Create a new OCR input object that represents the images to be processed
    using (var input = new OcrInput())
    {
        // Specify the indices of the pages in a multi-page image file you want to process
        // Typically, indices are 0-based, but check if the library needs 1-based indices
        var pageIndices = new int[] { 0, 1 }; // Adjust indices according to your requirements

        // Load specific frames/pages from a TIFF file into the OCR input object
        input.LoadImageFrames(@"img\Potter.LowQuality.tiff", pageIndices);

        // Apply a deskew transformation to the input to correct any rotation or perspective distortion
        input.Deskew(); // This method helps in removing rotation and perspective to improve OCR accuracy

        // Perform OCR on the processed input and obtain the result
        OcrResult result = ocr.Read(input);

        // Output the recognized text to the console
        Console.WriteLine("Recognized Text:");
        Console.WriteLine(result.Text);
    }
}
catch (Exception ex)
{
    // Handle potential exceptions such as file not found, incorrect image format, etc.
    Console.WriteLine("An error occurred during OCR processing: " + ex.Message);
}
' Include the necessary namespace for IronOcr

Imports IronOcr

Imports System



' Create a new instance of the IronTesseract class, which is responsible for OCR operations

Private ocr = New IronTesseract()



Try

	' Create a new OCR input object that represents the images to be processed

	Using input = New OcrInput()

		' Specify the indices of the pages in a multi-page image file you want to process

		' Typically, indices are 0-based, but check if the library needs 1-based indices

		Dim pageIndices = New Integer() { 0, 1 } ' Adjust indices according to your requirements



		' Load specific frames/pages from a TIFF file into the OCR input object

		input.LoadImageFrames("img\Potter.LowQuality.tiff", pageIndices)



		' Apply a deskew transformation to the input to correct any rotation or perspective distortion

		input.Deskew() ' This method helps in removing rotation and perspective to improve OCR accuracy



		' Perform OCR on the processed input and obtain the result

		Dim result As OcrResult = ocr.Read(input)



		' Output the recognized text to the console

		Console.WriteLine("Recognized Text:")

		Console.WriteLine(result.Text)

	End Using

Catch ex As Exception

	' Handle potential exceptions such as file not found, incorrect image format, etc.

	Console.WriteLine("An error occurred during OCR processing: " & ex.Message)

End Try
$vbLabelText   $csharpLabel

With Input.Deskew(), we reach 99.8% accuracy, nearly matching a high-quality scan.

Image filters might slightly increase runtime, but reduce OCR processing times. Developers must balance filters for their input documents.

If uncertain, Input.Deskew() and Input.DeNoise() are reliable filters for improving your OCR's performance.

Performance Tuning

The primary factor in OCR job speed is input image quality. Less noise and higher DPI (~200 dpi is ideal) make for the fastest and most accurate results.

IronOCR efficiently corrects imperfect documents, although this is computationally expensive and slower.

Using image formats with minimal digital noise like TIFF or PNG can yield faster results compared to JPEG.

Image Filters

The following image filters can significantly improve performance:

  • OcrInput.Rotate(double degrees): Rotate images by degrees clockwise. Use negative degrees for counterclockwise rotation.
  • OcrInput.Binarize(): Converts every pixel to black or white, improving OCR performance in low contrast cases.
  • OcrInput.ToGrayScale(): Converts every pixel to grayscale, potentially improving speed.
  • OcrInput.Contrast(): Automatically increases contrast, often enhancing OCR speed and accuracy.
  • OcrInput.DeNoise(): Removes digital noise, beneficial where noise is expected.
  • OcrInput.Invert(): Inverts image colors (white becomes black, black becomes white).
  • OcrInput.Dilate(): Adds pixels to object boundaries in images.
  • OcrInput.Erode(): Removes pixels on object boundaries.
  • OcrInput.Deskew(): Aligns the image correctly. Critical for OCR since Tesseract's skew tolerance can be low.
  • OcrInput.DeepCleanBackgroundNoise(): Heavy noise removal.
  • OcrInput.EnhanceResolution: Enhances low-quality image resolution.

Performance Tuning for Speed

Consider using the following settings to speed up OCR for high-quality scans:

:path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-4.cs
using IronOcr; 

// Initialize a new instance of the IronTesseract class which handles OCR operations
IronTesseract ocr = new IronTesseract();

// Configure for optimal speed by excluding specific characters from OCR consideration
ocr.Configuration.BlackListCharacters = "~`$#^*_{[]}
\\";
// Set the page segmentation mode to automatic
ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.Auto;
// Use the fast version of the English OCR language
ocr.Language = OcrLanguage.EnglishFast;

// Create an instance of OcrInput which will be used to load the document to be processed
using (OcrInput input = new OcrInput())
{
    // Specify the image frames to load from a multi-page image file
    int[] pageIndices = new int[] { 1, 2 };
    
    // Load specified pages from the image file into the OcrInput instance
    input.LoadImageFrames(@"img\Potter.tiff", pageIndices);

    // Read the text from the input images and store the OCR result
    OcrResult result = ocr.Read(input);
    
    // Output the textual result of the OCR process to the console
    Console.WriteLine(result.Text);
}
Imports IronOcr



' Initialize a new instance of the IronTesseract class which handles OCR operations

Private ocr As New IronTesseract()



' Configure for optimal speed by excluding specific characters from OCR consideration

ocr.Configuration.BlackListCharacters = "~`$#^*_{[]}
\"

' Set the page segmentation mode to automatic

ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.Auto

' Use the fast version of the English OCR language

ocr.Language = OcrLanguage.EnglishFast



' Create an instance of OcrInput which will be used to load the document to be processed

Using input As New OcrInput()

	' Specify the image frames to load from a multi-page image file

	Dim pageIndices() As Integer = { 1, 2 }



	' Load specified pages from the image file into the OcrInput instance

	input.LoadImageFrames("img\Potter.tiff", pageIndices)



	' Read the text from the input images and store the OCR result

	Dim result As OcrResult = ocr.Read(input)



	' Output the textual result of the OCR process to the console

	Console.WriteLine(result.Text)

End Using
$vbLabelText   $csharpLabel

This setup is 99.8% accurate compared to the baseline 100%, achieving a 35% speed boost.

Reading Cropped Regions of Images

Iron's version of Tesseract OCR can target specific image areas using System.Drawing.Rectangle.

This is particularly useful when handling standardized forms where the text is localized in specific sections.

Example: Scanning an Area of a Page

Using System.Drawing.Rectangle, you can specify pixel-based areas for OCR. This improves speed and prevents reading unnecessary text.

:path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-5.cs
using IronOcr; // Namespace for IronOcr library
using IronSoftware.Drawing; // Namespace reference for drawing-related classes

// Create an instance of the IronTesseract class for OCR processing
var ocr = new IronTesseract();

// Using statement ensures that resources are automatically released after use
using (var input = new OcrInput())
{
    // Define a rectangle area to focus OCR processing on a specific portion of the image.
    // This can significantly speed up the process by limiting the area to be analyzed.
    // Rectangle arguments: x = 215, y = 1250, width = 1335, height = 280
    var contentArea = new System.Drawing.Rectangle(x: 215, y: 1250, width: 1335, height: 280);

    // Load the image with a specific area of interest
    input.AddImage("img/ComSci.png", contentArea);

    // Perform OCR on the loaded image input
    OcrResult result = ocr.Read(input);

    // Output the recognized text to the console
    Console.WriteLine(result.Text);
}
Imports IronOcr ' Namespace for IronOcr library

Imports IronSoftware.Drawing ' Namespace reference for drawing-related classes



' Create an instance of the IronTesseract class for OCR processing

Private ocr = New IronTesseract()



' Using statement ensures that resources are automatically released after use

Using input = New OcrInput()

	' Define a rectangle area to focus OCR processing on a specific portion of the image.

	' This can significantly speed up the process by limiting the area to be analyzed.

	' Rectangle arguments: x = 215, y = 1250, width = 1335, height = 280

	Dim contentArea = New System.Drawing.Rectangle(x:= 215, y:= 1250, width:= 1335, height:= 280)



	' Load the image with a specific area of interest

	input.AddImage("img/ComSci.png", contentArea)



	' Perform OCR on the loaded image input

	Dim result As OcrResult = ocr.Read(input)



	' Output the recognized text to the console

	Console.WriteLine(result.Text)

End Using
$vbLabelText   $csharpLabel

This method offers 41% speed improvement and specific text extraction, ideal for .NET OCR contexts like invoices, checks, forms, etc. OCR cropping is also supported for PDFs.

International Languages

IronOCR supports 125 international languages via language packs, downloadable as DLLs from this website or the NuGet Package Manager for Visual Studio.

You can install them via the NuGet interface (search for "IronOcr.Languages") or from the OCR language packs page.

Example languages include:

  • Afrikaans
  • Amharic Also known as አማርኛ
  • Arabic Also known as العربية
  • ArabicAlphabet Also known as العربية
  • ArmenianAlphabet Also known as Հայերեն
  • Assamese Also known as অসমীয়া
  • Azerbaijani Also known as azərbaycan dili
  • AzerbaijaniCyrillic Also known as azərbaycan dili
  • Belarusian Also known as беларуская мова
  • Bengali Also known as Bangla,বাংলা
  • BengaliAlphabet Also known as Bangla,বাংলা
  • Tibetan Also known as Tibetan Standard, Tibetan, Central ཡིག་
  • Bosnian Also known as bosanski jezik
  • Breton Also known as brezhoneg
  • Bulgarian Also known as български език
  • CanadianAboriginalAlphabet Also known as Canadian First Nations, Indigenous Canadians, Native Canadian, Inuit
  • Catalan Also known as català, valencià
  • Cebuano Also known as Bisaya, Binisaya
  • Czech Also known as čeština, český jazyk
  • CherokeeAlphabet Also known as ᏣᎳᎩ ᎦᏬᏂᎯᏍᏗ, Tsalagi Gawonihisdi
  • ChineseSimplified Also known as 中文 (Zhōngwén), 汉语, 漢語
  • ChineseSimplifiedVertical Also known as 中文 (Zhōngwén), 汉语, 漢語
  • ChineseTraditional Also known as 中文 (Zhōngwén), 汉语, 漢語
  • ChineseTraditionalVertical Also known as 中文 (Zhōngwén), 汉语, 漢語
  • Cherokee Also known as ᏣᎳᎩ ᎦᏬᏂᎯᏍᏗ, Tsalagi Gawonihisdi
  • Corsican Also known as corsu, lingua corsa
  • Welsh Also known as Cymraeg
  • CyrillicAlphabet Also known as Cyrillic scripts
  • Danish Also known as dansk
  • DanishFraktur Also known as dansk
  • German Also known as Deutsch
  • GermanFraktur Also known as Deutsch
  • DevanagariAlphabet Also known as Nagair,देवनागरी
  • Divehi Also known as ދިވެހި
  • Dzongkha Also known as རྫོང་ཁ
  • Greek Also known as ελληνικά
  • English
  • MiddleEnglish Also known as English (1100-1500 AD)
  • Esperanto
  • Estonian Also known as eesti, eesti keel
  • EthiopicAlphabet Also known as Ge'ez,ግዕዝ, Gəʿəz
  • Basque Also known as euskara, euskera
  • Faroese Also known as føroyskt
  • Persian Also known as فارسی
  • Filipino Also known as National Language of the Philippines, Standardized Tagalog
  • Finnish Also known as suomi, suomen kieli
  • Financial Also known as Financial, Numerical and Technical Documents
  • French Also known as français, langue française
  • FrakturAlphabet Also known as Generic Fraktur, Calligraphic hand of the Latin alphabet
  • Frankish Also known as Frenkisk, Old Franconian
  • MiddleFrench Also known as Moyen Français,Middle French (ca. 1400-1600 AD)
  • WesternFrisian Also known as Frysk
  • GeorgianAlphabet Also known as ქართული
  • ScottishGaelic Also known as Gàidhlig
  • Irish Also known as Gaeilge
  • Galician Also known as galego
  • AncientGreek Also known as Ἑλληνική
  • GreekAlphabet Also known as ελληνικά
  • Gujarati Also known as ગુજરાતી
  • GujaratiAlphabet Also known as ગુજરાતી
  • GurmukhiAlphabet Also known as Gurmukhī, ਗੁਰਮੁਖੀ, Shahmukhi, گُرمُکھی‎, Sihk Script
  • HangulAlphabet Also known as Korean Alphabet,한글,Hangeul,조선글,hosŏn'gŭl
  • HangulVerticalAlphabet Also known as Korean Alphabet,한글,Hangeul,조선글,hosŏn'gŭl
  • HanSimplifiedAlphabet Also known as Samhan ,한어, 韓語
  • HanSimplifiedVerticalAlphabet Also known as Samhan ,한어, 韓語
  • HanTraditionalAlphabet Also known as Samhan ,한어, 韓語
  • HanTraditionalVerticalAlphabet Also known as Samhan ,한어, 韓語
  • Haitian Also known as Kreyòl ayisyen
  • Hebrew Also known as עברית
  • HebrewAlphabet Also known as עברית
  • Hindi Also known as हिन्दी, हिंदी
  • Croatian Also known as hrvatski jezik
  • Hungarian Also known as magyar
  • Armenian Also known as Հայերեն
  • Inuktitut Also known as ᐃᓄᒃᑎᑐᑦ
  • Indonesian Also known as Bahasa Indonesia
  • Icelandic Also known as Íslenska
  • Italian Also known as italiano
  • ItalianOld Also known as italiano
  • JapaneseAlphabet Also known as 日本語 (にほんご)
  • JapaneseVerticalAlphabet Also known as 日本語 (にほんご)
  • Javanese Also known as basa Jawa
  • Japanese Also known as 日本語 (にほんご)
  • JapaneseVertical Also known as 日本語 (にほんご)
  • Kannada Also known as ಕನ್ನಡ
  • KannadaAlphabet Also known as ಕನ್ನಡ
  • Georgian Also known as ქართული
  • GeorgianOld Also known as ქართული
  • Kazakh Also known as қазақ тілі
  • Khmer Also known as ខ្មែរ, ខេមរភាសា, ភាសាខ្មែរ
  • KhmerAlphabet Also known as ខ្មែរ, ខេមរភាសា, ភាសាខ្មែរ
  • Kyrgyz Also known as Кыргызча, Кыргыз тили
  • NorthernKurdish Also known as Kurmanji, کورمانجی ,Kurmancî‎
  • Korean Also known as 한국어 (韓國語), 조선어 (朝鮮語)
  • KoreanVertical Also known as 한국어 (韓國語), 조선어 (朝鮮語)
  • Lao Also known as ພາສາລາວ
  • LaoAlphabet Also known as ພາສາລາວ
  • Latin Also known as latine, lingua latina
  • LatinAlphabet Also known as latine, lingua latina
  • Latvian Also known as latviešu valoda
  • Lithuanian Also known as lietuvių kalba
  • Luxembourgish Also known as Lëtzebuergesch
  • Malayalam Also known as മലയാളം
  • MalayalamAlphabet Also known as മലയാളം
  • Marathi Also known as मराठी
  • MICR Also known as Magnetic Ink Character Recognition, MICR Cheque Encoding
  • Macedonian Also known as македонски јазик
  • Maltese Also known as Malti
  • Mongolian Also known as монгол
  • Maori Also known as te reo Māori
  • Malay Also known as bahasa Melayu, بهاس ملايو‎
  • Myanmar Also known as Burmese ,ဗမာစာ
  • MyanmarAlphabet Also known as Burmese ,ဗမာစာ
  • Nepali Also known as नेपाली
  • Dutch Also known as Nederlands, Vlaams
  • Norwegian Also known as Norsk
  • Occitan Also known as occitan, lenga d'òc
  • Oriya Also known as ଓଡ଼ିଆ
  • OriyaAlphabet Also known as ଓଡ଼ିଆ
  • Panjabi Also known as ਪੰਜਾਬੀ, پنجابی‎
  • Polish Also known as język polski, polszczyzna
  • Portuguese Also known as português
  • Pashto Also known as پښتو
  • Quechua Also known as Runa Simi, Kichwa
  • Romanian Also known as limba română
  • Russian Also known as русский язык
  • Sanskrit Also known as संस्कृतम्
  • Sinhala Also known as සිංහල
  • SinhalaAlphabet Also known as සිංහල
  • Slovak Also known as slovenčina, slovenský jazyk
  • SlovakFraktur Also known as slovenčina, slovenský jazyk
  • Slovene Also known as slovenski jezik, slovenščina
  • Sindhi Also known as सिन्धी, سنڌي، سندھی‎
  • Spanish Also known as español, castellano
  • SpanishOld Also known as español, castellano
  • Albanian Also known as gjuha shqipe
  • Serbian Also known as српски језик
  • SerbianLatin Also known as српски језик
  • Sundanese Also known as Basa Sunda
  • Swahili Also known as Kiswahili
  • Swedish Also known as Svenska
  • Syriac Also known as Syrian, Syriac Aramaic,ܠܫܢܐ ܣܘܪܝܝܐ‎, Leššānā Suryāyā
  • SyriacAlphabet Also known as Syrian, Syriac Aramaic,ܠܫܢܐ ܣܘܪܝܝܐ‎, Leššānā Suryāyā
  • Tamil Also known as தமிழ்
  • TamilAlphabet Also known as தமிழ்
  • Tatar Also known as татар теле, tatar tele
  • Telugu Also known as తెలుగు
  • TeluguAlphabet Also known as తెలుగు
  • Tajik Also known as тоҷикӣ, toğikī, تاجیکی‎
  • Tagalog Also known as Wikang Tagalog, ᜏᜒᜃᜅ᜔ ᜆᜄᜎᜓᜄ᜔
  • Thai Also known as ไทย
  • ThaanaAlphabet Also known as Taana , Tāna , ތާނަ‎
  • ThaiAlphabet Also known as ไทย
  • TibetanAlphabet Also known as Tibetan Standard, Tibetan, Central ཡིག་
  • Tigrinya Also known as ትግርኛ
  • Tonga Also known as faka Tonga
  • Turkish Also known as Türkçe
  • Uyghur Also known as Uyƣurqə, ئۇيغۇرچە‎
  • Ukrainian Also known as українська мова
  • Urdu Also known as اردو
  • Uzbek Also known as O‘zbek, Ўзбек, أۇزبېك‎
  • UzbekCyrillic Also known as O‘zbek, Ўзбек, أۇزبېك‎
  • Vietnamese Also known as Tiếng Việt
  • VietnameseAlphabet Also known as Tiếng Việt
  • Yiddish Also known as ייִדיש
  • Yoruba Also known as Yorùbá

Example: OCR in Arabic (+ many more)

The example below demonstrates how to scan documents written in Arabic.

Install-Package IronOcr.Languages.Arabic
C# OCR in Arabic Language
:path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-6.cs
// PM> Install IronOcr.Languages.Arabic
using IronOcr;

// Create an instance of the IronTesseract class for OCR operations.
var ocr = new IronTesseract();

// Set the OCR language to Arabic.
ocr.Language = OcrLanguage.Arabic;

// Use a 'using' block for OcrInput to ensure proper disposal of resources.
using (var input = new OcrInput())
{
    // Load the first frame of the image located at "img/arabic.gif".
    input.AddImage("img/arabic.gif");
    
    // Optional: Add image filters if necessary for better OCR performance.
    // In this case, even though the input is very low quality,
    // IronTesseract can read what conventional Tesseract cannot.

    // Perform OCR to get the result.
    var result = ocr.Read(input);

    // Save the OCR result to a text file because the console might not display Arabic correctly on Windows.
    result.SaveAsTextFile("arabic.txt");
}
' PM> Install IronOcr.Languages.Arabic

Imports IronOcr



' Create an instance of the IronTesseract class for OCR operations.

Private ocr = New IronTesseract()



' Set the OCR language to Arabic.

ocr.Language = OcrLanguage.Arabic



' Use a 'using' block for OcrInput to ensure proper disposal of resources.

Using input = New OcrInput()

	' Load the first frame of the image located at "img/arabic.gif".

	input.AddImage("img/arabic.gif")



	' Optional: Add image filters if necessary for better OCR performance.

	' In this case, even though the input is very low quality,

	' IronTesseract can read what conventional Tesseract cannot.



	' Perform OCR to get the result.

	Dim result = ocr.Read(input)



	' Save the OCR result to a text file because the console might not display Arabic correctly on Windows.

	result.SaveAsTextFile("arabic.txt")

End Using
$vbLabelText   $csharpLabel

Example: OCR in more than one language in the same document

If a document contains multiple languages, such as English and Chinese, you can perform OCR as follows:

Install-Package IronOcr.Languages.ChineseSimplified
:path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-7.cs
// Include the necessary namespace for IronOcr functionality
using IronOcr;

// Create an instance of IronTesseract, which is used for OCR operations
var ocr = new IronTesseract();

// Set the primary language for OCR processing to Chinese Simplified
ocr.Language = OcrLanguage.ChineseSimplified;

// We can add any number of secondary languages for OCR processing.
// Here, English is added as a secondary language.
ocr.AddSecondaryLanguage(OcrLanguage.English);

// Optionally, custom Tesseract '.traineddata' files can be added by specifying a file path.
// This is useful for languages that are not supported out of the box or for improving existing language support.

// Using a using statement to ensure proper disposal of resources after OCR operation
using (var input = new OcrInput())
{
    // Load the image that contains multi-language text for OCR processing
    input.AddImage("img/MultiLanguage.jpeg");

    // Perform OCR on the input and retrieve the result
    var result = ocr.Read(input);

    // Save the recognized text as a text file with the specified filename
    result.SaveAsTextFile("MultiLanguage.txt");
}

// Note: Ensure that the necessary IronOcr package is installed and referenced in your project.
// Also, make sure that the image path is correct and accessible by the application.
' Include the necessary namespace for IronOcr functionality

Imports IronOcr



' Create an instance of IronTesseract, which is used for OCR operations

Private ocr = New IronTesseract()



' Set the primary language for OCR processing to Chinese Simplified

ocr.Language = OcrLanguage.ChineseSimplified



' We can add any number of secondary languages for OCR processing.

' Here, English is added as a secondary language.

ocr.AddSecondaryLanguage(OcrLanguage.English)



' Optionally, custom Tesseract '.traineddata' files can be added by specifying a file path.

' This is useful for languages that are not supported out of the box or for improving existing language support.



' Using a using statement to ensure proper disposal of resources after OCR operation

Using input = New OcrInput()

	' Load the image that contains multi-language text for OCR processing

	input.AddImage("img/MultiLanguage.jpeg")



	' Perform OCR on the input and retrieve the result

	Dim result = ocr.Read(input)



	' Save the recognized text as a text file with the specified filename

	result.SaveAsTextFile("MultiLanguage.txt")

End Using



' Note: Ensure that the necessary IronOcr package is installed and referenced in your project.

' Also, make sure that the image path is correct and accessible by the application.
$vbLabelText   $csharpLabel

Multi Page Documents

IronOCR allows combining multiple pages or images into a single OcrResult. This is great for documents created from multiple images, enabling valuable features like creating searchable PDFs and HTML files.

IronOCR can mix and match images, TIFF frames, and PDF pages into a single OCR input.

:path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-8.cs
// Required namespace for OCR operations
using IronOcr;

// Create a new instance of IronTesseract to perform OCR operations
IronTesseract ocr = new IronTesseract();

// Using statement to ensure proper disposal of OcrInput resources
using (OcrInput input = new OcrInput())
{
    // Load various images into the input object for OCR processing
    input.AddImage("image1.jpeg");
    input.AddImage("image2.png");

    // Specify which frames to load from a multi-frame image (e.g., GIF)
    int[] pageIndices = { 1, 2 };
    input.AddImageFrames("image3.gif", pageIndices);

    // Perform OCR on the loaded images and retrieve the result
    OcrResult result = ocr.Read(input);

    // Output the number of pages processed to the console
    Console.WriteLine($"{result.Pages.Count} Pages processed."); // Expected: 3 Pages
}
' Required namespace for OCR operations

Imports IronOcr



' Create a new instance of IronTesseract to perform OCR operations

Private ocr As New IronTesseract()



' Using statement to ensure proper disposal of OcrInput resources

Using input As New OcrInput()

	' Load various images into the input object for OCR processing

	input.AddImage("image1.jpeg")

	input.AddImage("image2.png")



	' Specify which frames to load from a multi-frame image (e.g., GIF)

	Dim pageIndices() As Integer = { 1, 2 }

	input.AddImageFrames("image3.gif", pageIndices)



	' Perform OCR on the loaded images and retrieve the result

	Dim result As OcrResult = ocr.Read(input)



	' Output the number of pages processed to the console

	Console.WriteLine($"{result.Pages.Count} Pages processed.") ' Expected: 3 Pages

End Using
$vbLabelText   $csharpLabel

OCR all pages of a TIFF file as follows:

:path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-9.cs
using IronOcr;

// Instantiate the IronTesseract object, which will handle OCR processing.
IronTesseract ocr = new IronTesseract();

// Create an OcrInput object to hold the images to be processed.
// The 'using' statement ensures that resources are freed when the operation is complete.
using (OcrInput input = new OcrInput())
{
    // Define the indices of the pages to load from a multi-frame TIFF image.
    // This example loads the first and second pages.
    int[] pageIndices = new int[] { 0, 1 }; // Corrected indices to start from 0

    // Load the specified image frames (pages) from the TIFF file.
    // The LoadImageFrames function reads specific frames of a TIFF file as indicated by the indices.
    input.LoadImageFrames("MultiFrame.Tiff", pageIndices);

    // Perform OCR on the input and store the result.
    OcrResult result = ocr.Read(input);

    // Output the recognized text to the console.
    Console.WriteLine(result.Text);

    // Output the number of pages processed, noting that each frame corresponds to a page.
    Console.WriteLine($"{result.Pages.Count} Pages");
    // Note: One page is returned for each frame (page) in the TIFF input.
}
Imports IronOcr



' Instantiate the IronTesseract object, which will handle OCR processing.

Private ocr As New IronTesseract()



' Create an OcrInput object to hold the images to be processed.

' The 'using' statement ensures that resources are freed when the operation is complete.

Using input As New OcrInput()

	' Define the indices of the pages to load from a multi-frame TIFF image.

	' This example loads the first and second pages.

	Dim pageIndices() As Integer = { 0, 1 } ' Corrected indices to start from 0



	' Load the specified image frames (pages) from the TIFF file.

	' The LoadImageFrames function reads specific frames of a TIFF file as indicated by the indices.

	input.LoadImageFrames("MultiFrame.Tiff", pageIndices)



	' Perform OCR on the input and store the result.

	Dim result As OcrResult = ocr.Read(input)



	' Output the recognized text to the console.

	Console.WriteLine(result.Text)



	' Output the number of pages processed, noting that each frame corresponds to a page.

	Console.WriteLine($"{result.Pages.Count} Pages")

	' Note: One page is returned for each frame (page) in the TIFF input.

End Using
$vbLabelText   $csharpLabel

Converting TIFF documents or PDFs to searchable PDFs uses IronTesseract:

:path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-10.cs
using System; // Import system namespace for Console usage
using IronOcr; // Import IronOcr namespace to use OCR functionalities

// Create a new instance of IronTesseract, which is the main class for OCR operations
IronTesseract ocr = new IronTesseract();

// Create an OcrInput object within a using statement to ensure it is disposed of correctly after use
using (OcrInput input = new OcrInput())
{
    try
    {
        // Load the PDF file into the OcrInput object, providing a password if the PDF is password-protected.
        // If the PDF does not have a password, consider providing null or an empty string for the password parameter.
        input.LoadPdf("example.pdf", "password");

        // Perform OCR on the loaded input and store the result
        OcrResult result = ocr.Read(input);

        // Output the recognized text to the console
        Console.WriteLine(result.Text);

        // Output the number of pages recognized in the PDF
        Console.WriteLine($"{result.Pages.Count} Pages");
        // Note: Use result.Pages.Count instead of result.Pages.Length for better C# practices with collections
    }
    catch (Exception ex)
    {
        // In case of exceptions, output the error message to the console
        Console.WriteLine("An error occurred during OCR processing: " + ex.Message);
    }
}
Imports System ' Import system namespace for Console usage

Imports IronOcr ' Import IronOcr namespace to use OCR functionalities



' Create a new instance of IronTesseract, which is the main class for OCR operations

Private ocr As New IronTesseract()



' Create an OcrInput object within a using statement to ensure it is disposed of correctly after use

Using input As New OcrInput()

	Try

		' Load the PDF file into the OcrInput object, providing a password if the PDF is password-protected.

		' If the PDF does not have a password, consider providing null or an empty string for the password parameter.

		input.LoadPdf("example.pdf", "password")



		' Perform OCR on the loaded input and store the result

		Dim result As OcrResult = ocr.Read(input)



		' Output the recognized text to the console

		Console.WriteLine(result.Text)



		' Output the number of pages recognized in the PDF

		Console.WriteLine($"{result.Pages.Count} Pages")

		' Note: Use result.Pages.Count instead of result.Pages.Length for better C# practices with collections

	Catch ex As Exception

		' In case of exceptions, output the error message to the console

		Console.WriteLine("An error occurred during OCR processing: " & ex.Message)

	End Try

End Using
$vbLabelText   $csharpLabel

Searchable PDFs

IronOCR can export results as searchable PDFs, a sought-after feature for applications requiring database updating, SEO, and PDF usability.

:path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-11.cs
using IronOcr;

// Initializes a new instance of the IronTesseract class
IronTesseract ocr = new IronTesseract();

// Using block to ensure that resources are disposed of properly
using (OcrInput input = new OcrInput())
{
    // Sets the title for the OCR input
    input.Title = "Quarterly Report";

    // Loads individual images into the OCR input
    input.AddImage("image1.jpeg");
    input.AddImage("image2.png");

    // Define specific page indices to load from an image with multiple frames
    int[] pageIndices = new int[] { 1, 2 };

    // Loads specific frames from a multi-frame image (e.g., GIF)
    input.AddImageFrames("image3.gif", pageIndices);

    // Performs OCR on the loaded images and stores the result
    OcrResult result = ocr.Read(input);

    // Saves the OCR result as a searchable PDF
    result.SaveAsSearchablePdf("searchable.pdf");
}
Imports IronOcr



' Initializes a new instance of the IronTesseract class

Private ocr As New IronTesseract()



' Using block to ensure that resources are disposed of properly

Using input As New OcrInput()

	' Sets the title for the OCR input

	input.Title = "Quarterly Report"



	' Loads individual images into the OCR input

	input.AddImage("image1.jpeg")

	input.AddImage("image2.png")



	' Define specific page indices to load from an image with multiple frames

	Dim pageIndices() As Integer = { 1, 2 }



	' Loads specific frames from a multi-frame image (e.g., GIF)

	input.AddImageFrames("image3.gif", pageIndices)



	' Performs OCR on the loaded images and stores the result

	Dim result As OcrResult = ocr.Read(input)



	' Saves the OCR result as a searchable PDF

	result.SaveAsSearchablePdf("searchable.pdf")

End Using
$vbLabelText   $csharpLabel

Similarly, convert existing PDFs into searchable ones:

:path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-12.cs
using IronOcr;

// Create an instance of the IronTesseract engine
var ocr = new IronTesseract();

// Use a using statement to ensure input resources are disposed of properly
using (var input = new OcrInput())
{
    // Set a title for the input which might be useful for metadata information
    input.Title = "Pdf Metadata Name";

    // Load the PDF file into the OCR input. If the PDF is password protected, provide the password
    input.LoadPdf("example.pdf", "password");

    // Process the OCR operation on the loaded input
    var result = ocr.Read(input);

    // Save the result as a searchable PDF
    result.SaveAsSearchablePdf("searchable.pdf");
}
Imports IronOcr



' Create an instance of the IronTesseract engine

Private ocr = New IronTesseract()



' Use a using statement to ensure input resources are disposed of properly

Using input = New OcrInput()

	' Set a title for the input which might be useful for metadata information

	input.Title = "Pdf Metadata Name"



	' Load the PDF file into the OCR input. If the PDF is password protected, provide the password

	input.LoadPdf("example.pdf", "password")



	' Process the OCR operation on the loaded input

	Dim result = ocr.Read(input)



	' Save the result as a searchable PDF

	result.SaveAsSearchablePdf("searchable.pdf")

End Using
$vbLabelText   $csharpLabel

Applying the same technique to TIFF conversions:

:path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-13.cs
// Import necessary namespace for OCR functionality using IronOCR library
using IronOcr;

// Instantiate IronTesseract object for OCR operations
var ocr = new IronTesseract();

// Utilize using statement for automatic disposal of OcrInput object after usage
using (var input = new OcrInput())
{
    // Set a title for the OCR input, useful for identifying the content in larger projects
    input.Title = "Pdf Title";

    // Define the page indices of the images that are to be processed
    var pageIndices = new int[] { 1, 2 };

    // Load images from the specified TIFF file, using the defined page indices
    input.LoadImageFrames("example.tiff", pageIndices);

    // Perform OCR on the input and store the result
    OcrResult result = ocr.Read(input);

    // Save the OCR result as a searchable PDF document
    result.SaveAsSearchablePdf("searchable.pdf");
}
' Import necessary namespace for OCR functionality using IronOCR library

Imports IronOcr



' Instantiate IronTesseract object for OCR operations

Private ocr = New IronTesseract()



' Utilize using statement for automatic disposal of OcrInput object after usage

Using input = New OcrInput()

	' Set a title for the OCR input, useful for identifying the content in larger projects

	input.Title = "Pdf Title"



	' Define the page indices of the images that are to be processed

	Dim pageIndices = New Integer() { 1, 2 }



	' Load images from the specified TIFF file, using the defined page indices

	input.LoadImageFrames("example.tiff", pageIndices)



	' Perform OCR on the input and store the result

	Dim result As OcrResult = ocr.Read(input)



	' Save the OCR result as a searchable PDF document

	result.SaveAsSearchablePdf("searchable.pdf")

End Using
$vbLabelText   $csharpLabel

Exporting Hocr HTML

IronOCR allows export of OCR results to Hocr HTML, facilitating limited PDF to HTML and TIFF to HTML conversion.

:path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-14.cs
using IronOcr;

// Instantiate the OCR engine
var ocr = new IronTesseract();

// Using block to properly dispose of OcrInput resources
using (var input = new OcrInput())
{
    // Set a title for the OCR input
    input.Title = "Html Title";

    // Load images and PDFs for OCR processing
    input.AddImage("image2.jpeg");

    // The LoadPdf method requires a file and can include a password for encrypted PDFs
    input.AddPdf("example.pdf", "password");

    // Load specific frames from a TIFF image using page indices
    var pageIndices = new int[] { 1, 2 };
    input.AddTiff("example.tiff", pageIndices);

    // Perform OCR on the provided input and obtain the result
    OcrResult result = ocr.Read(input);

    // Save the result as a HOCR file, which is a format for representing OCR results
    result.SaveAsHocrFile("hocr.html");
}
Imports IronOcr



' Instantiate the OCR engine

Private ocr = New IronTesseract()



' Using block to properly dispose of OcrInput resources

Using input = New OcrInput()

	' Set a title for the OCR input

	input.Title = "Html Title"



	' Load images and PDFs for OCR processing

	input.AddImage("image2.jpeg")



	' The LoadPdf method requires a file and can include a password for encrypted PDFs

	input.AddPdf("example.pdf", "password")



	' Load specific frames from a TIFF image using page indices

	Dim pageIndices = New Integer() { 1, 2 }

	input.AddTiff("example.tiff", pageIndices)



	' Perform OCR on the provided input and obtain the result

	Dim result As OcrResult = ocr.Read(input)



	' Save the result as a HOCR file, which is a format for representing OCR results

	result.SaveAsHocrFile("hocr.html")

End Using
$vbLabelText   $csharpLabel

Reading Barcodes in OCR Documents

IronOCR uniquely offers the ability to read barcodes and QR codes alongside text recognition.

:path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-15.cs
// Include the IronOcr namespace to use IronTesseract for OCR and barcode reading.
using IronOcr;

// Create a new instance of IronTesseract for performing optical character and barcode recognition.
var ocr = new IronTesseract();

// Enable barcode reading in the OCR engine's configuration.
ocr.Configuration.ReadBarCodes = true;

// The OcrInput class allows for the loading of images which are to be processed by the OCR engine.
using (var input = new OcrInput())
{
    // Load an image file into the OcrInput. Ensure the path is accurate relative to the execution directory.
    // The image should contain a barcode or text for OCR.
    input.AddImage("img/Barcode.png");

    // Perform OCR and barcode reading on the input image.
    // The Read method returns an OcrResult containing recognized text, barcodes, and more.
    var result = ocr.Read(input);

    // Iterate through the detected barcodes in the OcrResult.
    foreach (var barcode in result.Barcodes)
    {
        // Output the barcode value to the console.
        Console.WriteLine($"Barcode Value: {barcode.Value}");

        // Additional barcode properties, such as type and location, can be accessed as needed.
        Console.WriteLine($"Type: {barcode.Type}, Location: {barcode.Location}");
    }
}
' Include the IronOcr namespace to use IronTesseract for OCR and barcode reading.

Imports IronOcr



' Create a new instance of IronTesseract for performing optical character and barcode recognition.

Private ocr = New IronTesseract()



' Enable barcode reading in the OCR engine's configuration.

ocr.Configuration.ReadBarCodes = True



' The OcrInput class allows for the loading of images which are to be processed by the OCR engine.

Using input = New OcrInput()

	' Load an image file into the OcrInput. Ensure the path is accurate relative to the execution directory.

	' The image should contain a barcode or text for OCR.

	input.AddImage("img/Barcode.png")



	' Perform OCR and barcode reading on the input image.

	' The Read method returns an OcrResult containing recognized text, barcodes, and more.

	Dim result = ocr.Read(input)



	' Iterate through the detected barcodes in the OcrResult.

	For Each barcode In result.Barcodes

		' Output the barcode value to the console.

		Console.WriteLine($"Barcode Value: {barcode.Value}")



		' Additional barcode properties, such as type and location, can be accessed as needed.

		Console.WriteLine($"Type: {barcode.Type}, Location: {barcode.Location}")

	Next barcode

End Using
$vbLabelText   $csharpLabel

A Detailed Look at Image to Text OCR Results

The OCR results object in IronOCR contains comprehensive information that advanced developers can leverage.

An OCR result includes collections of pages, each containing barcodes, graphs, text lines, words, and characters. These objects root their details, including location, font, confidence level, etc., giving developers flexibility in data handling.

Elements of the .NET OCR Results, like a paragraph, word, or barcode, can be exported as images or bitmaps.

:path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-16.cs
using System;
using IronOcr;
using IronSoftware.Drawing;

// Instantiate the IronTesseract OCR engine
IronTesseract ocr = new IronTesseract
{
    // Enable reading of barcodes in the OCR process
    Configuration = { ReadBarCodes = true }
};

// Create an OCR input object to hold the images to be analyzed
using OcrInput input = new OcrInput();

// Specify the page indexes of the TIFF file to be processed (pages 1 and 2)
int[] pageIndices = { 1, 2 };
input.LoadImageFrames(@"img\Potter.tiff", pageIndices);

// Process the input images and obtain the OCR result
OcrResult result = ocr.Read(input);

// Iterate through each page in the OCR result
foreach (var page in result.Pages)
{
    // Fetch information about the current page
    int pageNumber = page.PageNumber;
    string pageText = page.Text;
    int pageWordCount = page.WordCount;

    // Obtain barcodes if any are present on the page
    OcrResult.Barcode[] barcodes = page.Barcodes;

    // Convert the page content into images, supporting both AnyBitmap and System.Drawing.Bitmap types
    AnyBitmap pageImage = page.ToBitmap();
    System.Drawing.Bitmap pageImageLegacy = page.ToBitmap();
    double pageWidth = page.Width;
    double pageHeight = page.Height;

    // Iterate through each paragraph on the current page
    foreach (var paragraph in page.Paragraphs)
    {
        // Extract paragraph details
        int paragraphNumber = paragraph.ParagraphNumber;
        string paragraphText = paragraph.Text;
        System.Drawing.Bitmap paragraphImage = paragraph.ToBitmap();
        int paragraphXLocation = paragraph.X;
        int paragraphYLocation = paragraph.Y;
        int paragraphWidth = paragraph.Width;
        int paragraphHeight = paragraph.Height;
        double paragraphOcrAccuracy = paragraph.Confidence;
        var paragraphTextDirection = paragraph.TextDirection;

        // Iterate through each line within the current paragraph
        foreach (var line in paragraph.Lines)
        {
            // Extract line information
            int lineNumber = line.LineNumber;
            string lineText = line.Text;
            AnyBitmap lineImage = line.ToBitmap();
            System.Drawing.Bitmap lineImageLegacy = line.ToBitmap();
            int lineXLocation = line.X;
            int lineYLocation = line.Y;
            int lineWidth = line.Width;
            int lineHeight = line.Height;
            double lineOcrAccuracy = line.Confidence;
            double lineSkew = line.BaselineAngle;
            double lineOffset = line.BaselineOffset;

            // Iterate through each word in the line
            foreach (var word in line.Words)
            {
                int wordNumber = word.WordNumber;
                string wordText = word.Text;
                AnyBitmap wordImage = word.ToBitmap();
                System.Drawing.Image wordImageLegacy = word.ToBitmap();
                int wordXLocation = word.X;
                int wordYLocation = word.Y;
                int wordWidth = word.Width;
                int wordHeight = word.Height;
                double wordOcrAccuracy = word.Confidence;

                // Check for font details, available only when using certain Tesseract engine modes
                if (word.Font != null)
                {
                    string fontName = word.Font.FontName;
                    double fontSize = word.Font.FontSize;
                    bool isBold = word.Font.IsBold;
                    bool isFixedWidth = word.Font.IsFixedWidth;
                    bool isItalic = word.Font.IsItalic;
                    bool isSerif = word.Font.IsSerif;
                    bool isUnderlined = word.Font.IsUnderlined;
                    bool fontIsCaligraphic = word.Font.IsCaligraphic;
                }

                // Iterate through each character in the word
                foreach (var character in word.Characters)
                {
                    int characterNumber = character.CharacterNumber;
                    string characterText = character.Text;
                    AnyBitmap characterImage = character.ToBitmap();
                    System.Drawing.Bitmap characterImageLegacy = character.ToBitmap();
                    int characterXLocation = character.X;
                    int characterYLocation = character.Y;
                    int characterWidth = character.Width;
                    int characterHeight = character.Height;
                    double characterOcrAccuracy = character.Confidence;

                    // Get alternative symbol choices and their probabilities, useful for spell checking
                    OcrResult.Choice[] characterChoices = character.Choices;
                }
            }
        }
    }
}
Imports System

Imports IronOcr

Imports IronSoftware.Drawing



' Instantiate the IronTesseract OCR engine

Private ocr As New IronTesseract With {

	.Configuration = { ReadBarCodes = True }

}



' Create an OCR input object to hold the images to be analyzed

Private OcrInput As using



' Specify the page indexes of the TIFF file to be processed (pages 1 and 2)

Private pageIndices() As Integer = { 1, 2 }

input.LoadImageFrames("img\Potter.tiff", pageIndices)



' Process the input images and obtain the OCR result

Dim result As OcrResult = ocr.Read(input)



' Iterate through each page in the OCR result

For Each page In result.Pages

	' Fetch information about the current page

	Dim pageNumber As Integer = page.PageNumber

	Dim pageText As String = page.Text

	Dim pageWordCount As Integer = page.WordCount



	' Obtain barcodes if any are present on the page

	Dim barcodes() As OcrResult.Barcode = page.Barcodes



	' Convert the page content into images, supporting both AnyBitmap and System.Drawing.Bitmap types

	Dim pageImage As AnyBitmap = page.ToBitmap()

	Dim pageImageLegacy As System.Drawing.Bitmap = page.ToBitmap()

	Dim pageWidth As Double = page.Width

	Dim pageHeight As Double = page.Height



	' Iterate through each paragraph on the current page

	For Each paragraph In page.Paragraphs

		' Extract paragraph details

		Dim paragraphNumber As Integer = paragraph.ParagraphNumber

		Dim paragraphText As String = paragraph.Text

		Dim paragraphImage As System.Drawing.Bitmap = paragraph.ToBitmap()

		Dim paragraphXLocation As Integer = paragraph.X

		Dim paragraphYLocation As Integer = paragraph.Y

		Dim paragraphWidth As Integer = paragraph.Width

		Dim paragraphHeight As Integer = paragraph.Height

		Dim paragraphOcrAccuracy As Double = paragraph.Confidence

		Dim paragraphTextDirection = paragraph.TextDirection



		' Iterate through each line within the current paragraph

		For Each line In paragraph.Lines

			' Extract line information

			Dim lineNumber As Integer = line.LineNumber

			Dim lineText As String = line.Text

			Dim lineImage As AnyBitmap = line.ToBitmap()

			Dim lineImageLegacy As System.Drawing.Bitmap = line.ToBitmap()

			Dim lineXLocation As Integer = line.X

			Dim lineYLocation As Integer = line.Y

			Dim lineWidth As Integer = line.Width

			Dim lineHeight As Integer = line.Height

			Dim lineOcrAccuracy As Double = line.Confidence

			Dim lineSkew As Double = line.BaselineAngle

			Dim lineOffset As Double = line.BaselineOffset



			' Iterate through each word in the line

			For Each word In line.Words

				Dim wordNumber As Integer = word.WordNumber

				Dim wordText As String = word.Text

				Dim wordImage As AnyBitmap = word.ToBitmap()

				Dim wordImageLegacy As System.Drawing.Image = word.ToBitmap()

				Dim wordXLocation As Integer = word.X

				Dim wordYLocation As Integer = word.Y

				Dim wordWidth As Integer = word.Width

				Dim wordHeight As Integer = word.Height

				Dim wordOcrAccuracy As Double = word.Confidence



				' Check for font details, available only when using certain Tesseract engine modes

				If word.Font IsNot Nothing Then

					Dim fontName As String = word.Font.FontName

					Dim fontSize As Double = word.Font.FontSize

					Dim isBold As Boolean = word.Font.IsBold

					Dim isFixedWidth As Boolean = word.Font.IsFixedWidth

					Dim isItalic As Boolean = word.Font.IsItalic

					Dim isSerif As Boolean = word.Font.IsSerif

					Dim isUnderlined As Boolean = word.Font.IsUnderlined

					Dim fontIsCaligraphic As Boolean = word.Font.IsCaligraphic

				End If



				' Iterate through each character in the word

				For Each character In word.Characters

					Dim characterNumber As Integer = character.CharacterNumber

					Dim characterText As String = character.Text

					Dim characterImage As AnyBitmap = character.ToBitmap()

					Dim characterImageLegacy As System.Drawing.Bitmap = character.ToBitmap()

					Dim characterXLocation As Integer = character.X

					Dim characterYLocation As Integer = character.Y

					Dim characterWidth As Integer = character.Width

					Dim characterHeight As Integer = character.Height

					Dim characterOcrAccuracy As Double = character.Confidence



					' Get alternative symbol choices and their probabilities, useful for spell checking

					Dim characterChoices() As OcrResult.Choice = character.Choices

				Next character

			Next word

		Next line

	Next paragraph

Next page
$vbLabelText   $csharpLabel

Summary

IronOCR provides C# developers with the most advanced Tesseract API, executable across various platforms, including Windows, Linux, and Mac. Its capability to accurately read even imperfect documents at high statistical accuracy is exceptional. Moreover, it supports barcode reading and exporting OCR data as HTML or searchable PDFs, a unique feature compared to other OCR solutions or plain Tesseract.

Moving Forward

To further explore IronOCR:

Source Code Download

Explore more .NET OCR tutorials in this section.

Frequently Asked Questions

What is IronOCR?

IronOCR is a C# OCR library that allows developers to read text from images and PDFs without using Tesseract, offering improved accuracy and speed.

How do I install IronOCR in a .NET project?

You can install IronOCR via the NuGet package manager using the command: `Install-Package IronOcr`.

What languages does IronOCR support?

IronOCR supports 125 international languages, which can be added via downloadable language packs from NuGet.

Can IronOCR handle low-quality scans?

Yes, IronOCR can handle low-quality and skewed scans by using image filters like deskew and denoise to improve accuracy.

How can I read a specific region of an image using IronOCR?

You can specify a region using `System.Drawing.Rectangle` to target specific areas of an image for OCR processing.

Does IronOCR support multi-page document processing?

Yes, IronOCR can process multi-page documents, combining images, TIFF frames, and PDF pages into a single OCR input.

What output formats does IronOCR support?

IronOCR can output results as strings, searchable PDFs, Hocr HTML, and more, making it versatile for various applications.

Can IronOCR read barcodes from documents?

Yes, IronOCR can detect and read barcodes and QR codes alongside text recognition from images.

How does IronOCR improve upon traditional Tesseract?

IronOCR enhances the accuracy and speed of the Tesseract engine, manages complex dictionaries, and supports modern .NET frameworks natively.

Is IronOCR compatible with various .NET platforms?

IronOCR is compatible with .NET Framework 4.5+, .NET Standard 2+, .NET Core, Xamarin, Mono, Azure, and Docker, providing wide platform support.

Chaknith Bin
Software Engineer
Chaknith works on IronXL and IronBarcode. He has deep expertise in C# and .NET, helping improve the software and support customers. His insights from user interactions contribute to better products, documentation, and overall experience.