IronOCR Tutorials How to Read Text from an Image in .NET Read Text from Images with C# OCR ByChaknith Bin August 28, 2018 Updated June 22, 2025 Share: In this tutorial, we will learn how to convert images to text in C# and other .NET languages. View the IronOCR YouTube Playlist How to Convert Images to Text in C# Download the OCR Image to Text IronOCR Library Adjust Crop Regions to Read Parts of an Image Use up to 125 international languages via Language Packs Export OCR Scan Results as Text or Searchable PDF Reading Text from Images in .NET Applications We will use the IronOcr.IronTesseract class to recognize text within images and explore how to use Iron Tesseract OCR to maximize accuracy and speed in .NET applications. To achieve "Image to Text" functionality, we need to install the IronOCR library into a Visual Studio project. You can download the IronOcr DLL or use NuGet. Install-Package IronOcr Why IronOCR? We use IronOCR to manage Tesseract because: It works out of the box in pure .NET. It does not require Tesseract to be installed on your machine. It runs the latest engines: Tesseract 5 (as well as Tesseract 4 & 3). It is compatible with .NET Framework 4.5+, .NET Standard 2+, and .NET Core 2, 3 & 5. It improves accuracy and speed over traditional Tesseract. It supports Xamarin, Mono, Azure, and Docker. It manages the complex Tesseract dictionary system using NuGet packages. It supports PDFs, MultiFrame TIFFs, and all major image formats without configuration. It can correct low-quality and skewed scans to achieve the best results. Start using IronOCR in your project today with a free trial. First Step: Start for Free Using Tesseract in C# This simple example demonstrates using the IronOcr.IronTesseract class to read text from an image and return its value as a string. :path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-1.cs // This code snippet uses the IronOcr library to perform Optical Character Recognition (OCR) // on an image file and print the extracted text to the console. // To use IronOcr, ensure you have installed the package via NuGet Package Manager: // PM> Install-Package IronOcr using IronOcr; try { // Create an instance of IronTesseract, which is used to perform OCR on images. var ocrEngine = new IronTesseract(); // Specify the file path to the image you want to process. // Ensure the path is correct; it's currently set to a relative path. var filePath = @"img\Screenshot.png"; // Use the Read method to perform OCR on the specified image file. // This returns an OcrResult which contains the recognized text. using (var input = new OcrInput(filePath)) { OcrResult result = ocrEngine.Read(input); // Output the extracted text to the console. Console.WriteLine(result.Text); } } catch (OcrException ex) { // Handle any OCR-specific exceptions that might occur Console.WriteLine("An error occurred while processing the image: " + ex.Message); } catch (Exception ex) { // Handle any general exceptions that might occur Console.WriteLine("An error occurred: " + ex.Message); } ' This code snippet uses the IronOcr library to perform Optical Character Recognition (OCR) ' on an image file and print the extracted text to the console. ' To use IronOcr, ensure you have installed the package via NuGet Package Manager: ' PM> Install-Package IronOcr Imports IronOcr Try ' Create an instance of IronTesseract, which is used to perform OCR on images. Dim ocrEngine = New IronTesseract() ' Specify the file path to the image you want to process. ' Ensure the path is correct; it's currently set to a relative path. Dim filePath = "img\Screenshot.png" ' Use the Read method to perform OCR on the specified image file. ' This returns an OcrResult which contains the recognized text. Using input = New OcrInput(filePath) Dim result As OcrResult = ocrEngine.Read(input) ' Output the extracted text to the console. Console.WriteLine(result.Text) End Using Catch ex As OcrException ' Handle any OCR-specific exceptions that might occur Console.WriteLine("An error occurred while processing the image: " & ex.Message) Catch ex As Exception ' Handle any general exceptions that might occur Console.WriteLine("An error occurred: " & ex.Message) End Try $vbLabelText $csharpLabel Which results in 100% accuracy with the following text: IronOCR Simple Example In this simple example we test the accuracy of our C# OCR library to read text from a PNG Image. This is a very basic test, but things will get more complicated as the tutorial continues. The quick brown fox jumps over the lazy dog The OCR process involves sophisticated behavior like scanning the image for alignment, quality, and resolution, optimizing the OCR engine, and using artificial intelligence to read text as a human would. Despite its complexity, the OCR process can match a human's speed and achieve a high level of accuracy. Advanced Use of IronOCR Tesseract for C# For real-world projects requiring optimal performance, use the OcrInput and IronTesseract classes within the IronOcr namespace. OcrInput Features: Works with various image formats like JPEG, TIFF, GIF, BMP, and PNG. Imports whole or parts of PDF documents. Enhances contrast, resolution, and size. Corrects for rotation, scan noise, digital noise, skew, and negative images. IronTesseract Features: Access hundreds of prepackaged languages and variants. Tesseract 5, 4, or 3 OCR engines available out-of-the-box. Specify if the document is a screenshot, snippet, or full document. Read barcodes. Output results to: Searchable PDFs, Hocr HTML, a DOM, and Strings. Example: Getting Started with OcrInput + IronTesseract Here is a recommended starting configuration, suitable for most images: :path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-2.cs using IronOcr; // Create an instance of the IronTesseract OCR engine. IronTesseract ocr = new IronTesseract(); // Using statement to ensure resources are disposed of correctly. using (OcrInput input = new OcrInput()) { // Specify the pages to be read from a multi-page TIFF file. int[] pageIndices = new int[] { 1, 2 }; // Load specified image frames (pages) from the TIFF file into the input object. // Ensure the file path is correct, adjust as needed if the directory structure is different. input.LoadImageFrames(@"img\Potter.tiff", pageIndices); // Perform OCR on the loaded image frames. // The Read method analyses the image frames and extracts text. OcrResult result = ocr.Read(input); // Output the extracted text to the console. // If reading pages fails, result.Text will be empty or contain error messages. Console.WriteLine(result.Text); } Imports IronOcr ' Create an instance of the IronTesseract OCR engine. Private ocr As New IronTesseract() ' Using statement to ensure resources are disposed of correctly. Using input As New OcrInput() ' Specify the pages to be read from a multi-page TIFF file. Dim pageIndices() As Integer = { 1, 2 } ' Load specified image frames (pages) from the TIFF file into the input object. ' Ensure the file path is correct, adjust as needed if the directory structure is different. input.LoadImageFrames("img\Potter.tiff", pageIndices) ' Perform OCR on the loaded image frames. ' The Read method analyses the image frames and extracts text. Dim result As OcrResult = ocr.Read(input) ' Output the extracted text to the console. ' If reading pages fails, result.Text will be empty or contain error messages. Console.WriteLine(result.Text) End Using $vbLabelText $csharpLabel This configuration can achieve 100% accuracy on a medium-quality scan. Reading text and/or barcodes from scanned images such as TIFFs is simplified with IronOCR, achieving a high level of accuracy. IronOCR is highly effective with real-world documents, including multi-page TIFFs and PDF extractions. Example: A Low Quality Scan In this case, we work with a low-quality scan with distortion and digital noise. IronOCR excels in this scenario compared to other OCR libraries, handling real-world scanned images efficiently rather than synthetic test cases that guarantee 100% accuracy. :path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-3.cs // Include the necessary namespace for IronOcr using IronOcr; using System; // Create a new instance of the IronTesseract class, which is responsible for OCR operations var ocr = new IronTesseract(); try { // Create a new OCR input object that represents the images to be processed using (var input = new OcrInput()) { // Specify the indices of the pages in a multi-page image file you want to process // Typically, indices are 0-based, but check if the library needs 1-based indices var pageIndices = new int[] { 0, 1 }; // Adjust indices according to your requirements // Load specific frames/pages from a TIFF file into the OCR input object input.LoadImageFrames(@"img\Potter.LowQuality.tiff", pageIndices); // Apply a deskew transformation to the input to correct any rotation or perspective distortion input.Deskew(); // This method helps in removing rotation and perspective to improve OCR accuracy // Perform OCR on the processed input and obtain the result OcrResult result = ocr.Read(input); // Output the recognized text to the console Console.WriteLine("Recognized Text:"); Console.WriteLine(result.Text); } } catch (Exception ex) { // Handle potential exceptions such as file not found, incorrect image format, etc. Console.WriteLine("An error occurred during OCR processing: " + ex.Message); } ' Include the necessary namespace for IronOcr Imports IronOcr Imports System ' Create a new instance of the IronTesseract class, which is responsible for OCR operations Private ocr = New IronTesseract() Try ' Create a new OCR input object that represents the images to be processed Using input = New OcrInput() ' Specify the indices of the pages in a multi-page image file you want to process ' Typically, indices are 0-based, but check if the library needs 1-based indices Dim pageIndices = New Integer() { 0, 1 } ' Adjust indices according to your requirements ' Load specific frames/pages from a TIFF file into the OCR input object input.LoadImageFrames("img\Potter.LowQuality.tiff", pageIndices) ' Apply a deskew transformation to the input to correct any rotation or perspective distortion input.Deskew() ' This method helps in removing rotation and perspective to improve OCR accuracy ' Perform OCR on the processed input and obtain the result Dim result As OcrResult = ocr.Read(input) ' Output the recognized text to the console Console.WriteLine("Recognized Text:") Console.WriteLine(result.Text) End Using Catch ex As Exception ' Handle potential exceptions such as file not found, incorrect image format, etc. Console.WriteLine("An error occurred during OCR processing: " & ex.Message) End Try $vbLabelText $csharpLabel With Input.Deskew(), we reach 99.8% accuracy, nearly matching a high-quality scan. Image filters might slightly increase runtime, but reduce OCR processing times. Developers must balance filters for their input documents. If uncertain, Input.Deskew() and Input.DeNoise() are reliable filters for improving your OCR's performance. Performance Tuning The primary factor in OCR job speed is input image quality. Less noise and higher DPI (~200 dpi is ideal) make for the fastest and most accurate results. IronOCR efficiently corrects imperfect documents, although this is computationally expensive and slower. Using image formats with minimal digital noise like TIFF or PNG can yield faster results compared to JPEG. Image Filters The following image filters can significantly improve performance: OcrInput.Rotate(double degrees): Rotate images by degrees clockwise. Use negative degrees for counterclockwise rotation. OcrInput.Binarize(): Converts every pixel to black or white, improving OCR performance in low contrast cases. OcrInput.ToGrayScale(): Converts every pixel to grayscale, potentially improving speed. OcrInput.Contrast(): Automatically increases contrast, often enhancing OCR speed and accuracy. OcrInput.DeNoise(): Removes digital noise, beneficial where noise is expected. OcrInput.Invert(): Inverts image colors (white becomes black, black becomes white). OcrInput.Dilate(): Adds pixels to object boundaries in images. OcrInput.Erode(): Removes pixels on object boundaries. OcrInput.Deskew(): Aligns the image correctly. Critical for OCR since Tesseract's skew tolerance can be low. OcrInput.DeepCleanBackgroundNoise(): Heavy noise removal. OcrInput.EnhanceResolution: Enhances low-quality image resolution. Performance Tuning for Speed Consider using the following settings to speed up OCR for high-quality scans: :path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-4.cs using IronOcr; // Initialize a new instance of the IronTesseract class which handles OCR operations IronTesseract ocr = new IronTesseract(); // Configure for optimal speed by excluding specific characters from OCR consideration ocr.Configuration.BlackListCharacters = "~`$#^*_{[]} \\"; // Set the page segmentation mode to automatic ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.Auto; // Use the fast version of the English OCR language ocr.Language = OcrLanguage.EnglishFast; // Create an instance of OcrInput which will be used to load the document to be processed using (OcrInput input = new OcrInput()) { // Specify the image frames to load from a multi-page image file int[] pageIndices = new int[] { 1, 2 }; // Load specified pages from the image file into the OcrInput instance input.LoadImageFrames(@"img\Potter.tiff", pageIndices); // Read the text from the input images and store the OCR result OcrResult result = ocr.Read(input); // Output the textual result of the OCR process to the console Console.WriteLine(result.Text); } Imports IronOcr ' Initialize a new instance of the IronTesseract class which handles OCR operations Private ocr As New IronTesseract() ' Configure for optimal speed by excluding specific characters from OCR consideration ocr.Configuration.BlackListCharacters = "~`$#^*_{[]} \" ' Set the page segmentation mode to automatic ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.Auto ' Use the fast version of the English OCR language ocr.Language = OcrLanguage.EnglishFast ' Create an instance of OcrInput which will be used to load the document to be processed Using input As New OcrInput() ' Specify the image frames to load from a multi-page image file Dim pageIndices() As Integer = { 1, 2 } ' Load specified pages from the image file into the OcrInput instance input.LoadImageFrames("img\Potter.tiff", pageIndices) ' Read the text from the input images and store the OCR result Dim result As OcrResult = ocr.Read(input) ' Output the textual result of the OCR process to the console Console.WriteLine(result.Text) End Using $vbLabelText $csharpLabel This setup is 99.8% accurate compared to the baseline 100%, achieving a 35% speed boost. Reading Cropped Regions of Images Iron's version of Tesseract OCR can target specific image areas using System.Drawing.Rectangle. This is particularly useful when handling standardized forms where the text is localized in specific sections. Example: Scanning an Area of a Page Using System.Drawing.Rectangle, you can specify pixel-based areas for OCR. This improves speed and prevents reading unnecessary text. :path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-5.cs using IronOcr; // Namespace for IronOcr library using IronSoftware.Drawing; // Namespace reference for drawing-related classes // Create an instance of the IronTesseract class for OCR processing var ocr = new IronTesseract(); // Using statement ensures that resources are automatically released after use using (var input = new OcrInput()) { // Define a rectangle area to focus OCR processing on a specific portion of the image. // This can significantly speed up the process by limiting the area to be analyzed. // Rectangle arguments: x = 215, y = 1250, width = 1335, height = 280 var contentArea = new System.Drawing.Rectangle(x: 215, y: 1250, width: 1335, height: 280); // Load the image with a specific area of interest input.AddImage("img/ComSci.png", contentArea); // Perform OCR on the loaded image input OcrResult result = ocr.Read(input); // Output the recognized text to the console Console.WriteLine(result.Text); } Imports IronOcr ' Namespace for IronOcr library Imports IronSoftware.Drawing ' Namespace reference for drawing-related classes ' Create an instance of the IronTesseract class for OCR processing Private ocr = New IronTesseract() ' Using statement ensures that resources are automatically released after use Using input = New OcrInput() ' Define a rectangle area to focus OCR processing on a specific portion of the image. ' This can significantly speed up the process by limiting the area to be analyzed. ' Rectangle arguments: x = 215, y = 1250, width = 1335, height = 280 Dim contentArea = New System.Drawing.Rectangle(x:= 215, y:= 1250, width:= 1335, height:= 280) ' Load the image with a specific area of interest input.AddImage("img/ComSci.png", contentArea) ' Perform OCR on the loaded image input Dim result As OcrResult = ocr.Read(input) ' Output the recognized text to the console Console.WriteLine(result.Text) End Using $vbLabelText $csharpLabel This method offers 41% speed improvement and specific text extraction, ideal for .NET OCR contexts like invoices, checks, forms, etc. OCR cropping is also supported for PDFs. International Languages IronOCR supports 125 international languages via language packs, downloadable as DLLs from this website or the NuGet Package Manager for Visual Studio. You can install them via the NuGet interface (search for "IronOcr.Languages") or from the OCR language packs page. Example languages include: Afrikaans Amharic Also known as አማርኛ Arabic Also known as العربية ArabicAlphabet Also known as العربية ArmenianAlphabet Also known as Հայերեն Assamese Also known as অসমীয়া Azerbaijani Also known as azərbaycan dili AzerbaijaniCyrillic Also known as azərbaycan dili Belarusian Also known as беларуская мова Bengali Also known as Bangla,বাংলা BengaliAlphabet Also known as Bangla,বাংলা Tibetan Also known as Tibetan Standard, Tibetan, Central ཡིག་ Bosnian Also known as bosanski jezik Breton Also known as brezhoneg Bulgarian Also known as български език CanadianAboriginalAlphabet Also known as Canadian First Nations, Indigenous Canadians, Native Canadian, Inuit Catalan Also known as català, valencià Cebuano Also known as Bisaya, Binisaya Czech Also known as čeština, český jazyk CherokeeAlphabet Also known as ᏣᎳᎩ ᎦᏬᏂᎯᏍᏗ, Tsalagi Gawonihisdi ChineseSimplified Also known as 中文 (Zhōngwén), 汉语, 漢語 ChineseSimplifiedVertical Also known as 中文 (Zhōngwén), 汉语, 漢語 ChineseTraditional Also known as 中文 (Zhōngwén), 汉语, 漢語 ChineseTraditionalVertical Also known as 中文 (Zhōngwén), 汉语, 漢語 Cherokee Also known as ᏣᎳᎩ ᎦᏬᏂᎯᏍᏗ, Tsalagi Gawonihisdi Corsican Also known as corsu, lingua corsa Welsh Also known as Cymraeg CyrillicAlphabet Also known as Cyrillic scripts Danish Also known as dansk DanishFraktur Also known as dansk German Also known as Deutsch GermanFraktur Also known as Deutsch DevanagariAlphabet Also known as Nagair,देवनागरी Divehi Also known as ދިވެހި Dzongkha Also known as རྫོང་ཁ Greek Also known as ελληνικά English MiddleEnglish Also known as English (1100-1500 AD) Esperanto Estonian Also known as eesti, eesti keel EthiopicAlphabet Also known as Ge'ez,ግዕዝ, Gəʿəz Basque Also known as euskara, euskera Faroese Also known as føroyskt Persian Also known as فارسی Filipino Also known as National Language of the Philippines, Standardized Tagalog Finnish Also known as suomi, suomen kieli Financial Also known as Financial, Numerical and Technical Documents French Also known as français, langue française FrakturAlphabet Also known as Generic Fraktur, Calligraphic hand of the Latin alphabet Frankish Also known as Frenkisk, Old Franconian MiddleFrench Also known as Moyen Français,Middle French (ca. 1400-1600 AD) WesternFrisian Also known as Frysk GeorgianAlphabet Also known as ქართული ScottishGaelic Also known as Gàidhlig Irish Also known as Gaeilge Galician Also known as galego AncientGreek Also known as Ἑλληνική GreekAlphabet Also known as ελληνικά Gujarati Also known as ગુજરાતી GujaratiAlphabet Also known as ગુજરાતી GurmukhiAlphabet Also known as Gurmukhī, ਗੁਰਮੁਖੀ, Shahmukhi, گُرمُکھی, Sihk Script HangulAlphabet Also known as Korean Alphabet,한글,Hangeul,조선글,hosŏn'gŭl HangulVerticalAlphabet Also known as Korean Alphabet,한글,Hangeul,조선글,hosŏn'gŭl HanSimplifiedAlphabet Also known as Samhan ,한어, 韓語 HanSimplifiedVerticalAlphabet Also known as Samhan ,한어, 韓語 HanTraditionalAlphabet Also known as Samhan ,한어, 韓語 HanTraditionalVerticalAlphabet Also known as Samhan ,한어, 韓語 Haitian Also known as Kreyòl ayisyen Hebrew Also known as עברית HebrewAlphabet Also known as עברית Hindi Also known as हिन्दी, हिंदी Croatian Also known as hrvatski jezik Hungarian Also known as magyar Armenian Also known as Հայերեն Inuktitut Also known as ᐃᓄᒃᑎᑐᑦ Indonesian Also known as Bahasa Indonesia Icelandic Also known as Íslenska Italian Also known as italiano ItalianOld Also known as italiano JapaneseAlphabet Also known as 日本語 (にほんご) JapaneseVerticalAlphabet Also known as 日本語 (にほんご) Javanese Also known as basa Jawa Japanese Also known as 日本語 (にほんご) JapaneseVertical Also known as 日本語 (にほんご) Kannada Also known as ಕನ್ನಡ KannadaAlphabet Also known as ಕನ್ನಡ Georgian Also known as ქართული GeorgianOld Also known as ქართული Kazakh Also known as қазақ тілі Khmer Also known as ខ្មែរ, ខេមរភាសា, ភាសាខ្មែរ KhmerAlphabet Also known as ខ្មែរ, ខេមរភាសា, ភាសាខ្មែរ Kyrgyz Also known as Кыргызча, Кыргыз тили NorthernKurdish Also known as Kurmanji, کورمانجی ,Kurmancî Korean Also known as 한국어 (韓國語), 조선어 (朝鮮語) KoreanVertical Also known as 한국어 (韓國語), 조선어 (朝鮮語) Lao Also known as ພາສາລາວ LaoAlphabet Also known as ພາສາລາວ Latin Also known as latine, lingua latina LatinAlphabet Also known as latine, lingua latina Latvian Also known as latviešu valoda Lithuanian Also known as lietuvių kalba Luxembourgish Also known as Lëtzebuergesch Malayalam Also known as മലയാളം MalayalamAlphabet Also known as മലയാളം Marathi Also known as मराठी MICR Also known as Magnetic Ink Character Recognition, MICR Cheque Encoding Macedonian Also known as македонски јазик Maltese Also known as Malti Mongolian Also known as монгол Maori Also known as te reo Māori Malay Also known as bahasa Melayu, بهاس ملايو Myanmar Also known as Burmese ,ဗမာစာ MyanmarAlphabet Also known as Burmese ,ဗမာစာ Nepali Also known as नेपाली Dutch Also known as Nederlands, Vlaams Norwegian Also known as Norsk Occitan Also known as occitan, lenga d'òc Oriya Also known as ଓଡ଼ିଆ OriyaAlphabet Also known as ଓଡ଼ିଆ Panjabi Also known as ਪੰਜਾਬੀ, پنجابی Polish Also known as język polski, polszczyzna Portuguese Also known as português Pashto Also known as پښتو Quechua Also known as Runa Simi, Kichwa Romanian Also known as limba română Russian Also known as русский язык Sanskrit Also known as संस्कृतम् Sinhala Also known as සිංහල SinhalaAlphabet Also known as සිංහල Slovak Also known as slovenčina, slovenský jazyk SlovakFraktur Also known as slovenčina, slovenský jazyk Slovene Also known as slovenski jezik, slovenščina Sindhi Also known as सिन्धी, سنڌي، سندھی Spanish Also known as español, castellano SpanishOld Also known as español, castellano Albanian Also known as gjuha shqipe Serbian Also known as српски језик SerbianLatin Also known as српски језик Sundanese Also known as Basa Sunda Swahili Also known as Kiswahili Swedish Also known as Svenska Syriac Also known as Syrian, Syriac Aramaic,ܠܫܢܐ ܣܘܪܝܝܐ, Leššānā Suryāyā SyriacAlphabet Also known as Syrian, Syriac Aramaic,ܠܫܢܐ ܣܘܪܝܝܐ, Leššānā Suryāyā Tamil Also known as தமிழ் TamilAlphabet Also known as தமிழ் Tatar Also known as татар теле, tatar tele Telugu Also known as తెలుగు TeluguAlphabet Also known as తెలుగు Tajik Also known as тоҷикӣ, toğikī, تاجیکی Tagalog Also known as Wikang Tagalog, ᜏᜒᜃᜅ᜔ ᜆᜄᜎᜓᜄ᜔ Thai Also known as ไทย ThaanaAlphabet Also known as Taana , Tāna , ތާނަ ThaiAlphabet Also known as ไทย TibetanAlphabet Also known as Tibetan Standard, Tibetan, Central ཡིག་ Tigrinya Also known as ትግርኛ Tonga Also known as faka Tonga Turkish Also known as Türkçe Uyghur Also known as Uyƣurqə, ئۇيغۇرچە Ukrainian Also known as українська мова Urdu Also known as اردو Uzbek Also known as O‘zbek, Ўзбек, أۇزبېك UzbekCyrillic Also known as O‘zbek, Ўзбек, أۇزبېك Vietnamese Also known as Tiếng Việt VietnameseAlphabet Also known as Tiếng Việt Yiddish Also known as ייִדיש Yoruba Also known as Yorùbá Example: OCR in Arabic (+ many more) The example below demonstrates how to scan documents written in Arabic. Install-Package IronOcr.Languages.Arabic :path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-6.cs // PM> Install IronOcr.Languages.Arabic using IronOcr; // Create an instance of the IronTesseract class for OCR operations. var ocr = new IronTesseract(); // Set the OCR language to Arabic. ocr.Language = OcrLanguage.Arabic; // Use a 'using' block for OcrInput to ensure proper disposal of resources. using (var input = new OcrInput()) { // Load the first frame of the image located at "img/arabic.gif". input.AddImage("img/arabic.gif"); // Optional: Add image filters if necessary for better OCR performance. // In this case, even though the input is very low quality, // IronTesseract can read what conventional Tesseract cannot. // Perform OCR to get the result. var result = ocr.Read(input); // Save the OCR result to a text file because the console might not display Arabic correctly on Windows. result.SaveAsTextFile("arabic.txt"); } ' PM> Install IronOcr.Languages.Arabic Imports IronOcr ' Create an instance of the IronTesseract class for OCR operations. Private ocr = New IronTesseract() ' Set the OCR language to Arabic. ocr.Language = OcrLanguage.Arabic ' Use a 'using' block for OcrInput to ensure proper disposal of resources. Using input = New OcrInput() ' Load the first frame of the image located at "img/arabic.gif". input.AddImage("img/arabic.gif") ' Optional: Add image filters if necessary for better OCR performance. ' In this case, even though the input is very low quality, ' IronTesseract can read what conventional Tesseract cannot. ' Perform OCR to get the result. Dim result = ocr.Read(input) ' Save the OCR result to a text file because the console might not display Arabic correctly on Windows. result.SaveAsTextFile("arabic.txt") End Using $vbLabelText $csharpLabel Example: OCR in more than one language in the same document If a document contains multiple languages, such as English and Chinese, you can perform OCR as follows: Install-Package IronOcr.Languages.ChineseSimplified :path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-7.cs // Include the necessary namespace for IronOcr functionality using IronOcr; // Create an instance of IronTesseract, which is used for OCR operations var ocr = new IronTesseract(); // Set the primary language for OCR processing to Chinese Simplified ocr.Language = OcrLanguage.ChineseSimplified; // We can add any number of secondary languages for OCR processing. // Here, English is added as a secondary language. ocr.AddSecondaryLanguage(OcrLanguage.English); // Optionally, custom Tesseract '.traineddata' files can be added by specifying a file path. // This is useful for languages that are not supported out of the box or for improving existing language support. // Using a using statement to ensure proper disposal of resources after OCR operation using (var input = new OcrInput()) { // Load the image that contains multi-language text for OCR processing input.AddImage("img/MultiLanguage.jpeg"); // Perform OCR on the input and retrieve the result var result = ocr.Read(input); // Save the recognized text as a text file with the specified filename result.SaveAsTextFile("MultiLanguage.txt"); } // Note: Ensure that the necessary IronOcr package is installed and referenced in your project. // Also, make sure that the image path is correct and accessible by the application. ' Include the necessary namespace for IronOcr functionality Imports IronOcr ' Create an instance of IronTesseract, which is used for OCR operations Private ocr = New IronTesseract() ' Set the primary language for OCR processing to Chinese Simplified ocr.Language = OcrLanguage.ChineseSimplified ' We can add any number of secondary languages for OCR processing. ' Here, English is added as a secondary language. ocr.AddSecondaryLanguage(OcrLanguage.English) ' Optionally, custom Tesseract '.traineddata' files can be added by specifying a file path. ' This is useful for languages that are not supported out of the box or for improving existing language support. ' Using a using statement to ensure proper disposal of resources after OCR operation Using input = New OcrInput() ' Load the image that contains multi-language text for OCR processing input.AddImage("img/MultiLanguage.jpeg") ' Perform OCR on the input and retrieve the result Dim result = ocr.Read(input) ' Save the recognized text as a text file with the specified filename result.SaveAsTextFile("MultiLanguage.txt") End Using ' Note: Ensure that the necessary IronOcr package is installed and referenced in your project. ' Also, make sure that the image path is correct and accessible by the application. $vbLabelText $csharpLabel Multi Page Documents IronOCR allows combining multiple pages or images into a single OcrResult. This is great for documents created from multiple images, enabling valuable features like creating searchable PDFs and HTML files. IronOCR can mix and match images, TIFF frames, and PDF pages into a single OCR input. :path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-8.cs // Required namespace for OCR operations using IronOcr; // Create a new instance of IronTesseract to perform OCR operations IronTesseract ocr = new IronTesseract(); // Using statement to ensure proper disposal of OcrInput resources using (OcrInput input = new OcrInput()) { // Load various images into the input object for OCR processing input.AddImage("image1.jpeg"); input.AddImage("image2.png"); // Specify which frames to load from a multi-frame image (e.g., GIF) int[] pageIndices = { 1, 2 }; input.AddImageFrames("image3.gif", pageIndices); // Perform OCR on the loaded images and retrieve the result OcrResult result = ocr.Read(input); // Output the number of pages processed to the console Console.WriteLine($"{result.Pages.Count} Pages processed."); // Expected: 3 Pages } ' Required namespace for OCR operations Imports IronOcr ' Create a new instance of IronTesseract to perform OCR operations Private ocr As New IronTesseract() ' Using statement to ensure proper disposal of OcrInput resources Using input As New OcrInput() ' Load various images into the input object for OCR processing input.AddImage("image1.jpeg") input.AddImage("image2.png") ' Specify which frames to load from a multi-frame image (e.g., GIF) Dim pageIndices() As Integer = { 1, 2 } input.AddImageFrames("image3.gif", pageIndices) ' Perform OCR on the loaded images and retrieve the result Dim result As OcrResult = ocr.Read(input) ' Output the number of pages processed to the console Console.WriteLine($"{result.Pages.Count} Pages processed.") ' Expected: 3 Pages End Using $vbLabelText $csharpLabel OCR all pages of a TIFF file as follows: :path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-9.cs using IronOcr; // Instantiate the IronTesseract object, which will handle OCR processing. IronTesseract ocr = new IronTesseract(); // Create an OcrInput object to hold the images to be processed. // The 'using' statement ensures that resources are freed when the operation is complete. using (OcrInput input = new OcrInput()) { // Define the indices of the pages to load from a multi-frame TIFF image. // This example loads the first and second pages. int[] pageIndices = new int[] { 0, 1 }; // Corrected indices to start from 0 // Load the specified image frames (pages) from the TIFF file. // The LoadImageFrames function reads specific frames of a TIFF file as indicated by the indices. input.LoadImageFrames("MultiFrame.Tiff", pageIndices); // Perform OCR on the input and store the result. OcrResult result = ocr.Read(input); // Output the recognized text to the console. Console.WriteLine(result.Text); // Output the number of pages processed, noting that each frame corresponds to a page. Console.WriteLine($"{result.Pages.Count} Pages"); // Note: One page is returned for each frame (page) in the TIFF input. } Imports IronOcr ' Instantiate the IronTesseract object, which will handle OCR processing. Private ocr As New IronTesseract() ' Create an OcrInput object to hold the images to be processed. ' The 'using' statement ensures that resources are freed when the operation is complete. Using input As New OcrInput() ' Define the indices of the pages to load from a multi-frame TIFF image. ' This example loads the first and second pages. Dim pageIndices() As Integer = { 0, 1 } ' Corrected indices to start from 0 ' Load the specified image frames (pages) from the TIFF file. ' The LoadImageFrames function reads specific frames of a TIFF file as indicated by the indices. input.LoadImageFrames("MultiFrame.Tiff", pageIndices) ' Perform OCR on the input and store the result. Dim result As OcrResult = ocr.Read(input) ' Output the recognized text to the console. Console.WriteLine(result.Text) ' Output the number of pages processed, noting that each frame corresponds to a page. Console.WriteLine($"{result.Pages.Count} Pages") ' Note: One page is returned for each frame (page) in the TIFF input. End Using $vbLabelText $csharpLabel Converting TIFF documents or PDFs to searchable PDFs uses IronTesseract: :path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-10.cs using System; // Import system namespace for Console usage using IronOcr; // Import IronOcr namespace to use OCR functionalities // Create a new instance of IronTesseract, which is the main class for OCR operations IronTesseract ocr = new IronTesseract(); // Create an OcrInput object within a using statement to ensure it is disposed of correctly after use using (OcrInput input = new OcrInput()) { try { // Load the PDF file into the OcrInput object, providing a password if the PDF is password-protected. // If the PDF does not have a password, consider providing null or an empty string for the password parameter. input.LoadPdf("example.pdf", "password"); // Perform OCR on the loaded input and store the result OcrResult result = ocr.Read(input); // Output the recognized text to the console Console.WriteLine(result.Text); // Output the number of pages recognized in the PDF Console.WriteLine($"{result.Pages.Count} Pages"); // Note: Use result.Pages.Count instead of result.Pages.Length for better C# practices with collections } catch (Exception ex) { // In case of exceptions, output the error message to the console Console.WriteLine("An error occurred during OCR processing: " + ex.Message); } } Imports System ' Import system namespace for Console usage Imports IronOcr ' Import IronOcr namespace to use OCR functionalities ' Create a new instance of IronTesseract, which is the main class for OCR operations Private ocr As New IronTesseract() ' Create an OcrInput object within a using statement to ensure it is disposed of correctly after use Using input As New OcrInput() Try ' Load the PDF file into the OcrInput object, providing a password if the PDF is password-protected. ' If the PDF does not have a password, consider providing null or an empty string for the password parameter. input.LoadPdf("example.pdf", "password") ' Perform OCR on the loaded input and store the result Dim result As OcrResult = ocr.Read(input) ' Output the recognized text to the console Console.WriteLine(result.Text) ' Output the number of pages recognized in the PDF Console.WriteLine($"{result.Pages.Count} Pages") ' Note: Use result.Pages.Count instead of result.Pages.Length for better C# practices with collections Catch ex As Exception ' In case of exceptions, output the error message to the console Console.WriteLine("An error occurred during OCR processing: " & ex.Message) End Try End Using $vbLabelText $csharpLabel Searchable PDFs IronOCR can export results as searchable PDFs, a sought-after feature for applications requiring database updating, SEO, and PDF usability. :path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-11.cs using IronOcr; // Initializes a new instance of the IronTesseract class IronTesseract ocr = new IronTesseract(); // Using block to ensure that resources are disposed of properly using (OcrInput input = new OcrInput()) { // Sets the title for the OCR input input.Title = "Quarterly Report"; // Loads individual images into the OCR input input.AddImage("image1.jpeg"); input.AddImage("image2.png"); // Define specific page indices to load from an image with multiple frames int[] pageIndices = new int[] { 1, 2 }; // Loads specific frames from a multi-frame image (e.g., GIF) input.AddImageFrames("image3.gif", pageIndices); // Performs OCR on the loaded images and stores the result OcrResult result = ocr.Read(input); // Saves the OCR result as a searchable PDF result.SaveAsSearchablePdf("searchable.pdf"); } Imports IronOcr ' Initializes a new instance of the IronTesseract class Private ocr As New IronTesseract() ' Using block to ensure that resources are disposed of properly Using input As New OcrInput() ' Sets the title for the OCR input input.Title = "Quarterly Report" ' Loads individual images into the OCR input input.AddImage("image1.jpeg") input.AddImage("image2.png") ' Define specific page indices to load from an image with multiple frames Dim pageIndices() As Integer = { 1, 2 } ' Loads specific frames from a multi-frame image (e.g., GIF) input.AddImageFrames("image3.gif", pageIndices) ' Performs OCR on the loaded images and stores the result Dim result As OcrResult = ocr.Read(input) ' Saves the OCR result as a searchable PDF result.SaveAsSearchablePdf("searchable.pdf") End Using $vbLabelText $csharpLabel Similarly, convert existing PDFs into searchable ones: :path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-12.cs using IronOcr; // Create an instance of the IronTesseract engine var ocr = new IronTesseract(); // Use a using statement to ensure input resources are disposed of properly using (var input = new OcrInput()) { // Set a title for the input which might be useful for metadata information input.Title = "Pdf Metadata Name"; // Load the PDF file into the OCR input. If the PDF is password protected, provide the password input.LoadPdf("example.pdf", "password"); // Process the OCR operation on the loaded input var result = ocr.Read(input); // Save the result as a searchable PDF result.SaveAsSearchablePdf("searchable.pdf"); } Imports IronOcr ' Create an instance of the IronTesseract engine Private ocr = New IronTesseract() ' Use a using statement to ensure input resources are disposed of properly Using input = New OcrInput() ' Set a title for the input which might be useful for metadata information input.Title = "Pdf Metadata Name" ' Load the PDF file into the OCR input. If the PDF is password protected, provide the password input.LoadPdf("example.pdf", "password") ' Process the OCR operation on the loaded input Dim result = ocr.Read(input) ' Save the result as a searchable PDF result.SaveAsSearchablePdf("searchable.pdf") End Using $vbLabelText $csharpLabel Applying the same technique to TIFF conversions: :path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-13.cs // Import necessary namespace for OCR functionality using IronOCR library using IronOcr; // Instantiate IronTesseract object for OCR operations var ocr = new IronTesseract(); // Utilize using statement for automatic disposal of OcrInput object after usage using (var input = new OcrInput()) { // Set a title for the OCR input, useful for identifying the content in larger projects input.Title = "Pdf Title"; // Define the page indices of the images that are to be processed var pageIndices = new int[] { 1, 2 }; // Load images from the specified TIFF file, using the defined page indices input.LoadImageFrames("example.tiff", pageIndices); // Perform OCR on the input and store the result OcrResult result = ocr.Read(input); // Save the OCR result as a searchable PDF document result.SaveAsSearchablePdf("searchable.pdf"); } ' Import necessary namespace for OCR functionality using IronOCR library Imports IronOcr ' Instantiate IronTesseract object for OCR operations Private ocr = New IronTesseract() ' Utilize using statement for automatic disposal of OcrInput object after usage Using input = New OcrInput() ' Set a title for the OCR input, useful for identifying the content in larger projects input.Title = "Pdf Title" ' Define the page indices of the images that are to be processed Dim pageIndices = New Integer() { 1, 2 } ' Load images from the specified TIFF file, using the defined page indices input.LoadImageFrames("example.tiff", pageIndices) ' Perform OCR on the input and store the result Dim result As OcrResult = ocr.Read(input) ' Save the OCR result as a searchable PDF document result.SaveAsSearchablePdf("searchable.pdf") End Using $vbLabelText $csharpLabel Exporting Hocr HTML IronOCR allows export of OCR results to Hocr HTML, facilitating limited PDF to HTML and TIFF to HTML conversion. :path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-14.cs using IronOcr; // Instantiate the OCR engine var ocr = new IronTesseract(); // Using block to properly dispose of OcrInput resources using (var input = new OcrInput()) { // Set a title for the OCR input input.Title = "Html Title"; // Load images and PDFs for OCR processing input.AddImage("image2.jpeg"); // The LoadPdf method requires a file and can include a password for encrypted PDFs input.AddPdf("example.pdf", "password"); // Load specific frames from a TIFF image using page indices var pageIndices = new int[] { 1, 2 }; input.AddTiff("example.tiff", pageIndices); // Perform OCR on the provided input and obtain the result OcrResult result = ocr.Read(input); // Save the result as a HOCR file, which is a format for representing OCR results result.SaveAsHocrFile("hocr.html"); } Imports IronOcr ' Instantiate the OCR engine Private ocr = New IronTesseract() ' Using block to properly dispose of OcrInput resources Using input = New OcrInput() ' Set a title for the OCR input input.Title = "Html Title" ' Load images and PDFs for OCR processing input.AddImage("image2.jpeg") ' The LoadPdf method requires a file and can include a password for encrypted PDFs input.AddPdf("example.pdf", "password") ' Load specific frames from a TIFF image using page indices Dim pageIndices = New Integer() { 1, 2 } input.AddTiff("example.tiff", pageIndices) ' Perform OCR on the provided input and obtain the result Dim result As OcrResult = ocr.Read(input) ' Save the result as a HOCR file, which is a format for representing OCR results result.SaveAsHocrFile("hocr.html") End Using $vbLabelText $csharpLabel Reading Barcodes in OCR Documents IronOCR uniquely offers the ability to read barcodes and QR codes alongside text recognition. :path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-15.cs // Include the IronOcr namespace to use IronTesseract for OCR and barcode reading. using IronOcr; // Create a new instance of IronTesseract for performing optical character and barcode recognition. var ocr = new IronTesseract(); // Enable barcode reading in the OCR engine's configuration. ocr.Configuration.ReadBarCodes = true; // The OcrInput class allows for the loading of images which are to be processed by the OCR engine. using (var input = new OcrInput()) { // Load an image file into the OcrInput. Ensure the path is accurate relative to the execution directory. // The image should contain a barcode or text for OCR. input.AddImage("img/Barcode.png"); // Perform OCR and barcode reading on the input image. // The Read method returns an OcrResult containing recognized text, barcodes, and more. var result = ocr.Read(input); // Iterate through the detected barcodes in the OcrResult. foreach (var barcode in result.Barcodes) { // Output the barcode value to the console. Console.WriteLine($"Barcode Value: {barcode.Value}"); // Additional barcode properties, such as type and location, can be accessed as needed. Console.WriteLine($"Type: {barcode.Type}, Location: {barcode.Location}"); } } ' Include the IronOcr namespace to use IronTesseract for OCR and barcode reading. Imports IronOcr ' Create a new instance of IronTesseract for performing optical character and barcode recognition. Private ocr = New IronTesseract() ' Enable barcode reading in the OCR engine's configuration. ocr.Configuration.ReadBarCodes = True ' The OcrInput class allows for the loading of images which are to be processed by the OCR engine. Using input = New OcrInput() ' Load an image file into the OcrInput. Ensure the path is accurate relative to the execution directory. ' The image should contain a barcode or text for OCR. input.AddImage("img/Barcode.png") ' Perform OCR and barcode reading on the input image. ' The Read method returns an OcrResult containing recognized text, barcodes, and more. Dim result = ocr.Read(input) ' Iterate through the detected barcodes in the OcrResult. For Each barcode In result.Barcodes ' Output the barcode value to the console. Console.WriteLine($"Barcode Value: {barcode.Value}") ' Additional barcode properties, such as type and location, can be accessed as needed. Console.WriteLine($"Type: {barcode.Type}, Location: {barcode.Location}") Next barcode End Using $vbLabelText $csharpLabel A Detailed Look at Image to Text OCR Results The OCR results object in IronOCR contains comprehensive information that advanced developers can leverage. An OCR result includes collections of pages, each containing barcodes, graphs, text lines, words, and characters. These objects root their details, including location, font, confidence level, etc., giving developers flexibility in data handling. Elements of the .NET OCR Results, like a paragraph, word, or barcode, can be exported as images or bitmaps. :path=/static-assets/ocr/content-code-examples/tutorials/how-to-read-text-from-an-image-in-csharp-net-16.cs using System; using IronOcr; using IronSoftware.Drawing; // Instantiate the IronTesseract OCR engine IronTesseract ocr = new IronTesseract { // Enable reading of barcodes in the OCR process Configuration = { ReadBarCodes = true } }; // Create an OCR input object to hold the images to be analyzed using OcrInput input = new OcrInput(); // Specify the page indexes of the TIFF file to be processed (pages 1 and 2) int[] pageIndices = { 1, 2 }; input.LoadImageFrames(@"img\Potter.tiff", pageIndices); // Process the input images and obtain the OCR result OcrResult result = ocr.Read(input); // Iterate through each page in the OCR result foreach (var page in result.Pages) { // Fetch information about the current page int pageNumber = page.PageNumber; string pageText = page.Text; int pageWordCount = page.WordCount; // Obtain barcodes if any are present on the page OcrResult.Barcode[] barcodes = page.Barcodes; // Convert the page content into images, supporting both AnyBitmap and System.Drawing.Bitmap types AnyBitmap pageImage = page.ToBitmap(); System.Drawing.Bitmap pageImageLegacy = page.ToBitmap(); double pageWidth = page.Width; double pageHeight = page.Height; // Iterate through each paragraph on the current page foreach (var paragraph in page.Paragraphs) { // Extract paragraph details int paragraphNumber = paragraph.ParagraphNumber; string paragraphText = paragraph.Text; System.Drawing.Bitmap paragraphImage = paragraph.ToBitmap(); int paragraphXLocation = paragraph.X; int paragraphYLocation = paragraph.Y; int paragraphWidth = paragraph.Width; int paragraphHeight = paragraph.Height; double paragraphOcrAccuracy = paragraph.Confidence; var paragraphTextDirection = paragraph.TextDirection; // Iterate through each line within the current paragraph foreach (var line in paragraph.Lines) { // Extract line information int lineNumber = line.LineNumber; string lineText = line.Text; AnyBitmap lineImage = line.ToBitmap(); System.Drawing.Bitmap lineImageLegacy = line.ToBitmap(); int lineXLocation = line.X; int lineYLocation = line.Y; int lineWidth = line.Width; int lineHeight = line.Height; double lineOcrAccuracy = line.Confidence; double lineSkew = line.BaselineAngle; double lineOffset = line.BaselineOffset; // Iterate through each word in the line foreach (var word in line.Words) { int wordNumber = word.WordNumber; string wordText = word.Text; AnyBitmap wordImage = word.ToBitmap(); System.Drawing.Image wordImageLegacy = word.ToBitmap(); int wordXLocation = word.X; int wordYLocation = word.Y; int wordWidth = word.Width; int wordHeight = word.Height; double wordOcrAccuracy = word.Confidence; // Check for font details, available only when using certain Tesseract engine modes if (word.Font != null) { string fontName = word.Font.FontName; double fontSize = word.Font.FontSize; bool isBold = word.Font.IsBold; bool isFixedWidth = word.Font.IsFixedWidth; bool isItalic = word.Font.IsItalic; bool isSerif = word.Font.IsSerif; bool isUnderlined = word.Font.IsUnderlined; bool fontIsCaligraphic = word.Font.IsCaligraphic; } // Iterate through each character in the word foreach (var character in word.Characters) { int characterNumber = character.CharacterNumber; string characterText = character.Text; AnyBitmap characterImage = character.ToBitmap(); System.Drawing.Bitmap characterImageLegacy = character.ToBitmap(); int characterXLocation = character.X; int characterYLocation = character.Y; int characterWidth = character.Width; int characterHeight = character.Height; double characterOcrAccuracy = character.Confidence; // Get alternative symbol choices and their probabilities, useful for spell checking OcrResult.Choice[] characterChoices = character.Choices; } } } } } Imports System Imports IronOcr Imports IronSoftware.Drawing ' Instantiate the IronTesseract OCR engine Private ocr As New IronTesseract With { .Configuration = { ReadBarCodes = True } } ' Create an OCR input object to hold the images to be analyzed Private OcrInput As using ' Specify the page indexes of the TIFF file to be processed (pages 1 and 2) Private pageIndices() As Integer = { 1, 2 } input.LoadImageFrames("img\Potter.tiff", pageIndices) ' Process the input images and obtain the OCR result Dim result As OcrResult = ocr.Read(input) ' Iterate through each page in the OCR result For Each page In result.Pages ' Fetch information about the current page Dim pageNumber As Integer = page.PageNumber Dim pageText As String = page.Text Dim pageWordCount As Integer = page.WordCount ' Obtain barcodes if any are present on the page Dim barcodes() As OcrResult.Barcode = page.Barcodes ' Convert the page content into images, supporting both AnyBitmap and System.Drawing.Bitmap types Dim pageImage As AnyBitmap = page.ToBitmap() Dim pageImageLegacy As System.Drawing.Bitmap = page.ToBitmap() Dim pageWidth As Double = page.Width Dim pageHeight As Double = page.Height ' Iterate through each paragraph on the current page For Each paragraph In page.Paragraphs ' Extract paragraph details Dim paragraphNumber As Integer = paragraph.ParagraphNumber Dim paragraphText As String = paragraph.Text Dim paragraphImage As System.Drawing.Bitmap = paragraph.ToBitmap() Dim paragraphXLocation As Integer = paragraph.X Dim paragraphYLocation As Integer = paragraph.Y Dim paragraphWidth As Integer = paragraph.Width Dim paragraphHeight As Integer = paragraph.Height Dim paragraphOcrAccuracy As Double = paragraph.Confidence Dim paragraphTextDirection = paragraph.TextDirection ' Iterate through each line within the current paragraph For Each line In paragraph.Lines ' Extract line information Dim lineNumber As Integer = line.LineNumber Dim lineText As String = line.Text Dim lineImage As AnyBitmap = line.ToBitmap() Dim lineImageLegacy As System.Drawing.Bitmap = line.ToBitmap() Dim lineXLocation As Integer = line.X Dim lineYLocation As Integer = line.Y Dim lineWidth As Integer = line.Width Dim lineHeight As Integer = line.Height Dim lineOcrAccuracy As Double = line.Confidence Dim lineSkew As Double = line.BaselineAngle Dim lineOffset As Double = line.BaselineOffset ' Iterate through each word in the line For Each word In line.Words Dim wordNumber As Integer = word.WordNumber Dim wordText As String = word.Text Dim wordImage As AnyBitmap = word.ToBitmap() Dim wordImageLegacy As System.Drawing.Image = word.ToBitmap() Dim wordXLocation As Integer = word.X Dim wordYLocation As Integer = word.Y Dim wordWidth As Integer = word.Width Dim wordHeight As Integer = word.Height Dim wordOcrAccuracy As Double = word.Confidence ' Check for font details, available only when using certain Tesseract engine modes If word.Font IsNot Nothing Then Dim fontName As String = word.Font.FontName Dim fontSize As Double = word.Font.FontSize Dim isBold As Boolean = word.Font.IsBold Dim isFixedWidth As Boolean = word.Font.IsFixedWidth Dim isItalic As Boolean = word.Font.IsItalic Dim isSerif As Boolean = word.Font.IsSerif Dim isUnderlined As Boolean = word.Font.IsUnderlined Dim fontIsCaligraphic As Boolean = word.Font.IsCaligraphic End If ' Iterate through each character in the word For Each character In word.Characters Dim characterNumber As Integer = character.CharacterNumber Dim characterText As String = character.Text Dim characterImage As AnyBitmap = character.ToBitmap() Dim characterImageLegacy As System.Drawing.Bitmap = character.ToBitmap() Dim characterXLocation As Integer = character.X Dim characterYLocation As Integer = character.Y Dim characterWidth As Integer = character.Width Dim characterHeight As Integer = character.Height Dim characterOcrAccuracy As Double = character.Confidence ' Get alternative symbol choices and their probabilities, useful for spell checking Dim characterChoices() As OcrResult.Choice = character.Choices Next character Next word Next line Next paragraph Next page $vbLabelText $csharpLabel Summary IronOCR provides C# developers with the most advanced Tesseract API, executable across various platforms, including Windows, Linux, and Mac. Its capability to accurately read even imperfect documents at high statistical accuracy is exceptional. Moreover, it supports barcode reading and exporting OCR data as HTML or searchable PDFs, a unique feature compared to other OCR solutions or plain Tesseract. Moving Forward To further explore IronOCR: Check our C# Tesseract OCR Quickstart guide. Browse C# & VB code examples. Dig into the MSDN-style API Reference. Source Code Download Github Repository Source Code Zip Explore more .NET OCR tutorials in this section. Frequently Asked Questions What is OCR? IronOCR is a C# OCR library that allows developers to read text from images and PDFs without using Tesseract, offering improved accuracy and speed. How do I install an OCR library in a .NET project? You can install IronOCR via the NuGet package manager using the command: `Install-Package IronOcr`. What languages does an OCR library support? IronOCR supports 125 international languages, which can be added via downloadable language packs from NuGet. Can an OCR library handle low-quality scans? Yes, IronOCR can handle low-quality and skewed scans by using image filters like deskew and denoise to improve accuracy. How can I read a specific region of an image using an OCR library? You can specify a region using `System.Drawing.Rectangle` to target specific areas of an image for OCR processing. Does an OCR library support multi-page document processing? Yes, IronOCR can process multi-page documents, combining images, TIFF frames, and PDF pages into a single OCR input. What output formats does an OCR library support? IronOCR can output results as strings, searchable PDFs, Hocr HTML, and more, making it versatile for various applications. Can an OCR library read barcodes from documents? Yes, IronOCR can detect and read barcodes and QR codes alongside text recognition from images. How does an OCR library improve upon traditional OCR engines? IronOCR enhances the accuracy and speed of the Tesseract engine, manages complex dictionaries, and supports modern .NET frameworks natively. Is an OCR library compatible with various .NET platforms? IronOCR is compatible with .NET Framework 4.5+, .NET Standard 2+, .NET Core, Xamarin, Mono, Azure, and Docker, providing wide platform support. Chaknith Bin Chat with engineering team now Software Engineer Chaknith works on IronXL and IronBarcode. He has deep expertise in C# and .NET, helping improve the software and support customers. His insights from user interactions contribute to better products, documentation, and overall experience. Ready to Get Started? Free NuGet Download Total downloads: 3,904,374 View Licenses