How to extract text from an image file

In this tutorial, we explore the process of extracting text from images using Iron OCR, a powerful library for C#. The session begins with setting up a C# console application in Visual Studio and installing the Iron OCR library via the NuGet Package Manager.

Once the library is imported, an IronTesseract object is initialized, and its configuration options are fine-tuned to enable barcode reading and set the language to English. This setup allows for accurate text recognition and enhanced performance through multi-threading. Additional features include rendering PDFs and setting page segmentation mode to Auto OSD, which automatically segments and divides lines with words.

The tutorial further explains how to use configuration variables for behavior fine-tuning, such as enabling parallelization for smooth execution and recognizing table layouts. Text inversion is disabled to improve results. The tutorial provides a link for more configuration options.

Next, an image file is loaded using the OCR input object, and the Iron OCR is used to extract text from the image. The recognized text is output to the console, demonstrating the library's high accuracy.

The tutorial concludes by highlighting Iron OCR as a powerful tool for extracting text from images and PDFs, encouraging viewers to try it with a provided trial link.

Further Reading

How to use Iron Tesseract

using IronOcr;
using System;

namespace IronOCRExample
{
    class Program
    {
        static void Main(string[] args)
        {
            // Create a new instance of the IronTesseract class
            var Ocr = new IronTesseract();

            // Configuration settings for better accuracy
            Ocr.Configuration.ReadBarCodes = true; // Enable barcode reading
            Ocr.Configuration.Language = OcrLanguage.English; // Set language to English
            Ocr.Configuration.RenderSearchablePdfsAndHocr = true; // Enable rendering to searchable PDFs
            Ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd; // Auto segment lines and words
            Ocr.Configuration.EngineMode = TesseractEngineMode.TesseractAndLstm; // Use a combination of Tesseract and LSTM engines

            // Further fine-tuning
            Ocr.Configuration.TesseractVariables["user_defined_dpi"] = "300"; // Set dpi to 300 for better resolution
            Ocr.Configuration.TesseractVariables["tessedit_parallelize"] = "true"; // Enable parallel processing
            Ocr.Configuration.TesseractVariables["textord_tabfind_find_tables"] = "true"; // Table recognition
            Ocr.Configuration.TesseractVariables["invert"] = "false"; // Disable inversion

            // Load the image into the OCR input
            using (var Input = new OcrInput(@"input-image.png"))
            {
                // Perform OCR
                var Result = Ocr.Read(Input);

                // Output the recognized text to the console
                Console.WriteLine(Result.Text);
            }
        }
    }
}
using IronOcr;
using System;

namespace IronOCRExample
{
    class Program
    {
        static void Main(string[] args)
        {
            // Create a new instance of the IronTesseract class
            var Ocr = new IronTesseract();

            // Configuration settings for better accuracy
            Ocr.Configuration.ReadBarCodes = true; // Enable barcode reading
            Ocr.Configuration.Language = OcrLanguage.English; // Set language to English
            Ocr.Configuration.RenderSearchablePdfsAndHocr = true; // Enable rendering to searchable PDFs
            Ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd; // Auto segment lines and words
            Ocr.Configuration.EngineMode = TesseractEngineMode.TesseractAndLstm; // Use a combination of Tesseract and LSTM engines

            // Further fine-tuning
            Ocr.Configuration.TesseractVariables["user_defined_dpi"] = "300"; // Set dpi to 300 for better resolution
            Ocr.Configuration.TesseractVariables["tessedit_parallelize"] = "true"; // Enable parallel processing
            Ocr.Configuration.TesseractVariables["textord_tabfind_find_tables"] = "true"; // Table recognition
            Ocr.Configuration.TesseractVariables["invert"] = "false"; // Disable inversion

            // Load the image into the OCR input
            using (var Input = new OcrInput(@"input-image.png"))
            {
                // Perform OCR
                var Result = Ocr.Read(Input);

                // Output the recognized text to the console
                Console.WriteLine(Result.Text);
            }
        }
    }
}
Imports IronOcr
Imports System

Namespace IronOCRExample
	Friend Class Program
		Shared Sub Main(ByVal args() As String)
			' Create a new instance of the IronTesseract class
			Dim Ocr = New IronTesseract()

			' Configuration settings for better accuracy
			Ocr.Configuration.ReadBarCodes = True ' Enable barcode reading
			Ocr.Configuration.Language = OcrLanguage.English ' Set language to English
			Ocr.Configuration.RenderSearchablePdfsAndHocr = True ' Enable rendering to searchable PDFs
			Ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd ' Auto segment lines and words
			Ocr.Configuration.EngineMode = TesseractEngineMode.TesseractAndLstm ' Use a combination of Tesseract and LSTM engines

			' Further fine-tuning
			Ocr.Configuration.TesseractVariables("user_defined_dpi") = "300" ' Set dpi to 300 for better resolution
			Ocr.Configuration.TesseractVariables("tessedit_parallelize") = "true" ' Enable parallel processing
			Ocr.Configuration.TesseractVariables("textord_tabfind_find_tables") = "true" ' Table recognition
			Ocr.Configuration.TesseractVariables("invert") = "false" ' Disable inversion

			' Load the image into the OCR input
			Using Input = New OcrInput("input-image.png")
				' Perform OCR
				Dim Result = Ocr.Read(Input)

				' Output the recognized text to the console
				Console.WriteLine(Result.Text)
			End Using
		End Sub
	End Class
End Namespace
$vbLabelText   $csharpLabel

The above C# code demonstrates how to set up and use Iron OCR in a console application to read text from an image file. The configuration is optimized for accuracy by enabling barcode reading, setting language preferences, and allowing multi-threading for faster performance.

Kannaopat Udonpant
Software Engineer
Before becoming a Software Engineer, Kannapat completed a Environmental Resources PhD from Hokkaido University in Japan. While pursuing his degree, Kannapat also became a member of the Vehicle Robotics Laboratory, which is part of the Department of Bioproduction Engineering. In 2022, he leveraged his C# skills to join Iron Software's engineering team, where he focuses on IronPDF. Kannapat values his job because he learns directly from the developer who writes most of the code used in IronPDF. In addition to peer learning, Kannapat enjoys the social aspect of working at Iron Software. When he's not writing code or documentation, Kannapat can usually be found gaming on his PS5 or rewatching The Last of Us.
< PREVIOUS
How to use Multiple Languages with Tesseract
NEXT >
How to Use Input Images for OCR Processing in C#

Report an Issue