How to Extract Text from Images in C#

In this tutorial, we delve into the process of using Iron OCR, a free library for optical character recognition (OCR), to extract text from image files. The tutorial begins with the installation of Iron OCR through NuGet Package Manager and proceeds to guide users through writing a program in C#. Key steps include adding the Iron OCR namespace, initializing the IronTesseract class, and passing the image file path to OCR input. The tutorial demonstrates the accuracy of Iron OCR in reading both simple and complex text images, including those with large bodies of text and less-than-ideal conditions such as crinkled, rotated, or skewed pages. The tutorial emphasizes the library's ability to manage various image formats, providing detailed instructions on saving the output as a text file or PDF. The effectiveness of Iron OCR is highlighted, showcasing its ability to accurately read and generate text outputs even from challenging inputs. The tutorial concludes with encouragement to reach out for support if needed, ensuring users can fully leverage the powerful capabilities of Iron OCR.

Here's how you can implement OCR in a C# application using Iron OCR:

// Import the Iron OCR namespace
using IronOcr;

class OcrExample
{
    static void Main(string[] args)
    {
        // Initialize the IronTesseract class
        var Ocr = new IronTesseract();

        // Specify the path to the image file
        string imagePath = "path/to/your/imagefile.jpg";

        // Use a 'using' statement to properly handle resources
        using (var input = new OcrInput(imagePath))
        {
            // Perform OCR on the image input
            var Result = Ocr.Read(input);

            // Output the OCR result to the console
            Console.WriteLine(Result.Text);

            // Save the OCR result to a text file
            Result.SaveAsTextFile("output_text.txt");

            // Optionally Save the OCR result to a PDF file
            Result.SaveAsPdf("output.pdf");
        }
    }
}
// Import the Iron OCR namespace
using IronOcr;

class OcrExample
{
    static void Main(string[] args)
    {
        // Initialize the IronTesseract class
        var Ocr = new IronTesseract();

        // Specify the path to the image file
        string imagePath = "path/to/your/imagefile.jpg";

        // Use a 'using' statement to properly handle resources
        using (var input = new OcrInput(imagePath))
        {
            // Perform OCR on the image input
            var Result = Ocr.Read(input);

            // Output the OCR result to the console
            Console.WriteLine(Result.Text);

            // Save the OCR result to a text file
            Result.SaveAsTextFile("output_text.txt");

            // Optionally Save the OCR result to a PDF file
            Result.SaveAsPdf("output.pdf");
        }
    }
}
' Import the Iron OCR namespace
Imports IronOcr

Friend Class OcrExample
	Shared Sub Main(ByVal args() As String)
		' Initialize the IronTesseract class
		Dim Ocr = New IronTesseract()

		' Specify the path to the image file
		Dim imagePath As String = "path/to/your/imagefile.jpg"

		' Use a 'using' statement to properly handle resources
		Using input = New OcrInput(imagePath)
			' Perform OCR on the image input
			Dim Result = Ocr.Read(input)

			' Output the OCR result to the console
			Console.WriteLine(Result.Text)

			' Save the OCR result to a text file
			Result.SaveAsTextFile("output_text.txt")

			' Optionally Save the OCR result to a PDF file
			Result.SaveAsPdf("output.pdf")
		End Using
	End Sub
End Class
$vbLabelText   $csharpLabel

Explaining the Code

  1. Namespace Importing: The using IronOcr; statement is necessary to access the OCR functionalities provided by Iron OCR.

  2. Class Initialization: An instance of IronTesseract is created to provide the OCR capability. This class is the core component of the Iron OCR library and handles image processing and text recognition.

  3. Image Path Specification: The variable imagePath holds the path to the image file from which text is to be extracted. Ensure this path is correctly specified based on the location of your image.

  4. Using OcrInput: The using statement ensures that resources are efficiently managed and released. The OcrInput class is utilized here to specify the image for OCR processing.

  5. Executing OCR: The Ocr.Read function processes the image and extracts the text content, storing it in the Result.

  6. Output Handling: The OCR result is outputted to the console using Console.WriteLine(Result.Text). Additionally, the text can be saved to both a text file and a PDF file through the SaveAsTextFile and SaveAsPdf methods, respectively.

By following this guide, users can effectively leverage Iron OCR to extract text from images and manage the output in various formats.

Further Reading: Read Text from Images with C# OCR

Kannaopat Udonpant
Software Engineer
Before becoming a Software Engineer, Kannapat completed a Environmental Resources PhD from Hokkaido University in Japan. While pursuing his degree, Kannapat also became a member of the Vehicle Robotics Laboratory, which is part of the Department of Bioproduction Engineering. In 2022, he leveraged his C# skills to join Iron Software's engineering team, where he focuses on IronPDF. Kannapat values his job because he learns directly from the developer who writes most of the code used in IronPDF. In addition to peer learning, Kannapat enjoys the social aspect of working at Iron Software. When he's not writing code or documentation, Kannapat can usually be found gaming on his PS5 or rewatching The Last of Us.
< PREVIOUS
How to Custom Font Training for Tesseract 5 in C#
NEXT >
Why IronOCR is better than the Tesseract 4 Nuget Package