Marathi OCR in C# and .NET

Other versions of this document:

IronOCR is a C# software component allowing .NET coders to read text from images and PDF documents in 126 languages, including Marathi.

It is an advanced fork of Tesseract, built exclusively for the .NET developers and regularly outperforms other Tesseract engines for both speed and accuracy.

Contents of IronOcr.Languages.Marathi

This package contains 46 OCR languages for .NET:

  • Marathi
  • MarathiBest
  • MarathiFast

Download

Marathi Language Pack [मराठी]

Installation

The first thing we have to do is install our Marathi OCR package to your .NET project.

Install-Package IronOCR.Languages.Marathi

Code Example

This C# code example reads Marathi text from an Image or PDF document.

// Include the IronOcr namespace
using IronOcr;

class Program
{
    static void Main()
    {
        // Initialize the OCR engine
        var Ocr = new IronTesseract();

        // Specify the language as Marathi
        Ocr.Language = OcrLanguage.Marathi;

        // Load the image or PDF document to be processed
        using (var Input = new OcrInput(@"images\Marathi.png"))
        {
            // Perform OCR on the input document
            var Result = Ocr.Read(Input);

            // Get the recognized text
            var AllText = Result.Text;

            // Output the recognized text to console
            Console.WriteLine(AllText);
        }
    }
}
// Include the IronOcr namespace
using IronOcr;

class Program
{
    static void Main()
    {
        // Initialize the OCR engine
        var Ocr = new IronTesseract();

        // Specify the language as Marathi
        Ocr.Language = OcrLanguage.Marathi;

        // Load the image or PDF document to be processed
        using (var Input = new OcrInput(@"images\Marathi.png"))
        {
            // Perform OCR on the input document
            var Result = Ocr.Read(Input);

            // Get the recognized text
            var AllText = Result.Text;

            // Output the recognized text to console
            Console.WriteLine(AllText);
        }
    }
}
' Include the IronOcr namespace
Imports IronOcr

Friend Class Program
	Shared Sub Main()
		' Initialize the OCR engine
		Dim Ocr = New IronTesseract()

		' Specify the language as Marathi
		Ocr.Language = OcrLanguage.Marathi

		' Load the image or PDF document to be processed
		Using Input = New OcrInput("images\Marathi.png")
			' Perform OCR on the input document
			Dim Result = Ocr.Read(Input)

			' Get the recognized text
			Dim AllText = Result.Text

			' Output the recognized text to console
			Console.WriteLine(AllText)
		End Using
	End Sub
End Class
$vbLabelText   $csharpLabel

Explanation:

  • This code uses the IronTesseract class from the IronOCR library to perform OCR.
  • The Ocr.Language property is set to use the Marathi language pack.
  • An OcrInput is created using the path to the image or PDF containing the Marathi text.
  • The Ocr.Read() method processes the input and extracts the text.
  • The recognized text is stored in the AllText variable and is printed to the console.