Bengali OCR in C# and .NET

Other versions of this document:

IronOCR is a C# software component allowing .NET coders to read text from images and PDF documents in 126 languages, including Bengali. It is an advanced fork of Tesseract, built exclusively for .NET developers and regularly outperforms other Tesseract engines for both speed and accuracy.

Contents of IronOcr.Languages.Bengali

This package contains 114 OCR languages for .NET:

  • Bengali
  • BengaliBest
  • BengaliFast
  • BengaliAlphabet
  • BengaliAlphabetBest
  • BengaliAlphabetFast

Download

Bengali Language Pack [Bangla]

Installation

The first thing we have to do is install our Bengali OCR package to your .NET project.

Install-Package IronOCR.Languages.Bengali

Code Example

This C# code example reads Bengali text from an image or PDF document.

// Import the IronOcr namespace
using IronOcr;

class BengaliOcrExample
{
    static void Main()
    {
        // Create an instance of IronTesseract
        var Ocr = new IronTesseract();

        // Specify the language for OCR
        Ocr.Language = OcrLanguage.Bengali;

        // Process the image and extract text
        using (var Input = new OcrInput(@"images\Bengali.png"))
        {
            // Perform OCR on the input image
            var Result = Ocr.Read(Input);

            // Get the extracted text
            var AllText = Result.Text;

            // Output the extracted text to the console
            System.Console.WriteLine(AllText);
        }
    }
}
// Import the IronOcr namespace
using IronOcr;

class BengaliOcrExample
{
    static void Main()
    {
        // Create an instance of IronTesseract
        var Ocr = new IronTesseract();

        // Specify the language for OCR
        Ocr.Language = OcrLanguage.Bengali;

        // Process the image and extract text
        using (var Input = new OcrInput(@"images\Bengali.png"))
        {
            // Perform OCR on the input image
            var Result = Ocr.Read(Input);

            // Get the extracted text
            var AllText = Result.Text;

            // Output the extracted text to the console
            System.Console.WriteLine(AllText);
        }
    }
}
' Import the IronOcr namespace
Imports IronOcr

Friend Class BengaliOcrExample
	Shared Sub Main()
		' Create an instance of IronTesseract
		Dim Ocr = New IronTesseract()

		' Specify the language for OCR
		Ocr.Language = OcrLanguage.Bengali

		' Process the image and extract text
		Using Input = New OcrInput("images\Bengali.png")
			' Perform OCR on the input image
			Dim Result = Ocr.Read(Input)

			' Get the extracted text
			Dim AllText = Result.Text

			' Output the extracted text to the console
			System.Console.WriteLine(AllText)
		End Using
	End Sub
End Class
$vbLabelText   $csharpLabel

Explanation

  1. Import IronOcr: We start by importing the IronOcr namespace, which contains classes and methods necessary to perform OCR operations.

  2. Create IronTesseract Instance: We create an instance of IronTesseract, which is the main class for performing OCR.

  3. Set Language: We set the OCR language to Bengali using OcrLanguage.Bengali.

  4. OcrInput: We specify the path to the image from which we want to extract text. An OcrInput object is used to load and preprocess the input file.

  5. Read and Extract Text: Using the Read method, we process the image to read the text content. The text is stored in Result.Text.

  6. Output Text: Finally, we print the extracted text to the console to verify the output.