Han Simplified Alphabet OCR in C# and .NET

126 More Languages

IronOCR is a C# software component allowing .NET coders to read text from images and PDF documents in 126 languages, including the Han Simplified Alphabet.

It is an advanced fork of Tesseract, built exclusively for .NET developers and regularly outperforms other Tesseract engines for both speed and accuracy.

Contents of IronOcr.Languages.Han

This package contains 400 OCR languages for .NET:

  • HanSimplifiedAlphabet
  • HanSimplifiedAlphabetBest
  • HanSimplifiedAlphabetFast
  • HanSimplifiedVerticalAlphabet
  • HanSimplifiedVerticalAlphabetBest
  • HanSimplifiedVerticalAlphabetFast
  • HanTraditionalAlphabet
  • HanTraditionalAlphabetBest
  • HanTraditionalAlphabetFast
  • HanTraditionalVerticalAlphabet
  • HanTraditionalVerticalAlphabetBest
  • HanTraditionalVerticalAlphabetFast

Download

Han Simplified Alphabet Language Pack [Samhan]

Installation

The first thing we have to do is install our Han Simplified Alphabet OCR package to your .NET project.

Run the following command in the Package Manager Console:

Install-Package IronOCR.Languages.Han

Code Example

This C# code example reads Han Simplified Alphabet text from an image or PDF document.

// Reference the IronOcr library
using IronOcr;

class Program
{
    static void Main()
    {
        // Create an IronTesseract OCR engine
        var Ocr = new IronTesseract();

        // Load the Han language for OCR processing
        Ocr.Language = OcrLanguage.Han;

        // Using a 'using' statement for resource management
        using (var Input = new OcrInput(@"images\Han.png"))
        {
            // Process the image to extract text
            var Result = Ocr.Read(Input);

            // Retrieve and display the extracted text
            string AllText = Result.Text;
            System.Console.WriteLine(AllText);
        }
    }
}
// Reference the IronOcr library
using IronOcr;

class Program
{
    static void Main()
    {
        // Create an IronTesseract OCR engine
        var Ocr = new IronTesseract();

        // Load the Han language for OCR processing
        Ocr.Language = OcrLanguage.Han;

        // Using a 'using' statement for resource management
        using (var Input = new OcrInput(@"images\Han.png"))
        {
            // Process the image to extract text
            var Result = Ocr.Read(Input);

            // Retrieve and display the extracted text
            string AllText = Result.Text;
            System.Console.WriteLine(AllText);
        }
    }
}
' Reference the IronOcr library
Imports IronOcr

Friend Class Program
	Shared Sub Main()
		' Create an IronTesseract OCR engine
		Dim Ocr = New IronTesseract()

		' Load the Han language for OCR processing
		Ocr.Language = OcrLanguage.Han

		' Using a 'using' statement for resource management
		Using Input = New OcrInput("images\Han.png")
			' Process the image to extract text
			Dim Result = Ocr.Read(Input)

			' Retrieve and display the extracted text
			Dim AllText As String = Result.Text
			System.Console.WriteLine(AllText)
		End Using
	End Sub
End Class
$vbLabelText   $csharpLabel

Explanation

  • We start by referencing the IronOcr library to use its OCR capabilities.
  • An instance of IronTesseract is created to process the image/PDF documents.
  • The language for the OCR process is set to Han using Ocr.Language.
  • An image is loaded using OcrInput and processed by calling Ocr.Read().
  • The result of the OCR process is stored in Result.Text, which contains the extracted text from the document.
  • We finally print the text to the console.

Ensure to have the proper using directives and manage resources efficiently with using statements, especially when dealing with unmanaged resources like file streams.