Han Simplified Alphabet OCR in C# and .NET
IronOCR is a C# software component allowing .NET coders to read text from images and PDF documents in 126 languages, including the Han Simplified Alphabet.
It is an advanced fork of Tesseract, built exclusively for .NET developers and regularly outperforms other Tesseract engines for both speed and accuracy.
Contents of IronOcr.Languages.Han
This package contains 400 OCR languages for .NET:
- HanSimplifiedAlphabet
- HanSimplifiedAlphabetBest
- HanSimplifiedAlphabetFast
- HanSimplifiedVerticalAlphabet
- HanSimplifiedVerticalAlphabetBest
- HanSimplifiedVerticalAlphabetFast
- HanTraditionalAlphabet
- HanTraditionalAlphabetBest
- HanTraditionalAlphabetFast
- HanTraditionalVerticalAlphabet
- HanTraditionalVerticalAlphabetBest
- HanTraditionalVerticalAlphabetFast
Download
Han Simplified Alphabet Language Pack [Samhan]
Installation
The first thing we have to do is install our Han Simplified Alphabet OCR package to your .NET project.
Run the following command in the Package Manager Console:
Install-Package IronOCR.Languages.Han
Code Example
This C# code example reads Han Simplified Alphabet text from an image or PDF document.
// Reference the IronOcr library
using IronOcr;
class Program
{
static void Main()
{
// Create an IronTesseract OCR engine
var Ocr = new IronTesseract();
// Load the Han language for OCR processing
Ocr.Language = OcrLanguage.Han;
// Using a 'using' statement for resource management
using (var Input = new OcrInput(@"images\Han.png"))
{
// Process the image to extract text
var Result = Ocr.Read(Input);
// Retrieve and display the extracted text
string AllText = Result.Text;
System.Console.WriteLine(AllText);
}
}
}
// Reference the IronOcr library
using IronOcr;
class Program
{
static void Main()
{
// Create an IronTesseract OCR engine
var Ocr = new IronTesseract();
// Load the Han language for OCR processing
Ocr.Language = OcrLanguage.Han;
// Using a 'using' statement for resource management
using (var Input = new OcrInput(@"images\Han.png"))
{
// Process the image to extract text
var Result = Ocr.Read(Input);
// Retrieve and display the extracted text
string AllText = Result.Text;
System.Console.WriteLine(AllText);
}
}
}
' Reference the IronOcr library
Imports IronOcr
Friend Class Program
Shared Sub Main()
' Create an IronTesseract OCR engine
Dim Ocr = New IronTesseract()
' Load the Han language for OCR processing
Ocr.Language = OcrLanguage.Han
' Using a 'using' statement for resource management
Using Input = New OcrInput("images\Han.png")
' Process the image to extract text
Dim Result = Ocr.Read(Input)
' Retrieve and display the extracted text
Dim AllText As String = Result.Text
System.Console.WriteLine(AllText)
End Using
End Sub
End Class
Explanation
- We start by referencing the IronOcr library to use its OCR capabilities.
- An instance of
IronTesseract
is created to process the image/PDF documents. - The language for the OCR process is set to
Han
usingOcr.Language
. - An image is loaded using
OcrInput
and processed by callingOcr.Read()
. - The result of the OCR process is stored in
Result.Text
, which contains the extracted text from the document. - We finally print the text to the console.
Ensure to have the proper using
directives and manage resources efficiently with using
statements, especially when dealing with unmanaged resources like file streams.