Japanese OCR in C# and .NET
Other versions of this document:
IronOCR is a C# software component allowing .NET coders to read text from images and PDF documents in 126 languages, including Japanese. It is an advanced fork of Tesseract, built exclusively for .NET developers, and regularly outperforms other Tesseract engines for both speed and accuracy.
It has been tested on many different hardware platforms, and the software library has been updated to the latest version of .NET. It is a good choice for developers who need to perform OCR in their apps or projects. IronOCR provides application developers with an easy-to-use API that works with many languages and can be integrated into applications in various ways. IronOCR supports 127 OCR languages. It is a fantastic OCR tool.
Contents of IronOcr.Languages.Japanese
The IronOCR Japanese package performs a large variety of operations. This package contains 286 OCR languages for .NET:
- JapaneseAlphabet
- JapaneseAlphabetBest
- JapaneseAlphabetFast
- JapaneseVerticalAlphabet
- JapaneseVerticalAlphabetBest
- JapaneseVerticalAlphabetFast
- Japanese
- JapaneseBest
- JapaneseFast
- JapaneseVertical
- JapaneseVerticalBest
- JapaneseVerticalFast
Download
You can download the IronOCR Japanese Language Pack [日本語 (にほんご)] from the following links:
We will look at the installation of IronOCR in the following sections.
Using IronOCR for the Japanese Language
Create or Open a C# Project
Let's start by creating a C# project. We are using Visual Studio 2022 to create a C# project — you can choose any version according to your preference. The latest version of Visual Studio is recommended. Follow the steps below to create a C# project:
- Open Visual Studio 2022.
- Click on the "Create a new project" button.
- Write "Windows" in the search bar, select the "Windows Form" application from the search results and click on the "Next" button.
- Give a name to the project. I have named the project "JapaneseOCR." Once named, click on the "Next" button.
- Select the .NET framework on the next screen. Select the .NET framework according to the needs of your project. We are selecting the .NET 5.0 version for this tutorial.
- After selecting, click on the "Create" button. It will easily create the C# Windows Form project in Visual Studio.
The project has been created! We can now use it in our IronOCR library. We can also use it with an existing C# project. Open the project and begin the installation of the IronOCR library. The following section explains how to install the IronOCR library in C# projects.
Installation
It is now time to install the IronOCR library in our project. IronOCR library can be installed in two different ways. We can install IronOCR using the Package Manager Console and NuGet Package Manager. Let's take a look at both methods.
Using NuGet Package Manager
To install the IronOCR library with NuGet Package Manager, we must open the NuGet Package Manager interface. Follow the following steps to install the IronOCR library:
- Click on the "Tools" from the main menu, from the drop-down menu, hover on "NuGet Package Manager," and select the "Manage NuGet Package Manager for Solution."
- This will open the NuGet Package Manager interface. Go to the browse tab and write IronOCR Japanese in the search bar. Select the Japanese package from the search results and click on the "Install" button to begin the installation.
- It will start installing the library. After installation, you will be able to use the IronOCR library in your project.
Using the Package Manager Console
We will install the IronOCR library using the Package Manager Console. It is straightforward to install the library using the console. Let's look at how we can install the IronOCR library using the console:
- Open the project and go to the Package Manager Console in Visual Studio. It is usually found at the bottom of Visual Studio.
- Write the following command in the console to install the IronOCR Japanese OCR language:
PM> Install-Package IronOCR.Languages.Japanese
- The installation will begin, and you will see the progress in the console. After installation, you will see the IronOCR dependency in the "dependencies" section in Solution Explorer.
After installation, you will be able to use the library without any third-party software. Next, it's time to set up the front-end of our program.
Code Example: Japanese language for OCR
It is now time to write the code for implementing the IronOCR library for the Japanese language. First, we have to develop the frontend for selecting the image file. Let's look at how to do this.
Development of the Frontend
For front-end development, we will take advantage of the "Toolbox" in Visual Studio. It has many pre-made elements that we can use to design our front end. We will use a picture box, a rich text box, a button, and two labels for identification. We will give these elements a proper shape and a good UI according to our needs. You can change the size and properties of the ingredients according to your choice. We make the output text box uneditable, and the picture property of the picture box is "Zoom," so every image will fit in the picture box. Our frontend will look like this:
Backend code for IronOCR:
Our front end is ready. Now it's time to make it live. But first, we have to import the IronOCR namespace to use IronOCR in our code. Write the following line on the top of the file:
using IronOCR;
using IronOCR;
Imports IronOCR
The "Select Image" button will be used for selecting Japanese images. And when we choose the image, it will automatically load into the picture box and be visible. At the same time, IronOCR will start recognizing Japanese words from uploaded images. After the process is complete, the output will show in the rich text box. Let's add these functionalities to the button by double-clicking on it. Here is the example of code for the button functionality. It will also save the output text into a "txt" file.
Code Example
private void btn_image_Click(object sender, EventArgs e)
{
OpenFileDialog open = new OpenFileDialog();
if (open.ShowDialog() == DialogResult.OK)
{
// display image in picture box
img_image.Image = new Bitmap(open.FileName);
// image file path
//textBox1.Text = open.FileName;
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.JapaneseBest;
using (var Input = new OcrInput(open.FileName))
{
var Result = Ocr.Read(Input);
txt_output.Text = Result.Text;
Result.SaveAsTextFile("JapaneseText.txt");
}
}
}
private void btn_image_Click(object sender, EventArgs e)
{
OpenFileDialog open = new OpenFileDialog();
if (open.ShowDialog() == DialogResult.OK)
{
// display image in picture box
img_image.Image = new Bitmap(open.FileName);
// image file path
//textBox1.Text = open.FileName;
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.JapaneseBest;
using (var Input = new OcrInput(open.FileName))
{
var Result = Ocr.Read(Input);
txt_output.Text = Result.Text;
Result.SaveAsTextFile("JapaneseText.txt");
}
}
}
Private Sub btn_image_Click(ByVal sender As Object, ByVal e As EventArgs)
Dim open As New OpenFileDialog()
If open.ShowDialog() = DialogResult.OK Then
' display image in picture box
img_image.Image = New Bitmap(open.FileName)
' image file path
'textBox1.Text = open.FileName;
Dim Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.JapaneseBest
Using Input = New OcrInput(open.FileName)
Dim Result = Ocr.Read(Input)
txt_output.Text = Result.Text
Result.SaveAsTextFile("JapaneseText.txt")
End Using
End If
End Sub
In this function, we implement when the user clicks on the button, a selection dialogue box will appear and prompts the user to select an image that contains Japanese text. When the user selects the image, the Bitmap() function takes the image path and loads it into the picture box. After loading, we initialize the IronOCR library and set the language to "Japanese". IronOCR will take the image path as input and start scanning. After scanning, it stores output text in the "Result" variable and shows it as output in the rich text box. Finally, it will save the output files as "txt" files with the name "JapaneseText."
Run the Project
We have designed the code and implemented the backend. Now it's time to run the program to check if the functionality is working well or not.
- Click on the green play button to run the project. We will see this screen in the middle of our operating systems' screens.
- Click on the "Select Image" button and select the image that contains Japanese text.
- You will see the output image text in the rich text box.
- A text file of the OCR result will be saved with "JapaneseText."
The OCR accuracy of IronOCR is excellent.
Summary
In this tutorial, we have learned how to use the IronOCR library for Japanese text. If you want to know more about IronOCR, please click on this link.
For more information about Iron Software, please click on this link. If you want to try the IronOCR library, then you can activate the free trial without any payment. Iron Software is currently providing a special offer where you can buy a suite of five software products for the price of just two. For more information, please click here.