OCR Indonesia dalam C#dan .NET

Versi lain dari dokumen ini:

IronOCR adalah komponen perangkat lunak C# yang memungkinkan pembuat kode .NET membaca teks dari gambar dan dokumen PDF dalam 126 bahasa, termasuk bahasa Indonesia.

Ini adalah cabang lanjutan dari Tesseract, dibuat secara eksklusif untuk para pengembang .NET dan secara teratur mengungguli mesin Tesseract lainnya dalam hal kecepatan dan akurasi.

Isi IronOcr.Languages.Indonesian

Paket ini berisi 55 bahasa OCR untuk .NET:

  • Bahasa Indonesia
  • IndonesianBest
  • IndonesianFast

Unduh

Paket Bahasa Indonesia [Bahasa Indonesia]

Instalasi

Hal pertama yang harus kita lakukan adalah menginstal paket OCR Indonesia kita ke proyek .NET Anda.

Install-Package IronOCR.Languages.Indonesian

Contoh Kode

Contoh kode C# ini membaca teks bahasa Indonesia dari dokumen Gambar atau PDF.

// Ensure the IronOcr library is installed
using IronOcr;

// Create a new instance of IronTesseract
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Indonesian; // Set the language to Indonesian

// Use 'using' keyword to ensure resources are cleaned up
using (var Input = new OcrInput(@"images\Indonesian.png"))
{
    // Perform OCR on the input image
    var Result = Ocr.Read(Input);
    // Store the result text
    var AllText = Result.Text;
}
// Ensure the IronOcr library is installed
using IronOcr;

// Create a new instance of IronTesseract
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Indonesian; // Set the language to Indonesian

// Use 'using' keyword to ensure resources are cleaned up
using (var Input = new OcrInput(@"images\Indonesian.png"))
{
    // Perform OCR on the input image
    var Result = Ocr.Read(Input);
    // Store the result text
    var AllText = Result.Text;
}
' Ensure the IronOcr library is installed
Imports IronOcr

' Create a new instance of IronTesseract
Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Indonesian ' Set the language to Indonesian

' Use 'using' keyword to ensure resources are cleaned up
Using Input = New OcrInput("images\Indonesian.png")
	' Perform OCR on the input image
	Dim Result = Ocr.Read(Input)
	' Store the result text
	Dim AllText = Result.Text
End Using
$vbLabelText   $csharpLabel

Mengapa Memilih IronOCR?

IronOCR adalah pustaka perangkat lunak .NET yang mudah dipasang, lengkap, dan terdokumentasi dengan baik.

Pilih IronOCR untuk mencapai akurasi 99,8% + OCR tanpa menggunakan layanan web eksternal, biaya berkelanjutan, atau mengirim dokumen rahasia melalui internet.

Mengapa pengembang C# memilih IronOCR daripada Vanilla Tesseract:

  • Pasang sebagai DLL atau NuGet tunggal
  • Termasuk untuk Tesseract 5, 4 dan 3 Engine di luar kotak.
  • Akurasi 99,8% secara signifikan mengungguli Tesseract biasa.
  • Kecepatan Tinggi dan MultiThreading
  • MVC, WebApp, Desktop, Konsol & Aplikasi Server kompatibel
  • Tidak ada kode Exes atau C ++ untuk digunakan
  • Dukungan PDF OCR penuh
  • Untuk melakukan OCR hampir semua file Gambar atau PDF
  • Dukungan penuh .NET Core, Standard dan FrameWork
  • Terapkan di Windows, Mac, Linux, Azure, Docker, Lambda, AWS
  • Baca kode batang dan kode QR
  • Ekspor OCR sebagai XHTML
  • Ekspor OCR ke dokumen PDF yang dapat dicari
  • Dukungan multithreading
  • 126 bahasa internasional semuanya dikelola melalui file NuGet atau OcrData
  • Ekstrak Gambar, Koordinat, Statistik, dan Font. Bukan hanya teks.
  • Dapat digunakan untuk mendistribusikan ulang Tesseract OCR di dalam aplikasi komersial & eksklusif.

IronOCR bersinar saat bekerja dengan gambar dunia nyata dan dokumen yang tidak sempurna seperti foto, atau pindaian resolusi rendah yang mungkin memiliki gangguan atau ketidaksempurnaan digital.

Pustaka OCR gratis lainnya untuk platform .NET seperti API .net tesseract dan layanan web lainnya tidak bekerja dengan baik pada kasus penggunaan dunia nyata ini.

OCR dengan Tesseract 5 - Mulai Coding di C#

Contoh kode di bawah ini menunjukkan betapa mudahnya membaca teks dari gambar menggunakan C# atau VB .NET.

OneLiner

// Perform a quick OCR on the screenshot and output the text
string Text = new IronTesseract().Read(@"img\Screenshot.png").Text;
// Perform a quick OCR on the screenshot and output the text
string Text = new IronTesseract().Read(@"img\Screenshot.png").Text;
' Perform a quick OCR on the screenshot and output the text
Dim Text As String = (New IronTesseract()).Read("img\Screenshot.png").Text
$vbLabelText   $csharpLabel

Hello World yang dapat dikonfigurasi

// Ensure the IronOcr library is installed
using IronOcr;

// Create a new instance of IronTesseract
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Indonesian; // Set the language to Indonesian

using (var Input = new OcrInput())
{
    // Add image to OCR input
    Input.AddImage("images/sample.jpeg");

    // Perform OCR on the input
    var Result = Ocr.Read(Input);

    // Output the result text
    Console.WriteLine(Result.Text);
}
// Ensure the IronOcr library is installed
using IronOcr;

// Create a new instance of IronTesseract
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Indonesian; // Set the language to Indonesian

using (var Input = new OcrInput())
{
    // Add image to OCR input
    Input.AddImage("images/sample.jpeg");

    // Perform OCR on the input
    var Result = Ocr.Read(Input);

    // Output the result text
    Console.WriteLine(Result.Text);
}
' Ensure the IronOcr library is installed
Imports IronOcr

' Create a new instance of IronTesseract
Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Indonesian ' Set the language to Indonesian

Using Input = New OcrInput()
	' Add image to OCR input
	Input.AddImage("images/sample.jpeg")

	' Perform OCR on the input
	Dim Result = Ocr.Read(Input)

	' Output the result text
	Console.WriteLine(Result.Text)
End Using
$vbLabelText   $csharpLabel

C# PDF OCR

Pendekatan yang sama juga dapat digunakan untuk mengekstrak teks dari dokumen PDF apa pun.

using IronOcr;

// Create a new instance of IronTesseract
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Indonesian; // Set the language to Indonesian

using (var input = new OcrInput())
{
    input.AddPdf("example.pdf", "password"); // Add a protected PDF to OCR Input
    // Perform OCR
    var Result = Ocr.Read(input);

    // Output the result text
    Console.WriteLine(Result.Text);
    Console.WriteLine($"{Result.Pages.Count()} Pages"); // Display the number of pages in the PDF
}
using IronOcr;

// Create a new instance of IronTesseract
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Indonesian; // Set the language to Indonesian

using (var input = new OcrInput())
{
    input.AddPdf("example.pdf", "password"); // Add a protected PDF to OCR Input
    // Perform OCR
    var Result = Ocr.Read(input);

    // Output the result text
    Console.WriteLine(Result.Text);
    Console.WriteLine($"{Result.Pages.Count()} Pages"); // Display the number of pages in the PDF
}
Imports IronOcr

' Create a new instance of IronTesseract
Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Indonesian ' Set the language to Indonesian

Using input = New OcrInput()
	input.AddPdf("example.pdf", "password") ' Add a protected PDF to OCR Input
	' Perform OCR
	Dim Result = Ocr.Read(input)

	' Output the result text
	Console.WriteLine(Result.Text)
	Console.WriteLine($"{Result.Pages.Count()} Pages") ' Display the number of pages in the PDF
End Using
$vbLabelText   $csharpLabel

OCR untuk TIFF MultiPage

OCR Membaca format file TIFF termasuk dokumen beberapa halaman. TIFF juga dapat diubah langsung menjadi file PDF dengan teks yang dapat dicari.

using IronOcr;

// Create a new instance of IronTesseract
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Indonesian; // Set the language to Indonesian

using (var Input = new OcrInput())
{
    Input.AddMultiFrameTiff("multi-frame.tiff"); // Add a multi-frame TIFF image
    var Result = Ocr.Read(Input);

    // Output the result text
    Console.WriteLine(Result.Text);
}
using IronOcr;

// Create a new instance of IronTesseract
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Indonesian; // Set the language to Indonesian

using (var Input = new OcrInput())
{
    Input.AddMultiFrameTiff("multi-frame.tiff"); // Add a multi-frame TIFF image
    var Result = Ocr.Read(Input);

    // Output the result text
    Console.WriteLine(Result.Text);
}
Imports IronOcr

' Create a new instance of IronTesseract
Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Indonesian ' Set the language to Indonesian

Using Input = New OcrInput()
	Input.AddMultiFrameTiff("multi-frame.tiff") ' Add a multi-frame TIFF image
	Dim Result = Ocr.Read(Input)

	' Output the result text
	Console.WriteLine(Result.Text)
End Using
$vbLabelText   $csharpLabel

Kode batang dan QR

Fitur unik IronOCR adalah dapat membaca kode batang dan kode QR dari dokumen saat memindai teks. Contoh dari Kelas OcrResult.OcrBarcode memberikan informasi rinci kepada pengembang tentang setiap kode batang yang dipindai.

using IronOcr;

// Create a new instance of IronTesseract
var Ocr = new IronTesseract();
Ocr.Configuration.ReadBarCodes = true; // Enable barcode reading

using (var input = new OcrInput())
{
    input.AddImage("img/Barcode.png"); // Add image containing barcode
    var Result = Ocr.Read(input);

    foreach (var Barcode in Result.Barcodes)
    {
        Console.WriteLine(Barcode.Value); // Output the value of each barcode
        // Barcode type and location properties are also exposed
    }
}
using IronOcr;

// Create a new instance of IronTesseract
var Ocr = new IronTesseract();
Ocr.Configuration.ReadBarCodes = true; // Enable barcode reading

using (var input = new OcrInput())
{
    input.AddImage("img/Barcode.png"); // Add image containing barcode
    var Result = Ocr.Read(input);

    foreach (var Barcode in Result.Barcodes)
    {
        Console.WriteLine(Barcode.Value); // Output the value of each barcode
        // Barcode type and location properties are also exposed
    }
}
Imports IronOcr

' Create a new instance of IronTesseract
Private Ocr = New IronTesseract()
Ocr.Configuration.ReadBarCodes = True ' Enable barcode reading

Using input = New OcrInput()
	input.AddImage("img/Barcode.png") ' Add image containing barcode
	Dim Result = Ocr.Read(input)

	For Each Barcode In Result.Barcodes
		Console.WriteLine(Barcode.Value) ' Output the value of each barcode
		' Barcode type and location properties are also exposed
	Next Barcode
End Using
$vbLabelText   $csharpLabel

OCR pada Area Gambar Tertentu

Semua metode pemindaian dan pembacaan IronOCR menyediakan kemampuan untuk menentukan dengan tepat bagian mana dari suatu halaman atau halaman yang teksnya ingin kita baca. Ini sangat berguna saat kita melihat formulir standar dan dapat menghemat banyak waktu serta meningkatkan efisiensi.

Untuk menggunakan crop region, kita perlu menambahkan referensi sistem ke System.Drawing sehingga kita bisa menggunakan objek System.Drawing.Rectangle.

using IronOcr;
using System.Drawing;

// Create a new instance of IronTesseract
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Indonesian; // Set the language to Indonesian

using (var Input = new OcrInput())
{
    var ContentArea = new System.Drawing.Rectangle() { X = 215, Y = 1250, Height = 280, Width = 1335 };
    // Dimensions specified in pixels

    Input.Add("document.png", ContentArea); // Add cropped area of the image

    var Result = Ocr.Read(Input); // Perform OCR
    Console.WriteLine(Result.Text); // Output the result text
}
using IronOcr;
using System.Drawing;

// Create a new instance of IronTesseract
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Indonesian; // Set the language to Indonesian

using (var Input = new OcrInput())
{
    var ContentArea = new System.Drawing.Rectangle() { X = 215, Y = 1250, Height = 280, Width = 1335 };
    // Dimensions specified in pixels

    Input.Add("document.png", ContentArea); // Add cropped area of the image

    var Result = Ocr.Read(Input); // Perform OCR
    Console.WriteLine(Result.Text); // Output the result text
}
Imports IronOcr
Imports System.Drawing

' Create a new instance of IronTesseract
Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Indonesian ' Set the language to Indonesian

Using Input = New OcrInput()
	Dim ContentArea = New System.Drawing.Rectangle() With {
		.X = 215,
		.Y = 1250,
		.Height = 280,
		.Width = 1335
	}
	' Dimensions specified in pixels

	Input.Add("document.png", ContentArea) ' Add cropped area of the image

	Dim Result = Ocr.Read(Input) ' Perform OCR
	Console.WriteLine(Result.Text) ' Output the result text
End Using
$vbLabelText   $csharpLabel

OCR untuk Pemindaian Berkualitas Rendah

Kelas IronOCR OcrInput dapat memperbaiki pemindaian yang tidak dapat dibaca oleh Tesseract normal.

using IronOcr;

// Create a new instance of IronTesseract
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Indonesian; // Set the language to Indonesian

using (var Input = new OcrInput(@"img\Potter.LowQuality.tiff"))
{
    Input.DeNoise(); // Fix digital noise and poor scanning
    Input.Deskew(); // Correct rotation and perspective
    var Result = Ocr.Read(Input); // Perform OCR
    Console.WriteLine(Result.Text); // Output the result text
}
using IronOcr;

// Create a new instance of IronTesseract
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Indonesian; // Set the language to Indonesian

using (var Input = new OcrInput(@"img\Potter.LowQuality.tiff"))
{
    Input.DeNoise(); // Fix digital noise and poor scanning
    Input.Deskew(); // Correct rotation and perspective
    var Result = Ocr.Read(Input); // Perform OCR
    Console.WriteLine(Result.Text); // Output the result text
}
Imports IronOcr

' Create a new instance of IronTesseract
Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Indonesian ' Set the language to Indonesian

Using Input = New OcrInput("img\Potter.LowQuality.tiff")
	Input.DeNoise() ' Fix digital noise and poor scanning
	Input.Deskew() ' Correct rotation and perspective
	Dim Result = Ocr.Read(Input) ' Perform OCR
	Console.WriteLine(Result.Text) ' Output the result text
End Using
$vbLabelText   $csharpLabel

Ekspor hasil OCR sebagai PDF yang Dapat Dicari

Gambar ke PDF dengan string teks yang dapat disalin. Dapat diindeks oleh mesin pencari dan database.

using IronOcr;

// Create a new instance of IronTesseract
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Indonesian; // Set the language to Indonesian

using (var Input = new OcrInput())
{
    input.Title = "Quarterly Report"; // Set title for the PDF
    input.AddImage("image1.jpeg");
    input.AddImage("image2.png");
    input.AddImage("image3.gif");

    var Result = Ocr.Read(input); // Perform OCR
    Result.SaveAsSearchablePdf("searchable.pdf"); // Save result as searchable PDF
}
using IronOcr;

// Create a new instance of IronTesseract
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Indonesian; // Set the language to Indonesian

using (var Input = new OcrInput())
{
    input.Title = "Quarterly Report"; // Set title for the PDF
    input.AddImage("image1.jpeg");
    input.AddImage("image2.png");
    input.AddImage("image3.gif");

    var Result = Ocr.Read(input); // Perform OCR
    Result.SaveAsSearchablePdf("searchable.pdf"); // Save result as searchable PDF
}
Imports IronOcr

' Create a new instance of IronTesseract
Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Indonesian ' Set the language to Indonesian

Using Input = New OcrInput()
	input.Title = "Quarterly Report" ' Set title for the PDF
	input.AddImage("image1.jpeg")
	input.AddImage("image2.png")
	input.AddImage("image3.gif")

	Dim Result = Ocr.Read(input) ' Perform OCR
	Result.SaveAsSearchablePdf("searchable.pdf") ' Save result as searchable PDF
End Using
$vbLabelText   $csharpLabel

TIFF ke Konversi PDF yang dapat dicari

Ubah dokumen TIFF (atau grup file gambar apa pun) langsung ke PDF yang dapat dicari yang dapat diindeks oleh intranet, situs web, dan mesin pencari Google.

using IronOcr;

// Create a new instance of IronTesseract
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Indonesian; // Set the language to Indonesian

using (var Input = new OcrInput())
{
    input.AddMultiFrameTiff("example.tiff"); // Add multi-frame TIFF
    var Result = Ocr.Read(input).SaveAsSearchablePdf("searchable.pdf"); // OCR and save as searchable PDF
}
using IronOcr;

// Create a new instance of IronTesseract
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Indonesian; // Set the language to Indonesian

using (var Input = new OcrInput())
{
    input.AddMultiFrameTiff("example.tiff"); // Add multi-frame TIFF
    var Result = Ocr.Read(input).SaveAsSearchablePdf("searchable.pdf"); // OCR and save as searchable PDF
}
Imports IronOcr

' Create a new instance of IronTesseract
Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Indonesian ' Set the language to Indonesian

Using Input = New OcrInput()
	input.AddMultiFrameTiff("example.tiff") ' Add multi-frame TIFF
	Dim Result = Ocr.Read(input).SaveAsSearchablePdf("searchable.pdf") ' OCR and save as searchable PDF
End Using
$vbLabelText   $csharpLabel

Ekspor hasil OCR sebagai HTML

Konversi Gambar OCR ke XHTML.

using IronOcr;

// Create a new instance of IronTesseract
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Indonesian; // Set the language to Indonesian

using (var Input = new OcrInput())
{
    input.Title = "Html Title"; // Set title for XHTML file
    input.AddImage("image1.jpeg"); // Add image
    var Result = Ocr.Read(input); // Perform OCR
    Result.SaveAsHocrFile("results.html"); // Save result as XHTML
}
using IronOcr;

// Create a new instance of IronTesseract
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Indonesian; // Set the language to Indonesian

using (var Input = new OcrInput())
{
    input.Title = "Html Title"; // Set title for XHTML file
    input.AddImage("image1.jpeg"); // Add image
    var Result = Ocr.Read(input); // Perform OCR
    Result.SaveAsHocrFile("results.html"); // Save result as XHTML
}
Imports IronOcr

' Create a new instance of IronTesseract
Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Indonesian ' Set the language to Indonesian

Using Input = New OcrInput()
	input.Title = "Html Title" ' Set title for XHTML file
	input.AddImage("image1.jpeg") ' Add image
	Dim Result = Ocr.Read(input) ' Perform OCR
	Result.SaveAsHocrFile("results.html") ' Save result as XHTML
End Using
$vbLabelText   $csharpLabel

Filter Peningkatan Gambar OCR

IronOCR menyediakan filter unik untuk objek OcrInput guna meningkatkan kinerja OCR.

Contoh Kode Peningkatan Gambar

Membuat gambar input OCR berkualitas lebih tinggi untuk menghasilkan hasil OCR yang lebih baik dan lebih cepat.

using IronOcr;

// Create a new instance of IronTesseract
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Indonesian; // Set the language to Indonesian

using (var Input = new OcrInput(@"LowQuality.jpeg"))
{
    Input.DeNoise(); // Fix digital noise and poor scanning
    Input.Deskew(); // Correct rotation and perspective
    var Result = Ocr.Read(Input); // Perform OCR
    Console.WriteLine(Result.Text); // Output the result text
}
using IronOcr;

// Create a new instance of IronTesseract
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Indonesian; // Set the language to Indonesian

using (var Input = new OcrInput(@"LowQuality.jpeg"))
{
    Input.DeNoise(); // Fix digital noise and poor scanning
    Input.Deskew(); // Correct rotation and perspective
    var Result = Ocr.Read(Input); // Perform OCR
    Console.WriteLine(Result.Text); // Output the result text
}
Imports IronOcr

' Create a new instance of IronTesseract
Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Indonesian ' Set the language to Indonesian

Using Input = New OcrInput("LowQuality.jpeg")
	Input.DeNoise() ' Fix digital noise and poor scanning
	Input.Deskew() ' Correct rotation and perspective
	Dim Result = Ocr.Read(Input) ' Perform OCR
	Console.WriteLine(Result.Text) ' Output the result text
End Using
$vbLabelText   $csharpLabel

Daftar Filter Gambar OCR

Filter input untuk meningkatkan kinerja OCR yang dibangun di IronOCR meliputi:

  • OcrInput.Rotate (double degrees) - Rotate image clockwise by specific degrees. Use negative numbers for counterclockwise.
  • OcrInput.Binarize () - Converts each pixel to black or white. Useful for very low contrast text-background cases.
  • OcrInput.ToGrayScale () - Converts pixels to grayscale. May not improve accuracy but might speed up processing.
  • OcrInput.Contrast () - Automatically increases contrast, often improving speed and accuracy for low-contrast scans.
  • OcrInput.DeNoise () - Removes digital noise. Only use when noise is suspected.
  • OcrInput.Invert () - Inverts all colors, e.g., white becomes black and vice versa.
  • OcrInput.Dilate () - Morphological operation that adds pixels to the boundaries of objects.
  • OcrInput.Erode () - Morphological operation that removes pixels on object boundaries.
  • OcrInput.Deskew () - Rotates image to correct orientation, improves OCR by compensating for tilted scans.
  • OcrInput.DeepCleanBackgroundNoise () - Heavy noise removal. Use only when extreme background noise is present. High CPU cost.
  • OcrInput.EnhanceResolution - Improves low-resolution images, automatically activated based on DPI settings.

CleanBackgroundNoise. Allows automatic cleaning of digital noise, creased paper, and other imperfections that hamper OCR.

EnhanceContrast boosts text contrast against backgrounds, improving OCR accuracy and performance.

EnhanceResolution detects low-resolution images and enhances them for optimal OCR readability, despite increased processing time.

Bahasa While IronOCR supports multiple languages, it provides specific settings to apply multiple languages for OCR operations.

Strategi Two OCR strategies: fast, less accurate versus a detailed AI-driven approach for higher text accuracy.

ColorSpace Choose grayscale or color OCR. Grayscale is preferred, but full color may improve results for similar shade texts.

DetectWhiteTextOnDarkBackgrounds. Automatically detects negative/white text on dark backgrounds, ensuring accurate reads.

InputImageType Guides OCR library for full document or snippet (e.g., screenshot) processing.

RotateAndStraighten Corrects documents with rotation and perspective issues, especially useful for photographed text.

ReadBarcodes Automatically reads barcodes and QR codes during text processing with minimal performance impact.

ColorDepth. Higher color depth may improve OCR quality but can extend processing time.

126 Paket Bahasa

IronOCR mendukung 126 bahasa internasional melalui paket bahasa yang didistribusikan sebagai DLL, yang dapat diunduh dari situs web ini , atau juga dari NuGet Package Manager .

Bahasa termasuk Jerman, Prancis, Inggris, Cina, Jepang, dan banyak lagi. Paket bahasa spesialis tersedia untuk paspor MRZ, cek MICR, Data Keuangan, Plat nomor dan banyak lagi. Anda juga dapat menggunakan file ".traineddata" tesseract - termasuk yang Anda buat sendiri.

Contoh Bahasa

Menggunakan bahasa OCR lainnya.

// using IronOcr;
// PM> Install IronOcr.Languages.Arabic

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Arabic; // Set language to Arabic

using (var input = new OcrInput())
{
    input.AddImage("img/arabic.gif"); // Add Arabic image

    var Result = Ocr.Read(input);

    // Windows console may struggle with displaying Arabic characters.
    // Save to disk instead.
    Result.SaveAsTextFile("arabic.txt");
}
// using IronOcr;
// PM> Install IronOcr.Languages.Arabic

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Arabic; // Set language to Arabic

using (var input = new OcrInput())
{
    input.AddImage("img/arabic.gif"); // Add Arabic image

    var Result = Ocr.Read(input);

    // Windows console may struggle with displaying Arabic characters.
    // Save to disk instead.
    Result.SaveAsTextFile("arabic.txt");
}
' using IronOcr;
' PM> Install IronOcr.Languages.Arabic

Dim Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Arabic ' Set language to Arabic

Using input = New OcrInput()
	input.AddImage("img/arabic.gif") ' Add Arabic image

	Dim Result = Ocr.Read(input)

	' Windows console may struggle with displaying Arabic characters.
	' Save to disk instead.
	Result.SaveAsTextFile("arabic.txt")
End Using
$vbLabelText   $csharpLabel

Contoh Berbagai Bahasa

Juga dimungkinkan untuk OCR menggunakan beberapa bahasa pada waktu yang bersamaan. Ini benar-benar dapat membantu mendapatkan metadata dan url bahasa Inggris dalam dokumen Unicode.

// using IronOcr;
// PM> Install IronOcr.Languages.ChineseSimplified

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.ChineseSimplified; // Set primary language
Ocr.AddSecondaryLanguage(OcrLanguage.Indonesian); // Add secondary language

// Multiple languages can be added

using (var input = new OcrInput())
{
    input.Add("multi-language.pdf"); // Add multi-language PDF
    var Result = Ocr.Read(input);
    Result.SaveAsTextFile("results.txt"); // Save results to text file
}
// using IronOcr;
// PM> Install IronOcr.Languages.ChineseSimplified

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.ChineseSimplified; // Set primary language
Ocr.AddSecondaryLanguage(OcrLanguage.Indonesian); // Add secondary language

// Multiple languages can be added

using (var input = new OcrInput())
{
    input.Add("multi-language.pdf"); // Add multi-language PDF
    var Result = Ocr.Read(input);
    Result.SaveAsTextFile("results.txt"); // Save results to text file
}
' using IronOcr;
' PM> Install IronOcr.Languages.ChineseSimplified

Dim Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.ChineseSimplified ' Set primary language
Ocr.AddSecondaryLanguage(OcrLanguage.Indonesian) ' Add secondary language

' Multiple languages can be added

Using input = New OcrInput()
	input.Add("multi-language.pdf") ' Add multi-language PDF
	Dim Result = Ocr.Read(input)
	Result.SaveAsTextFile("results.txt") ' Save results to text file
End Using
$vbLabelText   $csharpLabel

Objek Hasil OCR Terperinci

Besi OCR mengembalikan objek hasil OCR untuk setiap operasi OCR. Umumnya, pengembang hanya menggunakan properti teks dari objek ini untuk memindai teks dari gambar. Namun, hasil OCR DOM jauh lebih maju dari ini.

using IronOcr;
using System.Drawing; // Add assembly reference for System.Drawing

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Indonesian; // Set the language to Indonesian
Ocr.Configuration.EngineMode = TesseractEngineMode.TesseractAndLstm;
Ocr.Configuration.ReadBarCodes = true; // Enable barcode reading

using (var Input = new OcrInput(@"images\sample.tiff"))
{
    OcrResult Result = Ocr.Read(Input);
    var Pages = Result.Pages;
    var Words = Pages[0].Words;
    var Barcodes = Result.Barcodes;
    // Explore for a detailed and extensive API:
    // - Pages, Blocks, Paragraphs, Lines, Words, Chars
    // - Extract Images, Font Coordinates, Statistical Data
}
using IronOcr;
using System.Drawing; // Add assembly reference for System.Drawing

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Indonesian; // Set the language to Indonesian
Ocr.Configuration.EngineMode = TesseractEngineMode.TesseractAndLstm;
Ocr.Configuration.ReadBarCodes = true; // Enable barcode reading

using (var Input = new OcrInput(@"images\sample.tiff"))
{
    OcrResult Result = Ocr.Read(Input);
    var Pages = Result.Pages;
    var Words = Pages[0].Words;
    var Barcodes = Result.Barcodes;
    // Explore for a detailed and extensive API:
    // - Pages, Blocks, Paragraphs, Lines, Words, Chars
    // - Extract Images, Font Coordinates, Statistical Data
}
Imports IronOcr
Imports System.Drawing ' Add assembly reference for System.Drawing

Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Indonesian ' Set the language to Indonesian
Ocr.Configuration.EngineMode = TesseractEngineMode.TesseractAndLstm
Ocr.Configuration.ReadBarCodes = True ' Enable barcode reading

Using Input = New OcrInput("images\sample.tiff")
	Dim Result As OcrResult = Ocr.Read(Input)
	Dim Pages = Result.Pages
	Dim Words = Pages(0).Words
	Dim Barcodes = Result.Barcodes
	' Explore for a detailed and extensive API:
	' - Pages, Blocks, Paragraphs, Lines, Words, Chars
	' - Extract Images, Font Coordinates, Statistical Data
End Using
$vbLabelText   $csharpLabel

Performa

IronOCR bekerja di luar kotak tanpa perlu menyesuaikan kinerja atau banyak memodifikasi gambar input.

Kecepatan Berkobar: IronOcr.2020 + hingga 10 kali lebih cepat dan membuat kesalahan lebih dari 250% lebih sedikit daripada versi sebelumnya.

Belajarlah lagi

Untuk mempelajari lebih lanjut tentang OCR dalam C#, VB, F#, atau bahasa .NET lainnya, silakan baca tutorial komunitas kami, yang memberikan contoh dunia nyata tentang bagaimana IronOCR dapat digunakan dan mungkin menunjukkan nuansa cara mendapatkan yang terbaik dari perpustakaan ini.

Referensi objek lengkap untuk pengembang .NET juga tersedia.