Bahasa Sunda OCR dina C# sareng .NET

Versi sanés tina dokumén ieu:

IronOCR mangrupikeun komponén parangkat lunak C# ngamungkinkeun coders .NET maca téks tina gambar sareng dokumén PDF dina 126 basa, kalebet basa Sundana.

Mangrupikeun garpu canggih tina Tesseract, diwangun sacara éksklusif pikeun pamekar .NET sareng sacara rutin ngaleungitkeun mesin Tesseract anu sanés pikeun gancang sareng akurasi.

Eusi IronOcr.Languages.Sundanese

Paket ieu ngandung 52 basa OCR pikeun .NET:

  • Sundanis
  • SundanBest
  • SundanSepat

Unduh

Pakét Basa Sunda [Basa Sunda]
* Download as Zip
* Install with
https://www.nuget.org/packages/IronOcr.Languages.Sundanese/' > NuGet

Pamasangan

Hal kahiji kudu urang pigawé nyaéta install pakét Sunda OCR urang pikeun proyék .NET Anjeun.

PM> Install-Package IronOCR.Languages.Sundanese

Conto Kodeu

Conto kode C# ieu maca téks basa Sundana tina gambar atanapi dokumen PDF.

// Example of using IronTesseract to read text from an image with Sundanese OCR
using IronOcr;

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Sundanese;
using (var Input = new OcrInput(@"images\Sundanese.png"))
{
    // Perform OCR on the input image
    var Result = Ocr.Read(Input);
    // Extract the recognized text
    var AllText = Result.Text;
    // Output the recognized text
    Console.WriteLine(AllText);
}
// Example of using IronTesseract to read text from an image with Sundanese OCR
using IronOcr;

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Sundanese;
using (var Input = new OcrInput(@"images\Sundanese.png"))
{
    // Perform OCR on the input image
    var Result = Ocr.Read(Input);
    // Extract the recognized text
    var AllText = Result.Text;
    // Output the recognized text
    Console.WriteLine(AllText);
}
' Example of using IronTesseract to read text from an image with Sundanese OCR
Imports IronOcr

Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Sundanese
Using Input = New OcrInput("images\Sundanese.png")
	' Perform OCR on the input image
	Dim Result = Ocr.Read(Input)
	' Extract the recognized text
	Dim AllText = Result.Text
	' Output the recognized text
	Console.WriteLine(AllText)
End Using
$vbLabelText   $csharpLabel

Naha Milih IronOCR?

IronOCR mangrupikeun perpustakaan software .NET anu gampang dipasang, lengkep sareng saé didokumentasikeun.

Pilih IronOCR pikeun ngahontal akurasi 99.8% + OCR tanpa nganggo jasa wéb éksternal, biaya anu aya atapni ngirim dokumén rahasia liwat internét.

Naha pamekar C# milih IronOCR tibatan Vanilla Tesseract:

  • Pasang salaku DLL tunggal atanapi NuGet
  • Ngawengku pikeun Tesseract 5, 4 sareng 3 Mesin kaluar tina kotak.
  • Akurasi 99,8% nyata nguntungkeun Tesseract biasa.
  • Blazing Speed kalayan MultiThreading
  • MVC, WebApp, Desktop, Konsol & Aplikasi Server cocog
  • Henteu nganggo Exes atanapi C++ code
  • Pangrojong PDF OCR lengkep
  • Pikeun ngalaksanakeun OCR ampir sadaya Gambar gambar atanapi PDF
  • Pangrojong .NET Core lengkep, Standar sareng FrameWork
  • Nyebarkeun dina Windows, Mac, Linux, Azure, Docker, Lambda, AWS
  • Maca barkod sareng kode QR
  • Ékspor OCR kana XHTML
  • Ékspor OCR kana dokumén PDF anu tiasa dipilarian
  • Dukungan multithreading
  • 126 basa internasional sadayana dikelola ngalangkungan file NuGet atanapi OcrData
  • Nimba Gambar, Koordinat, Statistik sareng Font. Henteu ngan ukur téks.
  • Tiasa dianggo ngadistribusikaeun Tesseract OCR di jero aplikasi komérsial & proprietari.

Beusi OCR shines nalika damel sareng gambar dunya nyata sareng dokuméntasi teu sampurna sapertos poto, atanapi scan tina resolusi handap anu tiasa ngagaduhan noise digital atanapi teu sampurna.

Perpustakaan OCR gratis sanés kanggo platform .NET sapertos. API tesseract net sanés sareng jasa wéb sanés tiasa dilakukeun saé pisan dina kasus panggunaan dunya nyata ieu.

OCR sareng Tesseract 5 - Mimitian Coding di C#

Sampel kode di handap nunjukkeun kumaha gampangna maca téks tina gambar nganggo C# atanapi VB .NET.

OneLiner

// One-liner example to read text from an image using IronTesseract
string Text = new IronTesseract().Read(@"img\Screenshot.png").Text;
// One-liner example to read text from an image using IronTesseract
string Text = new IronTesseract().Read(@"img\Screenshot.png").Text;
' One-liner example to read text from an image using IronTesseract
Dim Text As String = (New IronTesseract()).Read("img\Screenshot.png").Text
$vbLabelText   $csharpLabel

Konpigurasikeun Hello World

// PM> Install-Package IronOCR.Languages.Sundanese
using IronOcr;

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Sundanese;
using (var Input = new OcrInput())
{
    // Add image to the OCR Input
    Input.AddImage("images/sample.jpeg");
    // Perform OCR
    var Result = Ocr.Read(Input);
    // Output the recognized text
    Console.WriteLine(Result.Text);
}
// PM> Install-Package IronOCR.Languages.Sundanese
using IronOcr;

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Sundanese;
using (var Input = new OcrInput())
{
    // Add image to the OCR Input
    Input.AddImage("images/sample.jpeg");
    // Perform OCR
    var Result = Ocr.Read(Input);
    // Output the recognized text
    Console.WriteLine(Result.Text);
}
' PM> Install-Package IronOCR.Languages.Sundanese
Imports IronOcr

Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Sundanese
Using Input = New OcrInput()
	' Add image to the OCR Input
	Input.AddImage("images/sample.jpeg")
	' Perform OCR
	Dim Result = Ocr.Read(Input)
	' Output the recognized text
	Console.WriteLine(Result.Text)
End Using
$vbLabelText   $csharpLabel

C# PDF OCR

Pendekatan anu sami tiasa sami dianggo pikeun nimba téks tina dokumén PDF mana waé.

// Example of reading text from a PDF file using IronTesseract
using IronOcr;

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Sundanese;
using (var input = new OcrInput())
{
    // Add a PDF to the OCR Input
    input.AddPdf("example.pdf", "password");
    // Perform OCR
    var Result = Ocr.Read(input);

    Console.WriteLine(Result.Text);
    Console.WriteLine($"{Result.Pages.Count()} Pages");
    // 1 halaman pikeun unggal halaman dina PDF
}
// Example of reading text from a PDF file using IronTesseract
using IronOcr;

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Sundanese;
using (var input = new OcrInput())
{
    // Add a PDF to the OCR Input
    input.AddPdf("example.pdf", "password");
    // Perform OCR
    var Result = Ocr.Read(input);

    Console.WriteLine(Result.Text);
    Console.WriteLine($"{Result.Pages.Count()} Pages");
    // 1 halaman pikeun unggal halaman dina PDF
}
' Example of reading text from a PDF file using IronTesseract
Imports IronOcr

Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Sundanese
Using input = New OcrInput()
	' Add a PDF to the OCR Input
	input.AddPdf("example.pdf", "password")
	' Perform OCR
	Dim Result = Ocr.Read(input)

	Console.WriteLine(Result.Text)
	Console.WriteLine($"{Result.Pages.Count()} Pages")
	' 1 halaman pikeun unggal halaman dina PDF
End Using
$vbLabelText   $csharpLabel

OCR pikeun MultiPage TIFFs

Format file OCR Maca TIFF kalebet sababaraha dokumén halaman. TIFF ogé tiasa dirobih langsung kana file PDF kalayan téks anu tiasa dipilarian.

// Example of reading text from a multi-page TIFF using IronTesseract
using IronOcr;

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Sundanese;

using (var Input = new OcrInput())
{
    // Add TIFF with multiple pages to the OCR Input
    Input.AddMultiFrameTiff("multi-frame.tiff");
    // Perform OCR
    var Result = Ocr.Read(Input);
    // Output the recognized text
    Console.WriteLine(Result.Text);
}
// Example of reading text from a multi-page TIFF using IronTesseract
using IronOcr;

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Sundanese;

using (var Input = new OcrInput())
{
    // Add TIFF with multiple pages to the OCR Input
    Input.AddMultiFrameTiff("multi-frame.tiff");
    // Perform OCR
    var Result = Ocr.Read(Input);
    // Output the recognized text
    Console.WriteLine(Result.Text);
}
' Example of reading text from a multi-page TIFF using IronTesseract
Imports IronOcr

Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Sundanese

Using Input = New OcrInput()
	' Add TIFF with multiple pages to the OCR Input
	Input.AddMultiFrameTiff("multi-frame.tiff")
	' Perform OCR
	Dim Result = Ocr.Read(Input)
	' Output the recognized text
	Console.WriteLine(Result.Text)
End Using
$vbLabelText   $csharpLabel

Kodeu jeung QR

Fitur unik tina IronOCR nyaéta tiasa maca barkod sareng kode QR tina dokumén nalika nyeken téks. Instansi OcrResult.OcrBarcode Class masihan pamekar inpormasi lengkep ngeunaan unggal barkod anu dipindai.

// Example of reading barcodes and QR codes using IronTesseract
using IronOcr;

var Ocr = new IronTesseract();
Ocr.Configuration.ReadBarCodes = true;

using (var input = new OcrInput())
{
    // Add image containing barcode or QR code
    input.AddImage("img/Barcode.png");
    // Perform OCR and barcode reading
    var Result = Ocr.Read(input);
    // Iterate through barcodes and print their values
    foreach (var Barcode in Result.Barcodes)
    {
        Console.WriteLine(Barcode.Value);
        // You can also access the barcode type and location
    }
}
// Example of reading barcodes and QR codes using IronTesseract
using IronOcr;

var Ocr = new IronTesseract();
Ocr.Configuration.ReadBarCodes = true;

using (var input = new OcrInput())
{
    // Add image containing barcode or QR code
    input.AddImage("img/Barcode.png");
    // Perform OCR and barcode reading
    var Result = Ocr.Read(input);
    // Iterate through barcodes and print their values
    foreach (var Barcode in Result.Barcodes)
    {
        Console.WriteLine(Barcode.Value);
        // You can also access the barcode type and location
    }
}
' Example of reading barcodes and QR codes using IronTesseract
Imports IronOcr

Private Ocr = New IronTesseract()
Ocr.Configuration.ReadBarCodes = True

Using input = New OcrInput()
	' Add image containing barcode or QR code
	input.AddImage("img/Barcode.png")
	' Perform OCR and barcode reading
	Dim Result = Ocr.Read(input)
	' Iterate through barcodes and print their values
	For Each Barcode In Result.Barcodes
		Console.WriteLine(Barcode.Value)
		' You can also access the barcode type and location
	Next Barcode
End Using
$vbLabelText   $csharpLabel

OCR ngeunaan Daérah Khusus Gambar

Sadaya metode scanning sareng bacaan IronOCR nyayogikeun kamampuan nangtoskeun bagian mana dina hiji halaman atanapi halaman anu urang hoyong baca téks. Ieu kapaké pisan nalika urang ningali bentuk anu distandarkeun sareng tiasa ngahémat seueur waktos sareng ningkatkeun éfisiénsi.

Pikeun nganggo daérah pamotongan, urang kedah nambihan rujukan sistem kana System. System.Drawing supados urang tiasa nganggo obyék System.Drawing.Rectangle.

// Example of reading text from a specific region of an image using IronTesseract
using IronOcr;
using System.Drawing;

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Sundanese;

using (var Input = new OcrInput())
{
    // Define the rectangular region of interest
    var ContentArea = new Rectangle() { X = 215, Y = 1250, Height = 280, Width = 1335 };

    // Add image and specify the region to read
    Input.Add("document.png", ContentArea);

    // Perform OCR
    var Result = Ocr.Read(Input);
    // Output the recognized text from the specified region
    Console.WriteLine(Result.Text);
}
// Example of reading text from a specific region of an image using IronTesseract
using IronOcr;
using System.Drawing;

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Sundanese;

using (var Input = new OcrInput())
{
    // Define the rectangular region of interest
    var ContentArea = new Rectangle() { X = 215, Y = 1250, Height = 280, Width = 1335 };

    // Add image and specify the region to read
    Input.Add("document.png", ContentArea);

    // Perform OCR
    var Result = Ocr.Read(Input);
    // Output the recognized text from the specified region
    Console.WriteLine(Result.Text);
}
' Example of reading text from a specific region of an image using IronTesseract
Imports IronOcr
Imports System.Drawing

Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Sundanese

Using Input = New OcrInput()
	' Define the rectangular region of interest
	Dim ContentArea = New Rectangle() With {
		.X = 215,
		.Y = 1250,
		.Height = 280,
		.Width = 1335
	}

	' Add image and specify the region to read
	Input.Add("document.png", ContentArea)

	' Perform OCR
	Dim Result = Ocr.Read(Input)
	' Output the recognized text from the specified region
	Console.WriteLine(Result.Text)
End Using
$vbLabelText   $csharpLabel

OCR pikeun Scan Kualitas Rendah

Kelas IronOCR OcrInput tiasa ngalereskeun scan anu Tesseract normal henteu tiasa maca.

// Example of improving low-quality scan before OCR using IronTesseract
using IronOcr;

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Sundanese;

using (var Input = new OcrInput(@"img\Potter.LowQuality.tiff"))
{
    // Apply filters to improve image quality
    Input.DeNoise(); // Fix digital noise and poor scanning
    Input.Deskew();  // Correct rotation and perspective
    // Perform OCR
    var Result = Ocr.Read(Input);
    // Output the recognized text
    Console.WriteLine(Result.Text);
}
// Example of improving low-quality scan before OCR using IronTesseract
using IronOcr;

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Sundanese;

using (var Input = new OcrInput(@"img\Potter.LowQuality.tiff"))
{
    // Apply filters to improve image quality
    Input.DeNoise(); // Fix digital noise and poor scanning
    Input.Deskew();  // Correct rotation and perspective
    // Perform OCR
    var Result = Ocr.Read(Input);
    // Output the recognized text
    Console.WriteLine(Result.Text);
}
' Example of improving low-quality scan before OCR using IronTesseract
Imports IronOcr

Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Sundanese

Using Input = New OcrInput("img\Potter.LowQuality.tiff")
	' Apply filters to improve image quality
	Input.DeNoise() ' Fix digital noise and poor scanning
	Input.Deskew() ' Correct rotation and perspective
	' Perform OCR
	Dim Result = Ocr.Read(Input)
	' Output the recognized text
	Console.WriteLine(Result.Text)
End Using
$vbLabelText   $csharpLabel

Ékspor hasil OCR salaku PDF Anu Dicari

Gambar kana PDF nganggo senar téks anu tiasa disalin. Tiasa diindeks ku mesin pencari sareng basis data.

// Example of exporting OCR results as a searchable PDF using IronTesseract
using IronOcr;

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Sundanese;

using (var Input = new OcrInput())
{
    // Set the title of the PDF
    Input.Title = "Quarterly Report";
    // Add images to the OCR Input
    Input.AddImage("image1.jpeg");
    Input.AddImage("image2.png");
    Input.AddImage("image3.gif");

    // Perform OCR
    var Result = Ocr.Read(Input);
    // Save results as searchable PDF
    Result.SaveAsSearchablePdf("searchable.pdf");
}
// Example of exporting OCR results as a searchable PDF using IronTesseract
using IronOcr;

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Sundanese;

using (var Input = new OcrInput())
{
    // Set the title of the PDF
    Input.Title = "Quarterly Report";
    // Add images to the OCR Input
    Input.AddImage("image1.jpeg");
    Input.AddImage("image2.png");
    Input.AddImage("image3.gif");

    // Perform OCR
    var Result = Ocr.Read(Input);
    // Save results as searchable PDF
    Result.SaveAsSearchablePdf("searchable.pdf");
}
' Example of exporting OCR results as a searchable PDF using IronTesseract
Imports IronOcr

Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Sundanese

Using Input = New OcrInput()
	' Set the title of the PDF
	Input.Title = "Quarterly Report"
	' Add images to the OCR Input
	Input.AddImage("image1.jpeg")
	Input.AddImage("image2.png")
	Input.AddImage("image3.gif")

	' Perform OCR
	Dim Result = Ocr.Read(Input)
	' Save results as searchable PDF
	Result.SaveAsSearchablePdf("searchable.pdf")
End Using
$vbLabelText   $csharpLabel

TIFF pikeun milarian Konversi PDF

Convert dokumén TIFF (atanapi kelompok file gambar) langsung kana PDF anu tiasa dipilarian anu tiasa diindéks ku intranet, halaman wéb sareng mesin pencari google.

// Example of converting TIFF documents to searchable PDFs using IronTesseract
using IronOcr;

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Sundanese;

using (var Input = new OcrInput())
{
    // Add TIFF documents to the OCR Input
    Input.AddMultiFrameTiff("example.tiff");
    // Perform OCR and save results as searchable PDF
    var Result = Ocr.Read(Input).SaveAsSearchablePdf("searchable.pdf");
}
// Example of converting TIFF documents to searchable PDFs using IronTesseract
using IronOcr;

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Sundanese;

using (var Input = new OcrInput())
{
    // Add TIFF documents to the OCR Input
    Input.AddMultiFrameTiff("example.tiff");
    // Perform OCR and save results as searchable PDF
    var Result = Ocr.Read(Input).SaveAsSearchablePdf("searchable.pdf");
}
' Example of converting TIFF documents to searchable PDFs using IronTesseract
Imports IronOcr

Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Sundanese

Using Input = New OcrInput()
	' Add TIFF documents to the OCR Input
	Input.AddMultiFrameTiff("example.tiff")
	' Perform OCR and save results as searchable PDF
	Dim Result = Ocr.Read(Input).SaveAsSearchablePdf("searchable.pdf")
End Using
$vbLabelText   $csharpLabel

Ékspor hasil OCR salaku HTML

Gambar OCR kana konversi XHTML.

// Example of exporting OCR results as HTML using IronTesseract
using IronOcr;

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Sundanese;
using (var Input = new OcrInput())
{
    // Set the HTML title
    Input.Title = "Html Title";
    // Add image to the OCR Input
    Input.AddImage("image1.jpeg");
    // Perform OCR
    var Result = Ocr.Read(Input);
    // Save results as HOCR HTML file
    Result.SaveAsHocrFile("results.html");
}
// Example of exporting OCR results as HTML using IronTesseract
using IronOcr;

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Sundanese;
using (var Input = new OcrInput())
{
    // Set the HTML title
    Input.Title = "Html Title";
    // Add image to the OCR Input
    Input.AddImage("image1.jpeg");
    // Perform OCR
    var Result = Ocr.Read(Input);
    // Save results as HOCR HTML file
    Result.SaveAsHocrFile("results.html");
}
' Example of exporting OCR results as HTML using IronTesseract
Imports IronOcr

Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Sundanese
Using Input = New OcrInput()
	' Set the HTML title
	Input.Title = "Html Title"
	' Add image to the OCR Input
	Input.AddImage("image1.jpeg")
	' Perform OCR
	Dim Result = Ocr.Read(Input)
	' Save results as HOCR HTML file
	Result.SaveAsHocrFile("results.html")
End Using
$vbLabelText   $csharpLabel

Saringan Peningkatan Gambar OCR

IronOCR nyayogikeun saringan unik pikeun objék OcrInput pikeun ningkatkeun kinerja OCR.

Conto Kodeu Paningkatan Gambar

Ngajantenkeun gambar input OCR kualitas langkung saé pikeun ngahasilkeun hasil OCR anu langkung saé, langkung gancang.

// Example of pre-processing images to improve OCR results using IronTesseract
using IronOcr;

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Sundanese;

using (var Input = new OcrInput(@"LowQuality.jpeg"))
{
    // Apply filters to enhance image quality
    Input.DeNoise(); // Fix digital noise and poor scanning
    Input.Deskew();  // Correct rotation and perspective
    // Perform OCR
    var Result = Ocr.Read(Input);
    // Output the recognized text
    Console.WriteLine(Result.Text);
}
// Example of pre-processing images to improve OCR results using IronTesseract
using IronOcr;

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Sundanese;

using (var Input = new OcrInput(@"LowQuality.jpeg"))
{
    // Apply filters to enhance image quality
    Input.DeNoise(); // Fix digital noise and poor scanning
    Input.Deskew();  // Correct rotation and perspective
    // Perform OCR
    var Result = Ocr.Read(Input);
    // Output the recognized text
    Console.WriteLine(Result.Text);
}
' Example of pre-processing images to improve OCR results using IronTesseract
Imports IronOcr

Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Sundanese

Using Input = New OcrInput("LowQuality.jpeg")
	' Apply filters to enhance image quality
	Input.DeNoise() ' Fix digital noise and poor scanning
	Input.Deskew() ' Correct rotation and perspective
	' Perform OCR
	Dim Result = Ocr.Read(Input)
	' Output the recognized text
	Console.WriteLine(Result.Text)
End Using
$vbLabelText   $csharpLabel

Daptar Saringan Gambar OCR

Saringan input pikeun ningkatkeun kinerja OCR anu diwangun kana IronOCR kalebet:

  • OcrInput.Rotate (derajat ganda) - Muterkeun gambar ku sajumlah derajat jarum jam. Pikeun anti jarum jam, anggo angka négatip.
  • OcrInput.Binarize () - Filter gambar ieu ngajantenkeun unggal piksel hideung atanapi bodas teu aya jalan tengah. Bisa Ngaronjatkeun kasus kinerja OCR kontras pisan téks kana latar tukang.
  • OcrInput.ToGrayScale () - Filter gambar ieu ngajantenkeun unggal piksel janten tempat teduh tina warna abu-abu. Teu dipikaresep ningkatkeun akurasi OCR tapi tiasa ningkatkeun kagancangan
  • OcrInput.Contrast () - Ningkatkeun kontras sacara otomatis. Filter ieu sering ningkatkeun kagancangan OCR sareng akurasi dina panyeken kontras rendah.
  • OcrInput.DeNoise () - Ngaleungitkeun noise digital. Saringan ieu kedah dianggo ngan ukur dimana diharepkeun noise.
  • OcrInput.Invert () - Ngarobih unggal warna. Mis Bodas janten hideung: hideung janten bodas.
  • OcrInput.Dilate () - Morfologi Canggih. Dilation nambihan piksel kana wates objék dina gambar. Sabalikna tina Erode
  • OcrInput.Erode () - Advanced Morphology. Érosi ngaleungitkeun piksel dina wates obyék Opposite of Dilate
  • OcrInput.Deskew () - Muterkeun gambar janten jalan anu leres sareng ortogonal. Ieu kalintang saé pikeun OCR sabab Tesseract tolerance pikeun skewed scan tiasa dugi ka 5 derajat.
  • OcrInput.DeepCleanBackgroundNoise () - Ngaleungitkeun noise tukang beurat. Ukur nganggo saringan ieu upami noise background background ekstrem dipikaterang, sabab filter ieu ogé bakal résiko ngirangan akurasi OCR tina dokumen bersih, sareng mahal pisan CPU.
  • OcrInput.EnhanceResolution - Ningkatkeun résolusi gambar kualitas rendah. Saringan ieu henteu sering diperyogikeun kumargi OcrInput.MinimumDPI sareng OcrInput. Target PDI otomatis bakal néwak sareng ngabéréskeun input résolusi rendah.

CleanBackgroundNoise. Ieu mangrupikeun setting anu rada nyéépkeun waktos; Nanging, éta ngamungkinkeun perpustakaan pikeun sacara otomatis ngabersihkeun noise digital, crumples kertas, sareng henteu sampurna dina gambar digital anu sanésna ngajantenkeun henteu sanggup dibaca ku perpustakaan OCR anu sanés.

EnhanceContrast mangrupikeun setting anu nyababkeun IronOCR sacara otomatis ningkatkeun kontras téks sareng latar gambar, ningkatkeun akurasi OCR sareng umumna ningkatkeun kinerja sareng kagancangan OCR.

EnhanceResolution mangrupikeun setting anu sacara otomatis bakal ngadeteksi gambar résolusi handap (anu sahandapeun 275 dpi) sareng otomatis ningkatkeun gambar teras ngasah sadaya téks janten tiasa dibaca sampurna ku perpustakaan OCR. Sanaos operasi ieu nyalira nyéépkeun waktos, sacara umum ngirangan waktos kanggo operasi OCR dina gambar.

Language IronOCR ngadukung 22 bungkus basa internasional, sareng setting basa tiasa dianggo pikeun milih hiji atanapi langkung sababaraha bahasa anu tiasa diterapkeun pikeun operasi OCR.

Strategi Beusi OCR ngadukung dua strategi. Urang tiasa milih pikeun milarian scan anu gancang sareng kirang akurat pikeun dokumén, atanapi nganggo strategi anu maju anu ngagunakeun sababaraha model intél jieunan pikeun sacara otomatis ningkatkeun akurasi téks OCR ku ningali hubungan statistik kecap pikeun hiji sareng anu sanés dina hiji kalimat.

ColorSpace mangrupikeun setting numana urang tiasa milih pikeun OCR dina warna abu atanapi warna. Sacara umum, grayscale mangrupikeun pilihan anu pangsaéna. Nanging, kadang-kadang upami aya téks atanapi latar anu hue mirip tapi warna anu béda pisan, rohangan warna-warna lengkep bakal masihan hasil anu langkung saé.

DetectWhiteTextOnDarkBackgrounds. Sacara umum, sadaya perpustakaan OCR ngarepkeun ningali téks hideung dina latar bodas. Setelan ieu ngamungkinkeun IronOCR pikeun otomatis ngadeteksi négatip, atanapi halaman poék kalayan téks bodas, sareng maca éta.

InputImageType. Setelan ieu ngamungkinkeun pamekar pikeun ngabimbing perpustakaan OCR naha éta ningali dokumén lengkep atanapi potongan, sapertos tangkepan layar.

RotateAndStraighten mangrupikeun setting anu maju anu ngamungkinkeun IronOCR kamampuan unik maca dokumén anu henteu ngan ukur diputer, tapi panginten ngandung sudut pandang, sapertos poto dokumén téks.

ReadBarcodes mangrupikeun fitur anu manpaat anu ngamungkinkeun IronOCR maca sacara otomatis barkod sareng kode QR dina halaman sabab ogé maca téks, tanpa nambihan beban waktos tambahan anu ageung.

WarnaDepth. Setelan ieu nangtoskeun sabaraha bit per piksel anu bakal dianggo perpustakaan OCR pikeun nangtoskeun jero hiji warna. Jero warna anu langkung luhur tiasa ningkatkeun kualitas OCR, tapi ogé bakal ningkatkeun waktos anu diperyogikeun pikeun operasi OCR réngsé.

126 Pakét Basa

IronOCR ngadukung 126 basa internasional liwat bungkus basa anu disebarkeun salaku DLLs, anu tiasa diunduh tina halaman wéb ieu, atanapi ogé ti NuGet Package Manager.

Bahasa kaasup basa Jérman, Perancis, Inggris, Cina, Jepang sareng seueur deui. Paket basa spesialis aya pikeun paspor MRZ, cek MICR, Data Keuangan, Plat lisénsi sareng seueur deui. Anjeun ogé tiasa nganggo file ".traineddata" tesseract naon waé - kalebet file anu anjeun ciptakeun nyalira.

Conto Basa

Ngagunakeun basa OCR sanés.

// Example of using IronTesseract to read Arabic text from an image
using IronOcr;

// PM> Install IronOcr.Languages.Arabic

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Arabic;

using (var input = new OcrInput())
{
    // Add image containing Arabic text
    input.AddImage("img/arabic.gif");
    // Perform OCR
    var Result = Ocr.Read(input);

    // Konsol henteu tiasa nyetak basa Arab dina Windows kalayan gampang.
    // Hayu urang simpen kana disk.
    Result.SaveAsTextFile("arabic.txt");
}
// Example of using IronTesseract to read Arabic text from an image
using IronOcr;

// PM> Install IronOcr.Languages.Arabic

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Arabic;

using (var input = new OcrInput())
{
    // Add image containing Arabic text
    input.AddImage("img/arabic.gif");
    // Perform OCR
    var Result = Ocr.Read(input);

    // Konsol henteu tiasa nyetak basa Arab dina Windows kalayan gampang.
    // Hayu urang simpen kana disk.
    Result.SaveAsTextFile("arabic.txt");
}
' Example of using IronTesseract to read Arabic text from an image
Imports IronOcr

' PM> Install IronOcr.Languages.Arabic

Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Arabic

Using input = New OcrInput()
	' Add image containing Arabic text
	input.AddImage("img/arabic.gif")
	' Perform OCR
	Dim Result = Ocr.Read(input)

	' Konsol henteu tiasa nyetak basa Arab dina Windows kalayan gampang.
	' Hayu urang simpen kana disk.
	Result.SaveAsTextFile("arabic.txt")
End Using
$vbLabelText   $csharpLabel

Conto Multiple Basa

Anjeun tiasa ogé ngagunakeun OCR nganggo sababaraha basa dina waktos anu sami. Ieu leres-leres tiasa ngabantosan metadata basa Inggris sareng url dina dokumén Unicode.

// Example of using IronTesseract with multiple languages for OCR
using IronOcr;

// PM> Install IronOcr.Languages.ChineseSimplified

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.ChineseSimplified;
Ocr.AddSecondaryLanguage(OcrLanguage.Sundanese);

// Urang tiasa nambihan sajumlah bahasa

using (var input = new OcrInput())
{
    input.Add("multi-language.pdf");
    // Perform OCR
    var Result = Ocr.Read(input);
    // Save results as text file
    Result.SaveAsTextFile("results.txt");
}
// Example of using IronTesseract with multiple languages for OCR
using IronOcr;

// PM> Install IronOcr.Languages.ChineseSimplified

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.ChineseSimplified;
Ocr.AddSecondaryLanguage(OcrLanguage.Sundanese);

// Urang tiasa nambihan sajumlah bahasa

using (var input = new OcrInput())
{
    input.Add("multi-language.pdf");
    // Perform OCR
    var Result = Ocr.Read(input);
    // Save results as text file
    Result.SaveAsTextFile("results.txt");
}
' Example of using IronTesseract with multiple languages for OCR
Imports IronOcr

' PM> Install IronOcr.Languages.ChineseSimplified

Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.ChineseSimplified
Ocr.AddSecondaryLanguage(OcrLanguage.Sundanese)

' Urang tiasa nambihan sajumlah bahasa

Using input = New OcrInput()
	input.Add("multi-language.pdf")
	' Perform OCR
	Dim Result = Ocr.Read(input)
	' Save results as text file
	Result.SaveAsTextFile("results.txt")
End Using
$vbLabelText   $csharpLabel

Objék Hasil OCR Detil

Beusi OCR mulihkeun obyék hasil OCR kanggo unggal operasi OCR. Sacara umum, pamekar ngan ukur nganggo sipat téks obyék ieu pikeun nga-scan téksna tina gambarna. Nanging, hasil OCR DOM langkung maju tibatan ieu.

// Example of accessing detailed OCR results using IronTesseract
using IronOcr;
using System.Drawing; //Tambahkeun Rujukan Majelis

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Sundanese;
Ocr.Configuration.EngineMode = TesseractEngineMode.TesseractAndLstm;
Ocr.Configuration.ReadBarCodes = true; //! Penting

using (var Input = new OcrInput(@"images\sample.tiff"))
{
    // Perform OCR
    OcrResult Result = Ocr.Read(Input);
    // Access different aspects of OCR results
    var Pages = Result.Pages;
    var Words = Pages[0].Words;
    var Barcodes = Result.Barcodes;
    // Jelajah didieu pikeun milari API anu masif sareng lengkep:
    // - Halaman, Blok, Paraphaphs, Garis, Kecap, Chars
    // - Ékspor Gambar, Koordinat Font, Data Statistik
}
// Example of accessing detailed OCR results using IronTesseract
using IronOcr;
using System.Drawing; //Tambahkeun Rujukan Majelis

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Sundanese;
Ocr.Configuration.EngineMode = TesseractEngineMode.TesseractAndLstm;
Ocr.Configuration.ReadBarCodes = true; //! Penting

using (var Input = new OcrInput(@"images\sample.tiff"))
{
    // Perform OCR
    OcrResult Result = Ocr.Read(Input);
    // Access different aspects of OCR results
    var Pages = Result.Pages;
    var Words = Pages[0].Words;
    var Barcodes = Result.Barcodes;
    // Jelajah didieu pikeun milari API anu masif sareng lengkep:
    // - Halaman, Blok, Paraphaphs, Garis, Kecap, Chars
    // - Ékspor Gambar, Koordinat Font, Data Statistik
}
' Example of accessing detailed OCR results using IronTesseract
Imports IronOcr
Imports System.Drawing 'Tambahkeun Rujukan Majelis

Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Sundanese
Ocr.Configuration.EngineMode = TesseractEngineMode.TesseractAndLstm
Ocr.Configuration.ReadBarCodes = True '! Penting

Using Input = New OcrInput("images\sample.tiff")
	' Perform OCR
	Dim Result As OcrResult = Ocr.Read(Input)
	' Access different aspects of OCR results
	Dim Pages = Result.Pages
	Dim Words = Pages(0).Words
	Dim Barcodes = Result.Barcodes
	' Jelajah didieu pikeun milari API anu masif sareng lengkep:
	' - Halaman, Blok, Paraphaphs, Garis, Kecap, Chars
	' - Ékspor Gambar, Koordinat Font, Data Statistik
End Using
$vbLabelText   $csharpLabel

Kinerja

IronOCR jalan kaluar tina kotak kalayan henteu kedah ngepaskeun performa atanapi beurat ngarobih gambar input.

Kacepetan Blazing: IronOcr.2020 + dugi ka 10 kali langkung gancang sareng ngajantenkeun langkung 250% langkung seueur kasalahan tibatan ngawangun tadi.

Diajar deui

Kanggo diajar langkung seueur ngeunaan OCR dina basa C#, VB, F#, atanapi basa anu sanés .NET, punten baca tutorial komunitas urang, anu masihan conto dunya nyata kumaha IronOCR tiasa dianggo sareng tiasa nunjukkeun anuansa kumaha kéngingkeun anu pangsaéna tina perpustakaan ieu.

Rujukan objék lengkep pikeun pamekar .NET ogé sayogi.