Kiswahili OCR katika C# na .NET

Matoleo mengine ya waraka huu:

IronOCR ni sehemu ya programu ya C# inayoruhusu nambari za .NET kusoma maandishi kutoka kwa picha na nyaraka za PDF kwa lugha 126, pamoja na Kiswahili.

Ni uma wa hali ya juu wa Tesseract, iliyojengwa peke kwa watengenezaji wa NET na inazidi kuzima injini zingine za Tesseract kwa kasi na usahihi.

Yaliyomo ya IronOcr.Lugha.Swahili

Kifurushi hiki kina lugha 46 za OCR za .NET:

  • Kiswahili
  • SwahiliBest
  • SwahiliFast

Pakua

Kifurushi cha Lugha ya Kiswahili [Kiswahili]

  • Download as Zip
  • Install with NuGet

Ufungaji

Jambo la kwanza tunalopaswa kufanya ni kusanikisha kifurushi chetu cha Kiswahili OCR kwenye mradi wako wa .NET.

# Kuiweka kupitia NuGet Package Manager
:InstallCmd Install-Package IronOCR.Languages.Swahili
# Kuiweka kupitia NuGet Package Manager
:InstallCmd Install-Package IronOCR.Languages.Swahili
SHELL

Mfano wa Kanuni

Mfano huu wa C# unasoma maandishi ya Kiswahili kutoka hati au Picha ya PDF.

// Install the Swahili language package using the Package Manager
// PM> Install-Package IronOcr.Languages.Swahili
using IronOcr;

// Create a new IronTesseract instance for OCR processing
var Ocr = new IronTesseract();
// Set the language for OCR to Swahili
Ocr.Language = OcrLanguage.Swahili;

// Wrap OCR operations in a 'using' statement to ensure resources are disposed
using (var Input = new OcrInput(@"images\Swahili.png"))
{
    // Perform OCR on the input image and read the result
    var Result = Ocr.Read(Input);
    // Extract all recognized text
    var AllText = Result.Text;
    // Output the recognized text
    Console.WriteLine(AllText);
}
// Install the Swahili language package using the Package Manager
// PM> Install-Package IronOcr.Languages.Swahili
using IronOcr;

// Create a new IronTesseract instance for OCR processing
var Ocr = new IronTesseract();
// Set the language for OCR to Swahili
Ocr.Language = OcrLanguage.Swahili;

// Wrap OCR operations in a 'using' statement to ensure resources are disposed
using (var Input = new OcrInput(@"images\Swahili.png"))
{
    // Perform OCR on the input image and read the result
    var Result = Ocr.Read(Input);
    // Extract all recognized text
    var AllText = Result.Text;
    // Output the recognized text
    Console.WriteLine(AllText);
}
' Install the Swahili language package using the Package Manager
' PM> Install-Package IronOcr.Languages.Swahili
Imports IronOcr

' Create a new IronTesseract instance for OCR processing
Private Ocr = New IronTesseract()
' Set the language for OCR to Swahili
Ocr.Language = OcrLanguage.Swahili

' Wrap OCR operations in a 'using' statement to ensure resources are disposed
Using Input = New OcrInput("images\Swahili.png")
	' Perform OCR on the input image and read the result
	Dim Result = Ocr.Read(Input)
	' Extract all recognized text
	Dim AllText = Result.Text
	' Output the recognized text
	Console.WriteLine(AllText)
End Using
$vbLabelText   $csharpLabel

Kwa nini Uchague IronOCR?

IronOCR ni maktaba rahisi ya kusanikisha, kamili na iliyoandikwa vizuri.

Chagua IronOCR kufikia 99.8% + usahihi wa OCR bila kutumia huduma yoyote ya nje ya wavuti, ada zinazoendelea au kutuma nyaraka za siri kwenye mtandao.

Kwa nini watengenezaji wa C# huchagua IronOCR juu ya Vanilla Tesseract:

  • Sakinisha kama DLL moja au NuGet
  • Inajumuisha kwa Tesseract 5, 4 na 3 Injini nje ya sanduku.
  • Usahihi 99.8% hushinda Tesseract ya kawaida.
  • Kasi ya Kuwaka na Kusindika Nyingi
  • MVC, WebApp, Desktop, Dashibodi na Maombi ya seva yanaoana
  • Hakuna nambari za Exes au C++ za kufanya kazi
  • Usaidizi kamili wa PDF OCR
  • Kufanya OCR karibu faili yoyote ya Picha au PDF
  • Kamili .NET Core, Standard na FrameWork msaada
  • Tumia Windows, Mac, Linux, Azure, Docker, Lambda, AWS
  • Soma barcode na nambari za QR
  • Hamisha OCR kama XHTML
  • Hamisha OCR ili utafute hati za PDF
  • Msaada wa kusoma anuwai
  • Lugha 126 za kimataifa zote zinasimamiwa kupitia faili za NuGet au OcrData
  • Dondoa Picha, Uratibu, Takwimu na Fonti. Sio maandishi tu.
  • Inaweza kutumiwa kusambaza Tesseract OCR ndani ya matumizi ya biashara na wamiliki.

IronOCR inashine wakati wa kufanya kazi na picha halisi za ulimwengu na nyaraka zisizo kamilifu kama picha, au skanati za azimio la chini ambazo zinaweza kuwa na kelele za dijiti au kutokamilika.

Maktaba mengine ya bure ya OCR ya jukwaa la .NET kama vile API zingine za kukomesha mtandao na huduma za wavuti hazifanyi vizuri sana kwenye kesi hizi za utumiaji wa ulimwengu.

OCR na Tesseract 5 - Anza kuweka Coding katika C#

Sampuli ya nambari hapa chini inaonyesha jinsi ilivyo rahisi kusoma maandishi kutoka kwa picha ukitumia C# au VB .NET.

OneLiner

// Create a new IronTesseract and read text from an image in a single line
string Text = new IronTesseract().Read(@"img\Screenshot.png").Text;
// Create a new IronTesseract and read text from an image in a single line
string Text = new IronTesseract().Read(@"img\Screenshot.png").Text;
' Create a new IronTesseract and read text from an image in a single line
Dim Text As String = (New IronTesseract()).Read("img\Screenshot.png").Text
$vbLabelText   $csharpLabel

Dunia inayoweza kusanidiwa

// PM> Install-Package IronOCR.Languages.Swahili
using IronOcr;

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Swahili;

// Use a bulk image input procedure 
using (var Input = new OcrInput())
{
    // Add a sample image to the input
    Input.AddImage("images/sample.jpeg");
    // Perform OCR on any number of images
    var Result = Ocr.Read(Input);
    // Output the recognized text
    Console.WriteLine(Result.Text);
}
// PM> Install-Package IronOCR.Languages.Swahili
using IronOcr;

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Swahili;

// Use a bulk image input procedure 
using (var Input = new OcrInput())
{
    // Add a sample image to the input
    Input.AddImage("images/sample.jpeg");
    // Perform OCR on any number of images
    var Result = Ocr.Read(Input);
    // Output the recognized text
    Console.WriteLine(Result.Text);
}
' PM> Install-Package IronOCR.Languages.Swahili
Imports IronOcr

Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Swahili

' Use a bulk image input procedure 
Using Input = New OcrInput()
	' Add a sample image to the input
	Input.AddImage("images/sample.jpeg")
	' Perform OCR on any number of images
	Dim Result = Ocr.Read(Input)
	' Output the recognized text
	Console.WriteLine(Result.Text)
End Using
$vbLabelText   $csharpLabel

C# PDF OCR

Njia hiyo hiyo pia inaweza kutumika kutoa maandishi kutoka kwa hati yoyote ya PDF.

using IronOcr;

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Swahili;

// Create new OcrInput to process a PDF file
using (var input = new OcrInput())
{
    // Adding a PDF file for OCR
    input.AddPdf("example.pdf", "password");
    // Tunaweza pia kuchagua majina maalum ya ukurasa wa PDF kwa OCR

    var Result = Ocr.Read(input);

    Console.WriteLine(Result.Text);
    // Output the number of pages processed
    Console.WriteLine($"{Result.Pages.Count()} Pages");
    // Note: One Result object handles all pages
}
using IronOcr;

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Swahili;

// Create new OcrInput to process a PDF file
using (var input = new OcrInput())
{
    // Adding a PDF file for OCR
    input.AddPdf("example.pdf", "password");
    // Tunaweza pia kuchagua majina maalum ya ukurasa wa PDF kwa OCR

    var Result = Ocr.Read(input);

    Console.WriteLine(Result.Text);
    // Output the number of pages processed
    Console.WriteLine($"{Result.Pages.Count()} Pages");
    // Note: One Result object handles all pages
}
Imports IronOcr

Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Swahili

' Create new OcrInput to process a PDF file
Using input = New OcrInput()
	' Adding a PDF file for OCR
	input.AddPdf("example.pdf", "password")
	' Tunaweza pia kuchagua majina maalum ya ukurasa wa PDF kwa OCR

	Dim Result = Ocr.Read(input)

	Console.WriteLine(Result.Text)
	' Output the number of pages processed
	Console.WriteLine($"{Result.Pages.Count()} Pages")
	' Note: One Result object handles all pages
End Using
$vbLabelText   $csharpLabel

OCR kwa MultiPage TIFFs

OCR ya kusoma faili ya faili ya TIFF pamoja na hati nyingi za ukurasa TIFF pia inaweza kubadilishwa moja kwa moja kuwa faili ya PDF na maandishi ya kutafutwa.

using IronOcr;

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Swahili;

using (var Input = new OcrInput())
{
    // Add a multi-frame TIFF image
    input.AddMultiFrameTiff("multi-frame.tiff");
    var Result = Ocr.Read(Input);
    // Output recognized text
    Console.WriteLine(Result.Text);
}
using IronOcr;

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Swahili;

using (var Input = new OcrInput())
{
    // Add a multi-frame TIFF image
    input.AddMultiFrameTiff("multi-frame.tiff");
    var Result = Ocr.Read(Input);
    // Output recognized text
    Console.WriteLine(Result.Text);
}
Imports IronOcr

Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Swahili

Using Input = New OcrInput()
	' Add a multi-frame TIFF image
	input.AddMultiFrameTiff("multi-frame.tiff")
	Dim Result = Ocr.Read(Input)
	' Output recognized text
	Console.WriteLine(Result.Text)
End Using
$vbLabelText   $csharpLabel

Misimbo ya alama na QR

Kipengele cha kipekee cha IronOCR ni kwamba inaweza kusoma barcode na nambari za QR kutoka kwa hati wakati inatafuta maandishi. Matukio ya Darasa la OcrResult.OcrBarcode humpa msanidi programu maelezo ya kina juu ya kila OcrResult.OcrBarcode.

using IronOcr;

var Ocr = new IronTesseract();
Ocr.Configuration.ReadBarCodes = true;

using (var input = new OcrInput())
{
    input.AddImage("img/Barcode.png");
    var Result = Ocr.Read(input);

    // Iterate and print all barcode values found in the image
    foreach (var Barcode in Result.Barcodes)
    {
        Console.WriteLine(Barcode.Value);
        // Type and location properties are also exposed
    }
}
using IronOcr;

var Ocr = new IronTesseract();
Ocr.Configuration.ReadBarCodes = true;

using (var input = new OcrInput())
{
    input.AddImage("img/Barcode.png");
    var Result = Ocr.Read(input);

    // Iterate and print all barcode values found in the image
    foreach (var Barcode in Result.Barcodes)
    {
        Console.WriteLine(Barcode.Value);
        // Type and location properties are also exposed
    }
}
Imports IronOcr

Private Ocr = New IronTesseract()
Ocr.Configuration.ReadBarCodes = True

Using input = New OcrInput()
	input.AddImage("img/Barcode.png")
	Dim Result = Ocr.Read(input)

	' Iterate and print all barcode values found in the image
	For Each Barcode In Result.Barcodes
		Console.WriteLine(Barcode.Value)
		' Type and location properties are also exposed
	Next Barcode
End Using
$vbLabelText   $csharpLabel

OCR kwenye Maeneo Maalum ya Picha

Njia zote za skena na kusoma za IronOCR hutoa uwezo wa kutaja ni sehemu gani ya ukurasa au kurasa tunataka kusoma maandishi kutoka. Hii ni muhimu sana wakati tunaangalia fomu zilizosanifiwa na inaweza kuokoa muda mwingi na kuboresha ufanisi.

Kutumia maeneo ya mazao, tutahitaji kuongeza rejeleo la mfumo kwa System.Drawing kwa Mfumo ili tuweze kutumia kitu cha System.Drawing.Rectangle

using IronOcr;
using System.Drawing; // Ensure using System.Drawing for Rectangle

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Swahili; 

using (var Input = new OcrInput())
{
    // Define a specific rectangular area for OCR
    var ContentArea = new Rectangle() { X = 215, Y = 1250, Height = 280, Width = 1335 };

    // Pass the specific area, not the whole image
    Input.Add("document.png", ContentArea);

    var Result = Ocr.Read(Input);
    // Output the text from the specific content area
    Console.WriteLine(Result.Text);
}
using IronOcr;
using System.Drawing; // Ensure using System.Drawing for Rectangle

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Swahili; 

using (var Input = new OcrInput())
{
    // Define a specific rectangular area for OCR
    var ContentArea = new Rectangle() { X = 215, Y = 1250, Height = 280, Width = 1335 };

    // Pass the specific area, not the whole image
    Input.Add("document.png", ContentArea);

    var Result = Ocr.Read(Input);
    // Output the text from the specific content area
    Console.WriteLine(Result.Text);
}
Imports IronOcr
Imports System.Drawing ' Ensure using System.Drawing for Rectangle

Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Swahili

Using Input = New OcrInput()
	' Define a specific rectangular area for OCR
	Dim ContentArea = New Rectangle() With {
		.X = 215,
		.Y = 1250,
		.Height = 280,
		.Width = 1335
	}

	' Pass the specific area, not the whole image
	Input.Add("document.png", ContentArea)

	Dim Result = Ocr.Read(Input)
	' Output the text from the specific content area
	Console.WriteLine(Result.Text)
End Using
$vbLabelText   $csharpLabel

OCR kwa Skana za Ubora wa Chini

OcrInput la IronOCR linaweza kurekebisha picha ambazo Tesseract ya kawaida haiwezi kusoma.

using IronOcr;

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Swahili;

using (var Input = new OcrInput(@"img\Potter.LowQuality.tiff"))
{
    // Apply denoising filter to reduce noise
    Input.DeNoise();
    // Apply deskewing filter to correct orientation
    Input.Deskew();
    var Result = Ocr.Read(Input);
    // Output the recognized text
    Console.WriteLine(Result.Text);
}
using IronOcr;

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Swahili;

using (var Input = new OcrInput(@"img\Potter.LowQuality.tiff"))
{
    // Apply denoising filter to reduce noise
    Input.DeNoise();
    // Apply deskewing filter to correct orientation
    Input.Deskew();
    var Result = Ocr.Read(Input);
    // Output the recognized text
    Console.WriteLine(Result.Text);
}
Imports IronOcr

Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Swahili

Using Input = New OcrInput("img\Potter.LowQuality.tiff")
	' Apply denoising filter to reduce noise
	Input.DeNoise()
	' Apply deskewing filter to correct orientation
	Input.Deskew()
	Dim Result = Ocr.Read(Input)
	' Output the recognized text
	Console.WriteLine(Result.Text)
End Using
$vbLabelText   $csharpLabel

Hamisha matokeo ya OCR kama PDF inayoweza kutafutwa

Picha kwa PDF na nyuzi za maandishi za kunakili. Inaweza kuorodheshwa na injini za utaftaji na hifadhidata.

using IronOcr;

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Swahili;

using (var Input = new OcrInput())
{
    // Assign a title to the output PDF
    input.Title = "Quarterly Report";
    // Add multiple images for combined PDF output
    input.AddImage("image1.jpeg");
    input.AddImage("image2.png");
    input.AddImage("image3.gif");

    var Result = Ocr.Read(input);
    // Save the output as a searchable PDF
    Result.SaveAsSearchablePdf("searchable.pdf");
}
using IronOcr;

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Swahili;

using (var Input = new OcrInput())
{
    // Assign a title to the output PDF
    input.Title = "Quarterly Report";
    // Add multiple images for combined PDF output
    input.AddImage("image1.jpeg");
    input.AddImage("image2.png");
    input.AddImage("image3.gif");

    var Result = Ocr.Read(input);
    // Save the output as a searchable PDF
    Result.SaveAsSearchablePdf("searchable.pdf");
}
Imports IronOcr

Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Swahili

Using Input = New OcrInput()
	' Assign a title to the output PDF
	input.Title = "Quarterly Report"
	' Add multiple images for combined PDF output
	input.AddImage("image1.jpeg")
	input.AddImage("image2.png")
	input.AddImage("image3.gif")

	Dim Result = Ocr.Read(input)
	' Save the output as a searchable PDF
	Result.SaveAsSearchablePdf("searchable.pdf")
End Using
$vbLabelText   $csharpLabel

TIFF kutafuta Uongofu wa PDF

Badili hati ya TIFF (au kikundi chochote cha faili za picha) moja kwa moja kwenye PDF inayoweza kutafutwa ambayo inaweza kuorodheshwa na intranet, wavuti na injini za utaftaji za google.

using IronOcr;

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Swahili;

using (var Input = new OcrInput())
{
    // Add all frames of a TIFF image
    input.AddMultiFrameTiff("example.tiff");
    // Convert all frames to a searchable PDF
    var Result = Ocr.Read(input);
    Result.SaveAsSearchablePdf("searchable.pdf");
}
using IronOcr;

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Swahili;

using (var Input = new OcrInput())
{
    // Add all frames of a TIFF image
    input.AddMultiFrameTiff("example.tiff");
    // Convert all frames to a searchable PDF
    var Result = Ocr.Read(input);
    Result.SaveAsSearchablePdf("searchable.pdf");
}
Imports IronOcr

Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Swahili

Using Input = New OcrInput()
	' Add all frames of a TIFF image
	input.AddMultiFrameTiff("example.tiff")
	' Convert all frames to a searchable PDF
	Dim Result = Ocr.Read(input)
	Result.SaveAsSearchablePdf("searchable.pdf")
End Using
$vbLabelText   $csharpLabel

Hamisha matokeo ya OCR kama HTML

Picha ya OCR kwa uongofu wa XHTML.

using IronOcr;

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Swahili;

using (var Input = new OcrInput())
{
    // Assign a title for the HTML output
    input.Title = "Html Title";
    // Add an image for OCR conversion
    input.AddImage("image1.jpeg");
    var Result = Ocr.Read(input);
    // Save the output as an HTML file
    Result.SaveAsHocrFile("results.html");
}
using IronOcr;

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Swahili;

using (var Input = new OcrInput())
{
    // Assign a title for the HTML output
    input.Title = "Html Title";
    // Add an image for OCR conversion
    input.AddImage("image1.jpeg");
    var Result = Ocr.Read(input);
    // Save the output as an HTML file
    Result.SaveAsHocrFile("results.html");
}
Imports IronOcr

Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Swahili

Using Input = New OcrInput()
	' Assign a title for the HTML output
	input.Title = "Html Title"
	' Add an image for OCR conversion
	input.AddImage("image1.jpeg")
	Dim Result = Ocr.Read(input)
	' Save the output as an HTML file
	Result.SaveAsHocrFile("results.html")
End Using
$vbLabelText   $csharpLabel

Vichungi vya Uboreshaji wa Picha ya OCR

IronOCR hutoa vichungi vya kipekee kwa vitu vya OcrInput ili kuboresha utendaji wa OCR.

Mfano wa Kuboresha Picha

Inafanya picha za kuingiza OCR ubora wa juu ili kutoa matokeo bora, ya haraka ya OCR.

using IronOcr;

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Swahili;

using (var Input = new OcrInput(@"LowQuality.jpeg"))
{
    // Apply image filters to improve OCR
    Input.DeNoise(); // reduces digital noise
    Input.Deskew();  // corrects skew or tilt
    var Result = Ocr.Read(Input);
    Console.WriteLine(Result.Text); // Output the improved OCR text
}
using IronOcr;

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Swahili;

using (var Input = new OcrInput(@"LowQuality.jpeg"))
{
    // Apply image filters to improve OCR
    Input.DeNoise(); // reduces digital noise
    Input.Deskew();  // corrects skew or tilt
    var Result = Ocr.Read(Input);
    Console.WriteLine(Result.Text); // Output the improved OCR text
}
Imports IronOcr

Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Swahili

Using Input = New OcrInput("LowQuality.jpeg")
	' Apply image filters to improve OCR
	Input.DeNoise() ' reduces digital noise
	Input.Deskew() ' corrects skew or tilt
	Dim Result = Ocr.Read(Input)
	Console.WriteLine(Result.Text) ' Output the improved OCR text
End Using
$vbLabelText   $csharpLabel

Orodha ya Vichungi vya Picha za OCR

Vichungi vya kuingiza ili kuongeza utendaji wa OCR ambao umejengwa katika IronOCR ni pamoja na:

  • OcrInput.Rotate(degrees) - Rotates the image by a specified number of degrees clockwise. Use a negative number for counter-clockwise.
  • OcrInput.Binarize() - Converts each pixel to black or white, improving contrasting text on backgrounds.
  • OcrInput.ToGrayScale() - Converts each pixel to grayscale. It can improve speed, not necessarily accuracy.
  • OcrInput.Contrast() - Automatically enhances image contrast. Improves speed and accuracy.
  • OcrInput.DeNoise() - Removes digital noise from image, should be used when noise is expected.
  • OcrInput.Invert() - Inverts colors; white becomes black and black becomes white.
  • OcrInput.Dilate() - Advanced morphology; expands edges, opposite of erode.
  • OcrInput.Erode() - Advanced morphology; contracts edges, opposite of dilate.
  • OcrInput.Deskew() - Corrects image orientation to upright, essential for OCR on skewed scans.
  • OcrInput.DeepCleanBackgroundNoise() - Removes heavy background noise at performance cost.
  • OcrInput.EnhanceResolution() - Increases detail in low-resolution images.

CleanBackgroundNoise. This setting allows clearing digital interference and paper debris from the image, aiding OCR accuracy.

EnhanceContrast improves text contrast against its background for better OCR speed and performance.

EnhanceResolution detects and corrects low-resolution images to ensure readable OCR text.

Language support extends to 126 international language packs, each can be toggled for the OCR operation.

Strategy allows choosing between fast scanning or optimized accuracy using AI models.

ColourSpace determines OCR in grayscale or color, affecting OCR results' speed and accuracy.

DetectWhiteTextOnDarkBackgrounds. Allows OCR to detect inversely colored text against a dark backdrop.

InputImageType. Guides OCR operation to scan documents or specific segments better.

RotateAndStraighten offers unmatched capabilities to handle perspective distortions in images.

ReadBarcode integrates barcode and QR code reading alongside text without added processing time.

ColorDepth. Sets the pixel sampling rate, where higher depth can improve but also delay OCR operation.

Pakiti za Lugha 126

IronOCR inasaidia lugha 126 za kimataifa kupitia vifurushi vya lugha ambavyo vinasambazwa kama DLL, ambazo zinaweza kupakuliwa kutoka kwa wavuti hii, au pia kutoka kwa Meneja wa Kifurushi cha NuGet.

Lugha ni pamoja na Kijerumani, Kifaransa, Kiingereza, Kichina, Kijapani na zingine nyingi. Pakiti za lugha za wataalam zipo kwa pasipoti MRZ, hundi za MICR, Takwimu za Fedha, sahani za Leseni na zingine nyingi. Unaweza pia kutumia faili yoyote ya tesseract ".traineddata" - pamoja na zile unazounda mwenyewe.

Mfano wa Lugha

Kutumia lugha zingine za OCR.

using IronOcr;

// PM> Install IronOcr.Languages.Arabic

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Arabic;

using (var input = new OcrInput())
{
    // Add an Arabic image for OCR processing
    input.AddImage("img/arabic.gif");
    // Optional: Add image filters
    // For low-quality input scenarios
    // that typical Tesseract may struggle with, IronOcr can process effectively

    var Result = Ocr.Read(input);

    // Since Windows might not display Arabic well, save it to disk instead
    Result.SaveAsTextFile("arabic.txt");
}
using IronOcr;

// PM> Install IronOcr.Languages.Arabic

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Arabic;

using (var input = new OcrInput())
{
    // Add an Arabic image for OCR processing
    input.AddImage("img/arabic.gif");
    // Optional: Add image filters
    // For low-quality input scenarios
    // that typical Tesseract may struggle with, IronOcr can process effectively

    var Result = Ocr.Read(input);

    // Since Windows might not display Arabic well, save it to disk instead
    Result.SaveAsTextFile("arabic.txt");
}
Imports IronOcr

' PM> Install IronOcr.Languages.Arabic

Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Arabic

Using input = New OcrInput()
	' Add an Arabic image for OCR processing
	input.AddImage("img/arabic.gif")
	' Optional: Add image filters
	' For low-quality input scenarios
	' that typical Tesseract may struggle with, IronOcr can process effectively

	Dim Result = Ocr.Read(input)

	' Since Windows might not display Arabic well, save it to disk instead
	Result.SaveAsTextFile("arabic.txt")
End Using
$vbLabelText   $csharpLabel

Mfano wa Lugha Nyingi

Inawezekana pia kwa OCR kutumia lugha nyingi kwa wakati mmoja. Hii inaweza kusaidia kupata metadata na URLs za lugha ya Kiingereza katika hati za Unicode.

using IronOcr;

// PM> Install IronOcr.Languages.ChineseSimplified

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.ChineseSimplified;
// Add Swahili as a secondary language for OCR
Ocr.AddSecondaryLanguage(OcrLanguage.Swahili);

// You can add any number of supported languages

using (var input = new OcrInput())
{
    // Add a multi-language PDF
    input.Add("multi-language.pdf");
    var Result = Ocr.Read(input);
    // Save the recognized text to a file
    Result.SaveAsTextFile("results.txt");
}
using IronOcr;

// PM> Install IronOcr.Languages.ChineseSimplified

var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.ChineseSimplified;
// Add Swahili as a secondary language for OCR
Ocr.AddSecondaryLanguage(OcrLanguage.Swahili);

// You can add any number of supported languages

using (var input = new OcrInput())
{
    // Add a multi-language PDF
    input.Add("multi-language.pdf");
    var Result = Ocr.Read(input);
    // Save the recognized text to a file
    Result.SaveAsTextFile("results.txt");
}
Imports IronOcr

' PM> Install IronOcr.Languages.ChineseSimplified

Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.ChineseSimplified
' Add Swahili as a secondary language for OCR
Ocr.AddSecondaryLanguage(OcrLanguage.Swahili)

' You can add any number of supported languages

Using input = New OcrInput()
	' Add a multi-language PDF
	input.Add("multi-language.pdf")
	Dim Result = Ocr.Read(input)
	' Save the recognized text to a file
	Result.SaveAsTextFile("results.txt")
End Using
$vbLabelText   $csharpLabel

Vitu vya Kina vya Matokeo ya OCR

IronOCR inarudi kitu cha matokeo ya OCR kwa kila operesheni ya OCR. Kwa ujumla, watengenezaji hutumia tu mali ya maandishi ya kitu hiki kupata maandishi yaliyochanganuliwa kutoka kwenye picha. Walakini, matokeo ya OCR DOM ni ya hali ya juu zaidi kuliko hii.

using IronOcr;
using System.Drawing; // Add System.Drawing reference for extra processing

var Ocr = new IronTesseract();
Ocr.Language = Ocr.Language.Swahili;

// Configure advanced options like engine mode and barcode reading
Ocr.Configuration.EngineMode = TesseractEngineMode.TesseractAndLstm;
Ocr.Configuration.ReadBarCodes = true; // Essential for barcode reading

using (var Input = new OcrInput(@"images\sample.tiff"))
{
    OcrResult Result = Ocr.Read(Input);

    // Access detailed API objects for insight
    var Pages = Result.Pages; // A collection of all pages
    var Words = Pages[0].Words; // Words from the first page
    var Barcodes = Result.Barcodes; // Detected barcodes in pages

    // Explore an extensive API for:
    // - Pages, Blocks, Paragraphs, Lines, Words, Characters
    // - Image output, Coordinates, Font metadata, Statistical data
}
using IronOcr;
using System.Drawing; // Add System.Drawing reference for extra processing

var Ocr = new IronTesseract();
Ocr.Language = Ocr.Language.Swahili;

// Configure advanced options like engine mode and barcode reading
Ocr.Configuration.EngineMode = TesseractEngineMode.TesseractAndLstm;
Ocr.Configuration.ReadBarCodes = true; // Essential for barcode reading

using (var Input = new OcrInput(@"images\sample.tiff"))
{
    OcrResult Result = Ocr.Read(Input);

    // Access detailed API objects for insight
    var Pages = Result.Pages; // A collection of all pages
    var Words = Pages[0].Words; // Words from the first page
    var Barcodes = Result.Barcodes; // Detected barcodes in pages

    // Explore an extensive API for:
    // - Pages, Blocks, Paragraphs, Lines, Words, Characters
    // - Image output, Coordinates, Font metadata, Statistical data
}
Imports IronOcr
Imports System.Drawing ' Add System.Drawing reference for extra processing

Private Ocr = New IronTesseract()
Ocr.Language = Ocr.Language.Swahili

' Configure advanced options like engine mode and barcode reading
Ocr.Configuration.EngineMode = TesseractEngineMode.TesseractAndLstm
Ocr.Configuration.ReadBarCodes = True ' Essential for barcode reading

Using Input = New OcrInput("images\sample.tiff")
	Dim Result As OcrResult = Ocr.Read(Input)

	' Access detailed API objects for insight
	Dim Pages = Result.Pages ' A collection of all pages
	Dim Words = Pages(0).Words ' Words from the first page
	Dim Barcodes = Result.Barcodes ' Detected barcodes in pages

	' Explore an extensive API for:
	' - Pages, Blocks, Paragraphs, Lines, Words, Characters
	' - Image output, Coordinates, Font metadata, Statistical data
End Using
$vbLabelText   $csharpLabel

Utendaji

IronOCR hufanya kazi nje ya sanduku bila hitaji la tune ya utendaji au kurekebisha sana picha za kuingiza.

Kasi ni Mkali: IronOcr.2020+ ni hadi mara 10 kwa kasi na hufanya makosa zaidi ya 250% kuliko ile ya awali.

Jifunze zaidi

Kujifunza zaidi juu ya OCR katika C#, VB, F#, au nyingine yoyote lugha .NET, tafadhali soma tutorials na jamii yetu, ambayo hutoa mifano halisi ya dunia ya jinsi IronOCR inaweza kutumika na linaweza kuonekana nuances ya jinsi ya kupata nje bora ya maktaba hii.

Rejeleo kamili ya kitu kwa watengenezaji wa NET inapatikana pia.