Kiswahili OCR katika C# na .NET
Matoleo mengine ya waraka huu:
IronOCR ni sehemu ya programu ya C# inayoruhusu nambari za .NET kusoma maandishi kutoka kwa picha na nyaraka za PDF kwa lugha 126, pamoja na Kiswahili.
Ni uma wa hali ya juu wa Tesseract, iliyojengwa peke kwa watengenezaji wa NET na inazidi kuzima injini zingine za Tesseract kwa kasi na usahihi.
Yaliyomo ya IronOcr.Lugha.Swahili
Kifurushi hiki kina lugha 46 za OCR za .NET:
- Kiswahili
- SwahiliBest
- SwahiliFast
Pakua
Kifurushi cha Lugha ya Kiswahili [Kiswahili]
- Download as Zip
- Install with NuGet
Ufungaji
Jambo la kwanza tunalopaswa kufanya ni kusanikisha kifurushi chetu cha Kiswahili OCR kwenye mradi wako wa .NET.
# Kuiweka kupitia NuGet Package Manager
:InstallCmd Install-Package IronOCR.Languages.Swahili
# Kuiweka kupitia NuGet Package Manager
:InstallCmd Install-Package IronOCR.Languages.Swahili
Mfano wa Kanuni
Mfano huu wa C# unasoma maandishi ya Kiswahili kutoka hati au Picha ya PDF.
// Install the Swahili language package using the Package Manager
// PM> Install-Package IronOcr.Languages.Swahili
using IronOcr;
// Create a new IronTesseract instance for OCR processing
var Ocr = new IronTesseract();
// Set the language for OCR to Swahili
Ocr.Language = OcrLanguage.Swahili;
// Wrap OCR operations in a 'using' statement to ensure resources are disposed
using (var Input = new OcrInput(@"images\Swahili.png"))
{
// Perform OCR on the input image and read the result
var Result = Ocr.Read(Input);
// Extract all recognized text
var AllText = Result.Text;
// Output the recognized text
Console.WriteLine(AllText);
}
// Install the Swahili language package using the Package Manager
// PM> Install-Package IronOcr.Languages.Swahili
using IronOcr;
// Create a new IronTesseract instance for OCR processing
var Ocr = new IronTesseract();
// Set the language for OCR to Swahili
Ocr.Language = OcrLanguage.Swahili;
// Wrap OCR operations in a 'using' statement to ensure resources are disposed
using (var Input = new OcrInput(@"images\Swahili.png"))
{
// Perform OCR on the input image and read the result
var Result = Ocr.Read(Input);
// Extract all recognized text
var AllText = Result.Text;
// Output the recognized text
Console.WriteLine(AllText);
}
' Install the Swahili language package using the Package Manager
' PM> Install-Package IronOcr.Languages.Swahili
Imports IronOcr
' Create a new IronTesseract instance for OCR processing
Private Ocr = New IronTesseract()
' Set the language for OCR to Swahili
Ocr.Language = OcrLanguage.Swahili
' Wrap OCR operations in a 'using' statement to ensure resources are disposed
Using Input = New OcrInput("images\Swahili.png")
' Perform OCR on the input image and read the result
Dim Result = Ocr.Read(Input)
' Extract all recognized text
Dim AllText = Result.Text
' Output the recognized text
Console.WriteLine(AllText)
End Using
Kwa nini Uchague IronOCR?
IronOCR ni maktaba rahisi ya kusanikisha, kamili na iliyoandikwa vizuri.
Chagua IronOCR kufikia 99.8% + usahihi wa OCR bila kutumia huduma yoyote ya nje ya wavuti, ada zinazoendelea au kutuma nyaraka za siri kwenye mtandao.
Kwa nini watengenezaji wa C# huchagua IronOCR juu ya Vanilla Tesseract:
- Sakinisha kama DLL moja au NuGet
- Inajumuisha kwa Tesseract 5, 4 na 3 Injini nje ya sanduku.
- Usahihi 99.8% hushinda Tesseract ya kawaida.
- Kasi ya Kuwaka na Kusindika Nyingi
- MVC, WebApp, Desktop, Dashibodi na Maombi ya seva yanaoana
- Hakuna nambari za Exes au C++ za kufanya kazi
- Usaidizi kamili wa PDF OCR
- Kufanya OCR karibu faili yoyote ya Picha au PDF
- Kamili .NET Core, Standard na FrameWork msaada
- Tumia Windows, Mac, Linux, Azure, Docker, Lambda, AWS
- Soma barcode na nambari za QR
- Hamisha OCR kama XHTML
- Hamisha OCR ili utafute hati za PDF
- Msaada wa kusoma anuwai
- Lugha 126 za kimataifa zote zinasimamiwa kupitia faili za NuGet au OcrData
- Dondoa Picha, Uratibu, Takwimu na Fonti. Sio maandishi tu.
- Inaweza kutumiwa kusambaza Tesseract OCR ndani ya matumizi ya biashara na wamiliki.
IronOCR inashine wakati wa kufanya kazi na picha halisi za ulimwengu na nyaraka zisizo kamilifu kama picha, au skanati za azimio la chini ambazo zinaweza kuwa na kelele za dijiti au kutokamilika.
Maktaba mengine ya bure ya OCR ya jukwaa la .NET kama vile API zingine za kukomesha mtandao na huduma za wavuti hazifanyi vizuri sana kwenye kesi hizi za utumiaji wa ulimwengu.
OCR na Tesseract 5 - Anza kuweka Coding katika C#
Sampuli ya nambari hapa chini inaonyesha jinsi ilivyo rahisi kusoma maandishi kutoka kwa picha ukitumia C# au VB .NET.
OneLiner
// Create a new IronTesseract and read text from an image in a single line
string Text = new IronTesseract().Read(@"img\Screenshot.png").Text;
// Create a new IronTesseract and read text from an image in a single line
string Text = new IronTesseract().Read(@"img\Screenshot.png").Text;
' Create a new IronTesseract and read text from an image in a single line
Dim Text As String = (New IronTesseract()).Read("img\Screenshot.png").Text
Dunia inayoweza kusanidiwa
// PM> Install-Package IronOCR.Languages.Swahili
using IronOcr;
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Swahili;
// Use a bulk image input procedure
using (var Input = new OcrInput())
{
// Add a sample image to the input
Input.AddImage("images/sample.jpeg");
// Perform OCR on any number of images
var Result = Ocr.Read(Input);
// Output the recognized text
Console.WriteLine(Result.Text);
}
// PM> Install-Package IronOCR.Languages.Swahili
using IronOcr;
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Swahili;
// Use a bulk image input procedure
using (var Input = new OcrInput())
{
// Add a sample image to the input
Input.AddImage("images/sample.jpeg");
// Perform OCR on any number of images
var Result = Ocr.Read(Input);
// Output the recognized text
Console.WriteLine(Result.Text);
}
' PM> Install-Package IronOCR.Languages.Swahili
Imports IronOcr
Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Swahili
' Use a bulk image input procedure
Using Input = New OcrInput()
' Add a sample image to the input
Input.AddImage("images/sample.jpeg")
' Perform OCR on any number of images
Dim Result = Ocr.Read(Input)
' Output the recognized text
Console.WriteLine(Result.Text)
End Using
C# PDF OCR
Njia hiyo hiyo pia inaweza kutumika kutoa maandishi kutoka kwa hati yoyote ya PDF.
using IronOcr;
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Swahili;
// Create new OcrInput to process a PDF file
using (var input = new OcrInput())
{
// Adding a PDF file for OCR
input.AddPdf("example.pdf", "password");
// Tunaweza pia kuchagua majina maalum ya ukurasa wa PDF kwa OCR
var Result = Ocr.Read(input);
Console.WriteLine(Result.Text);
// Output the number of pages processed
Console.WriteLine($"{Result.Pages.Count()} Pages");
// Note: One Result object handles all pages
}
using IronOcr;
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Swahili;
// Create new OcrInput to process a PDF file
using (var input = new OcrInput())
{
// Adding a PDF file for OCR
input.AddPdf("example.pdf", "password");
// Tunaweza pia kuchagua majina maalum ya ukurasa wa PDF kwa OCR
var Result = Ocr.Read(input);
Console.WriteLine(Result.Text);
// Output the number of pages processed
Console.WriteLine($"{Result.Pages.Count()} Pages");
// Note: One Result object handles all pages
}
Imports IronOcr
Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Swahili
' Create new OcrInput to process a PDF file
Using input = New OcrInput()
' Adding a PDF file for OCR
input.AddPdf("example.pdf", "password")
' Tunaweza pia kuchagua majina maalum ya ukurasa wa PDF kwa OCR
Dim Result = Ocr.Read(input)
Console.WriteLine(Result.Text)
' Output the number of pages processed
Console.WriteLine($"{Result.Pages.Count()} Pages")
' Note: One Result object handles all pages
End Using
OCR kwa MultiPage TIFFs
OCR ya kusoma faili ya faili ya TIFF pamoja na hati nyingi za ukurasa TIFF pia inaweza kubadilishwa moja kwa moja kuwa faili ya PDF na maandishi ya kutafutwa.
using IronOcr;
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Swahili;
using (var Input = new OcrInput())
{
// Add a multi-frame TIFF image
input.AddMultiFrameTiff("multi-frame.tiff");
var Result = Ocr.Read(Input);
// Output recognized text
Console.WriteLine(Result.Text);
}
using IronOcr;
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Swahili;
using (var Input = new OcrInput())
{
// Add a multi-frame TIFF image
input.AddMultiFrameTiff("multi-frame.tiff");
var Result = Ocr.Read(Input);
// Output recognized text
Console.WriteLine(Result.Text);
}
Imports IronOcr
Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Swahili
Using Input = New OcrInput()
' Add a multi-frame TIFF image
input.AddMultiFrameTiff("multi-frame.tiff")
Dim Result = Ocr.Read(Input)
' Output recognized text
Console.WriteLine(Result.Text)
End Using
Misimbo ya alama na QR
Kipengele cha kipekee cha IronOCR ni kwamba inaweza kusoma barcode na nambari za QR kutoka kwa hati wakati inatafuta maandishi. Matukio ya Darasa la OcrResult.OcrBarcode
humpa msanidi programu maelezo ya kina juu ya kila OcrResult.OcrBarcode
.
using IronOcr;
var Ocr = new IronTesseract();
Ocr.Configuration.ReadBarCodes = true;
using (var input = new OcrInput())
{
input.AddImage("img/Barcode.png");
var Result = Ocr.Read(input);
// Iterate and print all barcode values found in the image
foreach (var Barcode in Result.Barcodes)
{
Console.WriteLine(Barcode.Value);
// Type and location properties are also exposed
}
}
using IronOcr;
var Ocr = new IronTesseract();
Ocr.Configuration.ReadBarCodes = true;
using (var input = new OcrInput())
{
input.AddImage("img/Barcode.png");
var Result = Ocr.Read(input);
// Iterate and print all barcode values found in the image
foreach (var Barcode in Result.Barcodes)
{
Console.WriteLine(Barcode.Value);
// Type and location properties are also exposed
}
}
Imports IronOcr
Private Ocr = New IronTesseract()
Ocr.Configuration.ReadBarCodes = True
Using input = New OcrInput()
input.AddImage("img/Barcode.png")
Dim Result = Ocr.Read(input)
' Iterate and print all barcode values found in the image
For Each Barcode In Result.Barcodes
Console.WriteLine(Barcode.Value)
' Type and location properties are also exposed
Next Barcode
End Using
OCR kwenye Maeneo Maalum ya Picha
Njia zote za skena na kusoma za IronOCR hutoa uwezo wa kutaja ni sehemu gani ya ukurasa au kurasa tunataka kusoma maandishi kutoka. Hii ni muhimu sana wakati tunaangalia fomu zilizosanifiwa na inaweza kuokoa muda mwingi na kuboresha ufanisi.
Kutumia maeneo ya mazao, tutahitaji kuongeza rejeleo la mfumo kwa System.Drawing
kwa Mfumo ili tuweze kutumia kitu cha System.Drawing.Rectangle
using IronOcr;
using System.Drawing; // Ensure using System.Drawing for Rectangle
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Swahili;
using (var Input = new OcrInput())
{
// Define a specific rectangular area for OCR
var ContentArea = new Rectangle() { X = 215, Y = 1250, Height = 280, Width = 1335 };
// Pass the specific area, not the whole image
Input.Add("document.png", ContentArea);
var Result = Ocr.Read(Input);
// Output the text from the specific content area
Console.WriteLine(Result.Text);
}
using IronOcr;
using System.Drawing; // Ensure using System.Drawing for Rectangle
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Swahili;
using (var Input = new OcrInput())
{
// Define a specific rectangular area for OCR
var ContentArea = new Rectangle() { X = 215, Y = 1250, Height = 280, Width = 1335 };
// Pass the specific area, not the whole image
Input.Add("document.png", ContentArea);
var Result = Ocr.Read(Input);
// Output the text from the specific content area
Console.WriteLine(Result.Text);
}
Imports IronOcr
Imports System.Drawing ' Ensure using System.Drawing for Rectangle
Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Swahili
Using Input = New OcrInput()
' Define a specific rectangular area for OCR
Dim ContentArea = New Rectangle() With {
.X = 215,
.Y = 1250,
.Height = 280,
.Width = 1335
}
' Pass the specific area, not the whole image
Input.Add("document.png", ContentArea)
Dim Result = Ocr.Read(Input)
' Output the text from the specific content area
Console.WriteLine(Result.Text)
End Using
OCR kwa Skana za Ubora wa Chini
OcrInput
la IronOCR linaweza kurekebisha picha ambazo Tesseract ya kawaida haiwezi kusoma.
using IronOcr;
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Swahili;
using (var Input = new OcrInput(@"img\Potter.LowQuality.tiff"))
{
// Apply denoising filter to reduce noise
Input.DeNoise();
// Apply deskewing filter to correct orientation
Input.Deskew();
var Result = Ocr.Read(Input);
// Output the recognized text
Console.WriteLine(Result.Text);
}
using IronOcr;
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Swahili;
using (var Input = new OcrInput(@"img\Potter.LowQuality.tiff"))
{
// Apply denoising filter to reduce noise
Input.DeNoise();
// Apply deskewing filter to correct orientation
Input.Deskew();
var Result = Ocr.Read(Input);
// Output the recognized text
Console.WriteLine(Result.Text);
}
Imports IronOcr
Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Swahili
Using Input = New OcrInput("img\Potter.LowQuality.tiff")
' Apply denoising filter to reduce noise
Input.DeNoise()
' Apply deskewing filter to correct orientation
Input.Deskew()
Dim Result = Ocr.Read(Input)
' Output the recognized text
Console.WriteLine(Result.Text)
End Using
Hamisha matokeo ya OCR kama PDF inayoweza kutafutwa
Picha kwa PDF na nyuzi za maandishi za kunakili. Inaweza kuorodheshwa na injini za utaftaji na hifadhidata.
using IronOcr;
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Swahili;
using (var Input = new OcrInput())
{
// Assign a title to the output PDF
input.Title = "Quarterly Report";
// Add multiple images for combined PDF output
input.AddImage("image1.jpeg");
input.AddImage("image2.png");
input.AddImage("image3.gif");
var Result = Ocr.Read(input);
// Save the output as a searchable PDF
Result.SaveAsSearchablePdf("searchable.pdf");
}
using IronOcr;
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Swahili;
using (var Input = new OcrInput())
{
// Assign a title to the output PDF
input.Title = "Quarterly Report";
// Add multiple images for combined PDF output
input.AddImage("image1.jpeg");
input.AddImage("image2.png");
input.AddImage("image3.gif");
var Result = Ocr.Read(input);
// Save the output as a searchable PDF
Result.SaveAsSearchablePdf("searchable.pdf");
}
Imports IronOcr
Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Swahili
Using Input = New OcrInput()
' Assign a title to the output PDF
input.Title = "Quarterly Report"
' Add multiple images for combined PDF output
input.AddImage("image1.jpeg")
input.AddImage("image2.png")
input.AddImage("image3.gif")
Dim Result = Ocr.Read(input)
' Save the output as a searchable PDF
Result.SaveAsSearchablePdf("searchable.pdf")
End Using
TIFF kutafuta Uongofu wa PDF
Badili hati ya TIFF (au kikundi chochote cha faili za picha) moja kwa moja kwenye PDF inayoweza kutafutwa ambayo inaweza kuorodheshwa na intranet, wavuti na injini za utaftaji za google.
using IronOcr;
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Swahili;
using (var Input = new OcrInput())
{
// Add all frames of a TIFF image
input.AddMultiFrameTiff("example.tiff");
// Convert all frames to a searchable PDF
var Result = Ocr.Read(input);
Result.SaveAsSearchablePdf("searchable.pdf");
}
using IronOcr;
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Swahili;
using (var Input = new OcrInput())
{
// Add all frames of a TIFF image
input.AddMultiFrameTiff("example.tiff");
// Convert all frames to a searchable PDF
var Result = Ocr.Read(input);
Result.SaveAsSearchablePdf("searchable.pdf");
}
Imports IronOcr
Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Swahili
Using Input = New OcrInput()
' Add all frames of a TIFF image
input.AddMultiFrameTiff("example.tiff")
' Convert all frames to a searchable PDF
Dim Result = Ocr.Read(input)
Result.SaveAsSearchablePdf("searchable.pdf")
End Using
Hamisha matokeo ya OCR kama HTML
Picha ya OCR kwa uongofu wa XHTML.
using IronOcr;
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Swahili;
using (var Input = new OcrInput())
{
// Assign a title for the HTML output
input.Title = "Html Title";
// Add an image for OCR conversion
input.AddImage("image1.jpeg");
var Result = Ocr.Read(input);
// Save the output as an HTML file
Result.SaveAsHocrFile("results.html");
}
using IronOcr;
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Swahili;
using (var Input = new OcrInput())
{
// Assign a title for the HTML output
input.Title = "Html Title";
// Add an image for OCR conversion
input.AddImage("image1.jpeg");
var Result = Ocr.Read(input);
// Save the output as an HTML file
Result.SaveAsHocrFile("results.html");
}
Imports IronOcr
Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Swahili
Using Input = New OcrInput()
' Assign a title for the HTML output
input.Title = "Html Title"
' Add an image for OCR conversion
input.AddImage("image1.jpeg")
Dim Result = Ocr.Read(input)
' Save the output as an HTML file
Result.SaveAsHocrFile("results.html")
End Using
Vichungi vya Uboreshaji wa Picha ya OCR
IronOCR hutoa vichungi vya kipekee kwa vitu vya OcrInput
ili kuboresha utendaji wa OCR.
Mfano wa Kuboresha Picha
Inafanya picha za kuingiza OCR ubora wa juu ili kutoa matokeo bora, ya haraka ya OCR.
using IronOcr;
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Swahili;
using (var Input = new OcrInput(@"LowQuality.jpeg"))
{
// Apply image filters to improve OCR
Input.DeNoise(); // reduces digital noise
Input.Deskew(); // corrects skew or tilt
var Result = Ocr.Read(Input);
Console.WriteLine(Result.Text); // Output the improved OCR text
}
using IronOcr;
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Swahili;
using (var Input = new OcrInput(@"LowQuality.jpeg"))
{
// Apply image filters to improve OCR
Input.DeNoise(); // reduces digital noise
Input.Deskew(); // corrects skew or tilt
var Result = Ocr.Read(Input);
Console.WriteLine(Result.Text); // Output the improved OCR text
}
Imports IronOcr
Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Swahili
Using Input = New OcrInput("LowQuality.jpeg")
' Apply image filters to improve OCR
Input.DeNoise() ' reduces digital noise
Input.Deskew() ' corrects skew or tilt
Dim Result = Ocr.Read(Input)
Console.WriteLine(Result.Text) ' Output the improved OCR text
End Using
Orodha ya Vichungi vya Picha za OCR
Vichungi vya kuingiza ili kuongeza utendaji wa OCR ambao umejengwa katika IronOCR ni pamoja na:
- OcrInput.Rotate(degrees) - Rotates the image by a specified number of degrees clockwise. Use a negative number for counter-clockwise.
- OcrInput.Binarize() - Converts each pixel to black or white, improving contrasting text on backgrounds.
- OcrInput.ToGrayScale() - Converts each pixel to grayscale. It can improve speed, not necessarily accuracy.
- OcrInput.Contrast() - Automatically enhances image contrast. Improves speed and accuracy.
- OcrInput.DeNoise() - Removes digital noise from image, should be used when noise is expected.
- OcrInput.Invert() - Inverts colors; white becomes black and black becomes white.
- OcrInput.Dilate() - Advanced morphology; expands edges, opposite of erode.
- OcrInput.Erode() - Advanced morphology; contracts edges, opposite of dilate.
- OcrInput.Deskew() - Corrects image orientation to upright, essential for OCR on skewed scans.
- OcrInput.DeepCleanBackgroundNoise() - Removes heavy background noise at performance cost.
- OcrInput.EnhanceResolution() - Increases detail in low-resolution images.
CleanBackgroundNoise. This setting allows clearing digital interference and paper debris from the image, aiding OCR accuracy.
EnhanceContrast improves text contrast against its background for better OCR speed and performance.
EnhanceResolution detects and corrects low-resolution images to ensure readable OCR text.
Language support extends to 126 international language packs, each can be toggled for the OCR operation.
Strategy allows choosing between fast scanning or optimized accuracy using AI models.
ColourSpace determines OCR in grayscale or color, affecting OCR results' speed and accuracy.
DetectWhiteTextOnDarkBackgrounds. Allows OCR to detect inversely colored text against a dark backdrop.
InputImageType. Guides OCR operation to scan documents or specific segments better.
RotateAndStraighten offers unmatched capabilities to handle perspective distortions in images.
ReadBarcode integrates barcode and QR code reading alongside text without added processing time.
ColorDepth. Sets the pixel sampling rate, where higher depth can improve but also delay OCR operation.
Pakiti za Lugha 126
IronOCR inasaidia lugha 126 za kimataifa kupitia vifurushi vya lugha ambavyo vinasambazwa kama DLL, ambazo zinaweza kupakuliwa kutoka kwa wavuti hii, au pia kutoka kwa Meneja wa Kifurushi cha NuGet.
Lugha ni pamoja na Kijerumani, Kifaransa, Kiingereza, Kichina, Kijapani na zingine nyingi. Pakiti za lugha za wataalam zipo kwa pasipoti MRZ, hundi za MICR, Takwimu za Fedha, sahani za Leseni na zingine nyingi. Unaweza pia kutumia faili yoyote ya tesseract ".traineddata" - pamoja na zile unazounda mwenyewe.
Mfano wa Lugha
Kutumia lugha zingine za OCR.
using IronOcr;
// PM> Install IronOcr.Languages.Arabic
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Arabic;
using (var input = new OcrInput())
{
// Add an Arabic image for OCR processing
input.AddImage("img/arabic.gif");
// Optional: Add image filters
// For low-quality input scenarios
// that typical Tesseract may struggle with, IronOcr can process effectively
var Result = Ocr.Read(input);
// Since Windows might not display Arabic well, save it to disk instead
Result.SaveAsTextFile("arabic.txt");
}
using IronOcr;
// PM> Install IronOcr.Languages.Arabic
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Arabic;
using (var input = new OcrInput())
{
// Add an Arabic image for OCR processing
input.AddImage("img/arabic.gif");
// Optional: Add image filters
// For low-quality input scenarios
// that typical Tesseract may struggle with, IronOcr can process effectively
var Result = Ocr.Read(input);
// Since Windows might not display Arabic well, save it to disk instead
Result.SaveAsTextFile("arabic.txt");
}
Imports IronOcr
' PM> Install IronOcr.Languages.Arabic
Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Arabic
Using input = New OcrInput()
' Add an Arabic image for OCR processing
input.AddImage("img/arabic.gif")
' Optional: Add image filters
' For low-quality input scenarios
' that typical Tesseract may struggle with, IronOcr can process effectively
Dim Result = Ocr.Read(input)
' Since Windows might not display Arabic well, save it to disk instead
Result.SaveAsTextFile("arabic.txt")
End Using
Mfano wa Lugha Nyingi
Inawezekana pia kwa OCR kutumia lugha nyingi kwa wakati mmoja. Hii inaweza kusaidia kupata metadata na URLs za lugha ya Kiingereza katika hati za Unicode.
using IronOcr;
// PM> Install IronOcr.Languages.ChineseSimplified
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.ChineseSimplified;
// Add Swahili as a secondary language for OCR
Ocr.AddSecondaryLanguage(OcrLanguage.Swahili);
// You can add any number of supported languages
using (var input = new OcrInput())
{
// Add a multi-language PDF
input.Add("multi-language.pdf");
var Result = Ocr.Read(input);
// Save the recognized text to a file
Result.SaveAsTextFile("results.txt");
}
using IronOcr;
// PM> Install IronOcr.Languages.ChineseSimplified
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.ChineseSimplified;
// Add Swahili as a secondary language for OCR
Ocr.AddSecondaryLanguage(OcrLanguage.Swahili);
// You can add any number of supported languages
using (var input = new OcrInput())
{
// Add a multi-language PDF
input.Add("multi-language.pdf");
var Result = Ocr.Read(input);
// Save the recognized text to a file
Result.SaveAsTextFile("results.txt");
}
Imports IronOcr
' PM> Install IronOcr.Languages.ChineseSimplified
Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.ChineseSimplified
' Add Swahili as a secondary language for OCR
Ocr.AddSecondaryLanguage(OcrLanguage.Swahili)
' You can add any number of supported languages
Using input = New OcrInput()
' Add a multi-language PDF
input.Add("multi-language.pdf")
Dim Result = Ocr.Read(input)
' Save the recognized text to a file
Result.SaveAsTextFile("results.txt")
End Using
Vitu vya Kina vya Matokeo ya OCR
IronOCR inarudi kitu cha matokeo ya OCR kwa kila operesheni ya OCR. Kwa ujumla, watengenezaji hutumia tu mali ya maandishi ya kitu hiki kupata maandishi yaliyochanganuliwa kutoka kwenye picha. Walakini, matokeo ya OCR DOM ni ya hali ya juu zaidi kuliko hii.
using IronOcr;
using System.Drawing; // Add System.Drawing reference for extra processing
var Ocr = new IronTesseract();
Ocr.Language = Ocr.Language.Swahili;
// Configure advanced options like engine mode and barcode reading
Ocr.Configuration.EngineMode = TesseractEngineMode.TesseractAndLstm;
Ocr.Configuration.ReadBarCodes = true; // Essential for barcode reading
using (var Input = new OcrInput(@"images\sample.tiff"))
{
OcrResult Result = Ocr.Read(Input);
// Access detailed API objects for insight
var Pages = Result.Pages; // A collection of all pages
var Words = Pages[0].Words; // Words from the first page
var Barcodes = Result.Barcodes; // Detected barcodes in pages
// Explore an extensive API for:
// - Pages, Blocks, Paragraphs, Lines, Words, Characters
// - Image output, Coordinates, Font metadata, Statistical data
}
using IronOcr;
using System.Drawing; // Add System.Drawing reference for extra processing
var Ocr = new IronTesseract();
Ocr.Language = Ocr.Language.Swahili;
// Configure advanced options like engine mode and barcode reading
Ocr.Configuration.EngineMode = TesseractEngineMode.TesseractAndLstm;
Ocr.Configuration.ReadBarCodes = true; // Essential for barcode reading
using (var Input = new OcrInput(@"images\sample.tiff"))
{
OcrResult Result = Ocr.Read(Input);
// Access detailed API objects for insight
var Pages = Result.Pages; // A collection of all pages
var Words = Pages[0].Words; // Words from the first page
var Barcodes = Result.Barcodes; // Detected barcodes in pages
// Explore an extensive API for:
// - Pages, Blocks, Paragraphs, Lines, Words, Characters
// - Image output, Coordinates, Font metadata, Statistical data
}
Imports IronOcr
Imports System.Drawing ' Add System.Drawing reference for extra processing
Private Ocr = New IronTesseract()
Ocr.Language = Ocr.Language.Swahili
' Configure advanced options like engine mode and barcode reading
Ocr.Configuration.EngineMode = TesseractEngineMode.TesseractAndLstm
Ocr.Configuration.ReadBarCodes = True ' Essential for barcode reading
Using Input = New OcrInput("images\sample.tiff")
Dim Result As OcrResult = Ocr.Read(Input)
' Access detailed API objects for insight
Dim Pages = Result.Pages ' A collection of all pages
Dim Words = Pages(0).Words ' Words from the first page
Dim Barcodes = Result.Barcodes ' Detected barcodes in pages
' Explore an extensive API for:
' - Pages, Blocks, Paragraphs, Lines, Words, Characters
' - Image output, Coordinates, Font metadata, Statistical data
End Using
Utendaji
IronOCR hufanya kazi nje ya sanduku bila hitaji la tune ya utendaji au kurekebisha sana picha za kuingiza.
Kasi ni Mkali: IronOcr.2020+ ni hadi mara 10 kwa kasi na hufanya makosa zaidi ya 250% kuliko ile ya awali.
Jifunze zaidi
Kujifunza zaidi juu ya OCR katika C#, VB, F#, au nyingine yoyote lugha .NET, tafadhali soma tutorials na jamii yetu, ambayo hutoa mifano halisi ya dunia ya jinsi IronOCR inaweza kutumika na linaweza kuonekana nuances ya jinsi ya kupata nje bora ya maktaba hii.
Rejeleo kamili ya kitu kwa watengenezaji wa NET inapatikana pia.