.NET et Latine PDF in C#
Other versions of this document:
IronOCR .NET quod C# software pars permittens coders legere text et imagines ab PDF documents in lingua CXXVI, inter latinam.
Bivio Tesseract provectus est constructum in nisi tincidunt NET outperforms regulariter et alia tormenta properat Tesseract sagaciter.
Contentis in IronOcr.Languages.Latin
XL Haec contain sarcina pro .NET PDF linguis:
- Latine
- LatinBest
- LatinFast
Download
Pack Lingua Latina [microform]
* Download as zip
* Install with NuGet as https://www.nuget.org/packages/IronOcr.Languages.Latin/
Institutionem
Primum quod nobis have efficio est install sarcina ad nos latine PDF .NET project.
PM> Install-Package IronOCR.Languages.Latin
Exemplum Code
# Latine legit illud a C codice exempli gratia haec est imago seu PDF document.
// PM> Install-Package IronOcr.Languages.Latin
using IronOcr;
var Ocr = new IronTesseract();
// Set the OCR language to Latin
Ocr.Language = OcrLanguage.Latin;
using (var Input = new OcrInput(@"images\Latin.png"))
{
var Result = Ocr.Read(Input);
// Retrieve the recognized text
var AllText = Result.Text;
}
// PM> Install-Package IronOcr.Languages.Latin
using IronOcr;
var Ocr = new IronTesseract();
// Set the OCR language to Latin
Ocr.Language = OcrLanguage.Latin;
using (var Input = new OcrInput(@"images\Latin.png"))
{
var Result = Ocr.Read(Input);
// Retrieve the recognized text
var AllText = Result.Text;
}
' PM> Install-Package IronOcr.Languages.Latin
Imports IronOcr
Private Ocr = New IronTesseract()
' Set the OCR language to Latin
Ocr.Language = OcrLanguage.Latin
Using Input = New OcrInput("images\Latin.png")
Dim Result = Ocr.Read(Input)
' Retrieve the recognized text
Dim AllText = Result.Text
End Using
Elige IronOCR Quid?
PDF est facilis ad install ferrum: et integram bibliothecam bene amet .NET software.
Elige IronOCR ut consequi accurate 99.8% + reproduction aliqua externa absque usura textus muneris, permanens in interrete documenta secreto fees et mittens.
Cur C#developers eligere IronOCR super Vanilla Tesseract:
- DLL ut install vel una NuGet
- Includit enim Tesseract V, III et IV Engines ex arca archa.
- Sagaciter 99.8% significantly outperforms Tesseract iusto.
- Mobilitate et Blazing MultiThreading
- MVC, WebApp, Desktop: Servo Console & Application compatible
- Non est opus in codice C++ vel Exes
- PDF PDF plena firmamentum
- PDF PDF et praestare aut quasi quis lima Image
- .NET plena Core Latin compage et firmamentum
- Deploy in Fenestra, Mac, Linux, parma caelurea, Docker, Lambda, AWS
- Read barcodes et QR codes
- Export PDF ut prime
- PDF PDF documents ut export searchable
- Multithreading firmamentum
- CXXVI managed omnibus linguis gentium, aut per NuGet OcrData files
- Extract Images, Coordinata geographica, photos and Pelvis. Non iustus text.
- PDF interius posse ad redistribuere Tesseract commercial & proprietary applications.
Quando operantes cum reali mundi imagines, et ferrum lucet PDF documents ut imperfectus imagines, vel de visu perlustrat low resolutio cuius sonitus aut digital imperfectionibus.
Alius liber PDF libraries pro .NET platform alia huiusmodi .net Tesseract APIs telam, et non officia praestare, ita etiam in his casibus, uti realem mundi.
PDF cum Tesseract V - start coding in C#
In codice infra sample ostendit quomodo legere facile est in textu, ex imagine usura .NET C#aut VB.
OneLiner
// Read text from an image in one line
string Text = new IronTesseract().Read(@"img\Screenshot.png").Text;
// Read text from an image in one line
string Text = new IronTesseract().Read(@"img\Screenshot.png").Text;
' Read text from an image in one line
Dim Text As String = (New IronTesseract()).Read("img\Screenshot.png").Text
Salve Configurable Orbis Terrarum
// PM> Install-Package IronOCR.Languages.Latin
using IronOcr;
var Ocr = new IronTesseract();
// Set the language to Latin
Ocr.Language = OcrLanguage.Latin;
using (var Input = new OcrInput())
{
// Add an image to the OCR input
Input.AddImage("images/sample.jpeg");
// You can add any number of images ...
var Result = Ocr.Read(Input);
// Output the recognized text to the console
Console.WriteLine(Result.Text);
}
// PM> Install-Package IronOCR.Languages.Latin
using IronOcr;
var Ocr = new IronTesseract();
// Set the language to Latin
Ocr.Language = OcrLanguage.Latin;
using (var Input = new OcrInput())
{
// Add an image to the OCR input
Input.AddImage("images/sample.jpeg");
// You can add any number of images ...
var Result = Ocr.Read(Input);
// Output the recognized text to the console
Console.WriteLine(Result.Text);
}
' PM> Install-Package IronOCR.Languages.Latin
Imports IronOcr
Private Ocr = New IronTesseract()
' Set the language to Latin
Ocr.Language = OcrLanguage.Latin
Using Input = New OcrInput()
' Add an image to the OCR input
Input.AddImage("images/sample.jpeg")
' You can add any number of images ...
Dim Result = Ocr.Read(Input)
' Output the recognized text to the console
Console.WriteLine(Result.Text)
End Using
Eliciting Text from a PDF in C#
Et similiter potest uelim sese uti in textus ab aliquo PDF document eliciunt.
using IronOcr;
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Latin;
using (var input = new OcrInput())
{
// Add a PDF document as input
input.AddPdf("example.pdf", "password");
// Read the text from the PDF document
var Result = Ocr.Read(input);
// Write the text to the console
Console.WriteLine(Result.Text);
Console.WriteLine($"{Result.Pages.Count} Pages");
}
using IronOcr;
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Latin;
using (var input = new OcrInput())
{
// Add a PDF document as input
input.AddPdf("example.pdf", "password");
// Read the text from the PDF document
var Result = Ocr.Read(input);
// Write the text to the console
Console.WriteLine(Result.Text);
Console.WriteLine($"{Result.Pages.Count} Pages");
}
Imports IronOcr
Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Latin
Using input = New OcrInput()
' Add a PDF document as input
input.AddPdf("example.pdf", "password")
' Read the text from the PDF document
Dim Result = Ocr.Read(input)
' Write the text to the console
Console.WriteLine(Result.Text)
Console.WriteLine($"{Result.Pages.Count} Pages")
End Using
Handling MultiPage PDF and TIFF
Lectio TIFF comprehendo multiple lima page format PDF documents. TIFF quoque conversus recta cum solis archivii PDF searchable in textu.
using IronOcr;
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Latin;
using (var Input = new OcrInput())
{
// Process a multi-frame TIFF
Input.AddMultiFrameTiff("multi-frame.tiff");
var Result = Ocr.Read(Input);
// Output recognized text
Console.WriteLine(Result.Text);
}
using IronOcr;
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Latin;
using (var Input = new OcrInput())
{
// Process a multi-frame TIFF
Input.AddMultiFrameTiff("multi-frame.tiff");
var Result = Ocr.Read(Input);
// Output recognized text
Console.WriteLine(Result.Text);
}
Imports IronOcr
Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Latin
Using Input = New OcrInput()
' Process a multi-frame TIFF
Input.AddMultiFrameTiff("multi-frame.tiff")
Dim Result = Ocr.Read(Input)
' Output recognized text
Console.WriteLine(Result.Text)
End Using
Reading Barcodes and QR Codes
A unique feature to read barcodes and QR codes from documents while scanning text. The OcrResult.OcrBarcode
class provides developers access to barcode scanning results.
using IronOcr;
var Ocr = new IronTesseract();
// Enable barcode reading
Ocr.Configuration.ReadBarCodes = true;
using (var input = new OcrInput())
{
// Add an image with a barcode
input.AddImage("img/Barcode.png");
var Result = Ocr.Read(input);
// Iterate and print each barcode found
foreach (var Barcode in Result.Barcodes)
{
Console.WriteLine(Barcode.Value);
// Other properties such as location and type are also accessible
}
}
using IronOcr;
var Ocr = new IronTesseract();
// Enable barcode reading
Ocr.Configuration.ReadBarCodes = true;
using (var input = new OcrInput())
{
// Add an image with a barcode
input.AddImage("img/Barcode.png");
var Result = Ocr.Read(input);
// Iterate and print each barcode found
foreach (var Barcode in Result.Barcodes)
{
Console.WriteLine(Barcode.Value);
// Other properties such as location and type are also accessible
}
}
Imports IronOcr
Private Ocr = New IronTesseract()
' Enable barcode reading
Ocr.Configuration.ReadBarCodes = True
Using input = New OcrInput()
' Add an image with a barcode
input.AddImage("img/Barcode.png")
Dim Result = Ocr.Read(input)
' Iterate and print each barcode found
For Each Barcode In Result.Barcodes
Console.WriteLine(Barcode.Value)
' Other properties such as location and type are also accessible
Next Barcode
End Using
PDF in certain images
Specifying page areas to improve efficiency and save processing time.
To use crop regions, ensure System.Drawing
is imported to use System.Drawing.Rectangle
.
using IronOcr;
using System.Drawing;
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Latin;
using (var Input = new OcrInput())
{
// Define the area of interest in the image
var ContentArea = new Rectangle()
{
X = 215,
Y = 1250,
Height = 280,
Width = 1335 // Measurements in pixels
};
Input.Add("document.png", ContentArea);
var Result = Ocr.Read(Input);
Console.WriteLine(Result.Text);
}
using IronOcr;
using System.Drawing;
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Latin;
using (var Input = new OcrInput())
{
// Define the area of interest in the image
var ContentArea = new Rectangle()
{
X = 215,
Y = 1250,
Height = 280,
Width = 1335 // Measurements in pixels
};
Input.Add("document.png", ContentArea);
var Result = Ocr.Read(Input);
Console.WriteLine(Result.Text);
}
Imports IronOcr
Imports System.Drawing
Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Latin
Using Input = New OcrInput()
' Define the area of interest in the image
Dim ContentArea = New Rectangle() With {
.X = 215,
.Y = 1250,
.Height = 280,
.Width = 1335
}
Input.Add("document.png", ContentArea)
Dim Result = Ocr.Read(Input)
Console.WriteLine(Result.Text)
End Using
PDF scans for low-quality images
Use OcrInput
methods to enhance the quality of scans that normal Tesseract would struggle with.
using IronOcr;
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Latin;
using (var Input = new OcrInput(@"img\Potter.LowQuality.tiff"))
{
// Acts on the image to improve quality
Input.DeNoise(); // Removes noise
Input.Deskew(); // Corrects rotation and alignment
var Result = Ocr.Read(Input);
Console.WriteLine(Result.Text);
}
using IronOcr;
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Latin;
using (var Input = new OcrInput(@"img\Potter.LowQuality.tiff"))
{
// Acts on the image to improve quality
Input.DeNoise(); // Removes noise
Input.Deskew(); // Corrects rotation and alignment
var Result = Ocr.Read(Input);
Console.WriteLine(Result.Text);
}
Imports IronOcr
Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Latin
Using Input = New OcrInput("img\Potter.LowQuality.tiff")
' Acts on the image to improve quality
Input.DeNoise() ' Removes noise
Input.Deskew() ' Corrects rotation and alignment
Dim Result = Ocr.Read(Input)
Console.WriteLine(Result.Text)
End Using
Export scanned PDF results as searchable PDF
PDF Image conversion into searchable, indexable PDF format.
using IronOcr;
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Latin;
using (var Input = new OcrInput())
{
Input.Title = "Quarterly Report";
Input.AddImage("image1.jpeg");
Input.AddImage("image2.png");
Input.AddImage("image3.gif");
var Result = Ocr.Read(Input);
// Save recognized content as a searchable PDF
Result.SaveAsSearchablePdf("searchable.pdf");
}
using IronOcr;
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Latin;
using (var Input = new OcrInput())
{
Input.Title = "Quarterly Report";
Input.AddImage("image1.jpeg");
Input.AddImage("image2.png");
Input.AddImage("image3.gif");
var Result = Ocr.Read(Input);
// Save recognized content as a searchable PDF
Result.SaveAsSearchablePdf("searchable.pdf");
}
Imports IronOcr
Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Latin
Using Input = New OcrInput()
Input.Title = "Quarterly Report"
Input.AddImage("image1.jpeg")
Input.AddImage("image2.png")
Input.AddImage("image3.gif")
Dim Result = Ocr.Read(Input)
' Save recognized content as a searchable PDF
Result.SaveAsSearchablePdf("searchable.pdf")
End Using
Conversion of TIFF to a searchable PDF
Convert a TIFF with an entire image set directly into a searchable PDF that can be indexed by search engines or intranet services.
using IronOcr;
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Latin;
using (var Input = new OcrInput())
{
// Add TIFF images
Input.AddMultiFrameTiff("example.tiff");
var Result = Ocr.Read(Input);
// Save the recognized content as a searchable PDF
Result.SaveAsSearchablePdf("searchable.pdf");
}
using IronOcr;
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Latin;
using (var Input = new OcrInput())
{
// Add TIFF images
Input.AddMultiFrameTiff("example.tiff");
var Result = Ocr.Read(Input);
// Save the recognized content as a searchable PDF
Result.SaveAsSearchablePdf("searchable.pdf");
}
Imports IronOcr
Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Latin
Using Input = New OcrInput()
' Add TIFF images
Input.AddMultiFrameTiff("example.tiff")
Dim Result = Ocr.Read(Input)
' Save the recognized content as a searchable PDF
Result.SaveAsSearchablePdf("searchable.pdf")
End Using
Reproduction of HTML and export results
PDF Image prime conversion to HTML for text preservation.
using IronOcr;
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Latin;
using (var Input = new OcrInput())
{
Input.Title = "Html Title";
Input.AddImage("image1.jpeg");
var Result = Ocr.Read(Input);
// Save as HOCR HTML file, useful for web pages preserving layout
Result.SaveAsHocrFile("results.html");
}
using IronOcr;
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Latin;
using (var Input = new OcrInput())
{
Input.Title = "Html Title";
Input.AddImage("image1.jpeg");
var Result = Ocr.Read(Input);
// Save as HOCR HTML file, useful for web pages preserving layout
Result.SaveAsHocrFile("results.html");
}
Imports IronOcr
Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Latin
Using Input = New OcrInput()
Input.Title = "Html Title"
Input.AddImage("image1.jpeg")
Dim Result = Ocr.Read(Input)
' Save as HOCR HTML file, useful for web pages preserving layout
Result.SaveAsHocrFile("results.html")
End Using
PDF Image Enhancement Results
IronOCR offers unique filters for OcrInput
objects to enhance PDF input performance.
Example Image Code Enhancement
Enhancing input quality allows for higher accuracy and faster OCR processing.
using IronOcr;
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Latin;
using (var Input = new OcrInput(@"LowQuality.jpeg"))
{
// Improve image readability
Input.DeNoise(); // Clean up noisy data
Input.Deskew(); // Align the image correctly
var Result = Ocr.Read(Input);
Console.WriteLine(Result.Text);
}
using IronOcr;
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Latin;
using (var Input = new OcrInput(@"LowQuality.jpeg"))
{
// Improve image readability
Input.DeNoise(); // Clean up noisy data
Input.Deskew(); // Align the image correctly
var Result = Ocr.Read(Input);
Console.WriteLine(Result.Text);
}
Imports IronOcr
Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Latin
Using Input = New OcrInput("LowQuality.jpeg")
' Improve image readability
Input.DeNoise() ' Clean up noisy data
Input.Deskew() ' Align the image correctly
Dim Result = Ocr.Read(Input)
Console.WriteLine(Result.Text)
End Using
Summary of available input enhancement filters
- OcrInput.Rotate(double degrees): Rotates images clockwise by the specified degrees. Use negative values for anti-clockwise.
- OcrInput.Binarize(): Converts the image to black and white, useful for high-contrast OCR cases.
- OcrInput.ToGrayScale(): Converts image pixels into grayscale to improve OCR accuracy.
- OcrInput.Contrast(): Increases contrast to improve text readability.
- OcrInput.DeNoise(): Removes digital noise to clean up the image.
- OcrInput.Invert(): Inverts all colors, turning black to white and vice versa.
- OcrInput.Dilate(): Adds expansion to object boundaries, useful for dilating thin text.
- OcrInput.Erode(): Reduces the boundary of objects, the opposite of dilate.
- OcrInput.Deskew(): Corrects image tilt, crucial for OCR accuracy when skew exceeds 5 degrees.
- OcrInput.DeepCleanBackgroundNoise(): Advanced noise removal for heavily distorted documents.
- OcrInput.EnhanceResolution: Automatically upscales images with low DPI for better recognition.
Additional advanced settings in the OCR process aim to leverage unique IronOCR capabilities for optimal text scanning results.
CXXVI Supported Languages
IronOCR supports 126 languages through downloadable language packs, available for download or from NuGet Package Manager.
Language options include major languages like German, French, English, Chinese, and Japanese, with special packages for specific text, such as passport MRZ, MICR, and more.
Example using another language
Using PDF with other languages.
// using IronOcr;
// PM> Install IronOcr.Languages.Arabic
using IronOcr;
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Arabic;
using (var input = new OcrInput())
{
input.AddImage("img/arabic.gif");
// If necessary, add image filters for quality improvement
var Result = Ocr.Read(input);
// Save the result to a text file for Arabic
Result.SaveAsTextFile("arabic.txt");
}
// using IronOcr;
// PM> Install IronOcr.Languages.Arabic
using IronOcr;
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Arabic;
using (var input = new OcrInput())
{
input.AddImage("img/arabic.gif");
// If necessary, add image filters for quality improvement
var Result = Ocr.Read(input);
// Save the result to a text file for Arabic
Result.SaveAsTextFile("arabic.txt");
}
' using IronOcr;
' PM> Install IronOcr.Languages.Arabic
Imports IronOcr
Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Arabic
Using input = New OcrInput()
input.AddImage("img/arabic.gif")
' If necessary, add image filters for quality improvement
Dim Result = Ocr.Read(input)
' Save the result to a text file for Arabic
Result.SaveAsTextFile("arabic.txt")
End Using
Example using multiple languages
IronOCR can handle multiple languages simultaneously for comprehensive OCR of multilingual documents.
// using IronOcr;
// PM> Install IronOcr.Languages.ChineseSimplified
using IronOcr;
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.ChineseSimplified;
Ocr.AddSecondaryLanguage(OcrLanguage.Latin);
// Add any number of languages as needed
using (var input = new OcrInput())
{
input.Add("multi-language.pdf");
var Result = Ocr.Read(input);
// Save the multilingual result to a text file
Result.SaveAsTextFile("results.txt");
}
// using IronOcr;
// PM> Install IronOcr.Languages.ChineseSimplified
using IronOcr;
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.ChineseSimplified;
Ocr.AddSecondaryLanguage(OcrLanguage.Latin);
// Add any number of languages as needed
using (var input = new OcrInput())
{
input.Add("multi-language.pdf");
var Result = Ocr.Read(input);
// Save the multilingual result to a text file
Result.SaveAsTextFile("results.txt");
}
' using IronOcr;
' PM> Install IronOcr.Languages.ChineseSimplified
Imports IronOcr
Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.ChineseSimplified
Ocr.AddSecondaryLanguage(OcrLanguage.Latin)
' Add any number of languages as needed
Using input = New OcrInput()
input.Add("multi-language.pdf")
Dim Result = Ocr.Read(input)
' Save the multilingual result to a text file
Result.SaveAsTextFile("results.txt")
End Using
Detailed PDF Sed ea Results
Successful OCR operations return detailed results allowing further exploration and analysis beyond just recognized text.
using IronOcr;
using System.Drawing; // Add assembly reference
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Latin;
Ocr.Configuration.EngineMode = TesseractEngineMode.TesseractAndLstm;
Ocr.Configuration.ReadBarCodes = true; // Enable barcode reading
using (var Input = new OcrInput(@"images\sample.tiff"))
{
OcrResult Result = Ocr.Read(Input);
// Explore the API for detailed information
var Pages = Result.Pages;
var Words = Pages[0].Words;
var Barcodes = Result.Barcodes;
// Detailed inspection of pages, paragraphs, lines, words, and coordinates
}
using IronOcr;
using System.Drawing; // Add assembly reference
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Latin;
Ocr.Configuration.EngineMode = TesseractEngineMode.TesseractAndLstm;
Ocr.Configuration.ReadBarCodes = true; // Enable barcode reading
using (var Input = new OcrInput(@"images\sample.tiff"))
{
OcrResult Result = Ocr.Read(Input);
// Explore the API for detailed information
var Pages = Result.Pages;
var Words = Pages[0].Words;
var Barcodes = Result.Barcodes;
// Detailed inspection of pages, paragraphs, lines, words, and coordinates
}
Imports IronOcr
Imports System.Drawing ' Add assembly reference
Private Ocr = New IronTesseract()
Ocr.Language = OcrLanguage.Latin
Ocr.Configuration.EngineMode = TesseractEngineMode.TesseractAndLstm
Ocr.Configuration.ReadBarCodes = True ' Enable barcode reading
Using Input = New OcrInput("images\sample.tiff")
Dim Result As OcrResult = Ocr.Read(Input)
' Explore the API for detailed information
Dim Pages = Result.Pages
Dim Words = Pages(0).Words
Dim Barcodes = Result.Barcodes
' Detailed inspection of pages, paragraphs, lines, words, and coordinates
End Using
Performance
IronOCR is designed to work effectively out-of-the-box without the need for extensive input image modifications or performance tuning.
With improved speed: IronOcr.2020 is approximately 10 times faster than previous builds with significantly fewer errors.
Learn More
To gain a deeper understanding of OCR reproduction using C#, VB, F#, or other .NET languages, please refer to our community tutorials, which provide real-life examples and show you how to get the best out of this library.
A full API reference for .NET developers is also available.