Test in production without watermarks.
Works wherever you need it to.
Get 30 days of fully functional product.
Have it up and running in minutes.
Full access to our support engineering team during your product trial
using IronOcr;
string imageText = new IronTesseract().Read(@"images\image.png").Text;
Imports IronOcr
Private imageText As String = (New IronTesseract()).Read("images\image.png").Text
IronOCR is unique in its ability to automatically detect and read text from imperfectly scanned images and PDF documents. The IronTesseract
class provides the simplest API.
Try other code samples to gain fine-grained control of your C# OCR operations.
IronOCR provides the most advanced build of Tesseract known anywhere, on any platform, with increased speed, accuracy, and a native DLL and API.
Supports Tesseract 3, Tesseract 4, and Tesseract 5 for .NET Framework, Standard, Core, Xamarin, and Mono.
IronTesseract
to use intuitive APIsRead
method to perform OCR in VB.NETText
propertyImports IronOcr
' Instantiate the IronTesseract class to create a new OCR reader
Dim Ocr As New IronTesseract()
' Load an image or PDF document from which to extract the text
Dim Result = Ocr.Read("example_document.png")
' Output the extracted text from the OCR result onto the console
Console.WriteLine(Result.Text)
Imports IronOcr
' Instantiate the IronTesseract class to create a new OCR reader
Dim Ocr As New IronTesseract()
' Load an image or PDF document from which to extract the text
Dim Result = Ocr.Read("example_document.png")
' Output the extracted text from the OCR result onto the console
Console.WriteLine(Result.Text)
Instantiate IronTesseract: This step initializes an IronTesseract object which allows us to perform OCR operations. It provides the methods needed to read and extract text from images or documents.
Load and Read the Document: The Read
method performs OCR on the provided file. In this example, it processes "example_document.png"
. This operation extracts text content from the image.
Read
method contains a property Text
, which holds the extracted text. This text is printed to the console using Console.WriteLine
.By following these steps, you can easily perform OCR in VB.NET using the IronOCR library, allowing the automation of text extraction for various applications.
using IronOcr;
using System;
var ocrTesseract = new IronTesseract();
ocrTesseract.Language = OcrLanguage.Arabic;
using (var ocrInput = new OcrInput())
{
ocrInput.LoadImage(@"images\arabic.gif");
var ocrResult = ocrTesseract.Read(ocrInput);
Console.WriteLine(ocrResult.Text);
}
// Example with a Custom Trained Font Being used:
var ocrTesseractCustomerLang = new IronTesseract();
ocrTesseractCustomerLang.UseCustomTesseractLanguageFile("custom_tesseract_files/custom.traineddata");
ocrTesseractCustomerLang.AddSecondaryLanguage(OcrLanguage.EnglishBest);
using (var ocrInput = new OcrInput())
{
ocrInput.LoadPdf(@"images\mixed-lang.pdf");
var ocrResult = ocrTesseractCustomerLang.Read(ocrInput);
Console.WriteLine(ocrResult.Text);
}
Imports IronOcr
Imports System
Private ocrTesseract = New IronTesseract()
ocrTesseract.Language = OcrLanguage.Arabic
Using ocrInput As New OcrInput()
ocrInput.LoadImage("images\arabic.gif")
Dim ocrResult = ocrTesseract.Read(ocrInput)
Console.WriteLine(ocrResult.Text)
End Using
' Example with a Custom Trained Font Being used:
Dim ocrTesseractCustomerLang = New IronTesseract()
ocrTesseractCustomerLang.UseCustomTesseractLanguageFile("custom_tesseract_files/custom.traineddata")
ocrTesseractCustomerLang.AddSecondaryLanguage(OcrLanguage.EnglishBest)
Using ocrInput As New OcrInput()
ocrInput.LoadPdf("images\mixed-lang.pdf")
Dim ocrResult = ocrTesseractCustomerLang.Read(ocrInput)
Console.WriteLine(ocrResult.Text)
End Using
IronOCR supports 125 international languages. Other than English, which is installed by default, additional language packs can be added to your .NET project via NuGet or downloaded from our Languages Page.
Most languages are available in Fast, Standard (recommended), and Best quality. The Best quality option may offer more accurate results, but will also be slower in processing time.
Below is a sample code snippet demonstrating how to install and use additional language packs in your .NET project using IronOCR. Install the IronOcr.Languages.{LanguageName}
package using NuGet
. Here is how to do it in a C# application.
// First, ensure you have the following NuGet packages:
// - IronOcr
// - IronOcr.Languages.{LanguageName} // Replace {LanguageName} with your language choice, e.g., French.
using IronOcr;
class Program
{
static void Main(string[] args)
{
// Define a file path for the OCR image
var inputFilePath = "path\\to\\your\\sample-image.png";
// Choose the OCR language. The below example demonstrates using French.
var OcrLanguage = IronOcr.Languages.French.OcrLanguagePack.Best();
// Initialize IronTesseract engine with the specified language
var OcrEngine = new IronTesseract
{
Language = OcrLanguage
};
// Perform OCR to convert image text into a string
using (var Input = new OcrInput(inputFilePath))
{
// Recognize text from the image
var result = OcrEngine.Read(Input);
System.Console.WriteLine(result.Text);
}
// Output: The text content extracted from the image is displayed on the console.
}
}
// First, ensure you have the following NuGet packages:
// - IronOcr
// - IronOcr.Languages.{LanguageName} // Replace {LanguageName} with your language choice, e.g., French.
using IronOcr;
class Program
{
static void Main(string[] args)
{
// Define a file path for the OCR image
var inputFilePath = "path\\to\\your\\sample-image.png";
// Choose the OCR language. The below example demonstrates using French.
var OcrLanguage = IronOcr.Languages.French.OcrLanguagePack.Best();
// Initialize IronTesseract engine with the specified language
var OcrEngine = new IronTesseract
{
Language = OcrLanguage
};
// Perform OCR to convert image text into a string
using (var Input = new OcrInput(inputFilePath))
{
// Recognize text from the image
var result = OcrEngine.Read(Input);
System.Console.WriteLine(result.Text);
}
// Output: The text content extracted from the image is displayed on the console.
}
}
' First, ensure you have the following NuGet packages:
' - IronOcr
' - IronOcr.Languages.{LanguageName} // Replace {LanguageName} with your language choice, e.g., French.
Imports IronOcr
Friend Class Program
Shared Sub Main(ByVal args() As String)
' Define a file path for the OCR image
Dim inputFilePath = "path\to\your\sample-image.png"
' Choose the OCR language. The below example demonstrates using French.
Dim OcrLanguage = IronOcr.Languages.French.OcrLanguagePack.Best()
' Initialize IronTesseract engine with the specified language
Dim OcrEngine = New IronTesseract With {.Language = OcrLanguage}
' Perform OCR to convert image text into a string
Using Input = New OcrInput(inputFilePath)
' Recognize text from the image
Dim result = OcrEngine.Read(Input)
System.Console.WriteLine(result.Text)
End Using
' Output: The text content extracted from the image is displayed on the console.
End Sub
End Class
IronOcr
and you need to add the language-specific package like IronOcr.Languages.French
for French.OcrLanguagePack
for your desired language (best, standard, or fast quality).IronTesseract
and set its language to the selected language pack.OcrInput
, process the image file and extract text using Read
, which returns the OCR result as text.This setup allows you to process images and recognize text in the desired language, including non-default languages, with varying levels of quality and speed.
using IronOcr;
using IronSoftware.Drawing;
// We can delve deep into OCR results as an object model of
// Pages, Barcodes, Paragraphs, Lines, Words and Characters
// This allows us to explore, export and draw OCR content using other APIs/
var ocrTesseract = new IronTesseract();
ocrTesseract.Configuration.ReadBarCodes = true;
using var ocrInput = new OcrInput();
var pages = new int[] { 1, 2 };
ocrInput.LoadImageFrames("example.tiff", pages);
OcrResult ocrResult = ocrTesseract.Read(ocrInput);
foreach (var page in ocrResult.Pages)
{
// Page object
int PageNumber = page.PageNumber;
string PageText = page.Text;
int PageWordCount = page.WordCount;
// null if we dont set Ocr.Configuration.ReadBarCodes = true;
OcrResult.Barcode[] Barcodes = page.Barcodes;
AnyBitmap PageImage = page.ToBitmap(ocrInput);
double PageWidth = page.Width;
double PageHeight = page.Height;
double PageRotation = page.Rotation; // angular correction in degrees from OcrInput.Deskew()
foreach (var paragraph in page.Paragraphs)
{
// Pages -> Paragraphs
int ParagraphNumber = paragraph.ParagraphNumber;
string ParagraphText = paragraph.Text;
AnyBitmap ParagraphImage = paragraph.ToBitmap(ocrInput);
int ParagraphX_location = paragraph.X;
int ParagraphY_location = paragraph.Y;
int ParagraphWidth = paragraph.Width;
int ParagraphHeight = paragraph.Height;
double ParagraphOcrAccuracy = paragraph.Confidence;
OcrResult.TextFlow paragrapthText_direction = paragraph.TextDirection;
foreach (var line in paragraph.Lines)
{
// Pages -> Paragraphs -> Lines
int LineNumber = line.LineNumber;
string LineText = line.Text;
AnyBitmap LineImage = line.ToBitmap(ocrInput);
int LineX_location = line.X;
int LineY_location = line.Y;
int LineWidth = line.Width;
int LineHeight = line.Height;
double LineOcrAccuracy = line.Confidence;
double LineSkew = line.BaselineAngle;
double LineOffset = line.BaselineOffset;
foreach (var word in line.Words)
{
// Pages -> Paragraphs -> Lines -> Words
int WordNumber = word.WordNumber;
string WordText = word.Text;
AnyBitmap WordImage = word.ToBitmap(ocrInput);
int WordX_location = word.X;
int WordY_location = word.Y;
int WordWidth = word.Width;
int WordHeight = word.Height;
double WordOcrAccuracy = word.Confidence;
foreach (var character in word.Characters)
{
// Pages -> Paragraphs -> Lines -> Words -> Characters
int CharacterNumber = character.CharacterNumber;
string CharacterText = character.Text;
AnyBitmap CharacterImage = character.ToBitmap(ocrInput);
int CharacterX_location = character.X;
int CharacterY_location = character.Y;
int CharacterWidth = character.Width;
int CharacterHeight = character.Height;
double CharacterOcrAccuracy = character.Confidence;
// Output alternative symbols choices and their probability.
// Very useful for spellchecking
OcrResult.Choice[] Choices = character.Choices;
}
}
}
}
}
Imports IronOcr
Imports IronSoftware.Drawing
' We can delve deep into OCR results as an object model of
' Pages, Barcodes, Paragraphs, Lines, Words and Characters
' This allows us to explore, export and draw OCR content using other APIs/
Private ocrTesseract = New IronTesseract()
ocrTesseract.Configuration.ReadBarCodes = True
Dim ocrInput As New OcrInput()
Dim pages = New Integer() { 1, 2 }
ocrInput.LoadImageFrames("example.tiff", pages)
Dim ocrResult As OcrResult = ocrTesseract.Read(ocrInput)
For Each page In ocrResult.Pages
' Page object
Dim PageNumber As Integer = page.PageNumber
Dim PageText As String = page.Text
Dim PageWordCount As Integer = page.WordCount
' null if we dont set Ocr.Configuration.ReadBarCodes = true;
Dim Barcodes() As OcrResult.Barcode = page.Barcodes
Dim PageImage As AnyBitmap = page.ToBitmap(ocrInput)
Dim PageWidth As Double = page.Width
Dim PageHeight As Double = page.Height
Dim PageRotation As Double = page.Rotation ' angular correction in degrees from OcrInput.Deskew()
For Each paragraph In page.Paragraphs
' Pages -> Paragraphs
Dim ParagraphNumber As Integer = paragraph.ParagraphNumber
Dim ParagraphText As String = paragraph.Text
Dim ParagraphImage As AnyBitmap = paragraph.ToBitmap(ocrInput)
Dim ParagraphX_location As Integer = paragraph.X
Dim ParagraphY_location As Integer = paragraph.Y
Dim ParagraphWidth As Integer = paragraph.Width
Dim ParagraphHeight As Integer = paragraph.Height
Dim ParagraphOcrAccuracy As Double = paragraph.Confidence
Dim paragrapthText_direction As OcrResult.TextFlow = paragraph.TextDirection
For Each line In paragraph.Lines
' Pages -> Paragraphs -> Lines
Dim LineNumber As Integer = line.LineNumber
Dim LineText As String = line.Text
Dim LineImage As AnyBitmap = line.ToBitmap(ocrInput)
Dim LineX_location As Integer = line.X
Dim LineY_location As Integer = line.Y
Dim LineWidth As Integer = line.Width
Dim LineHeight As Integer = line.Height
Dim LineOcrAccuracy As Double = line.Confidence
Dim LineSkew As Double = line.BaselineAngle
Dim LineOffset As Double = line.BaselineOffset
For Each word In line.Words
' Pages -> Paragraphs -> Lines -> Words
Dim WordNumber As Integer = word.WordNumber
Dim WordText As String = word.Text
Dim WordImage As AnyBitmap = word.ToBitmap(ocrInput)
Dim WordX_location As Integer = word.X
Dim WordY_location As Integer = word.Y
Dim WordWidth As Integer = word.Width
Dim WordHeight As Integer = word.Height
Dim WordOcrAccuracy As Double = word.Confidence
For Each character In word.Characters
' Pages -> Paragraphs -> Lines -> Words -> Characters
Dim CharacterNumber As Integer = character.CharacterNumber
Dim CharacterText As String = character.Text
Dim CharacterImage As AnyBitmap = character.ToBitmap(ocrInput)
Dim CharacterX_location As Integer = character.X
Dim CharacterY_location As Integer = character.Y
Dim CharacterWidth As Integer = character.Width
Dim CharacterHeight As Integer = character.Height
Dim CharacterOcrAccuracy As Double = character.Confidence
' Output alternative symbols choices and their probability.
' Very useful for spellchecking
Dim Choices() As OcrResult.Choice = character.Choices
Next character
Next word
Next line
Next paragraph
Next page
IronOCR returns an advanced result object for each page it scans using Tesseract 5. This contains location data, images, text, statistical confidence, alternative symbol choices, font-names, font-sizes decoration, font weights, and position for each:
Here is an example of how you might retrieve and work with these data points using C# with IronOCR:
// Import the IronOCR library
using IronOcr;
class OCRExample
{
static void Main()
{
// Create a new instance of the IronTesseract engine
var OcrEngine = new IronTesseract();
// Specify the file path of the scanned document
var Input = new OcrInput(@"path_to_your_image_file.jpg");
// Perform OCR on the input image
OcrResult result = OcrEngine.Read(Input);
// Check the number of pages detected
Console.WriteLine($"Detected {result.Pages.Count} page(s)");
// Iterate through each page
foreach (var page in result.Pages)
{
// Output the page text
Console.WriteLine($"Page Text: {page.Text}");
// Iterate through each paragraph in the page
foreach (var paragraph in page.Paragraphs)
{
Console.WriteLine($"Paragraph Text: {paragraph.Text}");
// Iterate through each line in the paragraph
foreach (var line in paragraph.Lines)
{
Console.WriteLine($"Line Text: {line.Text}");
// Iterate through each word in the line
foreach (var word in line.Words)
{
Console.WriteLine($"Word Text: {word.Text}");
// Iterate through each character in the word
foreach (var character in word.Characters)
{
Console.WriteLine($"Character Text: {character.Text}");
}
}
}
}
//Detect barcodes within the page and output their values
foreach (var barcode in page.Barcodes)
{
Console.WriteLine($"Barcode Value: {barcode.Value}");
}
}
}
}
// Import the IronOCR library
using IronOcr;
class OCRExample
{
static void Main()
{
// Create a new instance of the IronTesseract engine
var OcrEngine = new IronTesseract();
// Specify the file path of the scanned document
var Input = new OcrInput(@"path_to_your_image_file.jpg");
// Perform OCR on the input image
OcrResult result = OcrEngine.Read(Input);
// Check the number of pages detected
Console.WriteLine($"Detected {result.Pages.Count} page(s)");
// Iterate through each page
foreach (var page in result.Pages)
{
// Output the page text
Console.WriteLine($"Page Text: {page.Text}");
// Iterate through each paragraph in the page
foreach (var paragraph in page.Paragraphs)
{
Console.WriteLine($"Paragraph Text: {paragraph.Text}");
// Iterate through each line in the paragraph
foreach (var line in paragraph.Lines)
{
Console.WriteLine($"Line Text: {line.Text}");
// Iterate through each word in the line
foreach (var word in line.Words)
{
Console.WriteLine($"Word Text: {word.Text}");
// Iterate through each character in the word
foreach (var character in word.Characters)
{
Console.WriteLine($"Character Text: {character.Text}");
}
}
}
}
//Detect barcodes within the page and output their values
foreach (var barcode in page.Barcodes)
{
Console.WriteLine($"Barcode Value: {barcode.Value}");
}
}
}
}
CONVERTER NOT RUNNING
IronTesseract Engine: This is used to initiate the OCR process on the provided image input.
OcrInput: Represents the image file that will be processed. You need to specify the path to your image file.
Read Method: This processes the image and returns an OcrResult
with all extracted data.
Iterating Structure: The example provided utilizes nested loops to dive deep from pages to characters and barcodes, allowing access to every element's text and properties.
This structured approach enables a detailed exploration and utilization of the various data points retrieved by IronOCR.
Whether it's product, integration or licensing queries, the Iron product development team is on hand to support all of your questions. Get in touch and start a dialog with Iron to make the most of our library in your project.
Ask a QuestionIronOCR (Optical Character Recognition) library enables developers quick and efficient results when converting Images to Text. IronOCR works with .NET, VB .NET, and C#. Our top .NET applications for .NET frameworks, which are specifically designed for you — the developer, to support you in achieving optimal performance for your projects.
OCR receives and recognizes text files, barcodes, QR content, and more. However, IronOCR also provides numerous methods that allow you to add OCR reading and text from images into the web, windows desktop, or console .NET projects with support for virtually unlimited image formats and files, such as JPG, PNG, GIF, TIFF, BMP, JPEG or PDF.
Although the recognition results of plain text, characters, lines, and paragraphs from image output may not seem straightforward, you'll find that under the hood of IronOCR results are in fact easier than you may have initially thought. IronOCR scans the image for alignment, employs its noise removal and filters to check quality and resolution. It looks at its properties, optimizes the OCR engine, and uses a trained artificial intelligence network to then recognize text (from images) as well as any human.
OCR is not a simple process even for a computer. However, IronOCR makes the overall process of creating searchable documents quicker and more straightforward, with 100% accuracy and minimal lines of code.
Read the TutorialSoftware is not limited to geographical boundaries — businesses function across borders and rely upon multiple languages to achieve their results. Similarly, an optical character recognition (OCR) tool that only performs document recognition in a single language is a major NO in every respect!
With a multilingual OCR library that provides multiple OCR functionalities, you benefit from creating a searchable PDF document from a scanned PDF or scanned image in multiple languages (from French to Chinese!). Your time and effort are streamlined with a dynamic, word-searchable PDF document that you, your clients, or your organization can use and reuse without limits.
With a strong focus on you, your business, and your OCR needs, whether in-built or on request, the IronOCR library has a wide array of supported languages. Your next .NET project can be free from language compatibility worries!
Whether Arabic, Spanish, French, German, Hebrew, Italian, Japanese, Simplified Chinese, Traditional Chinese (Mandarin), Danish, English, Finnish, Portuguese, Russian, Spanish or Swedish, you simply name the languages and we provide them for you! You can download your preferred language packs or contact our 24/7 support for more languages.
The first step is to use our NuGet package installer for Windows Visual Studio.
Download Language PacksHow does IronOCR differ from its competitors? In addition to letting you easily add OCR functionalities, extract text and scan rotated images, it also has the ability to perform OCR from imperfect scans! In contrast, many of the various ready-to-use products on the market today are often rigid and inaccurate, destined to fail in real-world individual and corporate applications as the majority of them work with machine-printed, high-resolution, and perfectly-adjusted text.
IronOCR extends the capabilities of Google Tesseract with its powerful IronTesseract DLL — a native C# OCR library with improved stability and higher accuracy than the free Tesseract library.
With the best tool in your hands, even if you have a less than perfect scanned image or a stored image in your storage folder — IronOCR's image processing library conversion cleans up noise, rotates, reduces distortion and skewed alignment, and improves resolution and contrast. The advanced Optical Character Recognition (OCR) settings give you — the coders — the tools and the code to generate the best possible searchable results, time after time.
Search for the words you need and never be disappointed with the 99.8-100% accurate results and the unlimited support for PDF Documents, multiFrame TIFF files, JPEG & JPEG2000, GIF, PNG, BMP, WBMP, System.Drawing.Image, System.Drawing.Bitmap, System.IO.Streams of images, binary image data (byte[]), and everything beyond!
An Alternative to TesseractUnlike other .NET applications in .NET framework, you'll find that the advanced Optical Character Recognition, within IronOCR's package manager console and recognized text console, gives your users the ability to read multiple word fonts (from Times New Roman to anything fancy or supposedly difficult to understand), weights, and styles for accurate text reading from a whole image or scanned images. Our ability to select certain areas of an image helps to improve speed and accuracy. Multithreading from a few lines to a few paragraphs speeds up the OCR engine and allows for the reading of multiple documents on multi-core machines.
Our claims to speed and accuracy are not limited to the process of character recognition. Rather, the improvements start right from the point of installation as IronOCR's .NET OCR engine is an easy-to-install, complete, and well-documented .NET software library. There is a single NuGet package manager installation for Visual Studio, and multithreading compatibility with MVC, WebApp, Desktop, Console, and Server Applications.
You can achieve 99.8-100% OCR accuracy without any external web services, ongoing fees, or having to send confidential documents over the internet. Without the cumbersome C++ coding, IronOCR is the clear choice to make when you're in need of full PDF OCR support for multiple characters, words, lines, paragraphs, text, and documents.
We offer the best options for developers seeking to perfect their coding, as IronOCR works out of the box with zero need for performance-tuning or to heavily modify input images. The latest IronOCR version works amazingly fast — up to ten times faster, and makes over 250% fewer errors than previous builds. We upgrade our own builds to support your goals by providing the perfect platform for OCR!
See Full Function ListEven when using mobile devices, our perfect .NET OCR library enables developers to code 'free from worries' as IronOCR supports exporting content as a simple set of straightforward and complex text, machine-encoded text, barcode data, or structured object model data. You can split the content paragraphs, lines, words, characters, and image string results for direct use inside your .NET apps.
From source code to the final result — the resulting data would be useless if you weren't able to export it to your application. IronOCR understands this and allows you to export the OCR result to XHTML in order to be able to work with a sustainable format across a broader range of applications and with integration into complex websites, not to mention quicker loading times!
However, the support does not end there. The ability to export OCR to searchable PDF documents makes it easy for you, your clients and organizations to store and retrieve PDF documents whenever required! This is especially beneficial when you have a 30-page contract that you can search for in your database with a few keywords, and also enables you to present your company as compliance-friendly, given that searchable PDF documents are proven to be beneficial for the visually impaired.
In addition to the above, you can export your results to the OCR format that represents your OCR output, layout information, and style information, and embeds the related information in standard HTML.
Learn MoreFree community development licenses. Commercial licenses from $749.
C# Tesseract OCR
Jim has been a leading figure in development of IronOCR. Jim designs and builds image processing algorithms and reading methods for OCR.
See ComparisonC# OCR ASP.NET
Learn how Gemma's team use IronOCR to read text from images for their archiving software. Gemma shares her own code samples.
Image to Text .NET TutorialIron's team have over 10 years experience in the .NET software component market.