using IronOcr; using IronSoftware.Drawing; // We can delve deep into OCR results as an object model of // Pages, Barcodes, Paragraphs, Lines, Words and Characters // This allows us to explore, export and draw OCR content using other APIs/ var ocrTesseract = new IronTesseract(); ocrTesseract.Configuration.ReadBarCodes = true; using var ocrInput = new OcrInput(); var pages = new int[] { 1, 2 }; ocrInput.LoadImageFrames("example.tiff", pages); OcrResult ocrResult = ocrTesseract.Read(ocrInput); foreach (var page in ocrResult.Pages) { // Page object int PageNumber = page.PageNumber; string PageText = page.Text; int PageWordCount = page.WordCount; // null if we dont set Ocr.Configuration.ReadBarCodes = true; OcrResult.Barcode[] Barcodes = page.Barcodes; AnyBitmap PageImage = page.ToBitmap(ocrInput); int PageWidth = page.Width; int PageHeight = page.Height; double PageRotation = page.Rotation; // angular correction in degrees from OcrInput.Deskew() foreach (var paragraph in page.Paragraphs) { // Pages -> Paragraphs int ParagraphNumber = paragraph.ParagraphNumber; string ParagraphText = paragraph.Text; AnyBitmap ParagraphImage = paragraph.ToBitmap(ocrInput); int ParagraphX_location = paragraph.X; int ParagraphY_location = paragraph.Y; int ParagraphWidth = paragraph.Width; int ParagraphHeight = paragraph.Height; double ParagraphOcrAccuracy = paragraph.Confidence; OcrResult.TextFlow paragrapthText_direction = paragraph.TextDirection; foreach (var line in paragraph.Lines) { // Pages -> Paragraphs -> Lines int LineNumber = line.LineNumber; string LineText = line.Text; AnyBitmap LineImage = line.ToBitmap(ocrInput); int LineX_location = line.X; int LineY_location = line.Y; int LineWidth = line.Width; int LineHeight = line.Height; double LineOcrAccuracy = line.Confidence; double LineSkew = line.BaselineAngle; double LineOffset = line.BaselineOffset; foreach (var word in line.Words) { // Pages -> Paragraphs -> Lines -> Words int WordNumber = word.WordNumber; string WordText = word.Text; AnyBitmap WordImage = word.ToBitmap(ocrInput); int WordX_location = word.X; int WordY_location = word.Y; int WordWidth = word.Width; int WordHeight = word.Height; double WordOcrAccuracy = word.Confidence; foreach (var character in word.Characters) { // Pages -> Paragraphs -> Lines -> Words -> Characters int CharacterNumber = character.CharacterNumber; string CharacterText = character.Text; AnyBitmap CharacterImage = character.ToBitmap(ocrInput); int CharacterX_location = character.X; int CharacterY_location = character.Y; int CharacterWidth = character.Width; int CharacterHeight = character.Height; double CharacterOcrAccuracy = character.Confidence; // Output alternative symbols choices and their probability. // Very useful for spellchecking OcrResult.Choice[] Choices = character.Choices; } } } } }

OCR TOOLS

How to Build an OCR in Python

Updated November 22, 2023

The world is awash with vast amounts of textual information. From printed documents to handwritten notes, there's a wealth of valuable content that could be immensely useful if it were just a bit more accessible.

This is where Optical Character Recognition (OCR) technology comes into play. Imagine a computer being able to "read" text from images just like a human does, only this is computer vision, which represents a section of computer science where we can train computers to recognize and identify different subjects in an image.

In this tutorial, we'll guide you through the process of building your own OCR system using Python, a programming language known for its simplicity and versatility. With the help of libraries like Tesseract, IronOCR, and OpenCV, you'll soon be able to unlock the potential of extracting, manipulating, and working with text from document images.

Prerequisites of OCR engine (optical character recognition)

Before we dive into the nitty-gritty of building our OCR system, there are a few things you'll need:

Python: Make sure you have Python installed on your computer. You can download it from the official Python website.
Install Tesseract OCR: Tesseract OCR is an open-source OCR engine developed by Google. It's a powerful tool that we'll be using in our project. You can download the Tesseract library from GitHub and read about the Tesseract OCR installation process.
Python Libraries: We'll be using two important Python libraries for this project pytesseract and opencv library. You can install them using the following commands in your command line prompt or terminal:
```
 pip install pytesseract opencv-python
```

How to Build an OCR in Python: Figure 1

Steps to Build the OCR System

You can easily build OCR using Python code with the help of Python OCR Libraries and a simple Python script.

Step 1 Import Libraries

First things first, you will need to import the necessary libraries:

import cv2 
import pytesseract

import cv2 
import pytesseract

PYTHON

Step 2 `Read and Process an Image`

Load the image using OpenCV and pre-process it to enhance OCR accuracy:

# Load the image using OpenCV 
image = cv2.imread('sample_image.png') 
# Convert the image to grayscale 
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) 
# Apply thresholding or other preprocessing techniques if needed

# Load the image using OpenCV 
image = cv2.imread('sample_image.png') 
# Convert the image to grayscale 
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) 
# Apply thresholding or other preprocessing techniques if needed

PYTHON

Step 3: Use Tesseract for OCR

Now it's time to use the Tesseract OCR engine to perform OCR on the processed image:

# Use pytesseract to perform OCR on the grayscale image 
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe'
text = pytesseract.image_to_string(gray_image)

# Use pytesseract to perform OCR on the grayscale image 
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe'
text = pytesseract.image_to_string(gray_image)

PYTHON

Step 4: Display Results

If you want to visualize the original image and the extracted text, you can use OpenCV to display them:

# Display the original image 
cv2.imshow('Original Image', image) 
cv2.waitKey(0) 
# Display the extracted text
    print("Extracted Text:", text) 
cv2.waitKey(0) 
cv2.destroyAllWindows()

# Display the original image 
cv2.imshow('Original Image', image) 
cv2.waitKey(0) 
# Display the extracted text
    print("Extracted Text:", text) 
cv2.waitKey(0) 
cv2.destroyAllWindows()

PYTHON

Original Image

How to Build an OCR in Python: Figure 2

Extracted text

How to Build an OCR in Python: Figure 3

As you can see the result is too bad because we need to train it (as we train machine learning) before using it to perform OCR to extract text images that contain tables.

IronOCR

In a world inundated with data, the ability to effortlessly convert printed or handwritten text into machine-readable content is a transformative capability.

Enter IronOCR – a cutting-edge technology that empowers developers to integrate robust Optical Character Recognition (OCR) capabilities into their applications with ease.

Whether you're extracting data from scanned documents, automating data entry, or enhancing accessibility, IronOCR offers a comprehensive solution that transcends the boundaries of traditional text recognition.

In this exploration, we delve into the realm of IronOCR, uncovering its versatile features and highlighting its potential to bridge the gap between the physical and digital worlds.

Installing IronOCR

You can easily install IronOCR using the NuGet Package Manager console, just by running the following command.

Install-Package IronOcr

IronOCR is also available to download at the official NuGet Website.

Extracting Text from Image using IronOCR

In this section, we will see how you can easily extract text from images using IronOCR. Below is the source code that extracts text from the image.

using IronOcr;
using System;
var ocr = new IronTesseract();
using (var input = new OcrInput())
{
    input.AddImage("r3.png");
    OcrResult result = ocr.Read(input);
    string text = result.Text;
    Console.WriteLine(result.Text);
}

using IronOcr;
using System;
var ocr = new IronTesseract();
using (var input = new OcrInput())
{
    input.AddImage("r3.png");
    OcrResult result = ocr.Read(input);
    string text = result.Text;
    Console.WriteLine(result.Text);
}

Imports IronOcr
Imports System
Private ocr = New IronTesseract()
Using input = New OcrInput()
	input.AddImage("r3.png")
	Dim result As OcrResult = ocr.Read(input)
	Dim text As String = result.Text
	Console.WriteLine(result.Text)
End Using

VB C#

Output

How to Build an OCR in Python: Figure 4

Conclusion

In this tutorial, we've explored the process of constructing an Optical Character Recognition (OCR) system in Python, unveiling the capacity to extract text from images with remarkable ease.

By leveraging libraries like Tesseract and OpenCV, we've navigated through essential steps, from loading and pre-processing images to utilizing the Tesseract OCR engine for text extraction.

We've also touched on potential challenges like accuracy limitations, which advanced solutions like IronOCR aim to address.

Whether you choose the DIY route or adopt sophisticated tools, the world of OCR beckons with the promise of transforming images into actionable text, streamlining data entry, and amplifying accessibility. With this newfound knowledge, you're poised to embark on a journey that merges the visual and digital realms seamlessly.

To get started with IronOCR visit the following link. To see the entire tutorial on how to extract text from images visit here.

If you want to try IronOCR for free today, be sure to opt in to the trial offered by IronOCR to explore all its uses and potential in a commercial environment without the watermark. To continue using it once the 15 days are over, simply purchase a license.

< PREVIOUS
How to Scan Writing into Text (Beginner Tutorial)

NEXT >
How to Scan to Editable Text in C#