Microsoft OCR Tools (Alternatives in C#)

OCR (Optical Character Recognition) is a crucial technology for businesses of all sizes. It enables efficient scanning, storage, and analysis of data that would otherwise be time-consuming and complex to handle.

Microsoft OCR tools offer robust options to simplify your digital transformation process. These tools allow for faster and more efficient document processing, freeing up time for you to focus on the important task of growing your business. In this article, we'll explore how to utilize the powerful Microsoft OCR tools to streamline your operations.

OneNote: Microsoft Tool

If you require extracting text from an image, Microsoft OneNote is a helpful tool. OneNote is a versatile note-taking application that provides a platform for capturing, storing, and organizing information in various forms such as text, images, audio, and video. It is also a valuable tool for copying text from images or file printouts, saving you time and effort by eliminating the need to manually type the text.

Extract Text using OneNote

To extract text from an image using OneNote, follow these steps:

  1. Launch the OneNote application.
  2. Insert the image file using the "Insert" option or simply drag and drop the image file into the OneNote window.

    OneNote Insert ribbon

    OneNote Insert Ribbon

  3. Right-click on the image and select "Copy Text from Picture" from the menu.

    Copy Text from Picture in the context menu

    Copy Text from Picture in the context menu

  4. Finally, paste the copied text in any desired location to access the extracted text from the scanned image.

    Text sourced from text copied from an image

    Text sourced from text copied from an image

That's how you can use OneNote to scan any images.

Microsoft Vision Studio

Microsoft Cognitive Services provides an "Extract Text from Images" feature, utilizing AI to scan images and accurately detect text. This service is user-friendly and requires only the upload of an image or PDF file. The information is then transcribed with high accuracy, ensuring that the extracted text accurately represents the content of the image or PDF file.

In addition, the extracted text can be in various languages, making the service accessible to users from around the world. With Microsoft Cognitive Services' "Extract Text from Images," extracting valuable data from images is made simple and enables efficient analysis and effective task completion.

Extract Text using Microsoft Vision Studio

To use the "Extract Text from Images" feature, you can visit the Microsoft Azure's Vision Studio website. However, this service requires a subscription to Azure. Once you have purchased a subscription, you will have access to the extracted text from scanned documents. The following is a sample output image for reference.

Image scanned for its text

Image scanned for its text

A9T9 Microsoft Free OCR Software

A9T9 Free OCR Software is a versatile tool that enables Windows users to effortlessly convert paper documents into digital text. Its straightforward drag-and-drop feature allows for the instant recognition of text in multiple languages, including English, German, Chinese, Korean, and Indic. This software can extract data from scanned images or PDF documents and convert them into an editable, searchable format.

This software supports various output formats such as Rich Text, TXT, or CSV, and image formats like BMP, TIF, or PDF. It also has an auto-document deskewing feature. This software is quick and accurate in recognizing text in images of various languages, even those with transparent backgrounds. A9T9's high accuracy rate, affordability, and ease of installation make it the top choice for Windows users who are searching for a free OCR software solution.

Copy Text using A9T9

You can download the A9T9 software from the Microsoft Store. After installation, open the A9T9 software and upload the images or PDF files.

Image scanned for its text

Copy Text using A9T9

Once the image or document is loaded, click the "Start OCR" button. This will extract the text from the scanned document or image and display it in the text area on the right side.

The text is shown on the right hand side

The text is shown on the right hand side

You can select the OCR language and can copy the text or save it as a Word document.

Office Lens

Office Lens is a sophisticated tool created for capturing and organizing notes, whiteboards, menus, signs, and other types of written or visual information. This app provides a superior alternative to traditional note-taking by eliminating the need for handwritten notes and the possibility of losing important information.

Office Lens allows users to easily capture sketches, handwritten notes, drawings, and equations, and correct images for shadows and skewed angles to improve legibility. It also features OCR (optical character recognition), enabling users to digitize and edit text within images.

Unfortunately, Microsoft has discontinued the Windows version of Office Lens. It is now only available on mobile devices. Additionally, Microsoft Office Document Imaging was removed from Microsoft Word 2010.

IronOCR: C# OCR Library

IronOCR is a powerful OCR library in C# for .NET developers. It enables full OCR capabilities on scanned documents and images, making it easy for developers to automate document-based workflows. With its simple API and minimal configuration, IronOCR is straightforward to integrate into existing systems.

The library offers a simple API, making it easy to integrate into existing systems with minimal configuration. It supports a wide range of input file formats including JPEG, TIFF, GIF, BMP, PDF, multi-page TIFFs, and multiple document scans, and can read text from images with different orientations.

The advanced features of IronOCR include noise removal, which helps reduce image distortion and improve the accuracy of text extraction results. With support for over 125 languages, including English, French, German, Spanish, and Japanese, the library is suitable for almost any application that requires high-quality OCR results without manual intervention.

Extract Text using IronOCR

With the ability to extract text from PDF files with ease, it is possible to specify specific page numbers or extract text from all pages of the document. The process of text extraction can be streamlined and made more efficient with the proper tools.

using IronOcr;
using System;

var ocrTesseract = new IronTesseract();

using (var ocrInput = new OcrInput())
{
    // OCR entire document
    ocrInput.AddPdf("example.pdf");

    // Alternatively OCR selected page numbers
    ocrInput.AddPdfPages("example.pdf", new[] { 1, 2, 3 }, "password");

    var ocrResult = ocrTesseract.Read(ocrInput);
    Console.WriteLine(ocrResult.Text);
}
using IronOcr;
using System;

var ocrTesseract = new IronTesseract();

using (var ocrInput = new OcrInput())
{
    // OCR entire document
    ocrInput.AddPdf("example.pdf");

    // Alternatively OCR selected page numbers
    ocrInput.AddPdfPages("example.pdf", new[] { 1, 2, 3 }, "password");

    var ocrResult = ocrTesseract.Read(ocrInput);
    Console.WriteLine(ocrResult.Text);
}
Imports IronOcr
Imports System

Private ocrTesseract = New IronTesseract()

Using ocrInput As New OcrInput()
	' OCR entire document
	ocrInput.AddPdf("example.pdf")

	' Alternatively OCR selected page numbers
	ocrInput.AddPdfPages("example.pdf", { 1, 2, 3 }, "password")

	Dim ocrResult = ocrTesseract.Read(ocrInput)
	Console.WriteLine(ocrResult.Text)
End Using
VB   C#

Here is the output:

The output inside the Visual Studio Debug Console

The output inside the Visual Studio Debug Console

You can also easily read barcodes in addition to text extraction from PDF files. The library provides a simple code implementation to read barcodes, making it a versatile tool for various document-based workflows. See the following code:

using IronOcr;
using System;

var ocrTesseract = new IronTesseract();
ocrTesseract.Configuration.ReadBarCodes = true;
using (var ocrInput = new OcrInput(@"images\imageWithBarcode.png"))
{
    var ocrResult = ocrTesseract.Read(ocrInput);
    foreach (var barcode in ocrResult.Barcodes)
    {
        Console.WriteLine(barcode.Value);
    }
}
using IronOcr;
using System;

var ocrTesseract = new IronTesseract();
ocrTesseract.Configuration.ReadBarCodes = true;
using (var ocrInput = new OcrInput(@"images\imageWithBarcode.png"))
{
    var ocrResult = ocrTesseract.Read(ocrInput);
    foreach (var barcode in ocrResult.Barcodes)
    {
        Console.WriteLine(barcode.Value);
    }
}
Imports IronOcr
Imports System

Private ocrTesseract = New IronTesseract()
ocrTesseract.Configuration.ReadBarCodes = True
Using ocrInput As New OcrInput("images\imageWithBarcode.png")
	Dim ocrResult = ocrTesseract.Read(ocrInput)
	For Each barcode In ocrResult.Barcodes
		Console.WriteLine(barcode.Value)
	Next barcode
End Using
VB   C#
Input/Output of the code

Input/Output of the code

IronOCR is capable of supporting low DPI and noisy images.

using IronOcr;
using System;

var ocrTesseract = new IronTesseract();
using (var ocrInput = new OcrInput(@"images\image.png"))
{
    ocrInput.Deskew();
    ocrInput.DeNoise();
    var ocrResult = ocrTesseract.Read(ocrInput);
    Console.WriteLine(ocrResult.Text);
}
using IronOcr;
using System;

var ocrTesseract = new IronTesseract();
using (var ocrInput = new OcrInput(@"images\image.png"))
{
    ocrInput.Deskew();
    ocrInput.DeNoise();
    var ocrResult = ocrTesseract.Read(ocrInput);
    Console.WriteLine(ocrResult.Text);
}
Imports IronOcr
Imports System

Private ocrTesseract = New IronTesseract()
Using ocrInput As New OcrInput("images\image.png")
	ocrInput.Deskew()
	ocrInput.DeNoise()
	Dim ocrResult = ocrTesseract.Read(ocrInput)
	Console.WriteLine(ocrResult.Text)
End Using
VB   C#

Conclusion

In conclusion, Optical Character Recognition (OCR) is a vital tool that can greatly benefit businesses of all sizes, enabling them to efficiently scan, store, and process information that would otherwise be complex and time-consuming to manage manually. Microsoft offers various OCR tools, including OneNote, Microsoft Vision Studio, and A9T9 Free OCR Software, which can streamline processes and save time.

IronOCR, a well-featured OCR library, is a standout option among the available OCR tools. It is easily integratable with C# and VB.NET applications, offers excellent accuracy and recognition of multiple languages and image formats, and has a free trial period, with license costs starting from $749. IronOCR is a valuable investment for businesses seeking to improve their digital transformation. Each of these OCR tools offers unique features and can serve different needs, making them valuable assets for businesses looking to improve their digital transformation.