OCR TOOLS

How to Convert Picture to Text

Published October 24, 2024
Share:

In the current digital era, transforming image-based content into easy-to-read editable, searchable text. This is particularly important in scenarios like archiving paper-based documents, extracting key information from images, or digitizing handwritten or printed materials. Optical Character Recognition (OCR) technology offers a solution to automate this conversion process. One highly reliable and efficient tool to achieve this is IronOCR, a robust OCR library for .NET.

This article will explain how to convert a picture to text using IronOCR, and explore how this conversion can save time, reduce errors, and streamline processes like data extraction, archiving, and document processing.

How to Convert Picture to Text

  1. Download a C# library for OCR work
  2. Create a new `IronTesseract` instance
  3. Load your image using `OcrImageInput`
  4. Read the image's content using `OcrRead`
  5. Export the OCR results to a Text file

Why Convert a Picture to Text?

There are many reasons why you might want to convert an image into text, including:

  • Data extraction: Extracting text from scanned documents and images for archival or data processing purposes.
  • Editing scanned content: Edit or update text in previously scanned documents, saving the time of manually retyping the content.
  • Improving accessibility: Convert printed material into digital text, making it accessible to screen readers or text-to-speech applications.
  • Automation: Automate data entry and processing by reading text from invoices, receipts, or business cards.

How to Start Converting Images to Text

Before we explore how IronOCR's powerful image-to-text capabilities can be leveraged to extract text from images, let's first take a look at the general step-by-step process using an online tool, docsumo. Online OCR tools are a helpful option for those looking to do casual, or even one-off, OCR tasks, thanks to their lack of needing any manual setup. Of course, if you need to perform OCR tasks regularly, then having a powerful OCR tool such as IronOCR could work better for you.

  1. Navigate to the online OCR tool
  2. Upload your image and begin the extraction process
  3. Download the resulting data as a Text document

Step One: Navigate to the online OCR Tool

To begin utilizing OCR technology to extract text from image files, we first navigate to the online image OCR tool we want to use.

How to Convert Picture to Text: Figure 1 - Docsumo OCR Tool

Step Two: Upload your Image and Begin the Extraction Process

Now, by clicking the "Upload File" button, we can upload the image file from which we want to extract text. The tool will immediately begin to process the image.

How to Convert Picture to Text: Figure 2 - Docsumo - File Processing

Step Three: Download the Resulting Data as a Text Document

Now that the image has finished being processed, we can download the extracted text as a new Text document, for further use or manipulation.

How to Convert Picture to Text: Figure 3 - Docsumo - Image Processing Completed

You can also view the file, highlighting the various sections to view the text contained within it. This could be particularly helpful if you just want to view the text within certain sections. Then, you can still go on to download the text as a Text document, XLS, or JSON.

How to Convert Picture to Text: Figure 4

Getting Started with IronOCR

IronOCR is a versatile .NET library that allows you to perform OCR operations on images. With a wide range of features to offer, it can process various file formats (such as PNG, JPEG, TIFF, and PDF), perform image correction, scan specialist documents (Passports, license plates, etc), provide advanced information about the scanned files, convert scanned documents, and highlight text.

Install the IronOCR Library

Before you can start reading images using IronOCR, you will need to install it if you do not already have it installed in your project. You can easily install IronOCR using NuGet in Visual Studio. Open the NuGet Package Manager Console and run the following command:

Install-Package IronOcr
Install-Package IronOcr
'INSTANT VB TODO TASK: The following line uses invalid syntax:
'Install-Package IronOcr
VB   C#

Alternatively, you can install IronOCR via the NuGet Package Manager for Solution page by searching for IronOCR.

How to Convert Picture to Text: Figure 5

To use IronOCR in your code, be sure to have the proper import statement at the top of your code:

using IronOcr;
using IronOcr;
Imports IronOcr
VB   C#

Convert Image to Text: A Basic Example

To start with, let's take a look at a basic image-to-text example using IronOCR. This is a core functionality of any OCR tool, and for this example, we will be using the PNG file we used for the online tool. In this example, we have first instantiated the IronTesseract class and assigned it the variable 'ocr'. We then use the OcrImageInput class to create a new OcrImageInput object from the image file provided. Finally, the Read** method is used to read the text from the image and returns an [OcrResult](/csharp/ocr/object-reference/api/IronOcr.OcrResult.html) object. We can then access the extracted text and display it to the console using [ocrResult.Text**](/csharp/ocr/object-reference/api/IronOcr.OcrResult.html#IronOcr_OcrResult_Text).

using IronOcr;
IronTesseract ocr = new IronTesseract();
using OcrImageInput image = new OcrImageInput("example.png");
OcrResult ocrResult = ocr.Read(image);
Console.WriteLine(ocrResult.Text);
using IronOcr;
IronTesseract ocr = new IronTesseract();
using OcrImageInput image = new OcrImageInput("example.png");
OcrResult ocrResult = ocr.Read(image);
Console.WriteLine(ocrResult.Text);
Imports IronOcr
Private ocr As New IronTesseract()
Private OcrImageInput As using
Private ocrResult As OcrResult = ocr.Read(image)
Console.WriteLine(ocrResult.Text)
VB   C#

Output Image

How to Convert Picture to Text: Figure 6

Handling Different Picture Formats

IronOCR supports multiple image formats like PNG, JPEG, BMP, GIF, and TIFF. The process to read text from different image formats remains the same, you just need to load the file with the correct extension.

using IronOcr;
IronTesseract ocr = new IronTesseract();
using OcrImageInput image = new OcrImageInput("example.bmp");
OcrResult ocrResult = ocr.Read(image);
Console.WriteLine(ocrResult.Text);
using IronOcr;
IronTesseract ocr = new IronTesseract();
using OcrImageInput image = new OcrImageInput("example.bmp");
OcrResult ocrResult = ocr.Read(image);
Console.WriteLine(ocrResult.Text);
Imports IronOcr
Private ocr As New IronTesseract()
Private OcrImageInput As using
Private ocrResult As OcrResult = ocr.Read(image)
Console.WriteLine(ocrResult.Text)
VB   C#

Improving OCR Accuracy

OCR performance can be improved by optimizing the image and configuring options such as language, image resolution, and the level of noise in the image. Here’s how you can fine-tune OCR to increase the accuracy of text extraction on an image whose quality needs improving through the use of the DeNoise() and Sharpen() methods:

using IronOcr
IronTesseract ocr = new IronTesseract();
using OcrImageInput image = new OcrImageInput("example.png");
image.DeNoise();
image.Sharpen();
OcrResult ocrResult = ocr.Read(image);
Console.WriteLine(ocrResult.Text);
using IronOcr
IronTesseract ocr = new IronTesseract();
using OcrImageInput image = new OcrImageInput("example.png");
image.DeNoise();
image.Sharpen();
OcrResult ocrResult = ocr.Read(image);
Console.WriteLine(ocrResult.Text);
Using IronOcr IronTesseract ocr = New IronTesseract()
	Using image As New OcrImageInput("example.png")
		image.DeNoise()
		image.Sharpen()
		Dim ocrResult As OcrResult = ocr.Read(image)
		Console.WriteLine(ocrResult.Text)
	End Using
End Using
VB   C#

Exporting the Extracted Text

Now that we know the basics of the image-to-text process, let's now look at how we can export the resulting text for later use. For this example, we will use the same process as before to load the image and scan it. Then, using File.WriteAllText("output.txt", ocrResult.Text), we create a new text file called 'output.txt' and save the extracted text to the file.

using IronOcr;
IronTesseract ocr = new IronTesseract();
using OcrImageInput image = new OcrImageInput("example.png");
OcrResult ocrResult = ocr.Read(image);
File.WriteAllText("output.txt", ocrResult.Text);
using IronOcr;
IronTesseract ocr = new IronTesseract();
using OcrImageInput image = new OcrImageInput("example.png");
OcrResult ocrResult = ocr.Read(image);
File.WriteAllText("output.txt", ocrResult.Text);
Imports IronOcr
Private ocr As New IronTesseract()
Private OcrImageInput As using
Private ocrResult As OcrResult = ocr.Read(image)
File.WriteAllText("output.txt", ocrResult.Text)
VB   C#

How to Convert Picture to Text: Figure 7

Key Features of IronOCR

  1. High Accuracy: IronOCR uses advanced Tesseract OCR algorithms and includes in-built tools to handle complex images, ensuring high accuracy.
  2. Multi-Language Support: Supports 125+ languages, including multiple writing scripts such as Latin, Cyrillic, Arabic, and Asian characters. It should be noted, however, that only English is installed alongside IronOCR, to use other languages, you will need to install the additional language pack for that language.
  3. PDF OCR: IronOCR can extract text from scanned PDFs, making it a valuable tool for document digitization.
  4. Image Cleanup: It provides pre-processing tools such as de-skewing, noise removal, and inversion to improve image quality for better OCR accuracy.
  5. Easy Integration: The API integrates seamlessly with any .NET project, whether it’s a console app, a web app, or desktop software.

Common Use Cases for Converting Pictures to Text

  • Automating Data Entry: Businesses can use OCR to automatically extract data from forms, receipts, or business cards.
  • Document Archiving: Organizations can digitize physical documents, making them searchable and easier to store.
  • Accessibility: Convert printed materials to text for use in screen readers or other assistive technologies.
  • Research and Analysis: Quickly convert scanned research materials into text for analysis or integration into other software tools.
  • Study: Convert scanned study notes into editable text that you can then save as a Word document for further manipulation in tools such as IronWord, Microsoft Word, or Google docs.

Conclusion

Converting text from an image using IronOCR is a fast, accurate, and efficient way to handle document processing tasks. Whether you are working with scanned documents, digital images, or PDF documents, IronOCR simplifies the process, providing high accuracy, multi-language support, and powerful image processing tools. This tool is ideal for businesses looking to streamline their document management workflows, automate data extraction, or enhance accessibility.

Use the free trial to try out IronOCR's powerful features for yourself today, it only takes a few minutes to get it fully working within your workspace so you can begin processing OCR tasks within no time!

NEXT >
Receipt OCR Library (List For Developers)

Ready to get started? Version: 2024.11 just released

Free NuGet Download Total downloads: 2,643,450 View Licenses >