How to Read Photos Using IronOCR

This article was translated from English: Does it need improvement?
Translated
View the article in English

When dealing with large volumes of documents, particularly scanned images like TIFF files, manually extracting text can be time-consuming and prone to human error. This is where Optical Character Recognition (OCR) comes in, offering an automated method to accurately convert text from images into digital data. OCR technology can handle the complexity of images, such as scanned documents or photographs, and turn them into searchable, editable text. This not only speeds up document processing but also ensures more accurate data extraction compared to manual transcription.

Using OCR on formats like TIFF, which may be hard to read due to their size, color depth, or compression, enables businesses and developers to quickly digitize and manage vast amounts of data. With OCR solutions like IronOCR's ReadPhoto function, developers can extract text from images and even perform advanced operations such as searching for keywords or converting scanned data into searchable PDFs. This technology is especially useful for industries that deal with legal documents, archives, or receipts, where efficient data retrieval is critical.

In this tutorial, we'll briefly provide an input and an example on how to use ReadPhoto and how to manipulate the results object. We'll also discuss scenarios where developers might prefer using ReadPhoto instead of the standard Read from IronOCR.

To use this function, you must also install the IronOcr.Extension.AdvancedScan package.

Quickstart: Use ReadPhoto to Extract Text from Complex Images

Get started fast: use IronOCR’s ReadPhoto method on an OcrInput loaded with your image frame to pull all text and regions in one go. It’s optimized for TIFFs, GIFs and similar photo-heavy formats for a smooth OCR experience.

Nuget IconGet started making PDFs with NuGet now:

  1. Install IronOCR with NuGet Package Manager

    PM > Install-Package IronOcr

  2. Copy and run this code snippet.

    var result = new IronTesseract().ReadPhoto(new OcrInput().LoadImageFrame("photo.tiff", 0));
  3. Deploy to test on your live environment

    Start using IronOCR in your project today with a free trial
    arrow pointer

Read Photos Example

Reading high-quality photo formats such as tiff and gif is relatively simple using IronOCR. First, we create a new variable and assign it as an OcrInput then load the image in using LoadImageFrame. Finally, we use the ReadPhoto method and obtain the results.

Por favor nota

  • Since Tiff contains multiple frames within a singular image, the PageNumber parameter is needed. Furthermore the index starts at 0, rather than 1.
  • The method currently only works for English, Chinese, Japanese, Korean, and LatinAlphabet.
  • Using advanced scan on .NET Framework requires the project to run on x64 architecture.

Input

Since most browsers do not natively support the TIFF format, you can download the TIFF input here. To display the TIFF file, I converted it to WEBP.

Input

Code

:path=/static-assets/ocr/content-code-examples/how-to/read-photo-read-photo.cs
using IronOcr;
using IronSoftware.Drawing;
using System;

// Instantiate OCR engine
var ocr = new IronTesseract();

using var inputPhoto = new OcrInput();
inputPhoto.LoadImageFrame("ocr.tiff", 0);

// Read photo
OcrPhotoResult result = ocr.ReadPhoto(inputPhoto);

// Index number refer to region order in the page
int number = result.TextRegions[0].PageNumber;

// Extract the text in the first region
string textinregion = result.TextRegions[0].TextInRegion;

//Extract the co_ordinates of the first text region
Rectangle region = result.TextRegions[0].Region;

var output = $"Text in First Region: {textinregion}\n"
             + $"Text Region:\n"
             + $"Starting X: {region.X}\n"
             + $"Starting Y: {region.Y}\n"
             + $"Region Width: {region.Width}\n"
             + $"Region Height: {region.Height}\n"
             + $"Result Confidence: {result.Confidence}\n\n"
             + $"Full Scnned Photo Text: {result.Text}";

Console.WriteLine(output);
Imports Microsoft.VisualBasic
Imports IronOcr
Imports IronSoftware.Drawing
Imports System

' Instantiate OCR engine
Private ocr = New IronTesseract()

Private inputPhoto = New OcrInput()
inputPhoto.LoadImageFrame("ocr.tiff", 0)

' Read photo
Dim result As OcrPhotoResult = ocr.ReadPhoto(inputPhoto)

' Index number refer to region order in the page
Dim number As Integer = result.TextRegions(0).PageNumber

' Extract the text in the first region
Dim textinregion As String = result.TextRegions(0).TextInRegion

'Extract the co_ordinates of the first text region
Dim region As Rectangle = result.TextRegions(0).Region

Dim output = $"Text in First Region: {textinregion}" & vbLf & $"Text Region:" & vbLf & $"Starting X: {region.X}" & vbLf & $"Starting Y: {region.Y}" & vbLf & $"Region Width: {region.Width}" & vbLf & $"Region Height: {region.Height}" & vbLf & $"Result Confidence: {result.Confidence}" & vbLf & vbLf & $"Full Scnned Photo Text: {result.Text}"

Console.WriteLine(output)
$vbLabelText   $csharpLabel

Output

output

Text: The extracted text from OCR input. Confidence: A "double" property that indicates the statistical accuracy confidence of an average of every character, with one being the highest and 0 being the lowest. TextRegions: A list of the "TextRegions" property indicating where the OCR text and its location is within the input. In the example above, we printed the frame number as well as the rectangle containing the text.


Difference between ReadPhoto and Read

The main difference between the ReadPhoto method compared to the standard Read is the result object and the file format it takes. LoadImageFrame specifically only takes in tiff and gif and does not support formats like jpeg for several reasons.

Comparison between TIFF and JPEG Images

TIFF as a file format is lossless and usually used to condense multiple pages and multiple frames into one single format. It is typically used for high-quality, multi-image storage (for example legal documents, medical images). It is much more complex than standard JPEG formats and as such requires a different method to fully extract text from it.

Furthermore, TIFF images use a different compression method, so IronOCR has to use a specialized method to decipher the text.

Here's a further breakdown between TIFF and JPEG for comparison.

Feature TIFF (Tagged Image File Format) JPG/JPEG (Joint Photographic Experts Group)
Compression Lossless or uncompressed (preserves quality) Lossy compression (reduces quality for smaller file size)
File Size Large (due to high quality and optional lack of compression) Smaller, optimized for web use and fast loading
Image Quality High (ideal for professional use, retains all details) Lower (due to lossy compression, some quality is sacrificed)
Color Depth Supports high color depth (up to 16-bit or 32-bit per channel) 24-bit color (16.7 million colors)
Use Case Professional photography, publishing, scanning, archiving Web images, social media, everyday photos
Transparency Supports transparency and alpha channels Does not support transparency
Editing Good for multiple edits (no quality loss with resaving) Quality degrades with repeated edits and saves
Compatibility Widely supported in professional software Universally supported across all platforms and devices
Animation Does not support animation Does not support animation
Metadata Stores extensive metadata (EXIF, layers, etc.) Stores EXIF metadata but is more limited

Different scenarios

Developers will have to consider each use case in production to further optimize and allow their applications to run effectively. Although ReadPhoto is suited for complex images such as TIFF like above, the result would be processed slowly. On the other hand, JPEG may be lower in quality but the operation would generally be faster. However, image quality such as having noise would result in a low confidence rate with the OCR.

Using the confidence property in the OcrPhotoResults or any class that uses the interface IOcrResult would give you an idea of how accurate the results are, allowing developers to test, re-iterate, and optimize as desired.

Developers should find a fine line between efficiency and accuracy ensuring that the images are up to a certain threshold for consistency.

Preguntas Frecuentes

¿Qué es el método ReadPhoto en C#?

El método ReadPhoto en IronOCR para C# está diseñado para extraer texto de formatos de imagen complejos como TIFF y GIF, convirtiéndolos en datos digitales buscables mediante el Reconocimiento Óptico de Caracteres (OCR).

¿Por qué debería usar ReadPhoto en lugar de la función estándar Read?

ReadPhoto está optimizado para manejar formatos de imagen complejos como TIFF y GIF, que requieren un procesamiento especial debido a sus características de compresión y calidad, haciéndolo más adecuado para estos tipos de imágenes en comparación con la función estándar Read.

¿Cómo puedo asegurar una extracción de texto óptima usando OCR en C#?

Para asegurar una extracción de texto óptima con OCR en C#, considera la calidad y el formato de la imagen. Usar el método ReadPhoto de IronOCR para formatos complejos y de alta calidad como TIFF puede mejorar la precisión y eficiencia.

¿Qué formatos de imagen soporta el método ReadPhoto?

El método ReadPhoto en IronOCR soporta formatos de imagen complejos como TIFF y GIF, que son ideales para tareas de extracción de texto de alta calidad.

¿Cuáles son los beneficios de convertir archivos TIFF usando OCR?

Convertir archivos TIFF usando OCR con el método ReadPhoto de IronOCR permite la transformación de imágenes de alta calidad en datos digitales buscables y editables, lo cual es beneficioso para la gestión y archivo de documentos.

¿Cómo mejora la tecnología OCR el procesamiento de documentos?

La tecnología OCR automatiza la conversión de texto de imágenes a datos digitales, aumentando significativamente la velocidad de procesamiento y la precisión, especialmente al gestionar grandes volúmenes de documentos.

¿Qué factores influyen en la elección de métodos de procesamiento de imágenes en OCR?

Los factores que influyen en la elección incluyen el formato y la calidad de la imagen, la velocidad de procesamiento, y los requisitos específicos de uso. ReadPhoto de IronOCR es ideal para imágenes complejas y de alta calidad, mientras que otros métodos pueden ser más eficientes para formatos más simples.

¿Puede el método ReadPhoto de IronOCR ser utilizado para imágenes a color?

Sí, el método ReadPhoto de IronOCR puede procesar imágenes a color, particularmente en formatos como TIFF y GIF, permitiendo una extracción precisa de texto en documentos a todo color.

¿Qué papel juega la propiedad 'confidence' en los resultados de OCR?

La propiedad 'confidence' en los resultados de OCR proporciona una medida estadística de la precisión de la extracción de texto, ayudando a los desarrolladores a evaluar la fiabilidad de los datos digitalizados.

¿Cómo pueden los desarrolladores usar los resultados de OCR de ReadPhoto en sus aplicaciones?

Los desarrolladores pueden usar los resultados de OCR de ReadPhoto de IronOCR accediendo a la propiedad OcrPhotoResult, que incluye texto extraído, puntuaciones de confianza y regiones de texto, permitiendo un procesamiento de datos adicional e integración en aplicaciones.

Curtis Chau
Escritor Técnico

Curtis Chau tiene una licenciatura en Ciencias de la Computación (Carleton University) y se especializa en el desarrollo front-end con experiencia en Node.js, TypeScript, JavaScript y React. Apasionado por crear interfaces de usuario intuitivas y estéticamente agradables, disfruta trabajando con frameworks modernos y creando manuales bien ...

Leer más
¿Listo para empezar?
Nuget Descargas 5,044,537 | Versión: 2025.11 recién lanzado