C# + VB.Net: PDF Ocr PDF Ocr
var Ocr = new IronOcr.AutoOcr();
var Results = Ocr.ReadPdf(@"C:\Users\Me\Desktop\Invoice.pdf");

var Barcodes = Results.Barcodes;
var Text = Results.Text;
Dim Ocr As var = New IronOcr.AutoOcr
Dim Results As var = Ocr.ReadPdf("C:\Users\Me\Desktop\Invoice.pdf")
Dim Barcodes As var = Results.Barcodes
Dim Text As var = Results.Text

Iron OCR can read many image formats, and also PDF documents using either AutoOCR or the AdvancedOCR Classe.

AutoOCR offers the option for PDF characteristics to be automatically detected and a best guess set of OCR settings applied to each document.

Developers may specify to read and entire PDF, a selection of pages or asingle crop area.

C# + VB.Net: Advanced Ocr Advanced Ocr
using IronOcr;
var Ocr = new AdvancedOcr()
    CleanBackgroundNoise = true,
    EnhanceContrast = true,
    EnhanceResolution = true,
    Language =  IronOcr.Languages.English.OcrLanguagePack,
    Strategy = IronOcr.AdvancedOcr.OcrStrategy.Advanced,
    ColorSpace = AdvancedOcr.OcrColorSpace.Color,
    DetectWhiteTextOnDarkBackgrounds = true,
    InputImageType = AdvancedOcr.InputTypes.AutoDetect,
    RotateAndStraighten = true,
    ReadBarCodes = true,
    ColorDepth = 4

var testImage = @"C:\path\to\scan.tiff";

var Results = Ocr.Read(testImage);

var Barcodes = Results.Barcodes.Select(b => b.Value);

Console.WriteLine("Barcodes:" + String.Join(",", Barcodes));
Imports IronOcr
Dim Ocr As var = New AdvancedOcr
Dim testImage As var = "C:\path\to\scan.tiff"
Dim Results As var = Ocr.Read(testImage)
Dim Barcodes As var = Results.Barcodes.Select(() => {  }, b.Value)
Console.WriteLine(("Barcodes:" + String.Join(",", Barcodes)))

The AdvanceOCR Class provides granular control to C# and .Net developers to add OCR (image and PDF to text) functionality to their application, and also to fine tune performance to their own specific use case.

By setting variables a perfect balance between speed and accuracy can be found though working with real world examples. Settings include: CleanBackgroundNoise, EnhanceContrast, EnhanceResolution, Language, Strategy, RotateAndStraighten, ColorSpace, DetectWhiteTextOnDarkBackgrounds, InputImageType.

There is also the option to automatically read barcode and QR codes with scanned documents.

C# + VB.Net: AutoOcr AutoOcr
using System;
using IronOcr;
var Ocr = new AutoOcr();
var Result = Ocr.Read(@"C:\path\to\image.png");
Imports System
Imports IronOcr
Dim Ocr As var = New AutoOcr
Dim Result As var = Ocr.Read("C:\path\to\image.png")

IronOCR is unique in its ability to automatically detect and read text from imperfectly scanned images and PDF documents. The AutoOCR Class provides the simplest (though not always fastest) way to extract text from images and documents - because it will automatically correct and sharpen low resolution scans, remove background noise, skew, distortion and perspective as well as enhancing resolution & contrast.

Also see the AdvancedOCR class for more granular developer control.

C# + VB.Net: PDF Advanced PDF Advanced
using IronOcr;

var Ocr = new AdvancedOcr()
    CleanBackgroundNoise = false,
    ColorDepth = 4,
    ColorSpace = AdvancedOcr.OcrColorSpace.Color,
    EnhanceContrast = false,
    DetectWhiteTextOnDarkBackgrounds = false,
    RotateAndStraighten = false,
    Language = IronOcr.Languages.English.OcrLanguagePack,
    EnhanceResolution = false,
    InputImageType = AdvancedOcr.InputTypes.Document,
    ReadBarCodes = true,
    Strategy = AdvancedOcr.OcrStrategy.Fast

var PagesToRead = new []{1,2,3};
var Results = Ocr.ReadPdf(@"C:\Users\Me\Desktop\Invoice.pdf", PagesToRead);
var Pages = Results.Pages;
var Barcodes = Results.Barcodes;
var FullPdfText = Results.Text;
Imports IronOcr
Dim Ocr As var = New AdvancedOcr
Dim PagesToRead As var
UnknownDim Results As var = Ocr.ReadPdf("C:\Users\Me\Desktop\Invoice.pdf", PagesToRead)
Dim Pages As var = Results.Pages
Dim Barcodes As var = Results.Barcodes
Dim FullPdfText As var = Results.Text

Iron OCR can read many image formats, and also PDF documents using wither AutoOCR and AdvancedOCR Classes,

Using the AdvancedOCR Class to read a PDF gives granular control on PDF-to-Text conversion and allows the developer to strike aballence between accuracy and speed.

Developers may specify to read and entire PDF, a selection of pages or asingle crop area.

C# + VB.Net: Intl. Languages Intl. Languages
using IronOcr;
using System;

// 19 Languages are currently supported by IronOCR
// To use them, please install the language packs (below) as required
var Ocr = new AdvancedOcr()
    Language = IronOcr.Languages.Arabic.OcrLanguagePack,
    ColorSpace = AdvancedOcr.OcrColorSpace.GrayScale,
    EnhanceResolution = true,
    EnhanceContrast = true,
    CleanBackgroundNoise = true,
    ColorDepth = 4,
    RotateAndStraighten = false,
    DetectWhiteTextOnDarkBackgrounds = false,
    ReadBarCodes = false,
    Strategy = AdvancedOcr.OcrStrategy.Fast,
    InputImageType = AdvancedOcr.InputTypes.Document

var results = Ocr.Read(@"path\to\arabic\document.png");

Imports IronOcr
Imports System
Dim Ocr As var = New AdvancedOcr
Dim results As var = Ocr.Read("path\to\arabic\document.png")
C# + VB.Net: Results Objects Results Objects
using IronOcr;
using System;
using System.Collections.Generic;
using System.Drawing; //Add Assembly Reference

// We can delve deep into OCR results as an object model of
// Pages, Barcodes, Paragraphs, Lines, Words and Characters
var Ocr = new AdvancedOcr()
    Language = IronOcr.Languages.English.OcrLanguagePack,
    ColorSpace = AdvancedOcr.OcrColorSpace.GrayScale,
    EnhanceResolution = true,
    EnhanceContrast = true,
    CleanBackgroundNoise = true,
    ColorDepth = 4,
    RotateAndStraighten = false,
    DetectWhiteTextOnDarkBackgrounds = false,
    ReadBarCodes = true,
    Strategy = AdvancedOcr.OcrStrategy.Fast,
    InputImageType = AdvancedOcr.InputTypes.Document

var results = Ocr.Read(@"path\to\document.png");

foreach (var page in results.Pages)
    // page object
    int page_number = page.PageNumber;
    String page_text = page.Text;
    int page_wordcount = page.WordCount;
    List<OcrResult.OcrBarcode> barcodes = page.Barcodes;

    System.Drawing.Image page_image = page.Image;

    int page_width_px = page.Width;
    int page_height_px = page.Height;

    foreach (var paragraph in page.Paragraphs)
        // pages -> paragraphs

        int paragraph_number = paragraph.ParagraphNumber;
        String paragraph_text = paragraph.Text;
        System.Drawing.Image paragraph_image = paragraph.Image;
        int paragraph_x_location = paragraph.X;
        int paragraph_y_location = paragraph.Y;
        int paragraph_width = paragraph.Width;
        int paragraph_height = paragraph.Height;
        double paragraph_ocr_accuracy = paragraph.Confidence;
        string paragraph_font_name = paragraph.FontName;
        double paragraph_font_size = paragraph.FontSize;
        OcrResult.TextFlow paragrapth_text_direction = paragraph.TextDirection;
        double paragrapth_rotation_degrees = paragraph.TextOrientation;

        foreach (var line in paragraph.Lines)
            // pages -> paragraphs -> lines
            int line_number = line.LineNumber;
            String line_text = line.Text;
            System.Drawing.Image line_image = line.Image;
            int line_x_location = line.X;
            int line_y_location = line.Y;
            int line_width = line.Width;
            int line_height = line.Height;
            double line_ocr_accuracy = line.Confidence;
            double line_skew = line.BaselineAngle;
            double line_offset = line.BaselineOffset;

            foreach (var word in line.Words)
                // pages -> paragraphs -> lines -> words
                int word_number = word.WordNumber;
                String word_text = word.Text;
                System.Drawing.Image word_image = word.Image;
                int word_x_location = word.X;
                int word_y_location = word.Y;
                int word_width = word.Width;
                int word_height = word.Height;
                double word_ocr_accuracy = word.Confidence;
                String word_font_name = word.FontName;
                double word_font_size = word.FontSize;
                bool word_is_bold = word.FontIsBold;
                bool word_is_fixed_width_font = word.FontIsFixedWidth;
                bool word_is_italic = word.FontIsItalic;
                bool word_is_serif_font = word.FontIsSerif;
                bool word_is_underlined = word.FontIsUnderlined;

                foreach (var character in word.Characters)
                    // pages -> paragraphs -> lines -> words -> characters
                    int character_number = character.CharacterNumber;
                    String character_text = character.Text;
                    System.Drawing.Image character_image = character.Image;
                    int character_x_location = character.X;
                    int character_y_location = character.Y;
                    int character_width = character.Width;
                    int character_height = character.Height;
                    double character_ocr_accuracy = character.Confidence;
Imports IronOcr
Imports System
Imports System.Collections.Generic
Imports System.Drawing
Dim Ocr As var = New AdvancedOcr
Dim results As var = Ocr.Read("path\to\document.png")
For Each page In results.Pages
    ' page object
    Dim page_number As Integer = page.PageNumber
    Dim page_text As String = page.Text
    Dim page_wordcount As Integer = page.WordCount
    Dim barcodes As List(Of OcrResult.OcrBarcode) = page.Barcodes
    Dim page_image As System.Drawing.Image = page.Image
    Dim page_width_px As Integer = page.Width
    Dim page_height_px As Integer = page.Height
    For Each paragraph In page.Paragraphs
        ' pages -> paragraphs
        Dim paragraph_number As Integer = paragraph.ParagraphNumber
        Dim paragraph_text As String = paragraph.Text
        Dim paragraph_image As System.Drawing.Image = paragraph.Image
        Dim paragraph_x_location As Integer = paragraph.X
        Dim paragraph_y_location As Integer = paragraph.Y
        Dim paragraph_width As Integer = paragraph.Width
        Dim paragraph_height As Integer = paragraph.Height
        Dim paragraph_ocr_accuracy As Double = paragraph.Confidence
        Dim paragraph_font_name As String = paragraph.FontName
        Dim paragraph_font_size As Double = paragraph.FontSize
        Dim paragrapth_text_direction As OcrResult.TextFlow = paragraph.TextDirection
        Dim paragrapth_rotation_degrees As Double = paragraph.TextOrientation
        For Each line In paragraph.Lines
            ' pages -> paragraphs -> lines
            Dim line_number As Integer = line.LineNumber
            Dim line_text As String = line.Text
            Dim line_image As System.Drawing.Image = line.Image
            Dim line_x_location As Integer = line.X
            Dim line_y_location As Integer = line.Y
            Dim line_width As Integer = line.Width
            Dim line_height As Integer = line.Height
            Dim line_ocr_accuracy As Double = line.Confidence
            Dim line_skew As Double = line.BaselineAngle
            Dim line_offset As Double = line.BaselineOffset
            For Each word In line.Words
                ' pages -> paragraphs -> lines -> words
                Dim word_number As Integer = word.WordNumber
                Dim word_text As String = word.Text
                Dim word_image As System.Drawing.Image = word.Image
                Dim word_x_location As Integer = word.X
                Dim word_y_location As Integer = word.Y
                Dim word_width As Integer = word.Width
                Dim word_height As Integer = word.Height
                Dim word_ocr_accuracy As Double = word.Confidence
                Dim word_font_name As String = word.FontName
                Dim word_font_size As Double = word.FontSize
                Dim word_is_bold As Boolean = word.FontIsBold
                Dim word_is_fixed_width_font As Boolean = word.FontIsFixedWidth
                Dim word_is_italic As Boolean = word.FontIsItalic
                Dim word_is_serif_font As Boolean = word.FontIsSerif
                Dim word_is_underlined As Boolean = word.FontIsUnderlined
                For Each character In word.Characters
                    ' pages -> paragraphs -> lines -> words -> characters
                    Dim character_number As Integer = character.CharacterNumber
                    Dim character_text As String = character.Text
                    Dim character_image As System.Drawing.Image = character.Image
                    Dim character_x_location As Integer = character.X
                    Dim character_y_location As Integer = character.Y
                    Dim character_width As Integer = character.Width
                    Dim character_height As Integer = character.Height
                    Dim character_ocr_accuracy As Double = character.Confidence

IronOCR returns an advanced result object for each page it scans which returns location, data, text, statistical confidence, font-names, font-sizes decoration and weights, rotation and position for each:

  • Page
  • Paragraph
  • Line of Text
  • Word
  • Individual Character
  • and Barcode

