如何在 C# 中從影像中提取文本

C# OCR 圖像轉文字教學:無需 Tesseract 即可將圖像轉換為文字

This article was translated from English: Does it need improvement?
Translated
View the article in English

想在 C# 中將圖像轉換為文本,而無需進行複雜的 Tesseract 配置嗎? 這篇全面的 IronOCR C# 教學將向您展示如何僅用幾行程式碼在您的 .NET 應用程式中實現強大的光學字元辨識功能。

快速入門:從圖像中提取一行文字

這個例子說明了IronOCR是多麼容易掌握——只需一行C#程式碼即可將圖像轉換為文字。 它示範如何初始化 OCR 引擎,無需複雜的設定即可立即讀取和檢索文字。

Nuget Icon立即開始使用 NuGet 建立 PDF 檔案:

  1. 使用 NuGet 套件管理器安裝 IronOCR

    PM > Install-Package IronOcr

  2. 複製並運行這段程式碼。

    string text = new IronTesseract().Read("image.png").Text;
  3. 部署到您的生產環境進行測試

    立即開始在您的專案中使用 IronOCR,免費試用!
    arrow pointer

如何在.NET應用程式中讀取圖像中的文字?

要在 .NET 應用程式中實作 C# OCR 影像轉文字功能,您需要一個可靠的 OCR 函式庫。 IronOCR 提供了一個使用IronOcr.IronTesseract類別的託管解決方案,該方案在無需外部依賴項的情況下最大限度地提高了準確性和速度。

首先,將 IronOCR 安裝到您的 Visual Studio 專案中。 您可以直接下載 IronOCR DLL ,也可以使用NuGet 套件管理器

Install-Package IronOcr

為什麼選擇 IronOCR 進行無需 Tesseract 的 C# OCR 識別?

當您需要在 C# 中將圖像轉換為文字時,IronOCR 比傳統的 Tesseract 實作方式具有顯著優勢:

  • 可在純 .NET 環境中立即執行 無需安裝或設定 Tesseract
  • 運行最新引擎: Tesseract 5 (以及 Tesseract 4 和 3)
  • 相容於 .NET Framework 4.5+、.NET Standard 2+ 以及 .NET Core 2、3、5、6、7、8、9 和 10 與原版 Tesseract 相比,提高了準確性和速度。
  • 支援 Xamarin、Mono、Azure 和 Docker 部署
  • 透過 NuGet 套件管理複雜的 Tesseract 字典
  • 自動處理 PDF、多幀 TIFF 和所有主流影像格式
  • 校正低品質和傾斜的掃描影像,以獲得最佳結果

立即開始在您的項目中使用 IronOCR 並免費試用。

第一步:
green arrow pointer

如何使用 IronOCR C# 進行基本 OCR 辨識?

這個 Iron Tesseract C# 範例示範了使用 IronOCR 從圖像中讀取文字的最簡單方法。 IronOcr.IronTesseract類別提取文字並將其作為字串傳回。

// Basic C# OCR image to text conversion using IronOCR
// This example shows how to extract text from images without complex setup

using IronOcr;
using System;

try
{
    // Initialize IronTesseract for OCR operations
    var ocrEngine = new IronTesseract();

    // Path to your image file - supports PNG, JPG, TIFF, BMP, and more
    var imagePath = @"img\Screenshot.png";

    // Create input and perform OCR to convert image to text
    using (var input = new OcrInput(imagePath))
    {
        // Read text from image and get results
        OcrResult result = ocrEngine.Read(input);

        // Display extracted text
        Console.WriteLine(result.Text);
    }
}
catch (OcrException ex)
{
    // Handle OCR-specific errors
    Console.WriteLine($"OCR Error: {ex.Message}");
}
catch (Exception ex)
{
    // Handle general errors
    Console.WriteLine($"Error: {ex.Message}");
}
// Basic C# OCR image to text conversion using IronOCR
// This example shows how to extract text from images without complex setup

using IronOcr;
using System;

try
{
    // Initialize IronTesseract for OCR operations
    var ocrEngine = new IronTesseract();

    // Path to your image file - supports PNG, JPG, TIFF, BMP, and more
    var imagePath = @"img\Screenshot.png";

    // Create input and perform OCR to convert image to text
    using (var input = new OcrInput(imagePath))
    {
        // Read text from image and get results
        OcrResult result = ocrEngine.Read(input);

        // Display extracted text
        Console.WriteLine(result.Text);
    }
}
catch (OcrException ex)
{
    // Handle OCR-specific errors
    Console.WriteLine($"OCR Error: {ex.Message}");
}
catch (Exception ex)
{
    // Handle general errors
    Console.WriteLine($"Error: {ex.Message}");
}
' Basic C# OCR image to text conversion using IronOCR
' This example shows how to extract text from images without complex setup

Imports IronOcr
Imports System

Try
	' Initialize IronTesseract for OCR operations
	Dim ocrEngine = New IronTesseract()

	' Path to your image file - supports PNG, JPG, TIFF, BMP, and more
	Dim imagePath = "img\Screenshot.png"

	' Create input and perform OCR to convert image to text
	Using input = New OcrInput(imagePath)
		' Read text from image and get results
		Dim result As OcrResult = ocrEngine.Read(input)

		' Display extracted text
		Console.WriteLine(result.Text)
	End Using
Catch ex As OcrException
	' Handle OCR-specific errors
	Console.WriteLine($"OCR Error: {ex.Message}")
Catch ex As Exception
	' Handle general errors
	Console.WriteLine($"Error: {ex.Message}")
End Try
$vbLabelText   $csharpLabel

這段程式碼對清晰影像的擷取準確率達到100%,能夠精確擷取影像中的文字:

IronOCR Simple Example

In this simple example we test the accuracy of our C# OCR library to read text from a PNG Image. This is a very basic test, but things will get more complicated as the tutorial continues.

The quick brown fox jumps over the lazy dog

IronTesseract類別在內部處理複雜的 OCR 操作。 它可自動掃描對齊情況,優化分辨率,並利用人工智慧技術,透過 IronOCR 從圖像中讀取文本,達到人類級別的準確度。

儘管幕後進行了複雜的處理——包括圖像分析、引擎優化和智慧文字識別——但 OCR 處理速度與人類閱讀速度相當,同時保持了卓越的準確度。

IronOCR 簡單範例,展示如何使用 C# 將 OCR 影像轉換為文本,準確率高達 100%螢幕截圖展示了 IronOCR 從 PNG 圖像中完美提取文字的能力。

如何在不配置 Tesseract 的情況下實現高階 C# OCR?

對於需要在 C# 中將圖像轉換為文字時獲得最佳性能的生產應用程序,請同時使用OcrInputIronTesseract類別。 這種方法可以對 OCR 過程進行精細控制。

Ocr輸入類別特徵

  • 支援多種影像格式:JPEG、TIFF、GIF、BMP、PNG
  • 匯入完整的 PDF 檔案或特定頁面
  • 自動增強對比度、解析度和影像質量
  • 校正旋轉、掃描雜訊、傾斜和負像

IronTesseract 類別特性

  • 可使用 125 多種預先包裝語言
  • 包括 Tesseract 5、4 和 3 引擎
  • 文檔類型說明(螢幕截圖、程式碼片段或完整文件)
  • 整合條碼讀取功能
  • 多種輸出格式:可搜尋的 PDF、HOCR HTML、DOM 物件和字串

如何開始使用 OcrInput 和 IronTesseract?

以下是針對本 IronOCR C# 教學的建議配置,此配置適用於大多數文件類型:

using IronOcr;

// Initialize IronTesseract for advanced OCR operations
IronTesseract ocr = new IronTesseract();

// Create input container for processing multiple images
using (OcrInput input = new OcrInput())
{
    // Process specific pages from multi-page TIFF files
    int[] pageIndices = new int[] { 1, 2 };

    // Load TIFF frames - perfect for scanned documents
    input.LoadImageFrames(@"img\Potter.tiff", pageIndices);

    // Execute OCR to read text from image using IronOCR
    OcrResult result = ocr.Read(input);

    // Output the extracted text
    Console.WriteLine(result.Text);
}
using IronOcr;

// Initialize IronTesseract for advanced OCR operations
IronTesseract ocr = new IronTesseract();

// Create input container for processing multiple images
using (OcrInput input = new OcrInput())
{
    // Process specific pages from multi-page TIFF files
    int[] pageIndices = new int[] { 1, 2 };

    // Load TIFF frames - perfect for scanned documents
    input.LoadImageFrames(@"img\Potter.tiff", pageIndices);

    // Execute OCR to read text from image using IronOCR
    OcrResult result = ocr.Read(input);

    // Output the extracted text
    Console.WriteLine(result.Text);
}
Imports IronOcr

' Initialize IronTesseract for advanced OCR operations
Private ocr As New IronTesseract()

' Create input container for processing multiple images
Using input As New OcrInput()
	' Process specific pages from multi-page TIFF files
	Dim pageIndices() As Integer = { 1, 2 }

	' Load TIFF frames - perfect for scanned documents
	input.LoadImageFrames("img\Potter.tiff", pageIndices)

	' Execute OCR to read text from image using IronOCR
	Dim result As OcrResult = ocr.Read(input)

	' Output the extracted text
	Console.WriteLine(result.Text)
End Using
$vbLabelText   $csharpLabel

這種配置在中等品質的掃描中始終能達到近乎完美的精度。 LoadImageFrames方法能夠有效率地處理多頁文檔,因此非常適合批次處理場景。


包含哈利波特文本的多頁 TIFF 文檔,可用於 C# OCR 處理。

範例 TIFF 文檔,示範 IronOCR 的多頁文字擷取功能

IronOCR 能夠從 TIFF 等掃描文件中的圖像和條碼讀取文本,這充分展現了它如何簡化複雜的 OCR 任務。 該程式庫在處理真實文件方面表現出色,能夠無縫處理多頁 TIFF 文件和PDF 文字擷取

IronOCR如何處理低品質掃描件?


低品質掃描影像帶有數位噪聲,展示了 IronOCR 的影像增強功能。

IronOCR 可以使用影像濾波器準確處理具有雜訊的低解析度文件。

在處理包含失真和數位雜訊的不完美掃描時, IronOCR 的效能優於其他 C# OCR 函式庫。 它專為真實場景而設計,而非為完美的測試圖像而設計。

// Advanced Iron Tesseract C# example for low-quality images
using IronOcr;
using System;

var ocr = new IronTesseract();

try
{
    using (var input = new OcrInput())
    {
        // Load specific pages from poor-quality TIFF
        var pageIndices = new int[] { 0, 1 };
        input.LoadImageFrames(@"img\Potter.LowQuality.tiff", pageIndices);

        // Apply deskew filter to correct rotation and perspective
        input.Deskew(); // Critical for improving accuracy on skewed scans

        // Perform OCR with enhanced preprocessing
        OcrResult result = ocr.Read(input);

        // Display results
        Console.WriteLine("Recognized Text:");
        Console.WriteLine(result.Text);
    }
}
catch (Exception ex)
{
    Console.WriteLine($"Error during OCR: {ex.Message}");
}
// Advanced Iron Tesseract C# example for low-quality images
using IronOcr;
using System;

var ocr = new IronTesseract();

try
{
    using (var input = new OcrInput())
    {
        // Load specific pages from poor-quality TIFF
        var pageIndices = new int[] { 0, 1 };
        input.LoadImageFrames(@"img\Potter.LowQuality.tiff", pageIndices);

        // Apply deskew filter to correct rotation and perspective
        input.Deskew(); // Critical for improving accuracy on skewed scans

        // Perform OCR with enhanced preprocessing
        OcrResult result = ocr.Read(input);

        // Display results
        Console.WriteLine("Recognized Text:");
        Console.WriteLine(result.Text);
    }
}
catch (Exception ex)
{
    Console.WriteLine($"Error during OCR: {ex.Message}");
}
' Advanced Iron Tesseract C# example for low-quality images
Imports IronOcr
Imports System

Private ocr = New IronTesseract()

Try
	Using input = New OcrInput()
		' Load specific pages from poor-quality TIFF
		Dim pageIndices = New Integer() { 0, 1 }
		input.LoadImageFrames("img\Potter.LowQuality.tiff", pageIndices)

		' Apply deskew filter to correct rotation and perspective
		input.Deskew() ' Critical for improving accuracy on skewed scans

		' Perform OCR with enhanced preprocessing
		Dim result As OcrResult = ocr.Read(input)

		' Display results
		Console.WriteLine("Recognized Text:")
		Console.WriteLine(result.Text)
	End Using
Catch ex As Exception
	Console.WriteLine($"Error during OCR: {ex.Message}")
End Try
$vbLabelText   $csharpLabel

使用Input.Deskew() ,低品質掃描的準確率提高到99.8% ,幾乎與高品質結果相當。 這表明,IronOCR 是 C# OCR 的首選,因為它不會產生 Tesseract 的複雜問題。

影像濾鏡可能會略微增加處理時間,但會顯著縮短整體 OCR 時間。 找到合適的平衡點取決於您的文件品質。

對於大多數情況, Input.Deskew()Input.DeNoise()可以可靠地提高 OCR 性能。 了解更多影像預處理技術

如何優化OCR效能和速度?

在 C# 中將影像轉換為文字時,影響 OCR 速度的最重要因素是輸入品質。 較高的 DPI(~200 dpi)和最小的噪音可產生最快、最準確的結果。

雖然 IronOCR 在修正不完美文件方面表現出色,但這種增強功能需要額外的處理時間。

選擇壓縮失真最小的影像格式。 由於數位雜訊較低,TIFF 和 PNG 通常比 JPEG 格式的掃描速度更快。

哪些影像濾鏡可以提高OCR速度?

以下濾鏡可顯著提升 C# OCR 影像轉文字工作流程的效能:

  • OcrInput.Rotate(double degrees) :順時針旋轉影像(負值表示逆時針旋轉)
  • OcrInput.Binarize()轉換為黑白影像,提高低對比場景下的效能
  • OcrInput.ToGrayScale()轉換為灰階影像,以提高速度。
  • OcrInput.Contrast()自動調整對比度以提高準確性
  • OcrInput.DeNoise()在預期存在雜訊時移除數位偽影
  • OcrInput.Invert()反轉黑底白字文字的顏色
  • OcrInput.Dilate()擴展文字邊界
  • OcrInput.Erode()縮小文字邊界
  • OcrInput.Deskew()校正對齊方式 - 對於傾斜的文檔至關重要
  • OcrInput.DeepCleanBackgroundNoise()強力降噪
  • OcrInput.EnhanceResolution提高低解析度影像質量

如何配置 IronOCR 以獲得最高速度?

使用以下設定可優化處理高品質掃描件的速度:

using IronOcr;

// Configure for speed - ideal for clean documents
IronTesseract ocr = new IronTesseract();

// Exclude problematic characters to speed up recognition
ocr.Configuration.BlackListCharacters = "~`$#^*_{[]}|\\";

// Use automatic page segmentation
ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.Auto;

// Select fast English language pack
ocr.Language = OcrLanguage.EnglishFast;

using (OcrInput input = new OcrInput())
{
    // Load specific pages from document
    int[] pageIndices = new int[] { 1, 2 };
    input.LoadImageFrames(@"img\Potter.tiff", pageIndices);

    // Read with optimized settings
    OcrResult result = ocr.Read(input);
    Console.WriteLine(result.Text);
}
using IronOcr;

// Configure for speed - ideal for clean documents
IronTesseract ocr = new IronTesseract();

// Exclude problematic characters to speed up recognition
ocr.Configuration.BlackListCharacters = "~`$#^*_{[]}|\\";

// Use automatic page segmentation
ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.Auto;

// Select fast English language pack
ocr.Language = OcrLanguage.EnglishFast;

using (OcrInput input = new OcrInput())
{
    // Load specific pages from document
    int[] pageIndices = new int[] { 1, 2 };
    input.LoadImageFrames(@"img\Potter.tiff", pageIndices);

    // Read with optimized settings
    OcrResult result = ocr.Read(input);
    Console.WriteLine(result.Text);
}
Imports IronOcr

' Configure for speed - ideal for clean documents
Private ocr As New IronTesseract()

' Exclude problematic characters to speed up recognition
ocr.Configuration.BlackListCharacters = "~`$#^*_{[]}|\"

' Use automatic page segmentation
ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.Auto

' Select fast English language pack
ocr.Language = OcrLanguage.EnglishFast

Using input As New OcrInput()
	' Load specific pages from document
	Dim pageIndices() As Integer = { 1, 2 }
	input.LoadImageFrames("img\Potter.tiff", pageIndices)

	' Read with optimized settings
	Dim result As OcrResult = ocr.Read(input)
	Console.WriteLine(result.Text)
End Using
$vbLabelText   $csharpLabel

與預設值相比,此優化後的設定可保持99.8% 的準確率,同時速度提高 35%

如何使用 C# OCR 讀取影像的特定區域?

下面這個 Iron Tesseract C# 範例展示如何使用System.Drawing.Rectangle定位特定區域。 對於文字出現在可預測位置的標準化表格,這種技術非常有價值。

IronOCR能否處理裁切區域以獲得更快的結果?

使用基於像素的座標,您可以將 OCR 限制在特定區域,從而顯著提高速度並防止提取不必要的文字:

using IronOcr;
using IronSoftware.Drawing;

// Initialize OCR engine for targeted region processing
var ocr = new IronTesseract();

using (var input = new OcrInput())
{
    // Define exact region for OCR - coordinates in pixels
    var contentArea = new System.Drawing.Rectangle(
        x: 215, 
        y: 1250, 
        width: 1335, 
        height: 280
    );

    // Load image with specific area - perfect for forms and invoices
    input.AddImage("img/ComSci.png", contentArea);

    // Process only the defined region
    OcrResult result = ocr.Read(input);
    Console.WriteLine(result.Text);
}
using IronOcr;
using IronSoftware.Drawing;

// Initialize OCR engine for targeted region processing
var ocr = new IronTesseract();

using (var input = new OcrInput())
{
    // Define exact region for OCR - coordinates in pixels
    var contentArea = new System.Drawing.Rectangle(
        x: 215, 
        y: 1250, 
        width: 1335, 
        height: 280
    );

    // Load image with specific area - perfect for forms and invoices
    input.AddImage("img/ComSci.png", contentArea);

    // Process only the defined region
    OcrResult result = ocr.Read(input);
    Console.WriteLine(result.Text);
}
Imports IronOcr
Imports IronSoftware.Drawing

' Initialize OCR engine for targeted region processing
Private ocr = New IronTesseract()

Using input = New OcrInput()
	' Define exact region for OCR - coordinates in pixels
	Dim contentArea = New System.Drawing.Rectangle(x:= 215, y:= 1250, width:= 1335, height:= 280)

	' Load image with specific area - perfect for forms and invoices
	input.AddImage("img/ComSci.png", contentArea)

	' Process only the defined region
	Dim result As OcrResult = ocr.Read(input)
	Console.WriteLine(result.Text)
End Using
$vbLabelText   $csharpLabel

這種有針對性的方法在僅提取相關文本的同時,速度提高了 41% 。 它非常適合用於發票、支票和表格等結構化文件。 同樣的裁切技術也能完美地應用於PDF OCR 操作

電腦科學文檔,展示如何使用 C# 進行目標 OCR 區域提取 本文檔示範如何使用 IronOCR 的矩形選擇功能進行精確的基於區域的文字擷取。

IronOCR支援多少種語言?

IronOCR 透過便利的語言包提供125 種國際語言。 您可以從我們的網站或透過NuGet 套件管理器下載它們作為 DLL 檔案。

透過 NuGet 介面安裝語言包(搜尋"IronOcr.Languages" ),或存取完整的語言包清單

支援的語言包括阿拉伯語、中文(簡體/繁體)、日語、韓語、印地語、俄語、德語、法語、西班牙語以及其他 115 種語言,每種語言都針對準確的文字辨識進行了最佳化。

如何實現多語言OCR?

本 IronOCR C# 教學範例示範了阿拉伯文字辨識:

Install-Package IronOcr.Languages.Arabic
IronOCR正在處理阿拉伯語文本,展示了其多語言OCR支援功能。

IronOCR能夠從GIF圖像中準確提取阿拉伯語文本

// Install-Package IronOcr.Languages.Arabic
using IronOcr;

// Configure for Arabic language OCR
var ocr = new IronTesseract();
ocr.Language = OcrLanguage.Arabic;

using (var input = new OcrInput())
{
    // Load Arabic text image
    input.AddImage("img/arabic.gif");

    // IronOCR handles low-quality Arabic text that standard Tesseract cannot
    var result = ocr.Read(input);

    // Save to file (console may not display Arabic correctly)
    result.SaveAsTextFile("arabic.txt");
}
// Install-Package IronOcr.Languages.Arabic
using IronOcr;

// Configure for Arabic language OCR
var ocr = new IronTesseract();
ocr.Language = OcrLanguage.Arabic;

using (var input = new OcrInput())
{
    // Load Arabic text image
    input.AddImage("img/arabic.gif");

    // IronOCR handles low-quality Arabic text that standard Tesseract cannot
    var result = ocr.Read(input);

    // Save to file (console may not display Arabic correctly)
    result.SaveAsTextFile("arabic.txt");
}
' Install-Package IronOcr.Languages.Arabic
Imports IronOcr

' Configure for Arabic language OCR
Private ocr = New IronTesseract()
ocr.Language = OcrLanguage.Arabic

Using input = New OcrInput()
	' Load Arabic text image
	input.AddImage("img/arabic.gif")

	' IronOCR handles low-quality Arabic text that standard Tesseract cannot
	Dim result = ocr.Read(input)

	' Save to file (console may not display Arabic correctly)
	result.SaveAsTextFile("arabic.txt")
End Using
$vbLabelText   $csharpLabel

IronOCR 能處理多語言文件嗎?

當文件包含多種語言時,請設定 IronOCR 以支援多語言:

Install-Package IronOcr.Languages.ChineseSimplified
// Multi-language OCR configuration
using IronOcr;

var ocr = new IronTesseract();

// Set primary language
ocr.Language = OcrLanguage.ChineseSimplified;

// Add secondary languages as needed
ocr.AddSecondaryLanguage(OcrLanguage.English);

// Custom .traineddata files can be added for specialized recognition
// ocr.AddSecondaryLanguage("path/to/custom.traineddata");

using (var input = new OcrInput())
{
    // Process multi-language document
    input.AddImage("img/MultiLanguage.jpeg");

    var result = ocr.Read(input);
    result.SaveAsTextFile("MultiLanguage.txt");
}
// Multi-language OCR configuration
using IronOcr;

var ocr = new IronTesseract();

// Set primary language
ocr.Language = OcrLanguage.ChineseSimplified;

// Add secondary languages as needed
ocr.AddSecondaryLanguage(OcrLanguage.English);

// Custom .traineddata files can be added for specialized recognition
// ocr.AddSecondaryLanguage("path/to/custom.traineddata");

using (var input = new OcrInput())
{
    // Process multi-language document
    input.AddImage("img/MultiLanguage.jpeg");

    var result = ocr.Read(input);
    result.SaveAsTextFile("MultiLanguage.txt");
}
' Multi-language OCR configuration
Imports IronOcr

Private ocr = New IronTesseract()

' Set primary language
ocr.Language = OcrLanguage.ChineseSimplified

' Add secondary languages as needed
ocr.AddSecondaryLanguage(OcrLanguage.English)

' Custom .traineddata files can be added for specialized recognition
' ocr.AddSecondaryLanguage("path/to/custom.traineddata");

Using input = New OcrInput()
	' Process multi-language document
	input.AddImage("img/MultiLanguage.jpeg")

	Dim result = ocr.Read(input)
	result.SaveAsTextFile("MultiLanguage.txt")
End Using
$vbLabelText   $csharpLabel

如何使用 C# OCR 處理多頁文件?

IronOCR 可將多個頁面或影像無縫合併為單一OcrResult 。 此功能可實現建立可搜尋 PDF和從整個文件集中提取文字等強大功能。

在一次 OCR 操作中混合搭配各種來源—影像、TIFF 幀和 PDF 頁面:

// Multi-source document processing
using IronOcr;

IronTesseract ocr = new IronTesseract();

using (OcrInput input = new OcrInput())
{
    // Add various image formats
    input.AddImage("image1.jpeg");
    input.AddImage("image2.png");

    // Process specific frames from multi-frame images
    int[] frameNumbers = { 1, 2 };
    input.AddImageFrames("image3.gif", frameNumbers);

    // Process all sources together
    OcrResult result = ocr.Read(input);

    // Verify page count
    Console.WriteLine($"{result.Pages.Count} Pages processed.");
}
// Multi-source document processing
using IronOcr;

IronTesseract ocr = new IronTesseract();

using (OcrInput input = new OcrInput())
{
    // Add various image formats
    input.AddImage("image1.jpeg");
    input.AddImage("image2.png");

    // Process specific frames from multi-frame images
    int[] frameNumbers = { 1, 2 };
    input.AddImageFrames("image3.gif", frameNumbers);

    // Process all sources together
    OcrResult result = ocr.Read(input);

    // Verify page count
    Console.WriteLine($"{result.Pages.Count} Pages processed.");
}
' Multi-source document processing
Imports IronOcr

Private ocr As New IronTesseract()

Using input As New OcrInput()
	' Add various image formats
	input.AddImage("image1.jpeg")
	input.AddImage("image2.png")

	' Process specific frames from multi-frame images
	Dim frameNumbers() As Integer = { 1, 2 }
	input.AddImageFrames("image3.gif", frameNumbers)

	' Process all sources together
	Dim result As OcrResult = ocr.Read(input)

	' Verify page count
	Console.WriteLine($"{result.Pages.Count} Pages processed.")
End Using
$vbLabelText   $csharpLabel

高效處理TIFF檔案的所有頁面:

using IronOcr;

IronTesseract ocr = new IronTesseract();

using (OcrInput input = new OcrInput())
{
    // Define pages to process (0-based indexing)
    int[] pageIndices = new int[] { 0, 1 };

    // Load specific TIFF frames
    input.LoadImageFrames("MultiFrame.Tiff", pageIndices);

    // Extract text from all frames
    OcrResult result = ocr.Read(input);

    Console.WriteLine(result.Text);
    Console.WriteLine($"{result.Pages.Count} Pages processed");
}
using IronOcr;

IronTesseract ocr = new IronTesseract();

using (OcrInput input = new OcrInput())
{
    // Define pages to process (0-based indexing)
    int[] pageIndices = new int[] { 0, 1 };

    // Load specific TIFF frames
    input.LoadImageFrames("MultiFrame.Tiff", pageIndices);

    // Extract text from all frames
    OcrResult result = ocr.Read(input);

    Console.WriteLine(result.Text);
    Console.WriteLine($"{result.Pages.Count} Pages processed");
}
Imports IronOcr

Private ocr As New IronTesseract()

Using input As New OcrInput()
	' Define pages to process (0-based indexing)
	Dim pageIndices() As Integer = { 0, 1 }

	' Load specific TIFF frames
	input.LoadImageFrames("MultiFrame.Tiff", pageIndices)

	' Extract text from all frames
	Dim result As OcrResult = ocr.Read(input)

	Console.WriteLine(result.Text)
	Console.WriteLine($"{result.Pages.Count} Pages processed")
End Using
$vbLabelText   $csharpLabel

將TIFF或PDF檔案轉換為可搜尋格式:

using System;
using IronOcr;

IronTesseract ocr = new IronTesseract();

using (OcrInput input = new OcrInput())
{
    try
    {
        // Load password-protected PDF if needed
        input.LoadPdf("example.pdf", "password");

        // Process entire document
        OcrResult result = ocr.Read(input);

        Console.WriteLine(result.Text);
        Console.WriteLine($"{result.Pages.Count} Pages recognized");
    }
    catch (Exception ex)
    {
        Console.WriteLine($"Error processing PDF: {ex.Message}");
    }
}
using System;
using IronOcr;

IronTesseract ocr = new IronTesseract();

using (OcrInput input = new OcrInput())
{
    try
    {
        // Load password-protected PDF if needed
        input.LoadPdf("example.pdf", "password");

        // Process entire document
        OcrResult result = ocr.Read(input);

        Console.WriteLine(result.Text);
        Console.WriteLine($"{result.Pages.Count} Pages recognized");
    }
    catch (Exception ex)
    {
        Console.WriteLine($"Error processing PDF: {ex.Message}");
    }
}
Imports System
Imports IronOcr

Private ocr As New IronTesseract()

Using input As New OcrInput()
	Try
		' Load password-protected PDF if needed
		input.LoadPdf("example.pdf", "password")

		' Process entire document
		Dim result As OcrResult = ocr.Read(input)

		Console.WriteLine(result.Text)
		Console.WriteLine($"{result.Pages.Count} Pages recognized")
	Catch ex As Exception
		Console.WriteLine($"Error processing PDF: {ex.Message}")
	End Try
End Using
$vbLabelText   $csharpLabel

如何從影像建立可搜尋的PDF?

IronOCR 擅長創建可搜尋的 PDF——這是資料庫系統、SEO 優化和文件可存取性的關鍵功能。

using IronOcr;

IronTesseract ocr = new IronTesseract();

using (OcrInput input = new OcrInput())
{
    // Set document metadata
    input.Title = "Quarterly Report";

    // Combine multiple sources
    input.AddImage("image1.jpeg");
    input.AddImage("image2.png");

    // Add specific frames from animated images
    int[] gifFrames = new int[] { 1, 2 };
    input.AddImageFrames("image3.gif", gifFrames);

    // Create searchable PDF
    OcrResult result = ocr.Read(input);
    result.SaveAsSearchablePdf("searchable.pdf");
}
using IronOcr;

IronTesseract ocr = new IronTesseract();

using (OcrInput input = new OcrInput())
{
    // Set document metadata
    input.Title = "Quarterly Report";

    // Combine multiple sources
    input.AddImage("image1.jpeg");
    input.AddImage("image2.png");

    // Add specific frames from animated images
    int[] gifFrames = new int[] { 1, 2 };
    input.AddImageFrames("image3.gif", gifFrames);

    // Create searchable PDF
    OcrResult result = ocr.Read(input);
    result.SaveAsSearchablePdf("searchable.pdf");
}
Imports IronOcr

Private ocr As New IronTesseract()

Using input As New OcrInput()
	' Set document metadata
	input.Title = "Quarterly Report"

	' Combine multiple sources
	input.AddImage("image1.jpeg")
	input.AddImage("image2.png")

	' Add specific frames from animated images
	Dim gifFrames() As Integer = { 1, 2 }
	input.AddImageFrames("image3.gif", gifFrames)

	' Create searchable PDF
	Dim result As OcrResult = ocr.Read(input)
	result.SaveAsSearchablePdf("searchable.pdf")
End Using
$vbLabelText   $csharpLabel

將現有PDF文件轉換為可搜尋版本:

using IronOcr;

var ocr = new IronTesseract();

using (var input = new OcrInput())
{
    // Set PDF metadata
    input.Title = "Annual Report 2024";

    // Process existing PDF
    input.LoadPdf("example.pdf", "password");

    // Generate searchable version
    var result = ocr.Read(input);
    result.SaveAsSearchablePdf("searchable.pdf");
}
using IronOcr;

var ocr = new IronTesseract();

using (var input = new OcrInput())
{
    // Set PDF metadata
    input.Title = "Annual Report 2024";

    // Process existing PDF
    input.LoadPdf("example.pdf", "password");

    // Generate searchable version
    var result = ocr.Read(input);
    result.SaveAsSearchablePdf("searchable.pdf");
}
Imports IronOcr

Private ocr = New IronTesseract()

Using input = New OcrInput()
	' Set PDF metadata
	input.Title = "Annual Report 2024"

	' Process existing PDF
	input.LoadPdf("example.pdf", "password")

	' Generate searchable version
	Dim result = ocr.Read(input)
	result.SaveAsSearchablePdf("searchable.pdf")
End Using
$vbLabelText   $csharpLabel

對 TIFF 格式轉換也應用相同的技術:

using IronOcr;

var ocr = new IronTesseract();

using (var input = new OcrInput())
{
    // Configure document properties
    input.Title = "Scanned Archive Document";

    // Select pages to process
    var pageIndices = new int[] { 1, 2 };
    input.LoadImageFrames("example.tiff", pageIndices);

    // Create searchable PDF from TIFF
    OcrResult result = ocr.Read(input);
    result.SaveAsSearchablePdf("searchable.pdf");
}
using IronOcr;

var ocr = new IronTesseract();

using (var input = new OcrInput())
{
    // Configure document properties
    input.Title = "Scanned Archive Document";

    // Select pages to process
    var pageIndices = new int[] { 1, 2 };
    input.LoadImageFrames("example.tiff", pageIndices);

    // Create searchable PDF from TIFF
    OcrResult result = ocr.Read(input);
    result.SaveAsSearchablePdf("searchable.pdf");
}
Imports IronOcr

Private ocr = New IronTesseract()

Using input = New OcrInput()
	' Configure document properties
	input.Title = "Scanned Archive Document"

	' Select pages to process
	Dim pageIndices = New Integer() { 1, 2 }
	input.LoadImageFrames("example.tiff", pageIndices)

	' Create searchable PDF from TIFF
	Dim result As OcrResult = ocr.Read(input)
	result.SaveAsSearchablePdf("searchable.pdf")
End Using
$vbLabelText   $csharpLabel

如何將 OCR 結果匯出為 HOCR HTML?

IronOCR 支援 HOCR HTML 匯出,可在保留佈局資訊的同時,實現結構化PDF 到 HTMLTIFF 到 HTML 的轉換:

using IronOcr;

var ocr = new IronTesseract();

using (var input = new OcrInput())
{
    // Set HTML title
    input.Title = "Document Archive";

    // Process multiple document types
    input.AddImage("image2.jpeg");
    input.AddPdf("example.pdf", "password");

    // Add TIFF pages
    var pageIndices = new int[] { 1, 2 };
    input.AddTiff("example.tiff", pageIndices);

    // Export as HOCR with position data
    OcrResult result = ocr.Read(input);
    result.SaveAsHocrFile("hocr.html");
}
using IronOcr;

var ocr = new IronTesseract();

using (var input = new OcrInput())
{
    // Set HTML title
    input.Title = "Document Archive";

    // Process multiple document types
    input.AddImage("image2.jpeg");
    input.AddPdf("example.pdf", "password");

    // Add TIFF pages
    var pageIndices = new int[] { 1, 2 };
    input.AddTiff("example.tiff", pageIndices);

    // Export as HOCR with position data
    OcrResult result = ocr.Read(input);
    result.SaveAsHocrFile("hocr.html");
}
Imports IronOcr

Private ocr = New IronTesseract()

Using input = New OcrInput()
	' Set HTML title
	input.Title = "Document Archive"

	' Process multiple document types
	input.AddImage("image2.jpeg")
	input.AddPdf("example.pdf", "password")

	' Add TIFF pages
	Dim pageIndices = New Integer() { 1, 2 }
	input.AddTiff("example.tiff", pageIndices)

	' Export as HOCR with position data
	Dim result As OcrResult = ocr.Read(input)
	result.SaveAsHocrFile("hocr.html")
End Using
$vbLabelText   $csharpLabel

IronOCR 能同時讀取條碼和文字嗎?

IronOCR獨特地將文字辨識與條碼讀取功能結合,無需單獨的庫:

// Enable combined text and barcode recognition
using IronOcr;

var ocr = new IronTesseract();

// Enable barcode detection
ocr.Configuration.ReadBarCodes = true;

using (var input = new OcrInput())
{
    // Load image containing both text and barcodes
    input.AddImage("img/Barcode.png");

    // Process both text and barcodes
    var result = ocr.Read(input);

    // Extract barcode data
    foreach (var barcode in result.Barcodes)
    {
        Console.WriteLine($"Barcode Value: {barcode.Value}");
        Console.WriteLine($"Type: {barcode.Type}, Location: {barcode.Location}");
    }
}
// Enable combined text and barcode recognition
using IronOcr;

var ocr = new IronTesseract();

// Enable barcode detection
ocr.Configuration.ReadBarCodes = true;

using (var input = new OcrInput())
{
    // Load image containing both text and barcodes
    input.AddImage("img/Barcode.png");

    // Process both text and barcodes
    var result = ocr.Read(input);

    // Extract barcode data
    foreach (var barcode in result.Barcodes)
    {
        Console.WriteLine($"Barcode Value: {barcode.Value}");
        Console.WriteLine($"Type: {barcode.Type}, Location: {barcode.Location}");
    }
}
' Enable combined text and barcode recognition
Imports IronOcr

Private ocr = New IronTesseract()

' Enable barcode detection
ocr.Configuration.ReadBarCodes = True

Using input = New OcrInput()
	' Load image containing both text and barcodes
	input.AddImage("img/Barcode.png")

	' Process both text and barcodes
	Dim result = ocr.Read(input)

	' Extract barcode data
	For Each barcode In result.Barcodes
		Console.WriteLine($"Barcode Value: {barcode.Value}")
		Console.WriteLine($"Type: {barcode.Type}, Location: {barcode.Location}")
	Next barcode
End Using
$vbLabelText   $csharpLabel

如何取得詳細的OCR結果和元資料?

IronOCR 結果物件提供全面的數據,高級開發人員可以利用這些數據開發複雜的應用程式。

每個OcrResult都包含分層集合:頁、段落、行、單字和字元。 所有元素均包含詳細的元數據,例如位置、字體資訊和置信度評分。

可以將單一元素(段落、單字、條碼)匯出為圖像或點陣圖以進行進一步處理:

using System;
using IronOcr;
using IronSoftware.Drawing;

// Configure with barcode support
IronTesseract ocr = new IronTesseract
{
    Configuration = { ReadBarCodes = true }
};

using OcrInput input = new OcrInput();

// Process multi-page document
int[] pageIndices = { 1, 2 };
input.LoadImageFrames(@"img\Potter.tiff", pageIndices);

OcrResult result = ocr.Read(input);

// Navigate the complete results hierarchy
foreach (var page in result.Pages)
{
    // Page-level data
    int pageNumber = page.PageNumber;
    string pageText = page.Text;
    int pageWordCount = page.WordCount;

    // Extract page elements
    OcrResult.Barcode[] barcodes = page.Barcodes;
    AnyBitmap pageImage = page.ToBitmap();
    double pageWidth = page.Width;
    double pageHeight = page.Height;

    foreach (var paragraph in page.Paragraphs)
    {
        // Paragraph properties
        int paragraphNumber = paragraph.ParagraphNumber;
        string paragraphText = paragraph.Text;
        double paragraphConfidence = paragraph.Confidence;
        var textDirection = paragraph.TextDirection;

        foreach (var line in paragraph.Lines)
        {
            // Line details including baseline information
            string lineText = line.Text;
            double lineConfidence = line.Confidence;
            double baselineAngle = line.BaselineAngle;
            double baselineOffset = line.BaselineOffset;

            foreach (var word in line.Words)
            {
                // Word-level data
                string wordText = word.Text;
                double wordConfidence = word.Confidence;

                // Font information (when available)
                if (word.Font != null)
                {
                    string fontName = word.Font.FontName;
                    double fontSize = word.Font.FontSize;
                    bool isBold = word.Font.IsBold;
                    bool isItalic = word.Font.IsItalic;
                }

                foreach (var character in word.Characters)
                {
                    // Character-level analysis
                    string charText = character.Text;
                    double charConfidence = character.Confidence;

                    // Alternative character choices for spell-checking
                    OcrResult.Choice[] alternatives = character.Choices;
                }
            }
        }
    }
}
using System;
using IronOcr;
using IronSoftware.Drawing;

// Configure with barcode support
IronTesseract ocr = new IronTesseract
{
    Configuration = { ReadBarCodes = true }
};

using OcrInput input = new OcrInput();

// Process multi-page document
int[] pageIndices = { 1, 2 };
input.LoadImageFrames(@"img\Potter.tiff", pageIndices);

OcrResult result = ocr.Read(input);

// Navigate the complete results hierarchy
foreach (var page in result.Pages)
{
    // Page-level data
    int pageNumber = page.PageNumber;
    string pageText = page.Text;
    int pageWordCount = page.WordCount;

    // Extract page elements
    OcrResult.Barcode[] barcodes = page.Barcodes;
    AnyBitmap pageImage = page.ToBitmap();
    double pageWidth = page.Width;
    double pageHeight = page.Height;

    foreach (var paragraph in page.Paragraphs)
    {
        // Paragraph properties
        int paragraphNumber = paragraph.ParagraphNumber;
        string paragraphText = paragraph.Text;
        double paragraphConfidence = paragraph.Confidence;
        var textDirection = paragraph.TextDirection;

        foreach (var line in paragraph.Lines)
        {
            // Line details including baseline information
            string lineText = line.Text;
            double lineConfidence = line.Confidence;
            double baselineAngle = line.BaselineAngle;
            double baselineOffset = line.BaselineOffset;

            foreach (var word in line.Words)
            {
                // Word-level data
                string wordText = word.Text;
                double wordConfidence = word.Confidence;

                // Font information (when available)
                if (word.Font != null)
                {
                    string fontName = word.Font.FontName;
                    double fontSize = word.Font.FontSize;
                    bool isBold = word.Font.IsBold;
                    bool isItalic = word.Font.IsItalic;
                }

                foreach (var character in word.Characters)
                {
                    // Character-level analysis
                    string charText = character.Text;
                    double charConfidence = character.Confidence;

                    // Alternative character choices for spell-checking
                    OcrResult.Choice[] alternatives = character.Choices;
                }
            }
        }
    }
}
Imports System
Imports IronOcr
Imports IronSoftware.Drawing

' Configure with barcode support
Private ocr As New IronTesseract With {
	.Configuration = { ReadBarCodes = True }
}

Private OcrInput As using

' Process multi-page document
Private pageIndices() As Integer = { 1, 2 }
input.LoadImageFrames("img\Potter.tiff", pageIndices)

Dim result As OcrResult = ocr.Read(input)

' Navigate the complete results hierarchy
For Each page In result.Pages
	' Page-level data
	Dim pageNumber As Integer = page.PageNumber
	Dim pageText As String = page.Text
	Dim pageWordCount As Integer = page.WordCount

	' Extract page elements
	Dim barcodes() As OcrResult.Barcode = page.Barcodes
	Dim pageImage As AnyBitmap = page.ToBitmap()
	Dim pageWidth As Double = page.Width
	Dim pageHeight As Double = page.Height

	For Each paragraph In page.Paragraphs
		' Paragraph properties
		Dim paragraphNumber As Integer = paragraph.ParagraphNumber
		Dim paragraphText As String = paragraph.Text
		Dim paragraphConfidence As Double = paragraph.Confidence
		Dim textDirection = paragraph.TextDirection

		For Each line In paragraph.Lines
			' Line details including baseline information
			Dim lineText As String = line.Text
			Dim lineConfidence As Double = line.Confidence
			Dim baselineAngle As Double = line.BaselineAngle
			Dim baselineOffset As Double = line.BaselineOffset

			For Each word In line.Words
				' Word-level data
				Dim wordText As String = word.Text
				Dim wordConfidence As Double = word.Confidence

				' Font information (when available)
				If word.Font IsNot Nothing Then
					Dim fontName As String = word.Font.FontName
					Dim fontSize As Double = word.Font.FontSize
					Dim isBold As Boolean = word.Font.IsBold
					Dim isItalic As Boolean = word.Font.IsItalic
				End If

				For Each character In word.Characters
					' Character-level analysis
					Dim charText As String = character.Text
					Dim charConfidence As Double = character.Confidence

					' Alternative character choices for spell-checking
					Dim alternatives() As OcrResult.Choice = character.Choices
				Next character
			Next word
		Next line
	Next paragraph
Next page
$vbLabelText   $csharpLabel

摘要

IronOCR 為 C# 開發人員提供最先進的Tesseract API 實現,可在 Windows、Linux 和 Mac 平台上無縫運作。 IronOCR 能夠準確地從圖像中讀取文字——即使是從不完美的文件中讀取——這使其區別於基本的 OCR 解決方案。

該程式庫的獨特功能包括整合條碼讀取功能,以及將結果匯出為可搜尋的 PDF 或 HOCR HTML 的功能,這些功能在標準的 Tesseract 實作中是不可用的。

展望未來

要繼續精通 IronOCR:

原始碼下載

準備好在您的應用程式中實現 C# OCR 圖像到文字的轉換功能了嗎? 下載 IronOCR ,立即開始免費試用

常見問題解答

如何在 C# 中不使用 Tesseract 將圖像轉換為文本?

您可以使用 IronOCR 在 C# 中將圖像轉換為文本,無需使用 Tesseract。IronOCR 簡化了過程,具有內建的方法直接處理圖像轉文本。

如何提高低質量圖像的 OCR 準確性?

IronOCR 提供了圖像過濾器如 Input.Deskew()Input.DeNoise(),可以用來校正偏斜和減少噪音,從而顯著提高低質量圖像的 OCR 準確性。

從多頁文件中提取文本的步驟是什麼?

要從多頁文件中提取文本,IronOCR 允許您使用 LoadPdf() 方法加載和處理每一頁的 PDF,或處理 TIFF 文件,有效地將每頁轉換為文本。

是否可以同時從圖像中讀取條形碼和文本?

是的,IronOCR 可以從單個圖像中讀取文本和條形碼。您可以啟用條形碼讀取功能 ocr.Configuration.ReadBarCodes = true,這樣可以提取文本和條形碼數據。

如何設置 OCR 以處理多語言的文件?

IronOCR 支持超過 125 種語言並允許您使用 ocr.Language 設置主要語言,並使用 ocr.AddSecondaryLanguage() 添加其他語言,以便處理多語言文件。

有哪些方法可以將 OCR 結果導出為不同格式?

IronOCR 提供了多種方法來導出 OCR 結果,比如 SaveAsSearchablePdf() 用於 PDF,SaveAsTextFile() 用於純文本,SaveAsHocrFile() 用於 HOCR HTML 格式。

如何優化大圖像文件的 OCR 處理速度?

要優化 OCR 處理速度,使用 IronOCR 的 OcrLanguage.EnglishFast 進行更快的語言識別,並定義 OCR 的特定區域 System.Drawing.Rectangle 以縮短處理時間。

如何處理受保護 PDF 文件的 OCR?

處理受保護的 PDF 時,使用 LoadPdf() 方法及正確的密碼。IronOCR 通過將頁面自動轉換為圖像來處理基於圖像的 PDF 以進行 OCR 處理。

如果 OCR 結果不準確,我該怎麼辦?

如果 OCR 結果不準確,可以考慮使用 IronOCR 的圖像增強功能,如 Input.Deskew()Input.DeNoise(),並確保安裝了正確的語言包。

我可以定制 OCR 過程以排除某些字符嗎?

是的,IronOCR 允許通過使用 BlackListCharacters 屬性來排除特定字符,從而提高準確性和處理速度,專注於相關文字。

Jacob Mellor, Team Iron 首席技术官
首席技术官

Jacob Mellor 是 Iron Software 的首席技術官,作為 C# PDF 技術的先鋒工程師。作為 Iron Software 核心代碼的原作者,他自開始以來塑造了公司產品架構,與 CEO Cameron Rimington 一起將其轉變為一家擁有超過 50 名員工的公司,為 NASA、特斯拉 和 全世界政府機構服務。

Jacob 持有曼徹斯特大學土木工程一級榮譽学士工程學位(BEng) (1998-2001)。他於 1999 年在倫敦開設了他的第一家軟件公司,並於 2005 年製作了他的首個 .NET 組件,專注於解決 Microsoft 生態系統內的複雜問題。

他的旗艦產品 IronPDF & Iron Suite .NET 庫在全球 NuGet 被安裝超過 3000 萬次,其基礎代碼繼續為世界各地的開發工具提供動力。擁有 25 年的商業經驗和 41 年的編碼專業知識,Jacob 仍專注於推動企業級 C#、Java 及 Python PDF 技術的創新,同時指導新一代技術領袖。

審核人
Jeff Fritz
Jeffrey T. Fritz
首席程序经理 - .NET 社区团队
Jeff 也是 .NET 和 Visual Studio 团队的首席程序经理。他是 .NET Conf 虚拟会议系列的执行制作人,并主持“Fritz 和朋友”这一每周两次的开发者的直播节目,在节目上讨论技术并与观众一起编写代码。Jeff 撰写研讨会、主持演讲,并计划大型 Microsoft 开发者活动(包括 Microsoft Build、Microsoft Ignite、.NET Conf 和 Microsoft MVP Summit)的内容。
準備好開始了嗎?
Nuget 下載 5,167,857 | Version: 2025.11 剛發表