用 IronOCR 在 C# 中将文本高亮为图像

已更新:2026年4月21日

Translated

View the article in English

IronOCR 的 HighlightTextAndSaveAsImages 方法通过在检测到的文本（字符、单词、行或段落）周围绘制边界框来可视化 OCR 结果，并将它们保存为诊断图像，从而使开发人员能够验证 OCR 准确性并调试识别问题。

可视化 OCR 结果包括在引擎检测到的图像中的特定文本元素周围呈现边界框。该流程可在单个字符、单词、行或段落的准确位置上叠加明显的高亮标记，提供清晰的识别内容地图。

这种可视化反馈对于调试和验证 OCR 输出的准确性至关重要，它可以显示软件已经识别出的内容和出错的地方。在处理复杂文档或排除识别问题时，可视化高亮显示成为必不可少的诊断工具。

本文通过 IronOCR 的 HighlightTextAndSaveAsImages 方法，展示了其诊断功能。该功能可突出显示文本的特定部分，并将其保存为图像以供验证。无论是构建文档处理系统、实施质量控制措施，还是验证您的 OCR 实施，该功能都能就 OCR 引擎检测到的内容提供即时的可视化反馈。

快速入门：瞬间突出显示PDF中的单词

此片段演示了 IronOCR 的用法：加载 PDF 文件并高亮显示文档中的每个单词，然后将结果保存为图像。只需一行即可获得 OCR 结果的视觉反馈。

使用 NuGet 包管理器安装 https://www.nuget.org/packages/IronOcr
PM > Install-Package IronOcr

复制并运行这段代码。

new IronOcr.OcrInput().LoadPdf("document.pdf").HighlightTextAndSaveAsImages(new IronOcr.IronTesseract(), "highlight_page_", IronOcr.ResultHighlightType.Word);

部署到您的生产环境中进行测试

通过免费试用立即在您的项目中开始使用IronOCR

最小工作流程（5 个步骤）

下载用于检测页面旋转的 C# 库
实例化 OCR 引擎
使用LoadPdf加载 PDF 文档
使用 HighlightTextAndSaveAsImages 突出显示文本部分并将其保存为图像

如何突出显示文本并另存为图像？

使用 IronOCR 可以直接突出显示文本并将其保存为图片。使用 LoadPdf 加载现有 PDF 文件，然后调用 HighlightTextByType 方法来突出显示文本片段并将其保存为图像。该技术可验证 OCR 的准确性并调试文档中的文本识别问题。

该方法接受三个参数：IronTesseract OCR 引擎、输出文件名前缀，以及来自 ResultHighlightType 的枚举值，该枚举值用于指定要高亮显示的文本类型。此示例使用 ResultHighlightType.Paragraph 将文本块标记为段落。 HighlightTextAndSaveAsImages

请注意此功能使用输出字符串前缀并附加页面标识符（例如，"page_0"、"page_1"）到每页输出图像文件名。

本示例使用的 PDF 有三个段落。

输入的 PDF 是什么样的？

如何实现突出显示代码？

下面的示例代码演示了使用 OcrInput 类的基本实现。

:path=/static-assets/ocr/content-code-examples/how-to/highlight-texts-as-images.cs

using IronOcr;

IronTesseract ocrTesseract = new IronTesseract();

using var ocrInput = new OcrInput();
ocrInput.LoadPdf("document.pdf");
ocrInput.HighlightTextAndSaveAsImages(ocrTesseract, "highlight_page_", ResultHighlightType.Paragraph);

Imports IronOcr

Private ocrTesseract As New IronTesseract()

Private ocrInput = New OcrInput()
ocrInput.LoadPdf("document.pdf")
ocrInput.HighlightTextAndSaveAsImages(ocrTesseract, "highlight_page_", ResultHighlightType.Paragraph)

$vbLabelText $csharpLabel

输出图像显示了什么？

如上图所示，所有三个段落都用浅红色方框突出显示。这种可视化的表达方式可以帮助开发人员快速识别 OCR 引擎是如何将文档分割成可读区块的。

有哪些不同的 ResultHighlightType 选项？

上文示例使用 ResultHighlightType.Paragraph 来突出显示文本块。 IronOCR 通过此枚举提供额外的突出显示选项。以下是可用类型的完整列表，每种类型都有不同的诊断用途。

字符：围绕 OCR 引擎检测到的每个字符绘制一个边界框。这对于调试字符识别或专用字体非常有用，尤其是在使用定制语言文件时。

单词：突出显示引擎识别出的每个完整单词。这是验证单词边界和正确单词识别的理想选择，尤其是在实施条形码和 QR 阅读以及文本识别时。

行：突出显示检测到的每一行文本。对于布局复杂、需要行识别验证的文档非常有用，例如在处理扫描文档时。

段落：突出显示作为段落分组的整个文本块。非常适合理解文档布局和验证文本块分割，在使用表格提取时尤其有用。

如何比较不同的高亮类型？

本综合示例演示了在同一文档中生成所有不同类型的高亮内容，您可以对结果进行比较：

using IronOcr;
using System;

// Initialize the OCR engine with custom configuration
IronTesseract ocrTesseract = new IronTesseract();

// Configure for better accuracy if needed
ocrTesseract.Configuration.ReadBarCodes = false; // Disable if not needed for performance
ocrTesseract.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd;

// Load the PDF document
using var ocrInput = new OcrInput();
ocrInput.LoadPdf("document.pdf");

// Generate highlights for each type
Console.WriteLine("Generating character-level highlights...");
ocrInput.HighlightTextAndSaveAsImages(ocrTesseract, "highlight_character_", ResultHighlightType.Character);

Console.WriteLine("Generating word-level highlights...");
ocrInput.HighlightTextAndSaveAsImages(ocrTesseract, "highlight_word_", ResultHighlightType.Word);

Console.WriteLine("Generating line-level highlights...");
ocrInput.HighlightTextAndSaveAsImages(ocrTesseract, "highlight_line_", ResultHighlightType.Line);

Console.WriteLine("Generating paragraph-level highlights...");
ocrInput.HighlightTextAndSaveAsImages(ocrTesseract, "highlight_paragraph_", ResultHighlightType.Paragraph);

Console.WriteLine("All highlight images have been generated successfully!");

using IronOcr;
using System;

// Initialize the OCR engine with custom configuration
IronTesseract ocrTesseract = new IronTesseract();

// Configure for better accuracy if needed
ocrTesseract.Configuration.ReadBarCodes = false; // Disable if not needed for performance
ocrTesseract.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd;

// Load the PDF document
using var ocrInput = new OcrInput();
ocrInput.LoadPdf("document.pdf");

// Generate highlights for each type
Console.WriteLine("Generating character-level highlights...");
ocrInput.HighlightTextAndSaveAsImages(ocrTesseract, "highlight_character_", ResultHighlightType.Character);

Console.WriteLine("Generating word-level highlights...");
ocrInput.HighlightTextAndSaveAsImages(ocrTesseract, "highlight_word_", ResultHighlightType.Word);

Console.WriteLine("Generating line-level highlights...");
ocrInput.HighlightTextAndSaveAsImages(ocrTesseract, "highlight_line_", ResultHighlightType.Line);

Console.WriteLine("Generating paragraph-level highlights...");
ocrInput.HighlightTextAndSaveAsImages(ocrTesseract, "highlight_paragraph_", ResultHighlightType.Paragraph);

Console.WriteLine("All highlight images have been generated successfully!");

Imports IronOcr
Imports System

' Initialize the OCR engine with custom configuration
Dim ocrTesseract As New IronTesseract()

' Configure for better accuracy if needed
ocrTesseract.Configuration.ReadBarCodes = False ' Disable if not needed for performance
ocrTesseract.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd

' Load the PDF document
Using ocrInput As New OcrInput()
    ocrInput.LoadPdf("document.pdf")

    ' Generate highlights for each type
    Console.WriteLine("Generating character-level highlights...")
    ocrInput.HighlightTextAndSaveAsImages(ocrTesseract, "highlight_character_", ResultHighlightType.Character)

    Console.WriteLine("Generating word-level highlights...")
    ocrInput.HighlightTextAndSaveAsImages(ocrTesseract, "highlight_word_", ResultHighlightType.Word)

    Console.WriteLine("Generating line-level highlights...")
    ocrInput.HighlightTextAndSaveAsImages(ocrTesseract, "highlight_line_", ResultHighlightType.Line)

    Console.WriteLine("Generating paragraph-level highlights...")
    ocrInput.HighlightTextAndSaveAsImages(ocrTesseract, "highlight_paragraph_", ResultHighlightType.Paragraph)
End Using

Console.WriteLine("All highlight images have been generated successfully!")

$vbLabelText $csharpLabel

如何处理多页文档？

在处理多页 PDF 或多帧 TIFF 文件时，高亮功能会自动单独处理每一页。这在实施 PDF OCR 文本提取工作流程时尤其有用：

using IronOcr;
using System.IO;

IronTesseract ocrTesseract = new IronTesseract();

// Load a multi-page document
using var ocrInput = new OcrInput();
ocrInput.LoadPdf("multi-page-document.pdf");

// Create output directory if it doesn't exist
string outputDir = "highlighted_pages";
Directory.CreateDirectory(outputDir);

// Generate highlights for each page
// Files will be named: highlighted_pages/page_0.png, page_1.png, etc.
ocrInput.HighlightTextAndSaveAsImages(ocrTesseract, 
    Path.Combine(outputDir, "page_"), 
    ResultHighlightType.Word);

// Count generated files for verification
int pageCount = Directory.GetFiles(outputDir, "page_*.png").Length;
Console.WriteLine($"Generated {pageCount} highlighted page images");

using IronOcr;
using System.IO;

IronTesseract ocrTesseract = new IronTesseract();

// Load a multi-page document
using var ocrInput = new OcrInput();
ocrInput.LoadPdf("multi-page-document.pdf");

// Create output directory if it doesn't exist
string outputDir = "highlighted_pages";
Directory.CreateDirectory(outputDir);

// Generate highlights for each page
// Files will be named: highlighted_pages/page_0.png, page_1.png, etc.
ocrInput.HighlightTextAndSaveAsImages(ocrTesseract, 
    Path.Combine(outputDir, "page_"), 
    ResultHighlightType.Word);

// Count generated files for verification
int pageCount = Directory.GetFiles(outputDir, "page_*.png").Length;
Console.WriteLine($"Generated {pageCount} highlighted page images");

Imports IronOcr
Imports System.IO

Dim ocrTesseract As New IronTesseract()

' Load a multi-page document
Using ocrInput As New OcrInput()
    ocrInput.LoadPdf("multi-page-document.pdf")

    ' Create output directory if it doesn't exist
    Dim outputDir As String = "highlighted_pages"
    Directory.CreateDirectory(outputDir)

    ' Generate highlights for each page
    ' Files will be named: highlighted_pages/page_0.png, page_1.png, etc.
    ocrInput.HighlightTextAndSaveAsImages(ocrTesseract, 
                                          Path.Combine(outputDir, "page_"), 
                                          ResultHighlightType.Word)

    ' Count generated files for verification
    Dim pageCount As Integer = Directory.GetFiles(outputDir, "page_*.png").Length
    Console.WriteLine($"Generated {pageCount} highlighted page images")
End Using

$vbLabelText $csharpLabel

什么是性能最佳实践？

使用高亮功能时，请考虑以下最佳实践：

1.文件大小：突出显示的图像可能很大，尤其是高分辨率文件。在处理大批量文件时，请考虑输出目录的可用空间。有关优化技巧，请参阅我们的快速 OCR 配置指南。

2.性能：生成高亮显示会增加处理开销。对于偶尔需要突出显示的生产系统，应将其作为单独的诊断流程而不是主要工作流程的一部分来实施。考虑使用多线程 OCR 进行批处理。

3.错误处理：在进行文件操作时，始终执行正确的错误处理：

try
{
    using var ocrInput = new OcrInput();
    ocrInput.LoadPdf("document.pdf");

    // Apply image filters if needed for better recognition
    ocrInput.Deskew(); // Correct slight rotations
    ocrInput.DeNoise(); // Remove background noise

    ocrInput.HighlightTextAndSaveAsImages(ocrTesseract, "highlight_", ResultHighlightType.Word);
}
catch (Exception ex)
{
    Console.WriteLine($"Error during highlighting: {ex.Message}");
    // Log error details for debugging
}

try
{
    using var ocrInput = new OcrInput();
    ocrInput.LoadPdf("document.pdf");

    // Apply image filters if needed for better recognition
    ocrInput.Deskew(); // Correct slight rotations
    ocrInput.DeNoise(); // Remove background noise

    ocrInput.HighlightTextAndSaveAsImages(ocrTesseract, "highlight_", ResultHighlightType.Word);
}
catch (Exception ex)
{
    Console.WriteLine($"Error during highlighting: {ex.Message}");
    // Log error details for debugging
}

Imports System

Try
    Using ocrInput As New OcrInput()
        ocrInput.LoadPdf("document.pdf")

        ' Apply image filters if needed for better recognition
        ocrInput.Deskew() ' Correct slight rotations
        ocrInput.DeNoise() ' Remove background noise

        ocrInput.HighlightTextAndSaveAsImages(ocrTesseract, "highlight_", ResultHighlightType.Word)
    End Using
Catch ex As Exception
    Console.WriteLine($"Error during highlighting: {ex.Message}")
    ' Log error details for debugging
End Try

$vbLabelText $csharpLabel

高亮显示如何与 OCR 结果相结合？

高亮功能可与 IronOCR 的结果对象无缝配合，让您可以将可视化高亮与提取的文本数据关联起来。这在需要 track OCR progress 或验证已识别文本的特定部分时尤为有用。 OcrResult 类提供了关于每个检测到的元素的详细信息，这些信息与该方法生成的视觉高亮标记直接对应。

如果我遇到问题怎么办？

如果在使用高亮显示功能时遇到问题，请查阅一般故障排除指南了解常见的解决方案。针对与突出显示相关的具体问题：

空白输出图像：确保输入文档包含可读文本，并且 OCR 引擎已针对您的文档类型正确配置。您可能需要应用图像优化滤镜或 fixing image orientation 以提高识别率。
遗漏要点：某些文档类型可能需要进行特定的预处理。尝试应用图像滤镜或 fixing image orientation 以提高识别率。
性能问题：对于大型文档，建议采用 multithreading 来提升处理速度。此外，如果使用质量较差的输入，请查看我们的修复低质量扫描指南。

如何将其用于生产调试？

高亮功能可作为出色的生产调试工具。当与用于长期运行操作的中止令牌和超时集成时，您可以创建一个强大的诊断系统。考虑在您的应用程序中实施调试模式：

public class OcrDebugger
{
    private readonly IronTesseract _tesseract;
    private readonly bool _debugMode;

    public OcrDebugger(bool enableDebugMode = false)
    {
        _tesseract = new IronTesseract();
        _debugMode = enableDebugMode;
    }

    public OcrResult ProcessDocument(string filePath)
    {
        using var input = new OcrInput();
        input.LoadPdf(filePath);

        // Apply preprocessing
        input.Deskew();
        input.DeNoise();

        // Generate debug highlights if in debug mode
        if (_debugMode)
        {
            string debugPath = $"debug_{Path.GetFileNameWithoutExtension(filePath)}_";
            input.HighlightTextAndSaveAsImages(_tesseract, debugPath, ResultHighlightType.Word);
        }

        // Perform actual OCR
        return _tesseract.Read(input);
    }
}

public class OcrDebugger
{
    private readonly IronTesseract _tesseract;
    private readonly bool _debugMode;

    public OcrDebugger(bool enableDebugMode = false)
    {
        _tesseract = new IronTesseract();
        _debugMode = enableDebugMode;
    }

    public OcrResult ProcessDocument(string filePath)
    {
        using var input = new OcrInput();
        input.LoadPdf(filePath);

        // Apply preprocessing
        input.Deskew();
        input.DeNoise();

        // Generate debug highlights if in debug mode
        if (_debugMode)
        {
            string debugPath = $"debug_{Path.GetFileNameWithoutExtension(filePath)}_";
            input.HighlightTextAndSaveAsImages(_tesseract, debugPath, ResultHighlightType.Word);
        }

        // Perform actual OCR
        return _tesseract.Read(input);
    }
}

Imports System.IO

Public Class OcrDebugger
    Private ReadOnly _tesseract As IronTesseract
    Private ReadOnly _debugMode As Boolean

    Public Sub New(Optional enableDebugMode As Boolean = False)
        _tesseract = New IronTesseract()
        _debugMode = enableDebugMode
    End Sub

    Public Function ProcessDocument(filePath As String) As OcrResult
        Using input As New OcrInput()
            input.LoadPdf(filePath)

            ' Apply preprocessing
            input.Deskew()
            input.DeNoise()

            ' Generate debug highlights if in debug mode
            If _debugMode Then
                Dim debugPath As String = $"debug_{Path.GetFileNameWithoutExtension(filePath)}_"
                input.HighlightTextAndSaveAsImages(_tesseract, debugPath, ResultHighlightType.Word)
            End If

            ' Perform actual OCR
            Return _tesseract.Read(input)
        End Using
    End Function
End Class

$vbLabelText $csharpLabel

下一步我应该去哪里？

现在您已了解如何使用高亮显示功能，请继续探索：

从 OCR 结果创建可搜索的 PDF
阅读特定文件类型，如护照或许可证
使用我们的入门指南在您的开发环境中设置 IronOCR
为全球应用程序提供 125 种国际语言支持
使用滤镜向导优化图像处理

如需用于生产，请务必获取许可证以去除水印并使用全部功能。

常见问题解答

如何在我的 C# 应用程序中可视化 OCR 结果？

IronOCR 提供了 HighlightTextAndSaveAsImages 方法，该方法通过在检测到的文本元素（字符、单词、行或段落）周围绘制边框来实现 OCR 结果的可视化，并将其保存为诊断图像。该功能可帮助开发人员验证 OCR 的准确性并调试识别问题。

在 PDF 文档中突出显示单词的最简单方法是什么？

使用 IronOCR，您只需一行代码就能高亮 PDF 中的单词：new IronOcr.OcrInput().LoadPdf("document.pdf").HighlightTextAndSaveAsImages(new IronOcr.IronTesseract(), "highlight_page_", IronOcr.ResultHighlightType.Word).这将加载 PDF 并创建带有高亮单词的图像。

HighlightTextAndSaveAsImages 方法需要哪些参数？

IronOCR 中的 HighlightTextAndSaveAsImages 方法需要三个参数：IronTesseract OCR 引擎实例、输出文件名的前缀字符串和 ResultHighlightType 枚举值，后者用于指定要高亮显示的文本元素（字符、单词、行或段落）。

使用文本高亮时，输出图像如何命名？

IronOCR 通过将您指定的前缀与页面标识符相结合，自动为输出图像命名。例如，如果使用 "highlight_page_"作为前缀，该方法将为文档中的每一页生成名为 "highlight_page_0"、"highlight_page_1 "等的文件。

为什么可视化高亮对 OCR 开发很重要？

IronOCR 中的可视化高亮显示通过准确显示 OCR 引擎检测到的文本以及潜在错误发生的位置，提供了至关重要的诊断反馈。这种可视化地图可帮助开发人员调试识别问题，验证 OCR 的准确性，并排除复杂文档中的故障。

除了单词，我还能突出显示不同类型的文本元素吗？

是的，IronOCR 的 ResultHighlightType 枚举允许您突出显示各种文本元素，包括单个字符、单词、行或整个段落。只需在调用 HighlightTextAndSaveAsImages 方法时指定所需的类型，即可直观地看到不同级别的文本检测。

IronOCR可以集成到现有应用程序中吗？

IronOCR设计为易于使用C#集成到现有应用程序中，允许开发人员以最小的努力为他们的软件添加OCR功能。

使用IronOCR进行文档管理有什么好处？

使用IronOCR进行文档管理可以通过将扫描的文档转换为可搜索和可编辑文本来简化工作流程，减少手动数据输入的需要，提高文档可访问性。

IronOCR如何提高数据准确性？

IronOCR通过其高级识别算法和图像校正功能提高数据准确性，确保文本提取过程既可靠又精确。

IronOCR 有免费试用版吗？

是的，Iron Software 提供IronOCR 的免费试用，使用户在做出购买决定之前可以测试其功能和能力。

Curtis Chau

立即与工程团队聊天

技术作家

Curtis Chau 拥有卡尔顿大学的计算机科学学士学位，专注于前端开发，精通 Node.js、TypeScript、JavaScript 和 React。他热衷于打造直观且美观的用户界面，喜欢使用现代框架并创建结构良好、视觉吸引力强的手册。

除了开发之外，Curtis 对物联网 (IoT) 有浓厚的兴趣，探索将硬件和软件集成的新方法。在空闲时间，他喜欢玩游戏和构建 Discord 机器人，将他对技术的热爱与创造力相结合。

准备开始了吗？

Nuget 下载 5,896,332 | 版本: 2026.5 just released

查看许可证

还在滚动吗？

想快速获得证据？ PM > Install-Package IronOcr
运行示例观看您的图像变成可搜索文本。

查看许可证

客户亮点：

开发者焦点：

网络研讨会：

开始免费 30 天试用

本页内容

用 IronOCR 在 C# 中将文本高亮为图像

使用 NuGet 包管理器安装 https://www.nuget.org/packages/IronOcr

复制并运行这段代码。

部署到您的生产环境中进行测试

最小工作流程（5 个步骤）

如何突出显示文本并另存为图像？

输入的 PDF 是什么样的？

如何实现突出显示代码？

输出图像显示了什么？

有哪些不同的 ResultHighlightType 选项？

如何比较不同的高亮类型？

如何处理多页文档？

什么是性能最佳实践？

高亮显示如何与 OCR 结果相结合？

如果我遇到问题怎么办？

如何将其用于生产调试？

下一步我应该去哪里？

常见问题解答

如何在我的 C# 应用程序中可视化 OCR 结果？

在 PDF 文档中突出显示单词的最简单方法是什么？

HighlightTextAndSaveAsImages 方法需要哪些参数？

使用文本高亮时，输出图像如何命名？

为什么可视化高亮对 OCR 开发很重要？

除了单词，我还能突出显示不同类型的文本元素吗？

IronOCR可以集成到现有应用程序中吗？

使用IronOCR进行文档管理有什么好处？

IronOCR如何提高数据准确性？

IronOCR 有免费试用版吗？

还在滚动吗？

钢铁支援团队

开始免费 30 天试用

本页内容

用 IronOCR 在 C# 中将文本高亮为图像

使用 NuGet 包管理器安装 https://www.nuget.org/packages/IronOcr

复制并运行这段代码。

部署到您的生产环境中进行测试

最小工作流程（5 个步骤）

如何突出显示文本并另存为图像？

输入的 PDF 是什么样的？

如何实现突出显示代码？

输出图像显示了什么？

有哪些不同的 ResultHighlightType 选项？

如何比较不同的高亮类型？

如何处理多页文档？

什么是性能最佳实践？

高亮显示如何与 OCR 结果相结合？

如果我遇到问题怎么办？

如何将其用于生产调试？

下一步我应该去哪里？

常见问题解答

如何在我的 C# 应用程序中可视化 OCR 结果？

在 PDF 文档中突出显示单词的最简单方法是什么？

HighlightTextAndSaveAsImages 方法需要哪些参数？

使用文本高亮时，输出图像如何命名？

为什么可视化高亮对 OCR 开发很重要？

除了单词，我还能突出显示不同类型的文本元素吗？

IronOCR可以集成到现有应用程序中吗？

使用IronOCR进行文档管理有什么好处？

IronOCR如何提高数据准确性？

IronOCR 有免费试用版吗？

还在滚动吗？

下一步：开始免费 30 天试用

Thank You

下一步：开始免费 30 天试用

Want to deploy IronSuite to a live project for FREE?

What’s included?

深受全球数百万工程师信赖

钢铁支援团队