Highlight Texts as Images in C# with IronOCR
IronOCR's HighlightTextAndSaveAsImages method visualizes OCR results by drawing bounding boxes around detected text (characters, words, lines, or paragraphs) and saves them as diagnostic images, enabling developers to validate OCR accuracy and debug recognition issues.
Visualizing OCR results involves rendering bounding boxes around specific text elements that the engine has detected within an image. This process overlays distinct highlights on the exact locations of individual characters, words, lines, or paragraphs, providing a clear map of recognized content.
This visual feedback is crucial for debugging and validating OCR output accuracy, showing what the software has identified and where it has made errors. When working with complex documents or troubleshooting recognition issues, visual highlighting becomes an essential diagnostic tool.
This article demonstrates IronOCR's diagnostic capabilities with its HighlightTextAndSaveAsImages method. This function highlights specific sections of text and saves them as images for verification. Whether building a document processing system, implementing quality control measures, or validating your OCR implementation, this feature provides immediate visual feedback on what the OCR engine detects.
Quickstart: Highlight Words in Your PDF Instantly
This snippet demonstrates IronOCR usage: load a PDF and highlight each word in the document, saving the result as images. Just one line to get visual feedback on your OCR results.
Get started making PDFs with NuGet now:
Install IronOCR with NuGet Package Manager
Copy and run this code snippet.
new IronOcr.OcrInput().LoadPdf("document.pdf").HighlightTextAndSaveAsImages(new IronOcr.IronTesseract(), "highlight_page_", IronOcr.ResultHighlightType.Word);Deploy to test on your live environment
Minimal Workflow (5 steps)
- Download a C# library to detect page rotation
- Instantiate OCR engine
- Load the PDF document with
LoadPdf - Using
HighlightTextAndSaveAsImageshighlight section of text and save them as images
How Do I Highlight Text and Save As Images?
Highlighting text and saving it as images is straightforward with IronOCR. Load an existing PDF with LoadPdf, then call the HighlightTextAndSaveAsImages method to highlight sections of text and save them as images. This technique verifies OCR accuracy and debugs text recognition issues in your documents.
The method takes three parameters: the IronTesseract OCR engine, a prefix for the output filename, and an enum from ResultHighlightType that dictates the type of text to highlight. This example uses ResultHighlightType.Paragraph to highlight text blocks as paragraphs.
This example uses a PDF with three paragraphs.
What Does the Input PDF Look Like?
How Do I Implement the Highlighting Code?
The example code below demonstrates the basic implementation using the OcrInput class.
:path=/static-assets/ocr/content-code-examples/how-to/highlight-texts-as-images.csusing IronOcr;
IronTesseract ocrTesseract = new IronTesseract();
using var ocrInput = new OcrInput();
ocrInput.LoadPdf("document.pdf");
ocrInput.HighlightTextAndSaveAsImages(ocrTesseract, "highlight_page_", ResultHighlightType.Paragraph);Imports IronOcr
Private ocrTesseract As New IronTesseract()
Private ocrInput = New OcrInput()
ocrInput.LoadPdf("document.pdf")
ocrInput.HighlightTextAndSaveAsImages(ocrTesseract, "highlight_page_", ResultHighlightType.Paragraph)What Do the Output Images Show?

As shown in the output image above, all three paragraphs have been highlighted with a light red box. This visual representation helps developers quickly identify how the OCR engine segments the document into readable blocks.
What Are the Different ResultHighlightType Options?
The example above used ResultHighlightType.Paragraph to highlight text blocks. IronOCR provides additional highlighting options through this enum. Below is a complete list of available types, each serving different diagnostic purposes.
Character: Draws a bounding box around every single character detected by the OCR engine. Useful for debugging character recognition or specialized fonts, particularly when working with custom language files.
Word: Highlights each complete word identified by the engine. Ideal for validating word boundaries and proper word identification, especially when implementing barcode and QR reading alongside text recognition.
Line: Highlights every detected text line. Useful for documents with complex layouts requiring line identification verification, such as when processing scanned documents.
Paragraph: Highlights entire text blocks grouped as paragraphs. Perfect for understanding document layout and verifying text block segmentation, particularly useful when working with table extraction.
How Do I Compare Different Highlight Types?
This comprehensive example demonstrates generating highlights for all different types on the same document, allowing you to compare the results:
using IronOcr;
using System;
// Initialize the OCR engine with custom configuration
IronTesseract ocrTesseract = new IronTesseract();
// Configure for better accuracy if needed
ocrTesseract.Configuration.ReadBarCodes = false; // Disable if not needed for performance
ocrTesseract.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd;
// Load the PDF document
using var ocrInput = new OcrInput();
ocrInput.LoadPdf("document.pdf");
// Generate highlights for each type
Console.WriteLine("Generating character-level highlights...");
ocrInput.HighlightTextAndSaveAsImages(ocrTesseract, "highlight_character_", ResultHighlightType.Character);
Console.WriteLine("Generating word-level highlights...");
ocrInput.HighlightTextAndSaveAsImages(ocrTesseract, "highlight_word_", ResultHighlightType.Word);
Console.WriteLine("Generating line-level highlights...");
ocrInput.HighlightTextAndSaveAsImages(ocrTesseract, "highlight_line_", ResultHighlightType.Line);
Console.WriteLine("Generating paragraph-level highlights...");
ocrInput.HighlightTextAndSaveAsImages(ocrTesseract, "highlight_paragraph_", ResultHighlightType.Paragraph);
Console.WriteLine("All highlight images have been generated successfully!");using IronOcr;
using System;
// Initialize the OCR engine with custom configuration
IronTesseract ocrTesseract = new IronTesseract();
// Configure for better accuracy if needed
ocrTesseract.Configuration.ReadBarCodes = false; // Disable if not needed for performance
ocrTesseract.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd;
// Load the PDF document
using var ocrInput = new OcrInput();
ocrInput.LoadPdf("document.pdf");
// Generate highlights for each type
Console.WriteLine("Generating character-level highlights...");
ocrInput.HighlightTextAndSaveAsImages(ocrTesseract, "highlight_character_", ResultHighlightType.Character);
Console.WriteLine("Generating word-level highlights...");
ocrInput.HighlightTextAndSaveAsImages(ocrTesseract, "highlight_word_", ResultHighlightType.Word);
Console.WriteLine("Generating line-level highlights...");
ocrInput.HighlightTextAndSaveAsImages(ocrTesseract, "highlight_line_", ResultHighlightType.Line);
Console.WriteLine("Generating paragraph-level highlights...");
ocrInput.HighlightTextAndSaveAsImages(ocrTesseract, "highlight_paragraph_", ResultHighlightType.Paragraph);
Console.WriteLine("All highlight images have been generated successfully!");IRON VB CONVERTER ERROR developers@ironsoftware.comHow Do I Handle Multi-Page Documents?
When processing multi-page PDFs or multi-frame TIFF files, the highlighting feature automatically handles each page individually. This is especially useful when implementing PDF OCR text extraction workflows:
using IronOcr;
using System.IO;
IronTesseract ocrTesseract = new IronTesseract();
// Load a multi-page document
using var ocrInput = new OcrInput();
ocrInput.LoadPdf("multi-page-document.pdf");
// Create output directory if it doesn't exist
string outputDir = "highlighted_pages";
Directory.CreateDirectory(outputDir);
// Generate highlights for each page
// Files will be named: highlighted_pages/page_0.png, page_1.png, etc.
ocrInput.HighlightTextAndSaveAsImages(ocrTesseract,
Path.Combine(outputDir, "page_"),
ResultHighlightType.Word);
// Count generated files for verification
int pageCount = Directory.GetFiles(outputDir, "page_*.png").Length;
Console.WriteLine($"Generated {pageCount} highlighted page images");using IronOcr;
using System.IO;
IronTesseract ocrTesseract = new IronTesseract();
// Load a multi-page document
using var ocrInput = new OcrInput();
ocrInput.LoadPdf("multi-page-document.pdf");
// Create output directory if it doesn't exist
string outputDir = "highlighted_pages";
Directory.CreateDirectory(outputDir);
// Generate highlights for each page
// Files will be named: highlighted_pages/page_0.png, page_1.png, etc.
ocrInput.HighlightTextAndSaveAsImages(ocrTesseract,
Path.Combine(outputDir, "page_"),
ResultHighlightType.Word);
// Count generated files for verification
int pageCount = Directory.GetFiles(outputDir, "page_*.png").Length;
Console.WriteLine($"Generated {pageCount} highlighted page images");IRON VB CONVERTER ERROR developers@ironsoftware.comWhat Are the Performance Best Practices?
When using the highlighting feature, consider these best practices:
File Size: Highlighted images can be large, especially for high-resolution documents. Consider the output directory's available space when processing large batches. For optimization tips, see our fast OCR configuration guide.
Performance: Generating highlights adds processing overhead. For production systems where highlights are only needed occasionally, implement them as a separate diagnostic process rather than part of the main workflow. Consider using multithreaded OCR for batch processing.
- Error Handling: Always implement proper error handling when working with file operations:
try
{
using var ocrInput = new OcrInput();
ocrInput.LoadPdf("document.pdf");
// Apply image filters if needed for better recognition
ocrInput.Deskew(); // Correct slight rotations
ocrInput.DeNoise(); // Remove background noise
ocrInput.HighlightTextAndSaveAsImages(ocrTesseract, "highlight_", ResultHighlightType.Word);
}
catch (Exception ex)
{
Console.WriteLine($"Error during highlighting: {ex.Message}");
// Log error details for debugging
}try
{
using var ocrInput = new OcrInput();
ocrInput.LoadPdf("document.pdf");
// Apply image filters if needed for better recognition
ocrInput.Deskew(); // Correct slight rotations
ocrInput.DeNoise(); // Remove background noise
ocrInput.HighlightTextAndSaveAsImages(ocrTesseract, "highlight_", ResultHighlightType.Word);
}
catch (Exception ex)
{
Console.WriteLine($"Error during highlighting: {ex.Message}");
// Log error details for debugging
}IRON VB CONVERTER ERROR developers@ironsoftware.comHow Does Highlighting Integrate with OCR Results?
The highlighting feature works seamlessly with IronOCR's result objects, allowing you to correlate visual highlights with extracted text data. This is particularly useful when you need to track OCR progress or validate specific sections of recognized text. The OcrResult class provides detailed information about each detected element, which corresponds directly to the visual highlights generated by this method.
What If I Encounter Issues?
If experiencing issues with the highlighting feature, consult the general troubleshooting guide for common solutions. For specific highlighting-related problems:
- Blank output images: Ensure the input document contains readable text and that the OCR engine is properly configured for your document type. You may need to apply image optimization filters.
- Missing highlights: Some document types may require specific preprocessing. Try applying image filters or fixing image orientation to improve recognition.
- Performance issues: For large documents, consider implementing multithreading to improve processing speed. Additionally, review our guide on fixing low quality scans if working with poor quality inputs.
How Can I Use This for Production Debugging?
The highlighting feature serves as an excellent production debugging tool. When integrated with abort tokens for long-running operations and timeouts, you can create a robust diagnostic system. Consider implementing a debug mode in your application:
public class OcrDebugger
{
private readonly IronTesseract _tesseract;
private readonly bool _debugMode;
public OcrDebugger(bool enableDebugMode = false)
{
_tesseract = new IronTesseract();
_debugMode = enableDebugMode;
}
public OcrResult ProcessDocument(string filePath)
{
using var input = new OcrInput();
input.LoadPdf(filePath);
// Apply preprocessing
input.Deskew();
input.DeNoise();
// Generate debug highlights if in debug mode
if (_debugMode)
{
string debugPath = $"debug_{Path.GetFileNameWithoutExtension(filePath)}_";
input.HighlightTextAndSaveAsImages(_tesseract, debugPath, ResultHighlightType.Word);
}
// Perform actual OCR
return _tesseract.Read(input);
}
}public class OcrDebugger
{
private readonly IronTesseract _tesseract;
private readonly bool _debugMode;
public OcrDebugger(bool enableDebugMode = false)
{
_tesseract = new IronTesseract();
_debugMode = enableDebugMode;
}
public OcrResult ProcessDocument(string filePath)
{
using var input = new OcrInput();
input.LoadPdf(filePath);
// Apply preprocessing
input.Deskew();
input.DeNoise();
// Generate debug highlights if in debug mode
if (_debugMode)
{
string debugPath = $"debug_{Path.GetFileNameWithoutExtension(filePath)}_";
input.HighlightTextAndSaveAsImages(_tesseract, debugPath, ResultHighlightType.Word);
}
// Perform actual OCR
return _tesseract.Read(input);
}
}IRON VB CONVERTER ERROR developers@ironsoftware.comWhere Should I Go Next?
Now that you understand how to use the highlighting feature, explore:
- Creating searchable PDFs from your OCR results
- Reading specific document types like passports or licenses
- Setting up IronOCR in your development environment with our getting started guides
- Implementing 125 international language support for global applications
- Using the Filter Wizard to optimize image processing
For production use, remember to obtain a license to remove watermarks and access full functionality.
Frequently Asked Questions
How can I visualize OCR results in my C# application?
IronOCR provides the HighlightTextAndSaveAsImages method that visualizes OCR results by drawing bounding boxes around detected text elements (characters, words, lines, or paragraphs) and saves them as diagnostic images. This feature helps developers validate OCR accuracy and debug recognition issues.
What is the simplest way to highlight words in a PDF document?
With IronOCR, you can highlight words in a PDF with just one line of code: new IronOcr.OcrInput().LoadPdf("document.pdf").HighlightTextAndSaveAsImages(new IronOcr.IronTesseract(), "highlight_page_", IronOcr.ResultHighlightType.Word). This loads the PDF and creates images with highlighted words.
What parameters does the HighlightTextAndSaveAsImages method require?
The HighlightTextAndSaveAsImages method in IronOCR requires three parameters: the IronTesseract OCR engine instance, a prefix string for the output filename, and a ResultHighlightType enum value that specifies what text elements to highlight (Character, Word, Line, or Paragraph).
How are the output images named when using text highlighting?
IronOCR automatically names the output images by combining your specified prefix with a page identifier. For example, if you use "highlight_page_" as the prefix, the method generates files named "highlight_page_0", "highlight_page_1", etc., for each page in your document.
Why is visual highlighting important for OCR development?
Visual highlighting in IronOCR provides crucial diagnostic feedback by showing exactly what text the OCR engine has detected and where potential errors occur. This visual map helps developers debug recognition issues, validate OCR accuracy, and troubleshoot problems in complex documents.
Can I highlight different types of text elements besides words?
Yes, IronOCR's ResultHighlightType enum allows you to highlight various text elements including individual Characters, Words, Lines, or entire Paragraphs. Simply specify the desired type when calling the HighlightTextAndSaveAsImages method to visualize different levels of text detection.






