如何使用 IronOCR 评估 C# 中 OCR识别与图片转文字的置信度
IronOCR 的读取置信度表示 OCR识别系统对图片转文字准确性的确定程度,取值范围为 0 到 100,分数越高表示可靠性越高——可通过任何 OcrResult 对象上的 Confidence 属性访问它。
OCR(光学字符识别)的读取置信度是指 OCR 系统对图像或文档中识别出的文本的准确性所赋予的确定性或可靠性级别。 它是衡量 OCR 系统对识别文本正确性的信心程度的指标。 在处理 扫描文档、照片或任何文本质量可能存在差异的图像时,这一指标变得尤为重要。
置信度得分越高,表示识别结果的准确性越有把握;而置信度得分越低,则表示识别结果的可靠性可能较低。 了解这些信心级别有助于开发人员在其应用程序中实施适当的验证逻辑和错误处理。
快速入门:一行代码即可掌握 OCR 读取技巧
使用 IronTesseract 的 Read 方法和图像文件路径,然后访问返回的 OcrResult 的 Confidence 属性,以查看IronOCR对其文本识别的确定性如何。 这是开始评估 OCR 输出准确性的一种简单、可靠的方法。
最小工作流程(5 个步骤)
- 下载 C# 库以访问读取置信度
- 准备好要处理的图像和PDF文档
- Access the **`Confidence`** property of the OCR result
- 检索页面、段落、行、单词和字符的信心
- Check the **`Choices`** property for alternative word choices
如何获得 C# 的阅读信心?
对输入图像执行 OCR 后,文本置信度存储在Confidence属性中。 使用"using"语句可以在使用后自动释放对象。 分别使用 OcrImageInput 和 OcrPdfInput 类添加图像和 PDF 等文档。 Read 方法将返回一个OcrResult对象,允许访问Confidence属性。
:path=/static-assets/ocr/content-code-examples/how-to/tesseract-result-confidence-get-confidence.cs
using IronOcr;
// Instantiate IronTesseract
IronTesseract ocrTesseract = new IronTesseract();
// Add image
using var imageInput = new OcrImageInput("sample.tiff");
// Perform OCR
OcrResult ocrResult = ocrTesseract.Read(imageInput);
// Get confidence level
double confidence = ocrResult.Confidence;
Imports IronOcr
' Instantiate IronTesseract
Private ocrTesseract As New IronTesseract()
' Add image
Private imageInput = New OcrImageInput("sample.tiff")
' Perform OCR
Private ocrResult As OcrResult = ocrTesseract.Read(imageInput)
' Get confidence level
Private confidence As Double = ocrResult.Confidence
返回的置信度值从 0 到 100 不等,其中
- 90-100:信心十足 - 文本高度可靠
- 80-89:信心十足 - 文本基本准确,存在少量不确定因素
- 70-79:中等可信度 - 文本可能包含一些错误
- 低于 70:置信度低 - 应对文本进行审核或重新处理
如何获得不同层次的信心?
您不仅可以获取整个文档的置信度,还可以访问每一页、段落、行、单词和字符的置信度。 此外,您还可以获得块的置信度,该块表示一个或多个紧密相邻的段落的集合。
:path=/static-assets/ocr/content-code-examples/how-to/tesseract-result-confidence-confidence-level.cs
// Get page confidence level
double pageConfidence = ocrResult.Pages[0].Confidence;
// Get paragraph confidence level
double paragraphConfidence = ocrResult.Paragraphs[0].Confidence;
// Get line confidence level
double lineConfidence = ocrResult.Lines[0].Confidence;
// Get word confidence level
double wordConfidence = ocrResult.Words[0].Confidence;
// Get character confidence level
double characterConfidence = ocrResult.Characters[0].Confidence;
// Get block confidence level
double blockConfidence = ocrResult.Blocks[0].Confidence;
' Get page confidence level
Dim pageConfidence As Double = ocrResult.Pages(0).Confidence
' Get paragraph confidence level
Dim paragraphConfidence As Double = ocrResult.Paragraphs(0).Confidence
' Get line confidence level
Dim lineConfidence As Double = ocrResult.Lines(0).Confidence
' Get word confidence level
Dim wordConfidence As Double = ocrResult.Words(0).Confidence
' Get character confidence level
Dim characterConfidence As Double = ocrResult.Characters(0).Confidence
' Get block confidence level
Dim blockConfidence As Double = ocrResult.Blocks(0).Confidence
实用示例:通过置信度进行筛选
在处理不同质量的文档(如 低质量扫描)时,您可以使用置信度分数来筛选结果:
using IronOcr;
using System.Linq;
// Instantiate IronTesseract
IronTesseract ocrTesseract = new IronTesseract();
// Configure for better accuracy
ocrTesseract.Configuration.ReadBarCodes = false;
ocrTesseract.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd;
// Add image
using var imageInput = new OcrImageInput("invoice.png");
// Apply filters to improve quality
imageInput.Deskew();
imageInput.DeNoise();
// Perform OCR
OcrResult ocrResult = ocrTesseract.Read(imageInput);
// Filter words with confidence above 85%
var highConfidenceWords = ocrResult.Words
.Where(word => word.Confidence >= 85)
.Select(word => word.Text)
.ToList();
// Process only high-confidence text
string reliableText = string.Join(" ", highConfidenceWords);
Console.WriteLine($"High confidence text: {reliableText}");
// Flag low-confidence words for manual review
var lowConfidenceWords = ocrResult.Words
.Where(word => word.Confidence < 85)
.Select(word => new { word.Text, word.Confidence })
.ToList();
foreach (var word in lowConfidenceWords)
{
Console.WriteLine($"Review needed: '{word.Text}' (Confidence: {word.Confidence:F2}%)");
}
using IronOcr;
using System.Linq;
// Instantiate IronTesseract
IronTesseract ocrTesseract = new IronTesseract();
// Configure for better accuracy
ocrTesseract.Configuration.ReadBarCodes = false;
ocrTesseract.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd;
// Add image
using var imageInput = new OcrImageInput("invoice.png");
// Apply filters to improve quality
imageInput.Deskew();
imageInput.DeNoise();
// Perform OCR
OcrResult ocrResult = ocrTesseract.Read(imageInput);
// Filter words with confidence above 85%
var highConfidenceWords = ocrResult.Words
.Where(word => word.Confidence >= 85)
.Select(word => word.Text)
.ToList();
// Process only high-confidence text
string reliableText = string.Join(" ", highConfidenceWords);
Console.WriteLine($"High confidence text: {reliableText}");
// Flag low-confidence words for manual review
var lowConfidenceWords = ocrResult.Words
.Where(word => word.Confidence < 85)
.Select(word => new { word.Text, word.Confidence })
.ToList();
foreach (var word in lowConfidenceWords)
{
Console.WriteLine($"Review needed: '{word.Text}' (Confidence: {word.Confidence:F2}%)");
}
Imports IronOcr
Imports System.Linq
' Instantiate IronTesseract
Dim ocrTesseract As New IronTesseract()
' Configure for better accuracy
ocrTesseract.Configuration.ReadBarCodes = False
ocrTesseract.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd
' Add image
Using imageInput As New OcrImageInput("invoice.png")
' Apply filters to improve quality
imageInput.Deskew()
imageInput.DeNoise()
' Perform OCR
Dim ocrResult As OcrResult = ocrTesseract.Read(imageInput)
' Filter words with confidence above 85%
Dim highConfidenceWords = ocrResult.Words _
.Where(Function(word) word.Confidence >= 85) _
.Select(Function(word) word.Text) _
.ToList()
' Process only high-confidence text
Dim reliableText As String = String.Join(" ", highConfidenceWords)
Console.WriteLine($"High confidence text: {reliableText}")
' Flag low-confidence words for manual review
Dim lowConfidenceWords = ocrResult.Words _
.Where(Function(word) word.Confidence < 85) _
.Select(Function(word) New With {Key .Text = word.Text, Key .Confidence = word.Confidence}) _
.ToList()
For Each word In lowConfidenceWords
Console.WriteLine($"Review needed: '{word.Text}' (Confidence: {word.Confidence:F2}%)")
Next
End Using
什么是 OCR 中的字符选择?
除了置信水平之外,还有另一个有趣的属性叫做选择。 选项中包含备选词语列表及其统计相关性。 此信息允许用户访问其他可能的角色。 在使用 多种语言或专用字体时,该功能尤其有用。
:path=/static-assets/ocr/content-code-examples/how-to/tesseract-result-confidence-get-choices.cs
using IronOcr;
using static IronOcr.OcrResult;
// Instantiate IronTesseract
IronTesseract ocrTesseract = new IronTesseract();
// Add image
using var imageInput = new OcrImageInput("Potter.tiff");
// Perform OCR
OcrResult ocrResult = ocrTesseract.Read(imageInput);
// Get choices
Choice[] choices = ocrResult.Characters[0].Choices;
Imports IronOcr
Imports IronOcr.OcrResult
' Instantiate IronTesseract
Private ocrTesseract As New IronTesseract()
' Add image
Private imageInput = New OcrImageInput("Potter.tiff")
' Perform OCR
Private ocrResult As OcrResult = ocrTesseract.Read(imageInput)
' Get choices
Private choices() As Choice = ocrResult.Characters(0).Choices
替代字符选择有何帮助?
选择其他字符有几个好处:
1.模糊解决:当 "O "和"0 "或 "l "和 "1 "等字符被混淆时 2.字体变化:风格化或装饰性字体的不同解释 3.质量问题:处理降级文本时的多种可能性 4.语言环境:基于语言规则的其他解释
使用字符选择
下面是一个综合示例,演示如何使用字符选择来提高准确性:
using IronOcr;
using System;
using System.Linq;
using static IronOcr.OcrResult;
// Configure IronTesseract for detailed results
IronTesseract ocrTesseract = new IronTesseract();
// Process image with potential ambiguities
using var imageInput = new OcrImageInput("ambiguous_text.png");
OcrResult ocrResult = ocrTesseract.Read(imageInput);
// Analyze character choices for each word
foreach (var word in ocrResult.Words)
{
Console.WriteLine($"\nWord: '{word.Text}' (Confidence: {word.Confidence:F2}%)");
// Check each character in the word
foreach (var character in word.Characters)
{
if (character.Choices != null && character.Choices.Length > 1)
{
Console.WriteLine($" Character '{character.Text}' has alternatives:");
// Display all choices sorted by confidence
foreach (var choice in character.Choices.OrderByDescending(c => c.Confidence))
{
Console.WriteLine($" - '{choice.Text}': {choice.Confidence:F2}%");
}
}
}
}
using IronOcr;
using System;
using System.Linq;
using static IronOcr.OcrResult;
// Configure IronTesseract for detailed results
IronTesseract ocrTesseract = new IronTesseract();
// Process image with potential ambiguities
using var imageInput = new OcrImageInput("ambiguous_text.png");
OcrResult ocrResult = ocrTesseract.Read(imageInput);
// Analyze character choices for each word
foreach (var word in ocrResult.Words)
{
Console.WriteLine($"\nWord: '{word.Text}' (Confidence: {word.Confidence:F2}%)");
// Check each character in the word
foreach (var character in word.Characters)
{
if (character.Choices != null && character.Choices.Length > 1)
{
Console.WriteLine($" Character '{character.Text}' has alternatives:");
// Display all choices sorted by confidence
foreach (var choice in character.Choices.OrderByDescending(c => c.Confidence))
{
Console.WriteLine($" - '{choice.Text}': {choice.Confidence:F2}%");
}
}
}
}
Imports IronOcr
Imports System
Imports System.Linq
Imports IronOcr.OcrResult
' Configure IronTesseract for detailed results
Dim ocrTesseract As New IronTesseract()
' Process image with potential ambiguities
Using imageInput As New OcrImageInput("ambiguous_text.png")
Dim ocrResult As OcrResult = ocrTesseract.Read(imageInput)
' Analyze character choices for each word
For Each word In ocrResult.Words
Console.WriteLine(vbCrLf & $"Word: '{word.Text}' (Confidence: {word.Confidence:F2}%)")
' Check each character in the word
For Each character In word.Characters
If character.Choices IsNot Nothing AndAlso character.Choices.Length > 1 Then
Console.WriteLine($" Character '{character.Text}' has alternatives:")
' Display all choices sorted by confidence
For Each choice In character.Choices.OrderByDescending(Function(c) c.Confidence)
Console.WriteLine($" - '{choice.Text}': {choice.Confidence:F2}%")
Next
End If
Next
Next
End Using
高级自信策略
在处理护照、牌照或MICR支票等专业文件时,置信度对于验证至关重要:
using IronOcr;
public class DocumentValidator
{
private readonly IronTesseract ocr = new IronTesseract();
public bool ValidatePassportNumber(string imagePath, double minConfidence = 95.0)
{
using var input = new OcrImageInput(imagePath);
// Configure for passport reading
ocr.Configuration.ReadBarCodes = true;
ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.SingleLine;
// Apply preprocessing
input.Deskew();
input.Scale(200); // Upscale for better accuracy
var result = ocr.Read(input);
// Find passport number pattern
var passportLine = result.Lines
.Where(line => line.Text.Contains("P<") || IsPassportNumberFormat(line.Text))
.FirstOrDefault();
if (passportLine != null)
{
Console.WriteLine($"Passport line found: {passportLine.Text}");
Console.WriteLine($"Confidence: {passportLine.Confidence:F2}%");
// Only accept if confidence meets threshold
return passportLine.Confidence >= minConfidence;
}
return false;
}
private bool IsPassportNumberFormat(string text)
{
// Simple passport number validation
return System.Text.RegularExpressions.Regex.IsMatch(text, @"^[A-Z]\d{7,9}$");
}
}
using IronOcr;
public class DocumentValidator
{
private readonly IronTesseract ocr = new IronTesseract();
public bool ValidatePassportNumber(string imagePath, double minConfidence = 95.0)
{
using var input = new OcrImageInput(imagePath);
// Configure for passport reading
ocr.Configuration.ReadBarCodes = true;
ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.SingleLine;
// Apply preprocessing
input.Deskew();
input.Scale(200); // Upscale for better accuracy
var result = ocr.Read(input);
// Find passport number pattern
var passportLine = result.Lines
.Where(line => line.Text.Contains("P<") || IsPassportNumberFormat(line.Text))
.FirstOrDefault();
if (passportLine != null)
{
Console.WriteLine($"Passport line found: {passportLine.Text}");
Console.WriteLine($"Confidence: {passportLine.Confidence:F2}%");
// Only accept if confidence meets threshold
return passportLine.Confidence >= minConfidence;
}
return false;
}
private bool IsPassportNumberFormat(string text)
{
// Simple passport number validation
return System.Text.RegularExpressions.Regex.IsMatch(text, @"^[A-Z]\d{7,9}$");
}
}
Imports IronOcr
Public Class DocumentValidator
Private ReadOnly ocr As New IronTesseract()
Public Function ValidatePassportNumber(imagePath As String, Optional minConfidence As Double = 95.0) As Boolean
Using input As New OcrImageInput(imagePath)
' Configure for passport reading
ocr.Configuration.ReadBarCodes = True
ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.SingleLine
' Apply preprocessing
input.Deskew()
input.Scale(200) ' Upscale for better accuracy
Dim result = ocr.Read(input)
' Find passport number pattern
Dim passportLine = result.Lines _
.Where(Function(line) line.Text.Contains("P<") OrElse IsPassportNumberFormat(line.Text)) _
.FirstOrDefault()
If passportLine IsNot Nothing Then
Console.WriteLine($"Passport line found: {passportLine.Text}")
Console.WriteLine($"Confidence: {passportLine.Confidence:F2}%")
' Only accept if confidence meets threshold
Return passportLine.Confidence >= minConfidence
End If
Return False
End Using
End Function
Private Function IsPassportNumberFormat(text As String) As Boolean
' Simple passport number validation
Return System.Text.RegularExpressions.Regex.IsMatch(text, "^[A-Z]\d{7,9}$")
End Function
End Class
优化以增强信心
要获得更高的置信度分数,请考虑使用 图像过滤器 和预处理技术:
using IronOcr;
// Create an optimized OCR workflow
IronTesseract ocr = new IronTesseract();
using var input = new OcrImageInput("low_quality_scan.jpg");
// Apply multiple filters to improve confidence
input.Deskew(); // Correct rotation
input.DeNoise(); // Remove noise
input.Sharpen(); // Enhance edges
input.Dilate(); // Thicken text
input.Scale(150); // Upscale for clarity
// Configure for accuracy over speed
ocr.Configuration.TesseractVersion = TesseractVersion.Tesseract5;
ocr.Configuration.EngineMode = TesseractEngineMode.TesseractOnly;
var result = ocr.Read(input);
Console.WriteLine($"Document confidence: {result.Confidence:F2}%");
// Generate confidence report
var confidenceReport = result.Pages
.Select((page, index) => new
{
PageNumber = index + 1,
Confidence = page.Confidence,
WordCount = page.Words.Length,
LowConfidenceWords = page.Words.Count(w => w.Confidence < 80)
});
foreach (var page in confidenceReport)
{
Console.WriteLine($"Page {page.PageNumber}: {page.Confidence:F2}% confidence");
Console.WriteLine($" Total words: {page.WordCount}");
Console.WriteLine($" Low confidence words: {page.LowConfidenceWords}");
}
using IronOcr;
// Create an optimized OCR workflow
IronTesseract ocr = new IronTesseract();
using var input = new OcrImageInput("low_quality_scan.jpg");
// Apply multiple filters to improve confidence
input.Deskew(); // Correct rotation
input.DeNoise(); // Remove noise
input.Sharpen(); // Enhance edges
input.Dilate(); // Thicken text
input.Scale(150); // Upscale for clarity
// Configure for accuracy over speed
ocr.Configuration.TesseractVersion = TesseractVersion.Tesseract5;
ocr.Configuration.EngineMode = TesseractEngineMode.TesseractOnly;
var result = ocr.Read(input);
Console.WriteLine($"Document confidence: {result.Confidence:F2}%");
// Generate confidence report
var confidenceReport = result.Pages
.Select((page, index) => new
{
PageNumber = index + 1,
Confidence = page.Confidence,
WordCount = page.Words.Length,
LowConfidenceWords = page.Words.Count(w => w.Confidence < 80)
});
foreach (var page in confidenceReport)
{
Console.WriteLine($"Page {page.PageNumber}: {page.Confidence:F2}% confidence");
Console.WriteLine($" Total words: {page.WordCount}");
Console.WriteLine($" Low confidence words: {page.LowConfidenceWords}");
}
Imports IronOcr
' Create an optimized OCR workflow
Dim ocr As New IronTesseract()
Using input As New OcrImageInput("low_quality_scan.jpg")
' Apply multiple filters to improve confidence
input.Deskew() ' Correct rotation
input.DeNoise() ' Remove noise
input.Sharpen() ' Enhance edges
input.Dilate() ' Thicken text
input.Scale(150) ' Upscale for clarity
' Configure for accuracy over speed
ocr.Configuration.TesseractVersion = TesseractVersion.Tesseract5
ocr.Configuration.EngineMode = TesseractEngineMode.TesseractOnly
Dim result = ocr.Read(input)
Console.WriteLine($"Document confidence: {result.Confidence:F2}%")
' Generate confidence report
Dim confidenceReport = result.Pages _
.Select(Function(page, index) New With {
.PageNumber = index + 1,
.Confidence = page.Confidence,
.WordCount = page.Words.Length,
.LowConfidenceWords = page.Words.Count(Function(w) w.Confidence < 80)
})
For Each page In confidenceReport
Console.WriteLine($"Page {page.PageNumber}: {page.Confidence:F2}% confidence")
Console.WriteLine($" Total words: {page.WordCount}")
Console.WriteLine($" Low confidence words: {page.LowConfidenceWords}")
Next
End Using
摘要
了解和利用 OCR 置信度分数对于构建强大的文档处理应用程序至关重要。 通过利用 IronOCR 的置信度属性和字符选择,开发人员可以在其 OCR 工作流程中实施智能验证、错误处理和质量保证机制。 无论您是在处理 屏幕截图、表格,还是在处理专业文档,置信度分数都能提供确保文本提取准确性所需的指标。
常见问题解答
什么是 OCR 信心,为什么它很重要?
OCR 置信度是一个从 0 到 100 的度量,表示 OCR 系统对文本识别准确性的确定程度。IronOCR 通过任何 OcrResult 对象上的置信度属性提供这一度量,帮助开发人员评估识别文本的可靠性,尤其是在处理扫描文档、照片或文本质量不一的图像时。
如何在 C# 中快速检查 OCR 的置信度?
使用 IronOCR,您只需一行代码就能获得 OCR 的置信度:double confidence = new IronOcr.IronTesseract().Read("input.png").Confidence; 这将返回一个 0-100 之间的置信度分数,表示 IronOCR 对其文本识别的确定程度。
不同的置信度范围意味着什么?
IronOCR 信心分数表示:90-100(优)表示文本高度可靠;80-89(良)表示文本基本准确,有少量不确定因素;70-79(中)表示文本可能包含一些错误;低于 70(低)表示文本应重新审核或处理。
如何获取不同文本元素的置信度?
IronOCR 允许您检索多个粒度的置信度--页面、段落、行、单词和单个字符。执行 OCR 后,您可以通过 OcrResult 对象结构访问每个级别的置信度属性。
我能否获得带有置信度的备选词语建议?
是的,IronOcr 提供了一个 "选择 "属性,可提供其他单词选择及其置信度分数。当 OCR 引擎识别出同一文本的多种可能解释时,该功能将有所帮助,使您可以实施智能验证逻辑。
如何在应用程序中实施基于置信度的验证?
使用 IronOCR 的读取方法后,检查 OcrResult 的置信度属性。根据置信度阈值实施条件逻辑--例如,自动接受 90 分以上的结果,标记 70-90 分之间的结果以供审核,重新处理或手动验证 70 分以下的结果。

