如何在 C# 中使用 Iron Tesseract 進行 OCR辨識與圖片轉文字
在 C# 中使用 Iron Tesseract 的方法是:建立一個 IronTesseract 實例,配置語言和 OCR辨識設置,然後對包含圖像或 PDF 的 OcrInput 物件呼叫 Read() 方法。 程式使用 Tesseract 5 的最佳化引擎執行圖片轉文字,將影像轉換為可搜尋的 PDF。
IronOCR提供了一個直覺的 API,用於使用客製化和最佳化的 Tesseract 5,即 Iron Tesseract。 透過使用IronOCR和 IronTesseract,您可以將文字影像和掃描文件轉換為文字和可搜尋的 PDF。 該圖書館支援125 種國際語言,並包含條碼讀取和電腦視覺等高級功能。
快速入門:在 C# 中設定 IronTesseract 設定
本範例示範如何使用特定設定配置 IronTesseract 並在一行程式碼中執行 OCR。
-
使用NuGet套件管理器安裝https://www.nuget.org/packages/IronOcr
PM > Install-Package IronOcr -
複製並運行這段程式碼。
var result = new IronOcr.IronTesseract { Language = IronOcr.OcrLanguage.English, Configuration = new IronOcr.TesseractConfiguration { ReadBarCodes = false, RenderSearchablePdf = true, WhiteListCharacters = "ABCabc123" } }.Read(new IronOcr.OcrInput("image.png")); -
部署到您的生產環境進行測試
今天就在您的專案中開始使用免費試用IronOCR
基本 OCR 工作流程
- 使用NuGet安裝 OCR 庫以讀取映像
- Utilize Custom `Tesseract 5` to perform OCR
- 載入所需文檔,例如圖像或 PDF 文件,以進行處理。
- 將提取的文字輸出到控制台或文件
- 將結果儲存為可搜尋的 PDF 檔案。
如何建立 IronTesseract 實例?
使用以下程式碼初始化 Tesseract 物件:
:path=/static-assets/ocr/content-code-examples/how-to/irontesseract-initialize-irontesseract.cs
using IronOcr;
IronTesseract ocr = new IronTesseract();
Imports IronOcr
Dim ocr As New IronTesseract()
您可以透過選擇不同的語言、啟用條碼讀取以及將字元列入白名單/黑名單來自訂 IronTesseract 的行為。 IronOCR提供全面的配置選項,可對您的 OCR 流程進行微調:
:path=/static-assets/ocr/content-code-examples/how-to/irontesseract-configure-irontesseract.cs
IronTesseract ocr = new IronTesseract
{
Configuration = new TesseractConfiguration
{
ReadBarCodes = false,
RenderHocr = true,
TesseractVariables = null,
WhiteListCharacters = null,
BlackListCharacters = "`ë|^",
},
MultiThreaded = false,
Language = OcrLanguage.English,
EnableTesseractConsoleMessages = true, // False as default
};
Dim ocr As New IronTesseract With {
.Configuration = New TesseractConfiguration With {
.ReadBarCodes = False,
.RenderHocr = True,
.TesseractVariables = Nothing,
.WhiteListCharacters = Nothing,
.BlackListCharacters = "`ë|^"
},
.MultiThreaded = False,
.Language = OcrLanguage.English,
.EnableTesseractConsoleMessages = True
}
配置完成後,您可以使用 Tesseract 功能讀取 OcrInput 物件。 OcrInput 類別提供了靈活的方法來載入各種輸入格式:
:path=/static-assets/ocr/content-code-examples/how-to/irontesseract-read.cs
IronTesseract ocr = new IronTesseract();
using OcrInput input = new OcrInput();
input.LoadImage("attachment.png");
OcrResult result = ocr.Read(input);
string text = result.Text;
Dim ocr As New IronTesseract()
Using input As New OcrInput()
input.LoadImage("attachment.png")
Dim result As OcrResult = ocr.Read(input)
Dim text As String = result.Text
End Using
對於複雜場景,您可以利用 多執行緒功能同時處理多個文檔,從而顯著提高批次操作的效能。
Tesseract 的高階配置變數有哪些?
IronOCR Tesseract 介面允許透過IronOCR類別完全控制 Tesseract 配置變數。 這些進階設定可讓您針對特定用例最佳化 OCR 效能,例如修復低品質掃描或讀取特定文件類型。
如何在程式碼中使用 Tesseract 配置?
:path=/static-assets/ocr/content-code-examples/how-to/irontesseract-tesseract-configuration.cs
using IronOcr;
using System;
IronTesseract Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.English;
Ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd;
// Configure Tesseract Engine
Ocr.Configuration.TesseractVariables["tessedit_parallelize"] = false;
using var input = new OcrInput();
input.LoadImage("/path/file.png");
OcrResult Result = Ocr.Read(input);
Console.WriteLine(Result.Text);
Imports IronOcr
Imports System
Private Ocr As New IronTesseract()
Ocr.Language = OcrLanguage.English
Ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd
' Configure Tesseract Engine
Ocr.Configuration.TesseractVariables("tessedit_parallelize") = False
Dim input = New OcrInput()
input.LoadImage("/path/file.png")
Dim Result As OcrResult = Ocr.Read(input)
Console.WriteLine(Result.Text)
IronOCR也提供針對不同文件類型的專門配置。例如,在讀取護照或處理 MICR 支票時,您可以套用特定的預處理過濾器和區域檢測來提高準確率。
財務文檔配置範例:
// Example: Configure for financial documents
IronTesseract ocr = new IronTesseract
{
Language = OcrLanguage.English,
Configuration = new TesseractConfiguration
{
PageSegmentationMode = TesseractPageSegmentationMode.SingleBlock,
TesseractVariables = new Dictionary<string, object>
{
["tessedit_char_whitelist"] = "0123456789.$,",
["textord_heavy_nr"] = false,
["edges_max_children_per_outline"] = 10
}
}
};
// Apply preprocessing filters for better accuracy
using OcrInput input = new OcrInput();
input.LoadPdf("financial-document.pdf");
input.Deskew();
input.EnhanceResolution(300);
OcrResult result = ocr.Read(input);
// Example: Configure for financial documents
IronTesseract ocr = new IronTesseract
{
Language = OcrLanguage.English,
Configuration = new TesseractConfiguration
{
PageSegmentationMode = TesseractPageSegmentationMode.SingleBlock,
TesseractVariables = new Dictionary<string, object>
{
["tessedit_char_whitelist"] = "0123456789.$,",
["textord_heavy_nr"] = false,
["edges_max_children_per_outline"] = 10
}
}
};
// Apply preprocessing filters for better accuracy
using OcrInput input = new OcrInput();
input.LoadPdf("financial-document.pdf");
input.Deskew();
input.EnhanceResolution(300);
OcrResult result = ocr.Read(input);
Imports IronOcr
' Example: Configure for financial documents
Dim ocr As New IronTesseract With {
.Language = OcrLanguage.English,
.Configuration = New TesseractConfiguration With {
.PageSegmentationMode = TesseractPageSegmentationMode.SingleBlock,
.TesseractVariables = New Dictionary(Of String, Object) From {
{"tessedit_char_whitelist", "0123456789.$,"},
{"textord_heavy_nr", False},
{"edges_max_children_per_outline", 10}
}
}
}
' Apply preprocessing filters for better accuracy
Using input As New OcrInput()
input.LoadPdf("financial-document.pdf")
input.Deskew()
input.EnhanceResolution(300)
Dim result As OcrResult = ocr.Read(input)
End Using
Tesseract配置變數的完整清單是什麼?
這些可以透過 IronTesseract.Configuration.TesseractVariables["key"] = value; 進行設定。 配置變數可讓您微調 OCR 行為,以獲得針對特定文件的最佳結果。 有關優化 OCR 效能的詳細指導,請參閱我們的快速 OCR 設定指南。
| Tesseract 配置變數 | 預設 | 意義 |
|---|---|---|
| 分類_數量_cp_級別 | 3 | 類別剪枝器等級數 |
| textord_debug_tabfind | 0 | 調試選項卡查找 |
| textord_debug_bugs | 0 | 啟用與製表符查找錯誤相關的輸出 |
| textord_testregion_left | -1 | 調試報告矩形的左邊緣 |
| textord_testregion_top | -1 | 調試報告矩形的頂部邊緣 |
| textord_testregion_right | 2147483647 | 調試矩形的右邊緣 |
| textord_testregion_bottom | 2147483647 | 調試矩形的底部邊緣 |
| textord_tabfind_show_partitions | 0 | 顯示分區邊界,如果大於 1 則等待。 |
| devanagari_split_debuglevel | 0 | 拆分 shiro-rekha 進程的調試等級。 |
| edges_max_children_per_outline | 10 | 角色輪廓內子角色的最大數量 |
| edges_max_children_layers | 5 | 角色輪廓內嵌套子角色的最大層數 |
| 每個孫子的邊緣子節點 | 10 | 拋擲輪廓的重要性比率 |
| edges_children_count_limit | 45 | 斑點中允許的最大孔數 |
| edges_min_nonhole | 12 | 方框內潛在字元的最小像素 |
| 邊緣路徑面積比率 | 40 | Max lensq/area for acceptable child outline |
| textord_fp_chop_error | 2 | 最大允許的切割單元彎曲度 |
| textord_tabfind_show_images | 0 | Show image blobs |
| textord_skewsmooth_offset | 4 | 對於平滑因子 |
| textord_skewsmooth_offset2 | 1 | 對於平滑因子 |
| textord_test_x | -2147483647 | 測試點座標 |
| textord_test_y | -2147483647 | 測試點座標 |
| textord_min_blobs_in_row | 4 | 梯度計數前的最小斑點數 |
| textord_spline_minblobs | 8 | Min blobs in each spline segment |
| textord_spline_medianwin | 6 | Size of window for spline segmentation |
| textord_max_blob_overlaps | 4 | Max number of blobs a big blob can overlap |
| textord_min_xheight | 10 | Min credible pixel xheight |
| textord_lms_line_trials | 12 | Number of linew fits to do |
| oldbl_holed_losscount | 10 | Max lost before fallback line used |
| pitsync_linear_version | 6 | Use new fast algorithm |
| pitsync_fake_depth | 1 | Max advance fake generation |
| textord_tabfind_show_strokewidths | 0 | Show stroke widths |
| textord_dotmatrix_gap | 3 | Max pixel gap for broken pixed pitch |
| textord_debug_block | 0 | Block to do debug on |
| 文字音調範圍 | 2 | Max range test on pitch |
| textord_words_veto_power | 5 | Rows required to outvote a veto |
| equationdetect_save_bi_image | 0 | Save input bi image |
| equationdetect_save_spt_image | 0 | Save special character image |
| 方程式檢測保存種子影像 | 0 | Save the seed image |
| 方程式檢測保存合併影像 | 0 | Save the merged image |
| poly_debug | 0 | Debug old poly |
| poly_wide_objects_better | 1 | More accurate approx on wide things |
| wordrec_display_splits | 0 | Display splits |
| textord_debug_printable | 0 | Make debug windows printable |
| textord_space_size_is_variable | 0 | If true, word delimiter spaces are assumed to have variable width, even though characters have fixed pitch. |
| textord_tabfind_show_initial_partitions | 0 | Show partition bounds |
| textord_tabfind_show_reject_blobs | 0 | Show blobs rejected as noise |
| textord_tabfind_show_columns | 0 | Show column bounds |
| textord_tabfind_show_blocks | 0 | Show final block bounds |
| textord_tabfind_find_tables | 1 | run table detection |
| devanagari_split_debugimage | 0 | Whether to create a debug image for split shiro-rekha process. |
| textord_show_fixed_cuts | 0 | Draw fixed pitch cell boundaries |
| 邊緣使用新的輪廓複雜性 | 0 | Use the new outline complexity module |
| 邊緣調試 | 0 | turn on debugging for this module |
| edges_children_fix | 0 | Remove boxy parents of char-like children |
| gapmap_debug | 0 | Say which blocks have tables |
| gapmap_use_ends | 0 | Use large space at start and end of rows |
| gapmap_no_isolated_quanta | 0 | Ensure gaps not less than 2quanta wide |
| textord_heavy_nr | 0 | Vigorously remove noise |
| textord_show_initial_rows | 0 | Display row accumulation |
| textord_show_parallel_rows | 0 | Display page correlated rows |
| textord_show_expanded_rows | 0 | Display rows after expanding |
| textord_show_final_rows | 0 | Display rows after final fitting |
| textord_show_final_blobs | Display blob bounds after pre-ass | |
| textord_test_landscape | 0 | Tests refer to land/port |
| textord_parallel_baselines | 1 | Force parallel baselines |
| textord_straight_baselines | 0 | Force straight baselines |
| textord_old_baselines | 1 | |
| textord_old_xheight | 0 | Use old xheight algorithm |
| textord_fix_xheight_bug | 1 | Use spline baseline |
| textord_fix_makerow_bug | 1 | Prevent multiple baselines |
| textord_debug_xheights | 0 | Test xheight algorithms |
| textord_biased_skewcalc | 1 | Bias skew estimates with line length |
| textord_interpolating_skew | 1 | Interpolate across gaps |
| textord_new_initial_xheight | 1 | Use test xheight mechanism |
| textord_debug_blob | 0 | Print test blob information |
| textord_really_old_xheight | 0 | Use original wiseowl xheight |
| textord_oldbl_debug | 0 | Debug old baseline generation |
| textord_debug_baselines | 0 | Debug baseline generation |
| textord_oldbl_paradef | 1 | Use para default mechanism |
| textord_oldbl_split_splines | 1 | Split stepped splines |
| textord_oldbl_merge_parts | 1 | Merge suspect partitions |
| oldbl_corrfix | 1 | Improve correlation of heights |
| oldbl_xhfix | 0 | Fix bug in modes threshold for xheights |
| textord_ocropus_mode | 0 | Make baselines for ocropus |
| textord_tabfind_only_strokewidths | 0 | Only run stroke widths |
| textord_tabfind_show_initialtabs | 0 | Show tab candidates |
| textord_tabfind_show_finaltabs | 0 | Show tab vectors |
| textord_show_tables | 0 | Show table regions |
| textord_tablefind_show_mark | 0 | Debug table marking steps in detail |
| textord_tablefind_show_stats | 0 | Show page stats used in table finding |
| textord_tablefind_recognize_tables | 0 | Enables the table recognizer for table layout and filtering. |
| textord_all_prop | ||
| textord_debug_pitch_test | ||
| textord_disable_pitch_test | ||
| textord_fast_pitch_test | ||
| textord_debug_pitch_metric | ||
| textord_show_row_cuts | ||
| textord_show_page_cuts | ||
| textord_pitch_cheat | ||
| textord_blockndoc_fixed | ||
| textord_show_initial_words | ||
| textord_show_new_words | ||
| textord_show_fixed_words | ||
| textord_blocksall_fixed | ||
| textord_blocksall_prop | ||
| textord_blocksall_testing | ||
| 文字測試模式 | ||
| textord_pitch_rowsimilarity | ||
| words_initial_lower | ||
| words_initial_upper | ||
| words_default_prop_nonspace | ||
| words_default_fixed_space | ||
| words_default_fixed_limit | ||
| textord_words_definite_spread | ||
| textord_spacesize_ratiofp | ||
| textord_spacesize_ratioprop | ||
| textord_fpiqr_ratio | ||
| textord_max_pitch_iqr | ||
| textord_fp_min_width | ||
| textord_underline_offset | ||
| ambigs_debug_level | ||
| 分類調試級別 | ||
| 分類規範方法 | ||
| 匹配器調試級別 | ||
| 匹配器調試標誌 | ||
| 分類學習調試級別 | ||
| matcher_permanent_classes_min | ||
| matcher_min_examples_for_prototyping | ||
| 用於原型設計的充分範例匹配器 | ||
| 分類_適應_原型_閾值 | ||
| 分類_適應_特徵_閾值 | ||
| 分類剪枝器閾值 | ||
| 分類剪枝乘數 | ||
| 分類_cp_截止強度 | ||
| 分類整數匹配器乘數 | ||
| dawg_debug_level | ||
| hyphen_debug_level | ||
| stopper_smallword_size | ||
| stopper_debug_level | ||
| tessedit_truncate_wordchoice_log | ||
| 最大置換嘗試次數 | ||
| 修復未切碎的斑點 | ||
| chop_debug | ||
| 分割長度 | ||
| 砍到相同距離 | ||
| 砍伐最小輪廓點 | ||
| 剪縫絨毛尺寸 | ||
| 切內角 | ||
| 砍掉最小輪廓區域 | ||
| 截斷居中最大寬度 | ||
| 砍伐 x 和 y 重量 | ||
| wordrec_debug_level | ||
| wordrec_max_join_chunks | ||
| segsearch_debug_level | ||
| segsearch_max_pain_points | ||
| segsearch_max_futile_classifications | ||
| language_model_debug_level | ||
| language_model_ngram_order | ||
| language_model_viterbi_list_max_num_prunable | ||
| language_model_viterbi_list_max_size | ||
| language_model_min_compound_length | ||
| wordrec_display_segmentations | ||
| tessedit_pageseg_mode | ||
| tessedit_ocr_engine_mode | ||
| 頁面eg_devanagari_split_strategy | ||
| ocr_devanagari_split_strategy | ||
| 雙向除錯 | ||
| applybox_debug | ||
| applybox_page | ||
| tessedit_bigram_debug | ||
| 調試噪音消除 | ||
| 噪音最大值 | ||
| noise_maxperword | ||
| debug_x_ht_level | ||
| quality_min_initial_alphas_reqd | ||
| tessedit_tess_adaption_mode | ||
| 多語言調試級別 | ||
| 段落調試級別 | ||
| tessedit_preserve_min_wd_len | ||
| crunch_rating_max | ||
| crunch_pot_indicators | ||
| crunch_leave_lc_strings | ||
| crunch_leave_uc_strings | ||
| 長時間重複訓練 | ||
| crunch_debug | ||
| fixsp_non_noise_limit | ||
| fixsp_done_mode | ||
| 調試修復空間級別 | ||
| x_ht_acceptance_tolerance | ||
| x_ht_min_change | ||
| 上標調試 | ||
| jpg_質量 | ||
| 使用者自訂DPI | ||
| min_characters_to_try | ||
| suspect_level | ||
| suspect_short_words | ||
| tessedit_reject_mode | ||
| tessedit_image_border | ||
| 最小正常 x 高度像素 | ||
| tessedit_page_number | ||
| tessedit_parallelize | ||
| lstm_選擇模式 | ||
| lstm_choice_iterations | ||
| tosp_debug_level | ||
| tosp_enough_space_samples_for_median | ||
| tosp_redo_kern_limit | ||
| tosp_few_samples | ||
| tosp_短行 | ||
| tosp_sanity_method | ||
| textord_max_noise_size | ||
| textord_baseline_debug | ||
| textord_noise_sizefraction | ||
| textord_noise_translimit | ||
| textord_noise_sncount | ||
| 使用歧義進行適應 | ||
| 優先劃分 | ||
| 分類啟用學習 | ||
| tess_cn_matching | ||
| tess_bn_matching | ||
| 啟用自適應匹配器 | ||
| 分類_使用_預先適合的模板 | ||
| 分類_儲存_已適配模板 | ||
| 啟用自適應調試器 | ||
| 分類非線性範數 | ||
| disable_character_fragments | ||
| 分類除錯字元片段 | ||
| matcher_debug_separate_windows | ||
| 分類_bln_numeric_mode | ||
| 載入系統狗 | ||
| 載入頻率_dawg | ||
| 載入無歧義的狗 | ||
| 載入_punc_dawg | ||
| 載入編號_dawg | ||
| 加載雙字母狗 | ||
| 只使用第一個 uft8_step | ||
| stopper_no_acceptable_choices | ||
| 段非字母腳本 | ||
| 儲存文件 | ||
| 合併矩陣中的片段 | ||
| wordrec_enable_assoc | ||
| force_word_assoc | ||
| 啟用 | ||
| 垂直爬行 | ||
| 砍新縫堆 | ||
| 假設固定音高字符段 | ||
| wordrec_skip_no_truth_words | ||
| wordrec_debug_blamer | ||
| wordrec_run_blamer | ||
| 儲存備選方案 | ||
| language_model_ngram_on | ||
| language_model_ngram_use_
only_first_uft8_step | ||
| language_model_ngram_space_delimited_language | ||
| language_model_use_sigmoidal_certainty | ||
| tessedit_resegment_from_boxes | ||
| tessedit_resegment_from_line_boxes | ||
| tessedit_train_from_boxes | ||
| tessedit_make_boxes_from_boxes | ||
| tessedit_train_line_recognizer | ||
| tessedit_dump_pageseg_images | ||
| tessedit_do_invert | ||
| tessedit_ambigs_training | ||
| tessedit_adaption_debug | ||
| applybox_learn_chars_and_char_frags_mode | ||
| applybox_learn_ngrams_mode | ||
| tessedit_display_outwords | ||
| tessedit_dump_choices | ||
| tessedit_timing_debug | ||
| tessedit_fix_fuzzy_spaces | ||
| tessedit_unrej_any_wd | ||
| tessedit_fix_hyphens | ||
| tessedit_enable_doc_dict | ||
| tessedit_debug_fonts | ||
| tessedit_debug_block_rejection | ||
| tessedit_enable_bigram_correction | ||
| tessedit_enable_dict_correction | ||
| 啟用降噪 | ||
| tessedit_minimal_rej_pass1 | ||
| tessedit_test_adaption | ||
| 測試點 | ||
| 基於段落文本 | ||
| lstm_use_matrix | ||
| tessedit_good_quality_unrej | ||
| tessedit_use_reject_spaces | ||
| tessedit_preserve_blk_rej_perfect_wds | ||
| tessedit_preserve_row_rej_perfect_wds | ||
| tessedit_dont_blkrej_good_wds | ||
| tessedit_dont_rowrej_good_wds | ||
| tessedit_row_rej_good_docs | ||
| tessedit_reject_bad_qual_wds | ||
| tessedit_debug_doc_rejection | ||
| tessedit_debug_quality_metrics | ||
| bland_unrej | ||
| unlv_tilde_crunching | ||
| hocr_font_info | ||
| hocr_char_boxes | ||
| crunch_early_merge_tess_fails | ||
| crunch_early_convert_bad_unlv_chs | ||
| crunch_terrible_garbage | ||
| crunch_leave_ok_strings | ||
| crunch_accept_ok | ||
| crunch_leave_accept_strings | ||
| crunch_include_numerals | ||
| tessedit_prefer_joined_punct | ||
| tessedit_write_block_separators | ||
| tessedit_write_rep_codes | ||
| tessedit_write_unlv | ||
| tessedit_create_txt | ||
| tessedit_create_hocr | ||
| tessedit_create_alto | ||
| tessedit_create_lstmbox | ||
| tessedit_create_tsv | ||
| tessedit_create_wordstrbox | ||
| tessedit_create_pdf | ||
| textonly_pdf | ||
| suspect_constrain_1Il | ||
| tessedit_minimal_rejection | ||
| tessedit_zero_rejection | ||
| tessedit_word_for_word | ||
| tessedit_zero_kelvin_rejection | ||
| tessedit_rejection_debug | ||
| tessedit_flip_0O | ||
| rej_trust_doc_dawg | ||
| rej_1Il_use_dict_word | ||
| rej_1Il_trust_permuter_type | ||
| rej_use_tess_accepted | ||
| rej_use_tess_blanks | ||
| rej_use_good_perm | ||
| rej_use_sensible_wd | ||
| rej_alphas_in_number_perm | ||
| tessedit_create_boxfile | ||
| tessedit_write_images | ||
| 互動式顯示模式 | ||
| tessedit_override_permuter | ||
| tessedit_use_primary_params_model | ||
| textord_tabfind_show_vlines | ||
| textord_use_cjk_fp_model | ||
| poly_allow_detailed_fx | ||
| tessedit_init_config_only | ||
| textord_equation_detect | ||
| textord_tabfind_vertical_text | ||
| textord_tabfind_force_vertical_text | ||
| 保留詞間空格 | ||
| pageseg_apply_music_mask | ||
| textord_single_height_mode | ||
| tosp_old_to_method | ||
| tosp_old_to_constrain_sp_kn | ||
| tosp_only_use_prop_rows | ||
| tosp_force_wordbreak_on_punct | ||
| tosp_use_pre_chopping | ||
| tosp_old_to_bug_fix | ||
| tosp_block_use_cert_spaces | ||
| tosp_row_use_cert_spaces | ||
| tosp_narrow_blobs_not_cert | ||
| tosp_row_use_cert_spaces1 | ||
| tosp_recovery_isolated_row_stats | ||
| tosp_only_small_gaps_for_kern | ||
| tosp_all_flips_fuzzy | ||
| tosp_fuzzy_limit_all | ||
| textord_no_rejects | ||
| textord_show_blobs | ||
| textord_show_boxes | ||
| textord_noise_rejwords | ||
| textord_noise_rejrows | ||
| textord_noise_debug | ||
| classify_learn_debug_str | ||
| 使用者單字文件 | ||
| 用戶詞後綴 | ||
| 使用者模式文件 | ||
| 使用者模式後綴 | ||
| 輸出歧義詞文件 | ||
| 待調試單字 | ||
| tessedit_char_blacklist | ||
| tessedit_char_whitelist | ||
| tessedit_char_unblacklist | ||
| tessedit_write_params_to_file | ||
| 應用框曝光模式 | ||
| chs_leading_punct('`" | ||
| chs_trailing_punct1 | ||
| chs_trailing_punct2)'`" | ||
| 輪廓_奇特 | %| | 非標準數量的輪廓 |
| outlines_2ij!?%":; | 非標準數量的輪廓 | |
| 數位標點符號 | ., | Punct. chs expected WITHIN numbers |
| 未識別的字符 | | | Output char for unidentified blobs |
| ok_repeated_ch_non_alphanum_wds | -?*= | Allow NN to unrej |
| 衝突集 I_l_1 | Il1 [] | Il1 conflict set |
| 文件類型 | .tif | Filename extension |
| tessedit_load_sublangs | ||
| 頁面分隔符 | ||
| 分類字元規範範圍 | ||
| 分類最高評分率 | ||
| 分類最大確定性邊際 | ||
| 匹配器_良好閾值 | ||
| 匹配器可靠自適應結果 | ||
| 匹配器完美閾值 | ||
| matcher_bad_match_pad | ||
| 匹配器評分差距 | ||
| 匹配器平均雜訊大小 | ||
| 匹配器聚類最大角度變化 | ||
| 分類不合格垃圾懲罰 | ||
| 評分量表 | ||
| 確定性規模 | ||
| tessedit_class_miss_scale | ||
| 分類適應剪枝因子 | ||
| 分類適應剪枝閾值 | ||
| 分類字元片段垃圾確定性閾值 | ||
| 斑點大尺寸 | ||
| 斑點評級懲罰 | ||
| x高度懲罰下標 | ||
| x高度懲罰不一致 | ||
| 詞段懲罰字典_常用詞 | ||
| segment_penalty_dict_case_ok | ||
| segment_penalty_dict_case_bad | ||
| segment_penalty_dict_nonword | ||
| 確定性規模 | ||
| stopper_nondict_certainty_base | ||
| stopper_phase2_certainty_rejection_offset | ||
| stopper_certainty_per_char | ||
| stopper_allowable_character_badness | ||
| doc_dict_pending_threshold | ||
| doc_dict_certainty_threshold | ||
| tessedit_certainty_threshold | ||
| chop_split_dist_knob | ||
| chop_overlap_knob | ||
| chop_center_knob | ||
| chop_sharpness_knob | ||
| chop_width_change_knob | ||
| chop_ok_split | ||
| chop_good_split | ||
| segsearch_max_char_wh_ratio |
為了獲得最佳效果,建議在應用 OCR 之前使用 IronOCR 的影像預處理濾鏡。 這些過濾器可以顯著提高準確率,尤其是在處理低品質掃描件或表格等複雜文件時。
常見問題解答
如何在 C# 中配置 IronTesseract 進行 OCR?
若要設定 IronTesseract,請建立一個 IronTesseract 實例,並設定語言和組態等屬性。您可以指定 OCR 語言(從 125 種支援語言中選擇)、啟用條碼讀取、設定可搜尋的 PDF 輸出,以及設定字元白名單。例如: var tesseract = new IronOcr.IronTesseract { Language = IronOcr.OcrLanguage.English, Configuration = new IronOcr.TesseractConfiguration { ReadBarCodes = false, RenderSearchablePdf = true }; var tesseract = new IronOcr.IronTesseract { Language = IronOcr.OcrLanguage.English, Configuration = new IronOcr.TesseractConfiguration { ReadBarCodes = false, RenderSearchablePdf = true };。};
IronTesseract 支援哪些輸入格式?
IronTesseract 透過 OcrInput 類別接受各種輸入格式。您可以處理影像 (PNG、JPG 等)、PDF 檔案和掃描的文件。OcrInput 類提供靈活的方法來載入這些不同的格式,讓您可以輕鬆地在幾乎任何包含文字的文件上執行 OCR。
我可以使用 IronTesseract 在讀取文字的同時讀取 BarCode 嗎?
是的,IronTesseract 包含進階的條碼讀取功能。您可以在 TesseractConfiguration 中設定 ReadBarCodes = true 來啟用條碼偵測。這可讓您在單一 OCR 作業中,從同一個文件中同時擷取文字和條碼資料。
如何從掃描的文件建立可搜尋的 PDF?
透過在 TesseractConfiguration 中設定 RenderSearchablePdf = true,IronTesseract 可以將掃描的文件和影像轉換成可搜尋的 PDF。這可建立可選擇和搜尋文字的 PDF 檔案,同時保持原始文件的外觀。
IronTesseract 的 OCR 支援哪些語言?
IronTesseract 支援 125 種國際語言的文字辨識。您可以透過設定 IronTesseract 實例的 Language 屬性來指定語言,例如 IronOcr.OcrLanguage.English、Spanish、Chinese、Arabic 等。
我可以限制 OCR 識別哪些字元嗎?
是的,IronTesseract 允許透過 TesseractConfiguration 中的 WhiteListCharacters 屬性,將字元列入白名單和黑名單。當您知道預期的字元集(例如僅限識別字母數字字元)時,此功能有助於提高精確度。
如何同時對多個文件執行 OCR?
IronTesseract 支援批次處理的多執行緒功能。您可以利用平行處理功能,同時對多個文件進行 OCR,大幅提升處理大量影像或 PDF 文件時的效能。
IronOCR 使用哪個版本的 Tesseract?
IronOCR 使用定制和優化版本的 Tesseract 5,即 Iron Tesseract。與標準的 Tesseract 實作相比,這個增強的引擎提供了更高的精確度和效能,同時保持與 .NET 應用程式的相容性。

