如何在 C# 中使用 Iron Tesseract
在 C# 中使用 Iron Tesseract 的方式是:建立一個 IronTesseract 實例,設定其語言與 OCR 參數,接著對包含您的圖片或 PDF 檔案的 OcrInput 物件呼叫 Read() 方法。 此工具利用 Tesseract 5 的優化引擎,將文字圖像轉換為可搜尋的 PDF 檔案。
IronOCR 提供直觀的 API,用於運用經過客製化與優化的 Tesseract 5,即 Iron Tesseract。 透過使用 IronOCR 和 IronTesseract,您將能夠將文字圖像和掃描文件轉換為純文字及可搜尋的 PDF 檔案。 該函式庫支援 125 種國際語言,並包含 BARCODE 讀取與電腦視覺等進階功能。
快速入門:在 C# 中設定 IronTesseract 配置
此範例展示如何透過特定設定配置 IronTesseract,並以單行程式碼執行 OCR。
-
using NuGet 套件管理員安裝 https://www.nuget.org/packages/IronOcr
PM > Install-Package IronOcr -
請複製並執行此程式碼片段。
var result = new IronOcr.IronTesseract { Language = IronOcr.OcrLanguage.English, Configuration = new IronOcr.TesseractConfiguration { ReadBarCodes = false, RenderSearchablePdf = true, WhiteListCharacters = "ABCabc123" } }.Read(new IronOcr.OcrInput("image.png")); -
部署至您的生產環境進行測試
立即透過免費試用,在您的專案中開始使用 IronOCR
基本 OCR 工作流程
- 使用 NuGet 安裝 OCR 函式庫以讀取圖像
- 使用自訂 `Tesseract 5` 執行 OCR
- 載入待處理的文件,例如圖片或 PDF 檔案
- 將擷取的文字輸出至控制台或檔案
- 將結果儲存為可搜尋的 PDF 檔案
如何建立 IronTesseract 實例?
請使用以下程式碼初始化 Tesseract 物件:
:path=/static-assets/ocr/content-code-examples/how-to/irontesseract-initialize-irontesseract.cs
using IronOcr;
IronTesseract ocr = new IronTesseract();
Imports IronOcr
Dim ocr As New IronTesseract()
您可以透過選擇不同語言、啟用BarCode讀取功能,以及將字元加入白名單或黑名單,來自訂 IronTesseract 的運作行為。 IronOCR 提供全面的設定選項,讓您能針對 OCR 流程進行微調:
:path=/static-assets/ocr/content-code-examples/how-to/irontesseract-configure-irontesseract.cs
IronTesseract ocr = new IronTesseract
{
Configuration = new TesseractConfiguration
{
ReadBarCodes = false,
RenderHocr = true,
TesseractVariables = null,
WhiteListCharacters = null,
BlackListCharacters = "`ë|^",
},
MultiThreaded = false,
Language = OcrLanguage.English,
EnableTesseractConsoleMessages = true, // False as default
};
Dim ocr As New IronTesseract With {
.Configuration = New TesseractConfiguration With {
.ReadBarCodes = False,
.RenderHocr = True,
.TesseractVariables = Nothing,
.WhiteListCharacters = Nothing,
.BlackListCharacters = "`ë|^"
},
.MultiThreaded = False,
.Language = OcrLanguage.English,
.EnableTesseractConsoleMessages = True
}
完成設定後,即可使用 Tesseract 功能讀取 OcrInput 物件。 OcrInput 類別提供靈活的方法,用於載入各種輸入格式:
:path=/static-assets/ocr/content-code-examples/how-to/irontesseract-read.cs
IronTesseract ocr = new IronTesseract();
using OcrInput input = new OcrInput();
input.LoadImage("attachment.png");
OcrResult result = ocr.Read(input);
string text = result.Text;
Dim ocr As New IronTesseract()
Using input As New OcrInput()
input.LoadImage("attachment.png")
Dim result As OcrResult = ocr.Read(input)
Dim text As String = result.Text
End Using
針對複雜情境,您可以利用多執行緒功能同時處理多個文件,大幅提升批次作業的效能。
什麼是 Tesseract 的進階設定變數?
IronOcr 的 Tesseract 介面允許透過 IronOcr.TesseractConfiguration 類別,完全控制 Tesseract 的配置變數。 這些進階設定可讓您針對特定使用情境優化 OCR 效能,例如修正掃描品質不佳的文件,或讀取特定類型的文件。
如何在程式碼中使用 Tesseract 設定?
:path=/static-assets/ocr/content-code-examples/how-to/irontesseract-tesseract-configuration.cs
using IronOcr;
using System;
IronTesseract Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.English;
Ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd;
// Configure Tesseract Engine
Ocr.Configuration.TesseractVariables["tessedit_parallelize"] = false;
using var input = new OcrInput();
input.LoadImage("/path/file.png");
OcrResult Result = Ocr.Read(input);
Console.WriteLine(Result.Text);
Imports IronOcr
Imports System
Private Ocr As New IronTesseract()
Ocr.Language = OcrLanguage.English
Ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd
' Configure Tesseract Engine
Ocr.Configuration.TesseractVariables("tessedit_parallelize") = False
Dim input = New OcrInput()
input.LoadImage("/path/file.png")
Dim Result As OcrResult = Ocr.Read(input)
Console.WriteLine(Result.Text)
IronOCR 亦針對不同文件類型提供專用設定。例如,在讀取護照或處理 MICR 支票時,您可以套用特定的預處理濾鏡與區域偵測功能,以提升準確度。
財務文件配置範例:
// Example: Configure for financial documents
IronTesseract ocr = new IronTesseract
{
Language = OcrLanguage.English,
Configuration = new TesseractConfiguration
{
PageSegmentationMode = TesseractPageSegmentationMode.SingleBlock,
TesseractVariables = new Dictionary<string, object>
{
["tessedit_char_whitelist"] = "0123456789.$,",
["textord_heavy_nr"] = false,
["edges_max_children_per_outline"] = 10
}
}
};
// Apply preprocessing filters for better accuracy
using OcrInput input = new OcrInput();
input.LoadPdf("financial-document.pdf");
input.Deskew();
input.EnhanceResolution(300);
OcrResult result = ocr.Read(input);
// Example: Configure for financial documents
IronTesseract ocr = new IronTesseract
{
Language = OcrLanguage.English,
Configuration = new TesseractConfiguration
{
PageSegmentationMode = TesseractPageSegmentationMode.SingleBlock,
TesseractVariables = new Dictionary<string, object>
{
["tessedit_char_whitelist"] = "0123456789.$,",
["textord_heavy_nr"] = false,
["edges_max_children_per_outline"] = 10
}
}
};
// Apply preprocessing filters for better accuracy
using OcrInput input = new OcrInput();
input.LoadPdf("financial-document.pdf");
input.Deskew();
input.EnhanceResolution(300);
OcrResult result = ocr.Read(input);
Imports IronOcr
' Example: Configure for financial documents
Dim ocr As New IronTesseract With {
.Language = OcrLanguage.English,
.Configuration = New TesseractConfiguration With {
.PageSegmentationMode = TesseractPageSegmentationMode.SingleBlock,
.TesseractVariables = New Dictionary(Of String, Object) From {
{"tessedit_char_whitelist", "0123456789.$,"},
{"textord_heavy_nr", False},
{"edges_max_children_per_outline", 10}
}
}
}
' Apply preprocessing filters for better accuracy
Using input As New OcrInput()
input.LoadPdf("financial-document.pdf")
input.Deskew()
input.EnhanceResolution(300)
Dim result As OcrResult = ocr.Read(input)
End Using
Tesseract 的所有配置變數完整清單為何?
這些可透過 IronTesseract.Co/nfiguration.TesseractVariables["key"] = value; 進行設定。 透過配置變數,您可以微調 OCR 的運作行為,以針對您的特定文件獲得最佳結果。 有關優化 OCR 效能的詳細指引,請參閱我們的快速 OCR 設定指南。
| Tesseract 配置變數 | Default | 含義 |
|---|---|---|
| classify_num_cp_levels | 3 | 類別修剪層級數 |
| textord_debug_tabfind | 0 | "除錯"索引標籤定位 |
| textord_debug_bugs | 0 | 開啟與標籤定位相關的錯誤輸出 |
| textord_testregion_left | -1 | 除錯報告矩形的左邊緣 |
| textord_testregion_top | -1 | 除錯報告矩形的頂部邊緣 |
| textord_testregion_right | 2147483647 | 除錯矩形的右邊緣 |
| textord_testregion_bottom | 2147483647 | 除錯矩形的底部邊緣 |
| textord_tabfind_show_partitions | 0 | 顯示區段邊界,若大於 1 則等待 |
| devanagari_split_debuglevel | 0 | 分段白線處理的除錯層級。 |
| edges_max_children_per_outline | 10 | 字元輪廓內的子元素最大數量 |
| edges_max_children_layers | 5 | 字元輪廓內的子元素最大嵌套層級 |
| edges_children_per_grandchild | 10 | 大綱刪除的重要性權重 |
| edges_children_count_limit | 45 | Blob 中允許的最大孔洞數 |
| edges_min_nonhole | 12 | 方框內潛在字元的最小像素數 |
| 邊緣路徑面積比 | 40 | Max lensq/area for acceptable child outline |
| textord_fp_chop_error | 2 | 切削單元的最大允許彎曲量 |
| textord_tabfind_show_images | 0 | Show image blobs |
| textord_skewsmooth_offset | 4 | 為確保流暢度 |
| textord_skewsmooth_offset2 | 1 | 為確保流暢度 |
| textord_test_x | -2147483647 | 測試點座標 |
| textord_test_y | -2147483647 | 測試點座標 |
| textord_min_blobs_in_row | 4 | 計算漸變效果前的最小 Blob 數量 |
| textord_spline_minblobs | 8 | Min blobs in each spline segment |
| textord_spline_medianwin | 6 | Size of window for spline segmentation |
| textord_max_blob_overlaps | 4 | Max number of blobs a big blob can overlap |
| textord_min_xheight | 10 | Min credible pixel xheight |
| textord_lms_line_trials | 12 | Number of linew fits to do |
| oldbl_holed_losscount | 10 | Max lost before fallback line used |
| pitsync_linear_version | 6 | Use new fast algorithm |
| pitsync_fake_depth | 1 | Max advance fake generation |
| textord_tabfind_show_strokewidths | 0 | Show stroke widths |
| textord_dotmatrix_gap | 3 | Max pixel gap for broken pixed pitch |
| textord_debug_block | 0 | Block to do debug on |
| textord_pitch_range | 2 | Max range test on pitch |
| textord_words_veto_power | 5 | Rows required to outvote a veto |
| equationdetect_save_bi_image | 0 | Save input bi image |
| equationdetect_save_spt_image | 0 | Save special character image |
| equationdetect_save_seed_image | 0 | Save the seed image |
| equationdetect_save_merged_image | 0 | Save the merged image |
| poly_debug | 0 | Debug old poly |
| poly_wide_objects_better | 1 | More accurate approx on wide things |
| wordrec_display_splits | 0 | Display splits |
| textord_debug_printable | 0 | Make debug windows printable |
| textord_space_size_is_variable | 0 | If true, word delimiter spaces are assumed to have variable width, even though characters have fixed pitch. |
| textord_tabfind_show_initial_partitions | 0 | Show partition bounds |
| textord_tabfind_show_reject_blobs | 0 | Show blobs rejected as noise |
| textord_tabfind_show_columns | 0 | Show column bounds |
| textord_tabfind_show_blocks | 0 | Show final block bounds |
| textord_tabfind_find_tables | 1 | run table detection |
| devanagari_split_debugimage | 0 | Whether to create a debug image for split shiro-rekha process. |
| textord_show_fixed_cuts | 0 | Draw fixed pitch cell boundaries |
| edges_use_new_outline_complexity | 0 | Use the new outline complexity module |
| edges_debug | 0 | turn on debugging for this module |
| edges_children_fix | 0 | Remove boxy parents of char-like children |
| gapmap_debug | 0 | Say which blocks have tables |
| gapmap_use_ends | 0 | Use large space at start and end of rows |
| gapmap_no_isolated_quanta | 0 | Ensure gaps not less than 2quanta wide |
| textord_heavy_nr | 0 | Vigorously remove noise |
| textord_show_initial_rows | 0 | Display row accumulation |
| textord_show_parallel_rows | 0 | Display page correlated rows |
| textord_show_expanded_rows | 0 | Display rows after expanding |
| textord_show_final_rows | 0 | Display rows after final fitting |
| textord_show_final_blobs | Display blob bounds after pre-ass | |
| textord_test_landscape | 0 | Tests refer to land/port |
| textord_parallel_baselines | 1 | Force parallel baselines |
| textord_straight_baselines | 0 | Force straight baselines |
| textord_old_baselines | 1 | |
| textord_old_xheight | 0 | Use old xheight algorithm |
| textord_fix_xheight_bug | 1 | Use spline baseline |
| textord_fix_makerow_bug | 1 | Prevent multiple baselines |
| textord_debug_xheights | 0 | Test xheight algorithms |
| textord_biased_skewcalc | 1 | Bias skew estimates with line length |
| textord_interpolating_skew | 1 | Interpolate across gaps |
| textord_new_initial_xheight | 1 | Use test xheight mechanism |
| textord_debug_blob | 0 | Print test blob information |
| textord_really_old_xheight | 0 | Use original wiseowl xheight |
| textord_oldbl_debug | 0 | Debug old baseline generation |
| textord_debug_baselines | 0 | Debug baseline generation |
| textord_oldbl_paradef | 1 | Use para default mechanism |
| textord_oldbl_split_splines | 1 | Split stepped splines |
| textord_oldbl_merge_parts | 1 | Merge suspect partitions |
| oldbl_corrfix | 1 | Improve correlation of heights |
| oldbl_xhfix | 0 | Fix bug in modes threshold for xheights |
| textord_ocropus_mode | 0 | Make baselines for ocropus |
| textord_tabfind_only_strokewidths | 0 | Only run stroke widths |
| textord_tabfind_show_initialtabs | 0 | Show tab candidates |
| textord_tabfind_show_finaltabs | 0 | Show tab vectors |
| textord_show_tables | 0 | Show table regions |
| textord_tablefind_show_mark | 0 | Debug table marking steps in detail |
| textord_tablefind_show_stats | 0 | Show page stats used in table finding |
| textord_tablefind_recognize_tables | 0 | Enables the table recognizer for table layout and filtering. |
| textord_all_prop | ||
| textord_debug_pitch_test | ||
| textord_disable_pitch_test | ||
| textord_fast_pitch_test | ||
| textord_debug_pitch_metric | ||
| textord_show_row_cuts | ||
| textord_show_page_cuts | ||
| textord_pitch_cheat | ||
| textord_blockndoc_fixed | ||
| textord_show_initial_words | ||
| textord_show_new_words | ||
| textord_show_fixed_words | ||
| textord_blocksall_fixed | ||
| textord_blocksall_prop | ||
| textord_blocksall_testing | ||
| textord_test_mode | ||
| textord_pitch_rowsimilarity | ||
| words_initial_lower | ||
| words_initial_upper | ||
| words_default_prop_nonspace | ||
| words_default_fixed_space | ||
| words_default_fixed_limit | ||
| textord_words_definite_spread | ||
| textord_spacesize_ratiofp | ||
| textord_spacesize_ratioprop | ||
| textord_fpiqr_ratio | ||
| textord_max_pitch_iqr | ||
| textord_fp_min_width | ||
| textord_underline_offset | ||
| ambigs_debug_level | ||
| classify_debug_level | ||
| classify_norm_method | ||
| matcher_debug_level | ||
| matcher_debug_flags | ||
| classify_learning_debug_level | ||
| matcher_permanent_classes_min | ||
| matcher_min_examples_for_ 原型設計 | ||
| matcher_sufficient_examples_ for_prototyping | ||
| classify_adapt_proto_threshold | ||
| classify_adapt_feature_threshold | ||
| classify_class_pruner_threshold | ||
| classify_class_pruner_multiplier | ||
| classify_cp_cutoff_strength | ||
| classify_integer_matcher_multiplier | ||
| dawg_debug_level | ||
| hyphen_debug_level | ||
| stopper_smallword_size | ||
| stopper_debug_level | ||
| tessedit_truncate_wordchoice_log | ||
| max_permuter_attempts | ||
| repair_unchopped_blobs | ||
| chop_debug | ||
| chop_split_length | ||
| chop_same_distance | ||
| chop_min_outline_points | ||
| chop_seam_pile_size | ||
| chop_inside_angle | ||
| chop_min_outline_area | ||
| chop_centered_maxwidth | ||
| chop_x_y_weight | ||
| wordrec_debug_level | ||
| wordrec_max_join_chunks | ||
| segsearch_debug_level | ||
| segsearch_max_pain_points | ||
| segsearch_max_futile_classifications | ||
| language_model_debug_level | ||
| language_model_ngram_order | ||
| language_model_viterbi_list_ max_num_prunable | ||
| language_model_viterbi_list_max_size | ||
| language_model_min_compound_length | ||
| wordrec_display_segmentations | ||
| tessedit_pageseg_mode | ||
| tessedit_ocr_engine_mode | ||
| pageseg_devanagari_split_strategy | ||
| ocr_devanagari_split_strategy | ||
| bidi_debug | ||
| applybox_debug | ||
| applybox_page | ||
| tessedit_bigram_debug | ||
| 除錯雜訊移除 | ||
| noise_maxperblob | ||
| noise_maxperword | ||
| debug_x_ht_level | ||
| quality_min_initial_alphas_reqd | ||
| tessedit_tess_adaption_mode | ||
| multilang_debug_level | ||
| paragraph_debug_level | ||
| tessedit_preserve_min_wd_len | ||
| crunch_rating_max | ||
| crunch_pot_indicators | ||
| crunch_leave_lc_strings | ||
| crunch_leave_uc_strings | ||
| crunch_long_repetitions | ||
| crunch_debug | ||
| fixsp_non_noise_limit | ||
| fixsp_done_mode | ||
| debug_fix_space_level | ||
| x_ht_acceptance_tolerance | ||
| x_ht_min_change | ||
| superscript_debug | ||
| jpg_quality | ||
| user_defined_dpi | ||
| min_characters_to_try | ||
| suspect_level | ||
| suspect_short_words | ||
| tessedit_reject_mode | ||
| tessedit_image_border | ||
| min_sane_x_ht_pixels | ||
| tessedit_page_number | ||
| tessedit_parallelize | ||
| lstm_choice_mode | ||
| lstm_choice_iterations | ||
| tosp_debug_level | ||
| tosp_enough_space_samples_for_median | ||
| tosp_redo_kern_limit | ||
| tosp_few_samples | ||
| tosp_short_row | ||
| tosp_sanity_method | ||
| textord_max_noise_size | ||
| textord_baseline_debug | ||
| textord_noise_sizefraction | ||
| textord_noise_translimit | ||
| textord_noise_sncount | ||
| use_ambigs_for_adaption | ||
| prioritize_division | ||
| classify_enable_learning | ||
| tess_cn_matching | ||
| tess_bn_matching | ||
| classify_enable_adaptive_matcher | ||
| classify_use_pre_adapted_templates | ||
| classify_save_adapted_templates | ||
| classify_enable_adaptive_debugger | ||
| classify_nonlinear_norm | ||
| disable_character_fragments | ||
| classify_debug_character_fragments | ||
| matcher_debug_separate_windows | ||
| classify_bln_numeric_mode | ||
| load_system_dawg | ||
| load_freq_dawg | ||
| load_unambig_dawg | ||
| load_punc_dawg | ||
| load_number_dawg | ||
| load_bigram_dawg | ||
| use_only_first_uft8_step | ||
| stopper_no_acceptable_choices | ||
| segment_nonalphabetic_script | ||
| save_doc_words | ||
| merge_fragments_in_matrix | ||
| wordrec_enable_assoc | ||
| force_word_assoc | ||
| chop_enable | ||
| chop_vertical_creep | ||
| chop_new_seam_pile | ||
| assume_fixed_pitch_char_segment | ||
| wordrec_skip_no_truth_words | ||
| wordrec_debug_blamer | ||
| wordrec_run_blamer | ||
| save_alt_choices | ||
| language_model_ngram_on | ||
| language_model_ngram_use_
only_first_uft8_step | ||
| language_model_ngram_space_ delimited_language | ||
| language_model_use_sigmoidal_certainty | ||
| tessedit_resegment_from_boxes | ||
| tessedit_resegment_from_line_boxes | ||
| tessedit_train_from_boxes | ||
| tessedit_make_boxes_from_boxes | ||
| tessedit_train_line_recognizer | ||
| tessedit_dump_pageseg_images | ||
| tessedit_do_invert | ||
| tessedit_ambigs_training | ||
| tessedit_adaption_debug | ||
| applybox_learn_chars_and_char_frags_mode | ||
| applybox_learn_ngrams_mode | ||
| tessedit_display_outwords | ||
| tessedit_dump_choices | ||
| tessedit_timing_debug | ||
| tessedit_fix_fuzzy_spaces | ||
| tessedit_unrej_any_wd | ||
| tessedit_fix_hyphens | ||
| tessedit_enable_doc_dict | ||
| tessedit_debug_fonts | ||
| tessedit_debug_block_rejection | ||
| tessedit_enable_bigram_correction | ||
| tessedit_enable_dict_correction | ||
| enable_noise_removal | ||
| tessedit_minimal_rej_pass1 | ||
| tessedit_test_adaption | ||
| test_pt | ||
| 段落文本 | ||
| lstm_use_matrix | ||
| tessedit_good_quality_unrej | ||
| tessedit_use_reject_spaces | ||
| tessedit_preserve_blk_rej_perfect_wds | ||
| tessedit_preserve_row_rej_perfect_wds | ||
| tessedit_dont_blkrej_good_wds | ||
| tessedit_dont_rowrej_good_wds | ||
| tessedit_row_rej_good_docs | ||
| tessedit_reject_bad_qual_wds | ||
| tessedit_debug_doc_rejection | ||
| tessedit_debug_quality_metrics | ||
| bland_unrej | ||
| unlv_tilde_crunching | ||
| hocr_font_info | ||
| hocr_char_boxes | ||
| crunch_early_merge_tess_fails | ||
| crunch_early_convert_bad_unlv_chs | ||
| crunch_terrible_garbage | ||
| crunch_leave_ok_strings | ||
| crunch_accept_ok | ||
| crunch_leave_accept_strings | ||
| crunch_include_numerals | ||
| tessedit_prefer_joined_punct | ||
| tessedit_write_block_separators | ||
| tessedit_write_rep_codes | ||
| tessedit_write_unlv | ||
| tessedit_create_txt | ||
| tessedit_create_hocr | ||
| tessedit_create_alto | ||
| tessedit_create_lstmbox | ||
| tessedit_create_tsv | ||
| tessedit_create_wordstrbox | ||
| tessedit_create_pdf | ||
| textonly_pdf | ||
| suspect_constrain_1Il | ||
| tessedit_minimal_rejection | ||
| tessedit_zero_rejection | ||
| tessedit_word_for_word | ||
| tessedit_zero_kelvin_rejection | ||
| tessedit_rejection_debug | ||
| tessedit_flip_0O | ||
| rej_trust_doc_dawg | ||
| rej_1Il_use_dict_word | ||
| rej_1Il_trust_permuter_type | ||
| rej_use_tess_accepted | ||
| rej_use_tess_blanks | ||
| rej_use_good_perm | ||
| rej_use_sensible_wd | ||
| rej_alphas_in_number_perm | ||
| tessedit_create_boxfile | ||
| tessedit_write_images | ||
| interactive_display_mode | ||
| tessedit_override_permuter | ||
| tessedit_use_primary_params_model | ||
| textord_tabfind_show_vlines | ||
| textord_use_cjk_fp_model | ||
| poly_allow_detailed_fx | ||
| tessedit_init_config_only | ||
| textord_equation_detect | ||
| textord_tabfind_vertical_text | ||
| textord_tabfind_force_vertical_text | ||
| preserve_interword_spaces | ||
| pageseg_apply_music_mask | ||
| textord_single_height_mode | ||
| tosp_old_to_method | ||
| tosp_old_to_constrain_sp_kn | ||
| tosp_only_use_prop_rows | ||
| tosp_force_wordbreak_on_punct | ||
| tosp_use_pre_chopping | ||
| tosp_old_to_bug_fix | ||
| tosp_block_use_cert_spaces | ||
| tosp_row_use_cert_spaces | ||
| tosp_narrow_blobs_not_cert | ||
| tosp_row_use_cert_spaces1 | ||
| tosp_recovery_isolated_row_stats | ||
| tosp_only_small_gaps_for_kern | ||
| tosp_all_flips_fuzzy | ||
| tosp_fuzzy_limit_all | ||
| textord_no_rejects | ||
| textord_show_blobs | ||
| textord_show_boxes | ||
| textord_noise_rejwords | ||
| textord_noise_rejrows | ||
| textord_noise_debug | ||
| classify_learn_debug_str | ||
| user_words_file | ||
| user_words_suffix | ||
| user_patterns_file | ||
| user_patterns_suffix | ||
| output_ambig_words_file | ||
| word_to_debug | ||
| tessedit_char_blacklist | ||
| tessedit_char_whitelist | ||
| tessedit_char_unblacklist | ||
| tessedit_write_params_to_file | ||
| applybox_exposure_pattern | ||
| chs_leading_punct('`" | ||
| chs_trailing_punct1 | ||
| chs_trailing_punct2)'`" | ||
| outlines_odd | %| | 大綱數量不符合標準 |
| outlines_2ij!?%":; | 大綱數量不符合標準 | |
| numeric_punctuation | ., | Punct. chs expected WITHIN numbers |
| unrecognised_char | | | Output char for unidentified blobs |
| ok_repeated_ch_non_alphanum_wds | -?*= | Allow NN to unrej |
| conflict_set_I_l_1 | Il1 [] | Il1 conflict set |
| file_type | .tif | Filename extension |
| tessedit_load_sublangs | ||
| page_separator | ||
| classify_char_norm_range | ||
| classify_max_rating_ratio | ||
| classify_max_certainty_margin | ||
| matcher_good_threshold | ||
| matcher_reliable_adaptive_result | ||
| matcher_perfect_threshold | ||
| matcher_bad_match_pad | ||
| matcher_rating_margin | ||
| matcher_avg_noise_size | ||
| matcher_clustering_max_angle_delta | ||
| classify_misfit_junk_penalty | ||
| 評分標準 | ||
| certainty_scale | ||
| tessedit_class_miss_scale | ||
| classify_adapted_pruning_factor | ||
| classify_adapted_pruning_threshold | ||
| classify_character_fragments_ garbage_certainty_threshold | ||
| speckle_large_max_size | ||
| speckle_rating_penalty | ||
| xheight_penalty_subscripts | ||
| xheight_penalty_inconsistent | ||
| segment_penalty_dict_frequent_word | ||
| segment_penalty_dict_case_ok | ||
| segment_penalty_dict_case_bad | ||
| segment_penalty_dict_nonword | ||
| certainty_scale | ||
| stopper_nondict_certainty_base | ||
| stopper_phase2_certainty_rejection_offset | ||
| stopper_certainty_per_char | ||
| stopper_allowable_character_badness | ||
| doc_dict_pending_threshold | ||
| doc_dict_certainty_threshold | ||
| tessedit_certainty_threshold | ||
| chop_split_dist_knob | ||
| chop_overlap_knob | ||
| chop_center_knob | ||
| chop_sharpness_knob | ||
| chop_width_change_knob | ||
| chop_ok_split | ||
| chop_good_split | ||
| segsearch_max_char_wh_ratio |
為獲得最佳效果,建議在執行 OCR 之前,先使用 IronOCR 的影像預處理濾鏡。 這些篩選器能顯著提升準確度,特別是在處理低品質掃描檔或表格等複雜文件時。
常見問題
如何在 C# 中設定 IronTesseract 進行 OCR?
要設定 IronTesseract,請建立一個 IronTesseract 實例,並設定 Language 和 Configuration 等屬性。 您可以指定 OCR 語言(從 125 種支援的語言中選擇)、啟用 BarCode 讀取、設定可搜尋 PDF 輸出,以及設定字元白名單。例如:var tesseract = new IronOcr.IronTesseract { Language = IronOcr.OcrLanguage.English, Configuration = new IronOcr.TesseractConfiguration { ReadBarCodes = false, RenderSearchablePdf = true } };
IronTesseract 支援哪些輸入格式?
IronTesseract 透過 OcrInput 類別支援多種輸入格式。您可以處理圖像(PNG、JPG 等)、PDF 檔案以及掃描文件。OcrInput 類別提供了靈活的方法來載入這些不同格式,使您能夠輕鬆對幾乎任何包含文字的文件執行 OCR 處理。
我能否使用 IronTesseract 同時讀取 BarCode 與文字?
是的,IronTesseract 具備進階 BarCode 讀取功能。您可透過在 TesseractConfiguration 中將 ReadBarCodes 設為 true 來啟用 BarCode 偵測功能。這讓您能在單次 OCR 操作中,從同一份文件中同時擷取文字與 BarCode 資料。
如何將掃描文件轉為可搜尋的 PDF 檔案?
IronTesseract 可透過在 TesseractConfiguration 中設定 RenderSearchablePdf = true,將掃描文件和圖像轉換為可搜尋的 PDF 檔案。此設定會產生文字可選取且可搜尋的 PDF 檔案,同時保留原始文件的外觀。
IronTesseract 支援哪些語言的 OCR 功能?
IronTesseract 支援 125 種國際語言的文字辨識。您可以透過設定 IronTesseract 實例的 Language 屬性來指定語言,例如 IronOcr.OcrLanguage.English、Spanish、Chinese、Arabic 以及其他多種語言。
我可以限制 OCR 過程中識別的字元嗎?
是的,IronTesseract 允許透過 TesseractConfiguration 中的 WhiteListCharacters 屬性來設定字元白名單與黑名單。當您已知預期字元集(例如僅限識別英數字元)時,此功能有助於提升辨識準確度。
如何同時對多個文件執行 OCR 處理?
IronTesseract 支援多執行緒功能,可進行批次處理。您可以利用平行處理同時對多個文件進行 OCR 處理,在處理大量圖片或 PDF 檔案時,能顯著提升效能。
IronOCR 使用哪個版本的 Tesseract?
IronOCR 採用了 Tesseract 5 的客製化與優化版本,稱為 Iron Tesseract。相較於標準的 Tesseract 實作,此增強版引擎在維持與 .NET 應用程式相容性的同時,提供了更高的準確度與效能。
IronOCR 如何提升資料準確性?
IronOCR 透過其先進的辨識演算法與影像校正功能來提升資料準確性,確保文字擷取過程既可靠又精確。
IronOCR 是否有提供免費試用版?
是的,Iron Software 提供 IronOCR 的免費試用版,讓使用者能在決定購買前測試其功能與效能。

