IronOCR 操作指南 C# Tesseract 配置 如何在 C# 中使用 Iron Tesseract Curtis Chau 更新:1月 10, 2026 下載 IronOCR NuGet 下載 DLL 下載 Windows 安裝程式 開始免費試用 法學碩士副本 法學碩士副本 將頁面複製為 Markdown 格式,用於 LLMs 在 ChatGPT 中打開 請向 ChatGPT 諮詢此頁面 在雙子座打開 請向 Gemini 詢問此頁面 在 Grok 中打開 向 Grok 詢問此頁面 打開困惑 向 Perplexity 詢問有關此頁面的信息 分享 在 Facebook 上分享 分享到 X(Twitter) 在 LinkedIn 上分享 複製連結 電子郵件文章 This article was translated from English: Does it need improvement? Translated View the article in English C# 中 Iron Tesseract 的使用方法是建立一個 IronTesseract 實例,使用語言和 OCR 設定來設定它,然後在包含您的影像或 PDF 的 OcrInput 物件上呼叫 Read() 方法。 這會使用 Tesseract 5 的最佳化引擎將文字影像轉換成可搜尋的 PDF。 IronOCR 提供了一個直覺的 API,用於使用自訂和優化的 Tesseract 5,即 Iron Tesseract。 透過 IronOCR 和 IronTesseract 的使用,您將能夠把文字和掃描文件的影像轉換成文字和可搜尋的 PDF。 該函式庫支援 125 種國際語言,並包含 條碼讀取和 電腦視覺等進階功能。 快速入門:在 C# 中設定 IronTesseract 配置 本範例示範如何以特定設定配置 IronTesseract 並執行 OCR,只需一行程式碼即可完成。 立即開始使用 NuGet 建立 PDF 檔案: 使用 NuGet 套件管理器安裝 IronOCR PM > Install-Package IronOcr 複製並運行這段程式碼。 var result = new IronOcr.IronTesseract { Language = IronOcr.OcrLanguage.English, Configuration = new IronOcr.TesseractConfiguration { ReadBarCodes = false, RenderSearchablePdf = true, WhiteListCharacters = "ABCabc123" } }.Read(new IronOcr.OcrInput("image.png")); 部署到您的生產環境進行測試 立即開始在您的專案中使用 IronOCR,免費試用! 免費試用30天 基本 OCR 工作流程 使用 NuGet 安裝 OCR 庫以讀取映像 Utilize Custom `Tesseract 5` to perform OCR 加載所需的文件,如圖像或PDF文件進行處理 將提取的文本輸出到控制台或文件 將結果保存為可搜尋的PDF 如何建立 IronTesseract Instance? <!--說明:顯示程式碼執行輸出或結果的截圖 --> 使用此代碼初始化 Tesseract 物件: ```csharp :path=/static-assets/ocr/content-code-examples/how-to/irontesseract-initialize-irontesseract.cs ``` 您可以自訂 `IronTesseract` 的行為,方法是選擇不同的語言、啟用條碼讀取功能,以及將字元列入白名單/黑名單。 IronOCR 提供全面的[設定選項](https://ironsoftware.com/csharp/ocr/examples/csharp-configure-setup-tesseract/)來微調您的 OCR 程序: ```csharp :path=/static-assets/ocr/content-code-examples/how-to/irontesseract-configure-irontesseract.cs ``` 配置完成後,您就可以使用 Tesseract 功能來讀取 `OcrInput` 物件。 [OcrInput 類別](https://ironsoftware.com/csharp/ocr/examples/csharp-ocr-input-for-iron-tesseract/)提供彈性的方法來載入各種輸入格式: ```csharp :path=/static-assets/ocr/content-code-examples/how-to/irontesseract-read.cs ``` 針對複雜的情境,您可以利用 [ 多執行緒功能](https://ironsoftware.com/csharp/ocr/examples/csharp-tesseract-multithreading-for-speed/) 來同時處理多個文件,大幅提升批次作業的效能。 什麼是進階 Tesseract 配置變數? <!--說明:說明程式碼概念的圖表或截圖 --> IronOcr Tesseract 介面允許透過[IronOcr.TesseractConfiguration 類別](/csharp/ocr/object-reference/api/IronOcr.TesseractConfiguration.html)完全控制 Tesseract 配置變數。 這些進階設定可讓您針對特定使用個案最佳化 OCR 效能,例如[修正低品質掃描](https://ironsoftware.com/csharp/ocr/examples/ocr-low-quality-scans-tesseract/)或[讀取特定文件類型](https://ironsoftware.com/csharp/ocr/tutorials/read-specific-document/)。 如何在程式碼中使用 Tesseract 配置? ```csharp :path=/static-assets/ocr/content-code-examples/how-to/irontesseract-tesseract-configuration.cs ``` IronOCR 還針對不同的文件類型提供了專門的配置。例如,當[讀取護照](https://ironsoftware.com/csharp/ocr/examples/read-passport/)或[處理 MICR 支票](https://ironsoftware.com/csharp/ocr/examples/read-micr-cheque/)時,您可以套用特定的預處理濾鏡和區域偵測,以提高精確度。 財務文件的配置範例: ```csharp // Example: Configure for financial documents IronTesseract ocr = new IronTesseract { Language = OcrLanguage.English, Configuration = new TesseractConfiguration { PageSegmentationMode = TesseractPageSegmentationMode.SingleBlock, TesseractVariables = new Dictionary { ["tessedit_char_whitelist"] = "0123456789.$,", ["textord_heavy_nr"] = false, ["edges_max_children_per_outline"] = 10 } } }; // Apply preprocessing filters for better accuracy using OcrInput input = new OcrInput(); input.LoadPdf("financial-document.pdf"); input.Deskew(); input.EnhanceResolution(300); OcrResult result = ocr.Read(input); ``` 所有 Tesseract 設定變數的完整清單是什麼? <!--說明:說明程式碼概念的圖表或截圖 --> 這些可以透過`IronTesseract.Configuration.TesseractVariables["key"] = value;`進行設定。 配置變數可讓您針對特定文件微調 OCR 行為,以獲得最佳效果。 如需有關最佳化 OCR 效能的詳細指導,請參閱我們的 [ 快速 OCR 配置指南](https://ironsoftware.com/csharp/ocr/examples/tune-tesseract-for-speed-in-dotnet/)。 Tesseract 配置變數 預設 意義 分類_num_cp_levels3類別剪枝器等級數 textord_debug_tabfind0調試選項卡查找 textord_debug_bugs0啟用與製表符查找錯誤相關的輸出 textord_testregion_left-1調試報告矩形的左邊緣 textord_testregion_top-1調試報告矩形的頂部邊緣 textord_testregion_right2147483647調試矩形的右邊緣 textord_testregion_bottom2147483647調試矩形的底部邊緣 textord_tabfind_show_partitions0顯示分區邊界,如果大於 1 則等待。 devanagari_split_debuglevel0拆分 shiro-rekha 進程的調試等級。 edges_max_children_per_outline10角色輪廓內子角色的最大數量 邊緣_max_children_layer5角色輪廓內嵌套子角色的最大層數 每個孫子的邊緣子節點10拋擲輪廓的重要性比率 邊緣_子數限制45斑點中允許的最大孔數 邊緣_min_nonhole12方框內潛在字元的最小像素 邊緣路徑面積比率40Max lensq/area for acceptable child outline textord_fp_chop_error2最大允許的切割單元彎曲度 textord_tabfind_show_images0Show image blobs textord_skewsmooth_offset4對於平滑因子 textord_skewsmooth_offset21對於平滑因子 textord_test_x-2147483647測試點座標 textord_test_y-2147483647測試點座標 textword_min_blobs_in_row4梯度計數前的最小斑點數 textord_spline_minblobs8Min blobs in each spline segment textord_spline_medianwin6Size of window for spline segmentation textord_max_blob_overlaps4Max number of blobs a big blob can overlap textord_min_xheight10Min credible pixel xheight textord_lms_line_trials12Number of linew fits to do oldbl_holed_losscount10Max lost before fallback line used pitsync_linear_version6Use new fast algorithm pitsync_fake_depth1Max advance fake generation textord_tabfind_show_strokewidths0Show stroke widths textord_dotmatrix_gap3Max pixel gap for broken pixed pitch textord_debug_block0Block to do debug on 文字音調範圍2Max range test on pitch textord_words_veto_power5Rows required to outvote a veto equationdetect_save_bi_image0Save input bi image equationdetect_save_spt_image0Save special character image 方程式檢測保存種子影像0Save the seed image 方程式檢測保存合併影像0Save the merged image poly_debug0Debug old poly poly_wide_objects_better1More accurate approx on wide things wordrec_display_splits0Display splits textord_debug_printable0Make debug windows printable textord_space_size_is_variable0If true, word delimiter spaces are assumed to have variable width, even though characters have fixed pitch. textord_tabfind_show_initial_partitions0Show partition bounds textord_tabfind_show_reject_blobs0Show blobs rejected as noise textord_tabfind_show_columns0Show column bounds textord_tabfind_show_blocks0Show final block bounds textord_tabfind_find_tables1run table detection devanagari_split_debugimage0Whether to create a debug image for split shiro-rekha process. textord_show_fixed_cuts0Draw fixed pitch cell boundaries 邊緣使用新的輪廓複雜性0Use the new outline complexity module 邊緣調試0turn on debugging for this module 邊緣_children_fix0Remove boxy parents of char-like children gapmap_debug0Say which blocks have tables gapmap_use_ends0Use large space at start and end of rows gapmap_noo_isolated_quanta0Ensure gaps not less than 2quanta wide textord_heavy_nr0Vigorously remove noise textord_show_initial_rows0Display row accumulation textord_show_parallel_rows0Display page correlated rows textord_show_expanded_rows0Display rows after expanding textord_show_final_rows0Display rows after final fitting textord_show_final_blobs0Display blob bounds after pre-ass textord_test_landscape0Tests refer to land/port textord_parallel_baselines1Force parallel baselines textord_straight_baselines0Force straight baselines textord_old_baselines1Use old baseline algorithm textord_old_xheight0Use old xheight algorithm textord_fix_xheight_bug1Use spline baseline textord_fix_makerow_bug1Prevent multiple baselines textord_debug_xheights0Test xheight algorithms textord_biased_skewcalc1Bias skew estimates with line length textord_interpolating_skew1Interpolate across gaps textord_new_initial_xheight1Use test xheight mechanism textord_debug_blob0Print test blob information textord_really_old_xheight0Use original wiseowl xheight textord_oldbl_debug0Debug old baseline generation textord_debug_baselines0Debug baseline generation textord_oldbl_paradef1Use para default mechanism textord_oldbl_split_splines1Split stepped splines textord_oldbl_merge_parts1Merge suspect partitions oldbl_corrfix1Improve correlation of heights oldbl_xhfix0Fix bug in modes threshold for xheights textord_ocropus_mode0Make baselines for ocropus textord_tabfind_only_strokewidths0Only run stroke widths textord_tabfind_show_initialtabs0Show tab candidates textord_tabfind_show_finaltabs0Show tab vectors textord_show_tables0Show table regions textord_tablefind_show_mark0Debug table marking steps in detail textord_tablefind_show_stats0Show page stats used in table finding textord_tablefind_recognize_tables0Enables the table recognizer for table layout and filtering. textord_all_prop0All doc is proportial text textord_debug_pitch_test0Debug on fixed pitch test textord_disable_pitch_test0Turn off dp fixed pitch algorithm textord_fast_pitch_test0Do even faster pitch algorithm textord_debug_pitch_metric0Write full metric stuff textord_show_row_cuts0Draw row-level cuts textord_show_page_cuts0Draw page-level cuts textord_pitch_cheat0Use correct answer for fixed/prop textord_blockndoc_fixed0Attempt whole doc/block fixed pitch textord_show_initial_words0Display separate words textord_show_new_words0Display separate words textord_show_fixed_words0Display forced fixed pitch words textord_blocksall_fixed0Moan about prop blocks textord_blocksall_prop0Moan about fixed pitch blocks textord_blocksall_testing0Dump stats when moaning 文字測試模式0Do current test textord_pitch_rowsimilarity0.08Fraction of xheight for sameness 字首下0.5Max initial cluster size 詞彙0.15Min initial cluster spacing 字詞_預設_prop_nonspace0.25Fraction of xheight words_default_fixed_space0.75Fraction of xheight 預設字數限制0.6Allowed size variance textord_words_definite_spread0.3Non-fuzzy spacing region textord_spacesize_ratiofp2.8Min ratio space/nonspace textord_spacesize_ratioprop2Min ratio space/nonspace textord_fpiqr_ratio1.5Pitch IQR/Gap IQR threshold textord_max_pitch_iqr0.2Xh fraction noise in pitch textord_fp_min_width0.5Min width of decent blobs textord_underline_offset0.1Fraction of x to ignore ambigs_debug_level0Debug level for unichar ambiguities 分類調試級別0Classify debug level 分類規範方法1Normalization Method ... 匹配器調試級別0Matcher Debug Level 匹配器調試標誌0Matcher Debug Flags 分類學習調試級別0Learning Debug Level: matcher_permanent_classes_min1Min # of permanent classes matcher_min_examples_for_prototyping3Reliable Config Threshold 用於原型設計的充分範例匹配器5Enable adaption even if the ambiguities have not been seen 分類_適應_原型_閾值230Threshold for good protos during adaptive 0-255 分類_適應_特徵_閾值230Threshold for good features during adaptive 0-255 分類剪枝器閾值229Class Pruner Threshold 0-255 分類剪枝乘數15Class Pruner Multiplier 0-255: 分類_cp_截止強度7Class Pruner CutoffStrength: 分類整數匹配器乘數10Integer Matcher Multiplier 0-255: dawg_debug_level0Set to 1 for general debug info, to 2 for more details, to 3 to see all the debug messages hyphen_debug_level0Debug level for hyphenated words. 小字體大小2Size of dict word to be treated as non-dict word stopper_debug_level0Stopper debug level tessedit_truncate_wordchoice_log10Max words to keep in list 最大置換嘗試次數10000Maximum number of different character choices to consider during permutation. This limit is especially useful when user patterns are specified, since overly generic patterns can result in dawg search exploring an overly large number of options. 修復未切碎的斑點1Fix blobs that aren't chopped chop_debug0Chop debug 分割長度10000Split Length 相同距離2Same distance 砍伐最小輪廓點6Min Number of Points on Outline 剪縫絨毛尺寸150Max number of seams in seam_pile 切內角-50Min Inside Angle Bend 砍掉最小輪廓區域2000Min Outline Area 截斷居中最大寬度90Width of (smaller) chopped blobs above which we don't care that a chop is not near the center. 砍伐 x 和 y 重量3X / Y length weight wordrec_debug_level0Debug level for wordrec wordrec_max_join_chunks4Max number of broken pieces to associate segsearch_debug_level0SegSearch debug level segsearch_max_pain_points2000Maximum number of pain points stored in the queue segsearch_max_futile_classifications20Maximum number of pain point classifications per chunk that did not result in finding a better word choice. 語言_模型_debug_level0Language model debug level 語言模型_ngram_order8Maximum order of the character ngram model language_model_viterbi_list_max_num_prunable10Maximum number of prunable (those for which PrunablePath() is true) entries in each viterbi list recorded in BLOB_CHOICEs 語言_模型_viterbi_list_max_size500Maximum size of viterbi lists recorded in BLOB_CHOICEs 語言_模型_最小複合長度3Minimum length of compound words wordrec_display_segmentations0Display Segmentations tessedit_pageseg_mode6Page seg mode: 0=osd only, 1=auto+osd, 2=auto_only, 3=auto, 4=column, 5=block_vert, 6=block, 7=line, 8=word, 9=word_circle, 10=char,11=sparse_text, 12=sparse_text+osd, 13=raw_line (Values from PageSegMode enum in tesseract/publictypes.h) tessedit_ocr_engine_mode2Which OCR engine(s) to run (Tesseract, LSTM, both). 預設s to loading and running the most accurate available. 頁面eg_devanagari_split_strategy0Whether to use the top-line splitting process for Devanagari documents while performing page-segmentation. ocr_devanagari_split_strategy0Whether to use the top-line splitting process for Devanagari documents while performing ocr. 雙向除錯0Debug level for BiDi applybox_debug1Debug level applybox_page0Page number to apply boxes from tessedit_bigram_debug0Amount of debug output for bigram correction. 調試噪音消除0Debug reassignment of small outlines 噪音最大值8Max diacritics to apply to a blob 噪音_詞彙16Max diacritics to apply to a word debug_x_ht_level0Reestimate debug quality_min_initial_alphas_reqd2alphas in a good word tessedit_tess_adaption_mode39Adaptation decision algorithm for tess 多語言調試級別0Print multilang debug info. 段落調試級別0Print paragraph debug info. tessedit_preserve_min_wd_len2Only preserve wds longer than this crunch_rating_max10For adj length in rating per ch crunch_pot_indicators1How many potential indicators needed crunch_leave_lc_strings4Don't crunch words with long lower case strings crunch_leave_uc_strings4Don't crunch words with long lower case strings 長時間重複訓練3Crunch words with long repetitions crunch_debug0As it says fixsp_non_noise_limit1How many non-noise blbs either side? fixsp_done_mode1What constitues done for spacing 調試修復空間級別0Contextual fixspace debug x_ht_acceptance_tolerance8Max allowed deviation of blob top outside of font data x_ht_min_change8Min change in xht before actually trying it 上標調試0Debug level for sub & superscript fixer jpg_質量85Set JPEG quality level 使用者自訂DPI0Specify DPI for input image 最小字元數50Specify minimum characters to try during OSD suspect_level99Suspect marker level suspect_short_words2Don't suspect dict wds longer than this tessedit_reject_mode0Rejection algorithm tessedit_image_border2Rej blbs near image edge limit 最小正常 x 高度像素8Reject any x-ht lt or eq than this tessedit_page_number-1-1 -> All pages, else specific page to process tessedit_parallelize1Run in parallel where possible lstm_選擇模式2Allows to include alternative symbols choices in the hOCR output. Valid input values are 0, 1 and 2. 0 is the default value. With 1 the alternative symbol choices per timestep are included. With 2 alternative symbol choices are extracted from the CTC process instead of the lattice. The choices are mapped per character. lstm_choice_iterations5Sets the number of cascading iterations for the Beamsearch in lstm_選擇模式. Note that lstm_選擇模式 must be set to a value greater than 0 to produce results. tosp_debug_level0Debug data tosp_enough_space_samples_for_median3or should we use mean tosp_redo_kern_limit10No.samples reqd to reestimate for row tosp_few_samples40No.gaps reqd with 1 large gap to treat as a table tosp_短行20No.gaps reqd with few cert spaces to use certs tosp_sanity_method1How to avoid being silly textord_max_noise_size7Pixel size of noise textord_baseline_debug0Baseline debug level textord_noise_sizefraction10Fraction of size for maxima textord_noise_translimit16Transitions for normal blob textord_noise_sncount1super norm blobs to save row 使用歧義進行適應0Use ambigs for deciding whether to adapt to a character 優先劃分0Prioritize blob division over chopping 分類啟用學習1Enable adaptive classifier tess_cn_matching0Character Normalized Matching tess_bn_matching0Baseline Normalized Matching 啟用自適應匹配器1Enable adaptive classifier 分類_使用_預先調整的模板0Use pre-adapted classifier templates 分類_儲存_已適配模板0Save adapted templates to a file 啟用自適應調試器0Enable match debugger 分類非線性範數0Non-linear stroke-density normalization disable_character_fragments1Do not include character fragments in the results of the classifier 分類除錯字元片段0Bring up graphical debugging windows for fragments training Matcher_debug_separate_windows0Use two different windows for debugging the matching: One for the protos and one for the features. 分類_bln_numeric_mode0Assume the input is numbers [0-9]. 載入系統狗1Load system word dawg. 載入頻率_dawg1Load frequent word dawg. 載入無歧義的狗1Load unambiguous word dawg. 載入_punc_dawg1Load dawg with punctuation patterns. 載入編號_dawg1Load dawg with number patterns. 加載雙字母狗1Load dawg with special word bigrams. 只使用第一個 uft8_step0Use only the first UTF8 step of the given string when computing log probabilities. stopper_no_acceptable_choices0Make AcceptableChoice() always return false. Useful when there is a need to explore all segmentations 段非字母腳本0Don't use any alphabetic-specific tricks. Set to true in the traineddata config file for scripts that are cursive or inherently fixed-pitch 儲存文件0Save Document Words 合併矩陣中的片段1Merge the fragments in the ratings matrix and delete them after merging wordrec_enable_assoc1Associator Enable force_word_assoc0force associator to run regardless of what enable_assoc is. This is used for CJK where component grouping is necessary. 啟用1Chop enable 垂直爬行0Vertical creep 砍新縫堆1Use new seam_pile 假設固定音高字符段0include fixed-pitch heuristics in char segmentation wordrec_skip_noo_truth_words0Only run OCR for words that had truth recorded in BlamerBundle wordrec_debug_blamer0Print blamer debug messages wordrec_run_blamer0Try to set the blame for errors 儲存備選方案1Save alternative paths found during chopping and segmentation search language_model_ngram_on0Turn on/off the use of character ngram model language_model_ngram_use_ only_first_uft8_step0Use only the first UTF8 step of the given string when computing log probabilities. language_model_ngram_space_delimited_language1Words are delimited by space 語言_模型_使用_σmoidal_確定性0Use sigmoidal score for certainty tessedit_resegment_from_boxes0Take segmentation and labeling from box file tessedit_resegment_from_line_boxes0Conversion of word/line box file to char box file tessedit_train_from_boxes0Generate training data from boxed chars tessedit_make_boxes_from_boxes0Generate more boxes from boxed chars tessedit_train_line_recognizer0Break input into lines and remap boxes if present tessedit_dump_pageseg_images0Dump intermediate images made during page segmentation tessedit_doo_invert1Try inverting the image in LSTMRecognizeWord tessedit_ambigs_training0Perform training for ambiguities tessedit_adaption_debug0Generate and print debug information for adaption applybox_learn_chars_and_char_frags_mode0Learn both character fragments (as is done in the special low exposure mode) as well as unfragmented characters. applybox_learn_ngrams_mode0Each bounding box is assumed to contain ngrams. Only learn the ngrams whose outlines overlap horizontally. tessedit_display_outwords0Draw output words tessedit_dump_choices0Dump char choices tessedit_timing_debug0Print timing stats tessedit_fix_fuzzy_spaces1Try to improve fuzzy spaces tessedit_unrej_any_wd0Don't bother with word plausibility tessedit_fix_hyphens1Crunch double hyphens? tessedit_enable_doc_dict1Add words to the document dictionary tessedit_debug_fonts0Output font info per char tessedit_debug_block_rejection0Block and Row stats tessedit_enable_bigram_correction1Enable correction based on the word bigram dictionary. tessedit_enable_dict_correction0Enable single word correction based on the dictionary. 啟用降噪1Remove and conditionally reassign small outlines when they confuse layout analysis, determining diacritics vs noise tessedit_minimal_rej_pass10Do minimal rejection on pass 1 output tessedit_test_adaptation0Test adaption criteria 測試點0Test for point 基於段落文本1Run paragraph detection on the post-text-recognition (more accurate) lstm_use_matrix1Use ratings matrix/beam search with lstm tessedit_good_quality_unrej1Reduce rejection on good docs tessedit_use_reject_spaces1Reject spaces? tessedit_preserve_blk_rej_perfect_wds1Only rej partially rejected words in block rejection tessedit_preserve_row_rej_perfect_wds1Only rej partially rejected words in row rejection tessedit_dont_blkrej_good_wds0Use word segmentation quality metric tessedit_dont_rowrej_good_wds0Use word segmentation quality metric tessedit_row_rej_good_docs1Apply row rejection to good docs tessedit_reject_bad_qual_wds1Reject all bad quality wds tessedit_debug_doc_rejection0Page stats tessedit_debug_quality_metrics0Output data to debug file bland_unrej0unrej potential with no checks unlv_tilde_crunching0Mark v.bad words for tilde crunch 字型資訊0Add font info to hocr output 字元0Add coordinates for each character to hocr output crunch_early_merge_tess_fails1Before word crunch? crunch_early_convert_bad_unlv_chs0Take out ~^ early? crunch_terrible_garbage1As it says crunch_leave_ok_strings1Don't touch sensible strings crunch_accept_ok1Use acceptability in okstring crunch_leave_accept_strings0Don't pot crunch sensible strings crunch_include_numerals0Fiddle alpha figures tessedit_prefer_joined_punct0Reward punctuation joins tessedit_write_block_separators0Write block separators in output tessedit_write_rep_codes0Write repetition char code tessedit_write_unlv0Write .unlv output file tessedit_create_txt0Write .txt output file tessedit_create_hocr0Write .html hOCR output file tessedit_create_alto0Write .xml ALTO file tessedit_create_lstmbox0Write .box file for LSTM training tessedit_create_tsv0Write .tsv output file tessedit_create_wordstrbox0Write WordStr format .box output file tessedit_create_pdf0Write .pdf output file textonly_pdf0Create PDF with only one invisible text layer suspect_constrain_1Il0UNLV keep 1Il chars rejected tessedit_minimal_rejection0Only reject tess failures tessedit_zero_rejection0Don't reject ANYTHING tessedit_word_for_word0Make output have exactly one word per WERD tessedit_zero_kelvin_rejection0Don't reject ANYTHING AT ALL tessedit_rejection_debug0Adaption debug tessedit_flip_0O1Contextual 0O O0 flips rej_trust_doc_dawg0Use DOC dawg in 11l conf. detector rej_1Il_use_dict_word0Use dictword test rej_1Il_trust_permuter_type1Don't double check rej_use_tess_accepted1Individual rejection control rej_use_tess_blanks1Individual rejection control 使用優點1Individual rejection control rej_use_sensible_wd0Extend permuter check rej_alphas_in_number_perm0Extend permuter check tessedit_create_boxfile0Output text with boxes tessedit_write_images0Capture the image from the IPE 互動式顯示模式0Run interactively? tessedit_override_permuter1According to dict_word tessedit_use_primary_params_model0In multilingual mode use params model of the primary language textord_tabfind_show_vlines0Debug line finding textord_use_cjk_fp_model0Use CJK fixed pitch model poly_allow_detailed_fx0Allow feature extractors to see the original outline tessedit_init_config_only0Only initialize with the config file. Useful if the instance is not going to be used for OCR but say only for layout analysis. textord_equation_detect0Turn on equation detector textord_tabfind_vertical_text1Enable vertical detection textord_tabfind_force_vertical_text0Force using vertical text page mode 保留詞間空格0Preserve multiple interword spaces 分頁seg_apply_music_mask1Detect music staff and remove intersecting components textord_single_height_mode0Script has no xheight, so use a single mode tosp_old_too_method0Space stats use prechopping? tosp_old_too_constrain_sp_kn0Constrain relative values of inter and intra-word gaps for old_to_method. tosp_only_use_prop_rows1Block stats to use fixed pitch rows? tosp_force_wordbreak_on_punct0Force word breaks on punct to break long lines in non-space delimited langs tosp_use_pre_chopping0Space stats use prechopping? tosp_old_too_bug_fix0Fix suspected bug in old code tosp_block_use_cert_spaces1Only stat OBVIOUS spaces tosp_row_use_cert_spaces1Only stat OBVIOUS spaces tosp_narrow_blobs_not_cert1Only stat OBVIOUS spaces tosp_row_use_cert_spaces11Only stat OBVIOUS spaces tosp_recovery_isolated_row_stats1Use row alone when inadequate cert spaces tosp_only_small_gaps_for_kern0Better guess tosp_all_flips_fuzzy0Pass ANY flip to context? tosp_fuzzy_limit_all1Don't restrict kn->sp fuzzy limit to tables textord_no_rejects0Don't remove noise blobs textord_show_blobs0Display unsorted blobs textord_show_boxes0Display unsorted blobs textord_noise_rejwords1Reject noise-like words textord_noise_rejrows1Reject noise-like rows textord_noise_debug0Debug row garbage detector classify_learn_debug_strClass str to debug learning 使用者單字文件A filename of user-provided words. 用戶詞後綴A suffix of user-provided words located in tessdata. 使用者模式文件A filename of user-provided patterns. 使用者模式後綴A suffix of user-provided patterns located in tessdata. 輸出歧義詞文件Output file for ambiguities found in the dictionary 待調試單字Word for which stopper debug information should be printed to stdout tessedit_char_blacklistBlacklist of chars not to recognize tessedit_char_whitelistWhitelist of chars to recognize tessedit_char_unblacklistList of chars to override tessedit_char_blacklist tessedit_write_params_to_fileWrite all parameters to the given file. 應用框曝光模式.expExposure value follows this pattern in the image filename. The name of the image files are expected to be in the form [lang].[fontname].exp [num].tif chs_leading_punct('`"前導標點 chs_trailing_punct1).,;:?!1st Trailing punctuation chs_trailing_punct2)'`"2nd Trailing punctuation 輪廓_奇特%|非標準數量的輪廓 outlines_2ij!?%":;非標準數量的輪廓 數位標點符號.,Punct. chs expected WITHIN numbers 未識別的字符|Output char for unidentified blobs ok_repeated_ch_non_alphanum_wds-?*=Allow NN to unrej 衝突集 I_l_1Il1 []Il1 conflict set 文件類型.tifFilename extension tessedit_load_sublangsList of languages to load with this one 頁面分隔符Page separator (default is form feed control character) 分類字元規範範圍0.2Character Normalization Range ... 分類最高評分率1.5Veto ratio between classifier ratings 分類最大確定性邊際5.5Veto difference between classifier certainties 匹配器_良好閾值0.125Good Match (0-1) 匹配器可靠自適應結果0Great Match (0-1) 匹配器完美閾值0.02Perfect Match (0-1) matcher_bad_match_pad0.15Bad Match Pad (0-1) 匹配器評分差距0.1New template margin (0-1) 匹配器平均雜訊大小12Avg. noise blob length 匹配器聚類最大角度變化0.015Maximum angle delta for prototype clustering 分類不合格垃圾懲罰0Penalty to apply when a non-alnum is vertically out of its expected textline position 評分量表1.5Rating scaling factor 確定性規模20Certainty scaling factor tessedit_class_miss_scale0.00390625Scale factor for features not used 分類適應剪枝因子2.5Prune poor adapted results this much worse than best result 分類適應剪枝閾值-1Threshold at which 分類適應剪枝因子 starts 分類字元片段垃圾確定性閾值-3Exclude fragments that do not look like whole characters from training and adaption 斑點大尺寸0.3Max large speckle size 斑點評級懲罰10Penalty to add to worst rating for noise x高度懲罰下標0.125Score penalty (0.1 = 10%) added if there are subscripts or superscripts in a word, but it is otherwise OK. x高度懲罰不一致0.25Score penalty (0.1 = 10%) added if an xheight is inconsistent. 詞段懲罰字典_常用詞1Score multiplier for word matches which have good case and are frequent in the given language (lower is better). segment_penalty_dict_case_ok1.1Score multiplier for word matches that have good case (lower is better). segment_penalty_dict_case_bad1.3125預設 score multiplier for word matches, which may have case issues (lower is better). segment_penalty_dict_nonword1.25Score multiplier for glyph fragment segmentations which do not match a dictionary word (lower is better). 確定性規模20Certainty scaling factor stopper_nondict_certainty_base-2.5Certainty threshold for non-dict words stopper_phase2_certainty_rejection_offset1Reject certainty offset stopper_certainty_per_char-0.5Certainty to add for each dict char above small word size. stopper_allowable_character_badness3Max certaintly variation allowed in a word (in sigma) doc_dict_pending_threshold0Worst certainty for using pending dictionary doc_dict_certainty_threshold-2.25Worst certainty for words that can be inserted into the document dictionary tessedit_certainty_threshold-2.25Good blob limit chop_split_dist_knob0.5Split length adjustment chop_overlap_knob0.9Split overlap adjustment chop_center_knob0.15Split center adjustment chop_sharpness_knob0.06Split sharpness adjustment chop_width_change_knob5Width change adjustment chop_ok_split100OK split limit chop_good_split50Good split limit segsearch_max_char_wh_ratio2最大字元寬高比 為達到最佳效果,建議在套用 OCR 之前使用 IronOCR 的 [ 影像預處理篩選器](https://ironsoftware.com/csharp/ocr/examples/ocr-image-filters-for-net-tesseract/)。 這些篩選器可以大幅提高精確度,尤其是在處理 [ 低品質掃描](https://ironsoftware.com/csharp/ocr/examples/ocr-low-quality-scans-tesseract/)或複雜文件(如 [ 表格](https://ironsoftware.com/csharp/ocr/examples/read-table-in-document/))時。 常見問題解答 如何在 C# 中配置 IronTesseract 進行 OCR? 若要設定 IronTesseract,請建立一個 IronTesseract 實例,並設定語言和組態等屬性。您可以指定 OCR 語言(從 125 種支援語言中選擇)、啟用條碼讀取、設定可搜尋的 PDF 輸出,以及設定字元白名單。例如: var tesseract = new IronOcr.IronTesseract { Language = IronOcr.OcrLanguage.English, Configuration = new IronOcr.TesseractConfiguration { ReadBarCodes = false, RenderSearchablePdf = true }; var tesseract = new IronOcr.IronTesseract { Language = IronOcr.OcrLanguage.English, Configuration = new IronOcr.TesseractConfiguration { ReadBarCodes = false, RenderSearchablePdf = true };。}; IronTesseract 支援哪些輸入格式? IronTesseract 透過 OcrInput 類別接受各種輸入格式。您可以處理影像 (PNG、JPG 等)、PDF 檔案和掃描的文件。OcrInput 類提供靈活的方法來載入這些不同的格式,讓您可以輕鬆地在幾乎任何包含文字的文件上執行 OCR。 我可以使用 IronTesseract 在讀取文字的同時讀取 BarCode 嗎? 是的,IronTesseract 包含進階的條碼讀取功能。您可以在 TesseractConfiguration 中設定 ReadBarCodes = true 來啟用條碼偵測。這可讓您在單一 OCR 作業中,從同一個文件中同時擷取文字和條碼資料。 如何從掃描的文件建立可搜尋的 PDF? 透過在 TesseractConfiguration 中設定 RenderSearchablePdf = true,IronTesseract 可以將掃描的文件和影像轉換成可搜尋的 PDF。這可建立可選擇和搜尋文字的 PDF 檔案,同時保持原始文件的外觀。 IronTesseract 的 OCR 支援哪些語言? IronTesseract 支援 125 種國際語言的文字辨識。您可以透過設定 IronTesseract 實例的 Language 屬性來指定語言,例如 IronOcr.OcrLanguage.English、Spanish、Chinese、Arabic 等。 我可以限制 OCR 識別哪些字元嗎? 是的,IronTesseract 允許透過 TesseractConfiguration 中的 WhiteListCharacters 屬性,將字元列入白名單和黑名單。當您知道預期的字元集(例如僅限識別字母數字字元)時,此功能有助於提高精確度。 如何同時對多個文件執行 OCR? IronTesseract 支援批次處理的多執行緒功能。您可以利用平行處理功能,同時對多個文件進行 OCR,大幅提升處理大量影像或 PDF 文件時的效能。 IronOCR 使用哪個版本的 Tesseract? IronOCR 使用定制和優化版本的 Tesseract 5,即 Iron Tesseract。與標準的 Tesseract 實作相比,這個增強的引擎提供了更高的精確度和效能,同時保持與 .NET 應用程式的相容性。 Curtis Chau 立即與工程團隊聊天 技術作家 Curtis Chau 擁有卡爾頓大學計算機科學學士學位,專注於前端開發,擅長於 Node.js、TypeScript、JavaScript 和 React。Curtis 熱衷於創建直觀且美觀的用戶界面,喜歡使用現代框架並打造結構良好、視覺吸引人的手冊。除了開發之外,Curtis 對物聯網 (IoT) 有著濃厚的興趣,探索將硬體和軟體結合的創新方式。在閒暇時間,他喜愛遊戲並構建 Discord 機器人,結合科技與創意的樂趣。 審核人 Jeffrey T. Fritz 首席程序经理 - .NET 社区团队 Jeff 也是 .NET 和 Visual Studio 团队的首席程序经理。他是 .NET Conf 虚拟会议系列的执行制作人,并主持“Fritz 和朋友”这一每周两次的开发者的直播节目,在节目上讨论技术并与观众一起编写代码。Jeff 撰写研讨会、主持演讲,并计划大型 Microsoft 开发者活动(包括 Microsoft Build、Microsoft Ignite、.NET Conf 和 Microsoft MVP Summit)的内容。 準備好開始了嗎? Nuget 下載 5,246,844 | 版本: 2025.12 剛發表 免費下載 NuGet 下載總數:5,246,844 檢視授權