How to use Iron Tesseract in C#
Iron Tesseract in C# is used by creating an IronTesseract instance, configuring it with language and OCR settings, then calling the Read() method on an OcrInput object containing your images or PDFs. This converts images of text into searchable PDFs using Tesseract 5's optimized engine.
IronOCR provides an intuitive API for utilizing the customized and optimized Tesseract 5, known as Iron Tesseract. By using IronOCR and IronTesseract, you will be able to convert images of text and scanned documents into text and searchable PDFs. The library supports 125 international languages and includes advanced features like barcode reading and computer vision.
Quickstart: Set Up IronTesseract Configuration in C#
This example demonstrates how to configure IronTesseract with specific settings and perform OCR in a single line of code.
Get started making PDFs with NuGet now:
Install IronOCR with NuGet Package Manager
Copy and run this code snippet.
var result = new IronOcr.IronTesseract { Language = IronOcr.OcrLanguage.English, Configuration = new IronOcr.TesseractConfiguration { ReadBarCodes = false, RenderSearchablePdf = true, WhiteListCharacters = "ABCabc123" } }.Read(new IronOcr.OcrInput("image.png"));Deploy to test on your live environment
Basic OCR Workflow
- Install the OCR Library with NuGet to read images
- Utilize Custom `Tesseract 5` to perform OCR
- Load the desired documents, such as images or PDF files, for processing
- Output the extracted text to the console or a file
- Save the result as a searchable PDF
How Do I Create an IronTesseract Instance?
Initialize a Tesseract object with this code:
:path=/static-assets/ocr/content-code-examples/how-to/irontesseract-initialize-irontesseract.csusing IronOcr;
IronTesseract ocr = new IronTesseract();You can customize the behavior of IronTesseract by selecting different languages, enabling barcode reading, and whitelisting/blacklisting characters. IronOCR provides comprehensive configuration options to fine-tune your OCR process:
:path=/static-assets/ocr/content-code-examples/how-to/irontesseract-configure-irontesseract.csIronTesseract ocr = new IronTesseract
{
Configuration = new TesseractConfiguration
{
ReadBarCodes = false,
RenderHocr = true,
TesseractVariables = null,
WhiteListCharacters = null,
BlackListCharacters = "`ë|^",
},
MultiThreaded = false,
Language = OcrLanguage.English,
EnableTesseractConsoleMessages = true, // False as default
};Once configured, you can use Tesseract functionality to read OcrInput objects. The OcrInput class provides flexible methods for loading various input formats:
:path=/static-assets/ocr/content-code-examples/how-to/irontesseract-read.csIronTesseract ocr = new IronTesseract();
using OcrInput input = new OcrInput();
input.LoadImage("attachment.png");
OcrResult result = ocr.Read(input);
string text = result.Text;For complex scenarios, you can leverage multithreading capabilities to process multiple documents simultaneously, significantly improving performance for batch operations.
What Are the Advanced Tesseract Configuration Variables?
The IronOcr Tesseract interface allows full control of Tesseract configuration variables through the IronOcr.TesseractConfiguration Class. These advanced settings enable you to optimize OCR performance for specific use cases, such as fixing low quality scans or reading specific document types.
How Do I Use Tesseract Configuration in Code?
:path=/static-assets/ocr/content-code-examples/how-to/irontesseract-tesseract-configuration.csusing IronOcr;
using System;
IronTesseract Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.English;
Ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd;
// Configure Tesseract Engine
Ocr.Configuration.TesseractVariables["tessedit_parallelize"] = false;
using var input = new OcrInput();
input.LoadImage("/path/file.png");
OcrResult Result = Ocr.Read(input);
Console.WriteLine(Result.Text);IronOCR also provides specialized configuration for different document types. For instance, when reading passports or processing MICR cheques, you can apply specific preprocessing filters and region detection to improve accuracy.
Example configuration for financial documents:
// Example: Configure for financial documents
IronTesseract ocr = new IronTesseract
{
Language = OcrLanguage.English,
Configuration = new TesseractConfiguration
{
PageSegmentationMode = TesseractPageSegmentationMode.SingleBlock,
TesseractVariables = new Dictionary<string, object>
{
["tessedit_char_whitelist"] = "0123456789.$,",
["textord_heavy_nr"] = false,
["edges_max_children_per_outline"] = 10
}
}
};
// Apply preprocessing filters for better accuracy
using OcrInput input = new OcrInput();
input.LoadPdf("financial-document.pdf");
input.Deskew();
input.EnhanceResolution(300);
OcrResult result = ocr.Read(input);// Example: Configure for financial documents
IronTesseract ocr = new IronTesseract
{
Language = OcrLanguage.English,
Configuration = new TesseractConfiguration
{
PageSegmentationMode = TesseractPageSegmentationMode.SingleBlock,
TesseractVariables = new Dictionary<string, object>
{
["tessedit_char_whitelist"] = "0123456789.$,",
["textord_heavy_nr"] = false,
["edges_max_children_per_outline"] = 10
}
}
};
// Apply preprocessing filters for better accuracy
using OcrInput input = new OcrInput();
input.LoadPdf("financial-document.pdf");
input.Deskew();
input.EnhanceResolution(300);
OcrResult result = ocr.Read(input);What Is the Complete List of All Tesseract Configuration Variables?
These can be set using IronTesseract.Configuration.TesseractVariables["key"] = value;. The configuration variables allow you to fine-tune OCR behavior for optimal results with your specific documents. For detailed guidance on optimizing OCR performance, refer to our fast OCR configuration guide.
| Tesseract Config Variable | Default | Meaning |
|---|---|---|
| classify_num_cp_levels | 3 | Number of Class Pruner Levels |
| textord_debug_tabfind | 0 | Debug tab finding |
| textord_debug_bugs | 0 | Turn on output related to bugs in tab finding |
| textord_testregion_left | -1 | Left edge of debug reporting rectangle |
| textord_testregion_top | -1 | Top edge of debug reporting rectangle |
| textord_testregion_right | 2147483647 | Right edge of debug rectangle |
| textord_testregion_bottom | 2147483647 | Bottom edge of debug rectangle |
| textord_tabfind_show_partitions | 0 | Show partition bounds, waiting if >1 |
| devanagari_split_debuglevel | 0 | Debug level for split shiro-rekha process. |
| edges_max_children_per_outline | 10 | Max number of children inside a character outline |
| edges_max_children_layers | 5 | Max layers of nested children inside a character outline |
| edges_children_per_grandchild | 10 | Importance ratio for chucking outlines |
| edges_children_count_limit | 45 | Max holes allowed in blob |
| edges_min_nonhole | 12 | Min pixels for potential char in box |
| edges_patharea_ratio | 40 | Max lensq/area for acceptable child outline |
| textord_fp_chop_error | 2 | Max allowed bending of chop cells |
| textord_tabfind_show_images | 0 | Show image blobs |
| textord_skewsmooth_offset | 4 | For smooth factor |
| textord_skewsmooth_offset2 | 1 | For smooth factor |
| textord_test_x | -2147483647 | coord of test pt |
| textord_test_y | -2147483647 | coord of test pt |
| textord_min_blobs_in_row | 4 | Min blobs before gradient counted |
| textord_spline_minblobs | 8 | Min blobs in each spline segment |
| textord_spline_medianwin | 6 | Size of window for spline segmentation |
| textord_max_blob_overlaps | 4 | Max number of blobs a big blob can overlap |
| textord_min_xheight | 10 | Min credible pixel xheight |
| textord_lms_line_trials | 12 | Number of linew fits to do |
| oldbl_holed_losscount | 10 | Max lost before fallback line used |
| pitsync_linear_version | 6 | Use new fast algorithm |
| pitsync_fake_depth | 1 | Max advance fake generation |
| textord_tabfind_show_strokewidths | 0 | Show stroke widths |
| textord_dotmatrix_gap | 3 | Max pixel gap for broken pixed pitch |
| textord_debug_block | 0 | Block to do debug on |
| textord_pitch_range | 2 | Max range test on pitch |
| textord_words_veto_power | 5 | Rows required to outvote a veto |
| equationdetect_save_bi_image | 0 | Save input bi image |
| equationdetect_save_spt_image | 0 | Save special character image |
| equationdetect_save_seed_image | 0 | Save the seed image |
| equationdetect_save_merged_image | 0 | Save the merged image |
| poly_debug | 0 | Debug old poly |
| poly_wide_objects_better | 1 | More accurate approx on wide things |
| wordrec_display_splits | 0 | Display splits |
| textord_debug_printable | 0 | Make debug windows printable |
| textord_space_size_is_variable | 0 | If true, word delimiter spaces are assumed to have variable width, even though characters have fixed pitch. |
| textord_tabfind_show_initial_partitions | 0 | Show partition bounds |
| textord_tabfind_show_reject_blobs | 0 | Show blobs rejected as noise |
| textord_tabfind_show_columns | 0 | Show column bounds |
| textord_tabfind_show_blocks | 0 | Show final block bounds |
| textord_tabfind_find_tables | 1 | run table detection |
| devanagari_split_debugimage | 0 | Whether to create a debug image for split shiro-rekha process. |
| textord_show_fixed_cuts | 0 | Draw fixed pitch cell boundaries |
| edges_use_new_outline_complexity | 0 | Use the new outline complexity module |
| edges_debug | 0 | turn on debugging for this module |
| edges_children_fix | 0 | Remove boxy parents of char-like children |
| gapmap_debug | 0 | Say which blocks have tables |
| gapmap_use_ends | 0 | Use large space at start and end of rows |
| gapmap_no_isolated_quanta | 0 | Ensure gaps not less than 2quanta wide |
| textord_heavy_nr | 0 | Vigorously remove noise |
| textord_show_initial_rows | 0 | Display row accumulation |
| textord_show_parallel_rows | 0 | Display page correlated rows |
| textord_show_expanded_rows | 0 | Display rows after expanding |
| textord_show_final_rows | 0 | Display rows after final fitting |
| textord_show_final_blobs | Display blob bounds after pre-ass | |
| textord_test_landscape | 0 | Tests refer to land/port |
| textord_parallel_baselines | 1 | Force parallel baselines |
| textord_straight_baselines | 0 | Force straight baselines |
| textord_old_baselines | 1 | |
| textord_old_xheight | 0 | Use old xheight algorithm |
| textord_fix_xheight_bug | 1 | Use spline baseline |
| textord_fix_makerow_bug | 1 | Prevent multiple baselines |
| textord_debug_xheights | 0 | Test xheight algorithms |
| textord_biased_skewcalc | 1 | Bias skew estimates with line length |
| textord_interpolating_skew | 1 | Interpolate across gaps |
| textord_new_initial_xheight | 1 | Use test xheight mechanism |
| textord_debug_blob | 0 | Print test blob information |
| textord_really_old_xheight | 0 | Use original wiseowl xheight |
| textord_oldbl_debug | 0 | Debug old baseline generation |
| textord_debug_baselines | 0 | Debug baseline generation |
| textord_oldbl_paradef | 1 | Use para default mechanism |
| textord_oldbl_split_splines | 1 | Split stepped splines |
| textord_oldbl_merge_parts | 1 | Merge suspect partitions |
| oldbl_corrfix | 1 | Improve correlation of heights |
| oldbl_xhfix | 0 | Fix bug in modes threshold for xheights |
| textord_ocropus_mode | 0 | Make baselines for ocropus |
| textord_tabfind_only_strokewidths | 0 | Only run stroke widths |
| textord_tabfind_show_initialtabs | 0 | Show tab candidates |
| textord_tabfind_show_finaltabs | 0 | Show tab vectors |
| textord_show_tables | 0 | Show table regions |
| textord_tablefind_show_mark | 0 | Debug table marking steps in detail |
| textord_tablefind_show_stats | 0 | Show page stats used in table finding |
| textord_tablefind_recognize_tables | 0 | Enables the table recognizer for table layout and filtering. |
| textord_all_prop | ||
| textord_debug_pitch_test | ||
| textord_disable_pitch_test | ||
| textord_fast_pitch_test | ||
| textord_debug_pitch_metric | ||
| textord_show_row_cuts | ||
| textord_show_page_cuts | ||
| textord_pitch_cheat | ||
| textord_blockndoc_fixed | ||
| textord_show_initial_words | ||
| textord_show_new_words | ||
| textord_show_fixed_words | ||
| textord_blocksall_fixed | ||
| textord_blocksall_prop | ||
| textord_blocksall_testing | ||
| textord_test_mode | ||
| textord_pitch_rowsimilarity | ||
| words_initial_lower | ||
| words_initial_upper | ||
| words_default_prop_nonspace | ||
| words_default_fixed_space | ||
| words_default_fixed_limit | ||
| textord_words_definite_spread | ||
| textord_spacesize_ratiofp | ||
| textord_spacesize_ratioprop | ||
| textord_fpiqr_ratio | ||
| textord_max_pitch_iqr | ||
| textord_fp_min_width | ||
| textord_underline_offset | ||
| ambigs_debug_level | ||
| classify_debug_level | ||
| classify_norm_method | ||
| matcher_debug_level | ||
| matcher_debug_flags | ||
| classify_learning_debug_level | ||
| matcher_permanent_classes_min | ||
| matcher_min_examples_for_ prototyping | ||
| matcher_sufficient_examples_ for_prototyping | ||
| classify_adapt_proto_threshold | ||
| classify_adapt_feature_threshold | ||
| classify_class_pruner_threshold | ||
| classify_class_pruner_multiplier | ||
| classify_cp_cutoff_strength | ||
| classify_integer_matcher_multiplier | ||
| dawg_debug_level | ||
| hyphen_debug_level | ||
| stopper_smallword_size | ||
| stopper_debug_level | ||
| tessedit_truncate_wordchoice_log | ||
| max_permuter_attempts | ||
| repair_unchopped_blobs | ||
| chop_debug | ||
| chop_split_length | ||
| chop_same_distance | ||
| chop_min_outline_points | ||
| chop_seam_pile_size | ||
| chop_inside_angle | ||
| chop_min_outline_area | ||
| chop_centered_maxwidth | ||
| chop_x_y_weight | ||
| wordrec_debug_level | ||
| wordrec_max_join_chunks | ||
| segsearch_debug_level | ||
| segsearch_max_pain_points | ||
| segsearch_max_futile_classifications | ||
| language_model_debug_level | ||
| language_model_ngram_order | ||
| language_model_viterbi_list_ max_num_prunable | ||
| language_model_viterbi_list_max_size | ||
| language_model_min_compound_length | ||
| wordrec_display_segmentations | ||
| tessedit_pageseg_mode | ||
| tessedit_ocr_engine_mode | ||
| pageseg_devanagari_split_strategy | ||
| ocr_devanagari_split_strategy | ||
| bidi_debug | ||
| applybox_debug | ||
| applybox_page | ||
| tessedit_bigram_debug | ||
| debug_noise_removal | ||
| noise_maxperblob | ||
| noise_maxperword | ||
| debug_x_ht_level | ||
| quality_min_initial_alphas_reqd | ||
| tessedit_tess_adaption_mode | ||
| multilang_debug_level | ||
| paragraph_debug_level | ||
| tessedit_preserve_min_wd_len | ||
| crunch_rating_max | ||
| crunch_pot_indicators | ||
| crunch_leave_lc_strings | ||
| crunch_leave_uc_strings | ||
| crunch_long_repetitions | ||
| crunch_debug | ||
| fixsp_non_noise_limit | ||
| fixsp_done_mode | ||
| debug_fix_space_level | ||
| x_ht_acceptance_tolerance | ||
| x_ht_min_change | ||
| superscript_debug | ||
| jpg_quality | ||
| user_defined_dpi | ||
| min_characters_to_try | ||
| suspect_level | ||
| suspect_short_words | ||
| tessedit_reject_mode | ||
| tessedit_image_border | ||
| min_sane_x_ht_pixels | ||
| tessedit_page_number | ||
| tessedit_parallelize | ||
| lstm_choice_mode | ||
| lstm_choice_iterations | ||
| tosp_debug_level | ||
| tosp_enough_space_samples_for_median | ||
| tosp_redo_kern_limit | ||
| tosp_few_samples | ||
| tosp_short_row | ||
| tosp_sanity_method | ||
| textord_max_noise_size | ||
| textord_baseline_debug | ||
| textord_noise_sizefraction | ||
| textord_noise_translimit | ||
| textord_noise_sncount | ||
| use_ambigs_for_adaption | ||
| prioritize_division | ||
| classify_enable_learning | ||
| tess_cn_matching | ||
| tess_bn_matching | ||
| classify_enable_adaptive_matcher | ||
| classify_use_pre_adapted_templates | ||
| classify_save_adapted_templates | ||
| classify_enable_adaptive_debugger | ||
| classify_nonlinear_norm | ||
| disable_character_fragments | ||
| classify_debug_character_fragments | ||
| matcher_debug_separate_windows | ||
| classify_bln_numeric_mode | ||
| load_system_dawg | ||
| load_freq_dawg | ||
| load_unambig_dawg | ||
| load_punc_dawg | ||
| load_number_dawg | ||
| load_bigram_dawg | ||
| use_only_first_uft8_step | ||
| stopper_no_acceptable_choices | ||
| segment_nonalphabetic_script | ||
| save_doc_words | ||
| merge_fragments_in_matrix | ||
| wordrec_enable_assoc | ||
| force_word_assoc | ||
| chop_enable | ||
| chop_vertical_creep | ||
| chop_new_seam_pile | ||
| assume_fixed_pitch_char_segment | ||
| wordrec_skip_no_truth_words | ||
| wordrec_debug_blamer | ||
| wordrec_run_blamer | ||
| save_alt_choices | ||
| language_model_ngram_on | ||
| language_model_ngram_use_ only_first_uft8_step | ||
| language_model_ngram_space_ delimited_language | ||
| language_model_use_sigmoidal_certainty | ||
| tessedit_resegment_from_boxes | ||
| tessedit_resegment_from_line_boxes | ||
| tessedit_train_from_boxes | ||
| tessedit_make_boxes_from_boxes | ||
| tessedit_train_line_recognizer | ||
| tessedit_dump_pageseg_images | ||
| tessedit_do_invert | ||
| tessedit_ambigs_training | ||
| tessedit_adaption_debug | ||
| applybox_learn_chars_and_char_frags_mode | ||
| applybox_learn_ngrams_mode | ||
| tessedit_display_outwords | ||
| tessedit_dump_choices | ||
| tessedit_timing_debug | ||
| tessedit_fix_fuzzy_spaces | ||
| tessedit_unrej_any_wd | ||
| tessedit_fix_hyphens | ||
| tessedit_enable_doc_dict | ||
| tessedit_debug_fonts | ||
| tessedit_debug_block_rejection | ||
| tessedit_enable_bigram_correction | ||
| tessedit_enable_dict_correction | ||
| enable_noise_removal | ||
| tessedit_minimal_rej_pass1 | ||
| tessedit_test_adaption | ||
| test_pt | ||
| paragraph_text_based | ||
| lstm_use_matrix | ||
| tessedit_good_quality_unrej | ||
| tessedit_use_reject_spaces | ||
| tessedit_preserve_blk_rej_perfect_wds | ||
| tessedit_preserve_row_rej_perfect_wds | ||
| tessedit_dont_blkrej_good_wds | ||
| tessedit_dont_rowrej_good_wds | ||
| tessedit_row_rej_good_docs | ||
| tessedit_reject_bad_qual_wds | ||
| tessedit_debug_doc_rejection | ||
| tessedit_debug_quality_metrics | ||
| bland_unrej | ||
| unlv_tilde_crunching | ||
| hocr_font_info | ||
| hocr_char_boxes | ||
| crunch_early_merge_tess_fails | ||
| crunch_early_convert_bad_unlv_chs | ||
| crunch_terrible_garbage | ||
| crunch_leave_ok_strings | ||
| crunch_accept_ok | ||
| crunch_leave_accept_strings | ||
| crunch_include_numerals | ||
| tessedit_prefer_joined_punct | ||
| tessedit_write_block_separators | ||
| tessedit_write_rep_codes | ||
| tessedit_write_unlv | ||
| tessedit_create_txt | ||
| tessedit_create_hocr | ||
| tessedit_create_alto | ||
| tessedit_create_lstmbox | ||
| tessedit_create_tsv | ||
| tessedit_create_wordstrbox | ||
| tessedit_create_pdf | ||
| textonly_pdf | ||
| suspect_constrain_1Il | ||
| tessedit_minimal_rejection | ||
| tessedit_zero_rejection | ||
| tessedit_word_for_word | ||
| tessedit_zero_kelvin_rejection | ||
| tessedit_rejection_debug | ||
| tessedit_flip_0O | ||
| rej_trust_doc_dawg | ||
| rej_1Il_use_dict_word | ||
| rej_1Il_trust_permuter_type | ||
| rej_use_tess_accepted | ||
| rej_use_tess_blanks | ||
| rej_use_good_perm | ||
| rej_use_sensible_wd | ||
| rej_alphas_in_number_perm | ||
| tessedit_create_boxfile | ||
| tessedit_write_images | ||
| interactive_display_mode | ||
| tessedit_override_permuter | ||
| tessedit_use_primary_params_model | ||
| textord_tabfind_show_vlines | ||
| textord_use_cjk_fp_model | ||
| poly_allow_detailed_fx | ||
| tessedit_init_config_only | ||
| textord_equation_detect | ||
| textord_tabfind_vertical_text | ||
| textord_tabfind_force_vertical_text | ||
| preserve_interword_spaces | ||
| pageseg_apply_music_mask | ||
| textord_single_height_mode | ||
| tosp_old_to_method | ||
| tosp_old_to_constrain_sp_kn | ||
| tosp_only_use_prop_rows | ||
| tosp_force_wordbreak_on_punct | ||
| tosp_use_pre_chopping | ||
| tosp_old_to_bug_fix | ||
| tosp_block_use_cert_spaces | ||
| tosp_row_use_cert_spaces | ||
| tosp_narrow_blobs_not_cert | ||
| tosp_row_use_cert_spaces1 | ||
| tosp_recovery_isolated_row_stats | ||
| tosp_only_small_gaps_for_kern | ||
| tosp_all_flips_fuzzy | ||
| tosp_fuzzy_limit_all | ||
| textord_no_rejects | ||
| textord_show_blobs | ||
| textord_show_boxes | ||
| textord_noise_rejwords | ||
| textord_noise_rejrows | ||
| textord_noise_debug | ||
| classify_learn_debug_str | ||
| user_words_file | ||
| user_words_suffix | ||
| user_patterns_file | ||
| user_patterns_suffix | ||
| output_ambig_words_file | ||
| word_to_debug | ||
| tessedit_char_blacklist | ||
| tessedit_char_whitelist | ||
| tessedit_char_unblacklist | ||
| tessedit_write_params_to_file | ||
| applybox_exposure_pattern | ||
| chs_leading_punct('`" | ||
| chs_trailing_punct1 | ||
| chs_trailing_punct2)'`" | ||
| outlines_odd | %| | Non standard number of outlines |
| outlines_2ij!?%":; | Non standard number of outlines | |
| numeric_punctuation | ., | Punct. chs expected WITHIN numbers |
| unrecognised_char | | | Output char for unidentified blobs |
| ok_repeated_ch_non_alphanum_wds | -?*= | Allow NN to unrej |
| conflict_set_I_l_1 | Il1 [] | Il1 conflict set |
| file_type | .tif | Filename extension |
| tessedit_load_sublangs | ||
| page_separator | ||
| classify_char_norm_range | ||
| classify_max_rating_ratio | ||
| classify_max_certainty_margin | ||
| matcher_good_threshold | ||
| matcher_reliable_adaptive_result | ||
| matcher_perfect_threshold | ||
| matcher_bad_match_pad | ||
| matcher_rating_margin | ||
| matcher_avg_noise_size | ||
| matcher_clustering_max_angle_delta | ||
| classify_misfit_junk_penalty | ||
| rating_scale | ||
| certainty_scale | ||
| tessedit_class_miss_scale | ||
| classify_adapted_pruning_factor | ||
| classify_adapted_pruning_threshold | ||
| classify_character_fragments_ garbage_certainty_threshold | ||
| speckle_large_max_size | ||
| speckle_rating_penalty | ||
| xheight_penalty_subscripts | ||
| xheight_penalty_inconsistent | ||
| segment_penalty_dict_frequent_word | ||
| segment_penalty_dict_case_ok | ||
| segment_penalty_dict_case_bad | ||
| segment_penalty_dict_nonword | ||
| certainty_scale | ||
| stopper_nondict_certainty_base | ||
| stopper_phase2_certainty_rejection_offset | ||
| stopper_certainty_per_char | ||
| stopper_allowable_character_badness | ||
| doc_dict_pending_threshold | ||
| doc_dict_certainty_threshold | ||
| tessedit_certainty_threshold | ||
| chop_split_dist_knob | ||
| chop_overlap_knob | ||
| chop_center_knob | ||
| chop_sharpness_knob | ||
| chop_width_change_knob | ||
| chop_ok_split | ||
| chop_good_split | ||
| segsearch_max_char_wh_ratio |
For best results, it's recommended to use IronOCR's image preprocessing filters before applying OCR. These filters can dramatically improve accuracy, especially when working with low quality scans or complex documents like tables.
Frequently Asked Questions
How do I configure IronTesseract for OCR in C#?
To configure IronTesseract, create an IronTesseract instance and set properties like Language and Configuration. You can specify the OCR language (from 125 supported languages), enable barcode reading, configure searchable PDF output, and set character whitelisting. For example: var tesseract = new IronOcr.IronTesseract { Language = IronOcr.OcrLanguage.English, Configuration = new IronOcr.TesseractConfiguration { ReadBarCodes = false, RenderSearchablePdf = true } };
What input formats does IronTesseract support?
IronTesseract accepts various input formats through the OcrInput class. You can process images (PNG, JPG, etc.), PDF files, and scanned documents. The OcrInput class provides flexible methods for loading these different formats, making it easy to perform OCR on virtually any document containing text.
Can I read barcodes along with text using IronTesseract?
Yes, IronTesseract includes advanced barcode reading capabilities. You can enable barcode detection by setting ReadBarCodes = true in the TesseractConfiguration. This allows you to extract both text and barcode data from the same document in a single OCR operation.
How do I create searchable PDFs from scanned documents?
IronTesseract can convert scanned documents and images into searchable PDFs by setting RenderSearchablePdf = true in the TesseractConfiguration. This creates PDF files where the text is selectable and searchable, while maintaining the original document appearance.
What languages does IronTesseract support for OCR?
IronTesseract supports 125 international languages for text recognition. You can specify the language by setting the Language property on your IronTesseract instance, such as IronOcr.OcrLanguage.English, Spanish, Chinese, Arabic, and many others.
Can I restrict which characters are recognized during OCR?
Yes, IronTesseract allows character whitelisting and blacklisting through the WhiteListCharacters property in TesseractConfiguration. This feature helps improve accuracy when you know the expected character set, such as limiting recognition to alphanumeric characters only.
How do I perform OCR on multiple documents simultaneously?
IronTesseract supports multithreading capabilities for batch processing. You can leverage parallel processing to OCR multiple documents simultaneously, significantly improving performance when dealing with large volumes of images or PDFs.
What version of Tesseract does IronOCR use?
IronOCR uses a customized and optimized version of Tesseract 5, known as Iron Tesseract. This enhanced engine provides improved accuracy and performance compared to standard Tesseract implementations while maintaining compatibility with .NET applications.







