How to use Iron Tesseract in C#

Updated:July 19, 2026

Iron Tesseract in C# is used by creating an IronTesseract instance, configuring it with language and OCR settings, then calling the Read() method on an OcrInput object containing your images or PDFs. This converts images of text into searchable PDFs using Tesseract 5's optimized engine.

IronOCR provides an intuitive API for utilizing the customized and optimized Tesseract 5, known as Iron Tesseract. By using IronOCR and IronTesseract, you will be able to convert images of text and scanned documents into text and searchable PDFs. The library supports 125 international languages and includes advanced features like barcode reading and computer vision.

Quickstart: Set Up IronTesseract Configuration in C#

This example demonstrates how to configure IronTesseract with specific settings and perform OCR in a single line of code.

Install IronOCR with NuGet Package Manager
PM > Install-Package IronOcr

Copy and run this code snippet.

var result = new IronOcr.IronTesseract { Language = IronOcr.OcrLanguage.English, Configuration = new IronOcr.TesseractConfiguration { ReadBarCodes = false, RenderSearchablePdf = true, WhiteListCharacters = "ABCabc123" } }.Read("image.png");

Deploy to test on your live environment

Start using IronOCR in your project today with a free trial

Basic OCR Workflow

Install the OCR Library with NuGet to read images
Utilize Custom `Tesseract 5` to perform OCR
Load the desired documents, such as images or PDF files, for processing
Output the extracted text to the console or a file
Save the result as a searchable PDF

How Do I Create an IronTesseract Instance?

Initialize a Tesseract object with this code:

:path=/static-assets/ocr/content-code-examples/how-to/irontesseract-initialize-irontesseract.cs

using IronOcr;

IronTesseract ocr = new IronTesseract();

Imports IronOcr

Dim ocr As New IronTesseract()

$vbLabelText $csharpLabel

You can customize the behavior of IronTesseract by selecting different languages, enabling barcode reading, and whitelisting/blacklisting characters. IronOCR provides comprehensive configuration options to fine-tune your OCR process:

:path=/static-assets/ocr/content-code-examples/how-to/irontesseract-configure-irontesseract.cs

IronTesseract ocr = new IronTesseract
{
    Configuration = new TesseractConfiguration
    {
        ReadBarCodes = false,
        RenderHocr = true,
        TesseractVariables = null,
        WhiteListCharacters = null,
        BlackListCharacters = "`ë|^",
    },
    MultiThreaded = false,
    Language = OcrLanguage.English,
    EnableTesseractConsoleMessages = true, // False as default
};

Dim ocr As New IronTesseract With {
	.Configuration = New TesseractConfiguration With {
		.ReadBarCodes = False,
		.RenderHocr = True,
		.TesseractVariables = Nothing,
		.WhiteListCharacters = Nothing,
		.BlackListCharacters = "`ë|^"
	},
	.MultiThreaded = False,
	.Language = OcrLanguage.English,
	.EnableTesseractConsoleMessages = True
}

$vbLabelText $csharpLabel

Once configured, you can use Tesseract functionality to read OcrInput objects. The OcrInput class provides flexible methods for loading various input formats:

:path=/static-assets/ocr/content-code-examples/how-to/irontesseract-read.cs

IronTesseract ocr = new IronTesseract();

using OcrInput input = new OcrInput();
input.LoadImage("attachment.png");
OcrResult result = ocr.Read(input);
string text = result.Text;

Dim ocr As New IronTesseract()

Using input As New OcrInput()
	input.LoadImage("attachment.png")
	Dim result As OcrResult = ocr.Read(input)
	Dim text As String = result.Text
End Using

$vbLabelText $csharpLabel

For complex scenarios, you can leverage multithreading capabilities to process multiple documents simultaneously, significantly improving performance for batch operations.

What Are the Advanced Tesseract Configuration Variables?

The IronOcr Tesseract interface allows full control of Tesseract configuration variables through the IronOcr.TesseractConfiguration Class. These advanced settings enable you to optimize OCR performance for specific use cases, such as fixing low quality scans or reading specific document types.

How Do I Use Tesseract Configuration in Code?

:path=/static-assets/ocr/content-code-examples/how-to/irontesseract-tesseract-configuration.cs

using IronOcr;
using System;

IronTesseract Ocr = new IronTesseract();

Ocr.Language = OcrLanguage.English;
Ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd;

// Configure Tesseract Engine
Ocr.Configuration.TesseractVariables["tessedit_parallelize"] = false;

using var input = new OcrInput();
input.LoadImage("/path/file.png");

OcrResult Result = Ocr.Read(input);
Console.WriteLine(Result.Text);

Imports IronOcr
Imports System

Private Ocr As New IronTesseract()

Ocr.Language = OcrLanguage.English
Ocr.Configuration.PageSegmentationMode = TesseractPageSegmentationMode.AutoOsd

' Configure Tesseract Engine
Ocr.Configuration.TesseractVariables("tessedit_parallelize") = False

Dim input = New OcrInput()
input.LoadImage("/path/file.png")

Dim Result As OcrResult = Ocr.Read(input)
Console.WriteLine(Result.Text)

$vbLabelText $csharpLabel

IronOCR also provides specialized configuration for different document types. For instance, when reading passports or processing MICR cheques, you can apply specific preprocessing filters and region detection to improve accuracy.

Example configuration for financial documents:

:path=/static-assets/ocr/content-code-examples/how-to/iron-tesseract-6.cs

// Example: Configure for financial documents
IronTesseract ocr = new IronTesseract
{
    Language = OcrLanguage.English,
    Configuration = new TesseractConfiguration
    {
        PageSegmentationMode = TesseractPageSegmentationMode.SingleBlock,
        TesseractVariables = new Dictionary<string, object>
        {
            ["tessedit_char_whitelist"] = "0123456789.$,",
            ["textord_heavy_nr"] = false,
            ["edges_max_children_per_outline"] = 10
        }
    }
};

// Apply preprocessing filters for better accuracy
using OcrInput input = new OcrInput();
input.LoadPdf("financial-document.pdf");
input.Deskew();
input.EnhanceResolution(300);

OcrResult result = ocr.Read(input);

Imports IronOcr

' Example: Configure for financial documents
Dim ocr As New IronTesseract With {
    .Language = OcrLanguage.English,
    .Configuration = New TesseractConfiguration With {
        .PageSegmentationMode = TesseractPageSegmentationMode.SingleBlock,
        .TesseractVariables = New Dictionary(Of String, Object) From {
            {"tessedit_char_whitelist", "0123456789.$,"},
            {"textord_heavy_nr", False},
            {"edges_max_children_per_outline", 10}
        }
    }
}

' Apply preprocessing filters for better accuracy
Using input As New OcrInput()
    input.LoadPdf("financial-document.pdf")
    input.Deskew()
    input.EnhanceResolution(300)

    Dim result As OcrResult = ocr.Read(input)
End Using

$vbLabelText $csharpLabel

What Is the Complete List of All Tesseract Configuration Variables?

These can be set using IronTesseract.Configuration.TesseractVariables["key"] = value;. The configuration variables allow you to fine-tune OCR behavior for optimal results with your specific documents. For detailed guidance on optimizing OCR performance, refer to our fast OCR configuration guide.

Tesseract Config Variable	Default	Meaning
classify_num_cp_levels	3	Number of Class Pruner Levels
textord_debug_tabfind	0	Debug tab finding
textord_debug_bugs	0	Turn on output related to bugs in tab finding
textord_testregion_left	-1	Left edge of debug reporting rectangle
textord_testregion_top	-1	Top edge of debug reporting rectangle
textord_testregion_right	2147483647	Right edge of debug rectangle
textord_testregion_bottom	2147483647	Bottom edge of debug rectangle
textord_tabfind_show_partitions	0	Show partition bounds, waiting if >1
devanagari_split_debuglevel	0	Debug level for split shiro-rekha process.
edges_max_children_per_outline	10	Max number of children inside a character outline
edges_max_children_layers	5	Max layers of nested children inside a character outline
edges_children_per_grandchild	10	Importance ratio for chucking outlines
edges_children_count_limit	45	Max holes allowed in blob
edges_min_nonhole	12	Min pixels for potential char in box
edges_patharea_ratio	40	Max lensq/area for acceptable child outline
textord_fp_chop_error	2	Max allowed bending of chop cells
textord_tabfind_show_images	0	Show image blobs
textord_skewsmooth_offset	4	For smooth factor
textord_skewsmooth_offset2	1	For smooth factor
textord_test_x	-2147483647	coord of test pt
textord_test_y	-2147483647	coord of test pt
textord_min_blobs_in_row	4	Min blobs before gradient counted
textord_spline_minblobs	8	Min blobs in each spline segment
textord_spline_medianwin	6	Size of window for spline segmentation
textord_max_blob_overlaps	4	Max number of blobs a big blob can overlap
textord_min_xheight	10	Min credible pixel xheight
textord_lms_line_trials	12	Number of linew fits to do
oldbl_holed_losscount	10	Max lost before fallback line used
pitsync_linear_version	6	Use new fast algorithm
pitsync_fake_depth	1	Max advance fake generation
textord_tabfind_show_strokewidths	0	Show stroke widths
textord_dotmatrix_gap	3	Max pixel gap for broken pixed pitch
textord_debug_block	0	Block to do debug on
textord_pitch_range	2	Max range test on pitch
textord_words_veto_power	5	Rows required to outvote a veto
equationdetect_save_bi_image	0	Save input bi image
equationdetect_save_spt_image	0	Save special character image
equationdetect_save_seed_image	0	Save the seed image
equationdetect_save_merged_image	0	Save the merged image
poly_debug	0	Debug old poly
poly_wide_objects_better	1	More accurate approx on wide things
wordrec_display_splits	0	Display splits
textord_debug_printable	0	Make debug windows printable
textord_space_size_is_variable	0	If true, word delimiter spaces are assumed to have variable width, even though characters have fixed pitch.
textord_tabfind_show_initial_partitions	0	Show partition bounds
textord_tabfind_show_reject_blobs	0	Show blobs rejected as noise
textord_tabfind_show_columns	0	Show column bounds
textord_tabfind_show_blocks	0	Show final block bounds
textord_tabfind_find_tables	1	run table detection
devanagari_split_debugimage	0	Whether to create a debug image for split shiro-rekha process.
textord_show_fixed_cuts	0	Draw fixed pitch cell boundaries
edges_use_new_outline_complexity	0	Use the new outline complexity module
edges_debug	0	turn on debugging for this module
edges_children_fix	0	Remove boxy parents of char-like children
gapmap_debug	0	Say which blocks have tables
gapmap_use_ends	0	Use large space at start and end of rows
gapmap_no_isolated_quanta	0	Ensure gaps not less than 2quanta wide
textord_heavy_nr	0	Vigorously remove noise
textord_show_initial_rows	0	Display row accumulation
textord_show_parallel_rows	0	Display page correlated rows
textord_show_expanded_rows	0	Display rows after expanding
textord_show_final_rows	0	Display rows after final fitting
textord_show_final_blobs	0	Display blob bounds after pre-ass
textord_test_landscape	0	Tests refer to land/port
textord_parallel_baselines	1	Force parallel baselines
textord_straight_baselines	0	Force straight baselines
textord_old_baselines	1	Use old baseline algorithm
textord_old_xheight	0	Use old xheight algorithm
textord_fix_xheight_bug	1	Use spline baseline
textord_fix_makerow_bug	1	Prevent multiple baselines
textord_debug_xheights	0	Test xheight algorithms
textord_biased_skewcalc	1	Bias skew estimates with line length
textord_interpolating_skew	1	Interpolate across gaps
textord_new_initial_xheight	1	Use test xheight mechanism
textord_debug_blob	0	Print test blob information
textord_really_old_xheight	0	Use original wiseowl xheight
textord_oldbl_debug	0	Debug old baseline generation
textord_debug_baselines	0	Debug baseline generation
textord_oldbl_paradef	1	Use para default mechanism
textord_oldbl_split_splines	1	Split stepped splines
textord_oldbl_merge_parts	1	Merge suspect partitions
oldbl_corrfix	1	Improve correlation of heights
oldbl_xhfix	0	Fix bug in modes threshold for xheights
textord_ocropus_mode	0	Make baselines for ocropus
textord_tabfind_only_strokewidths	0	Only run stroke widths
textord_tabfind_show_initialtabs	0	Show tab candidates
textord_tabfind_show_finaltabs	0	Show tab vectors
textord_show_tables	0	Show table regions
textord_tablefind_show_mark	0	Debug table marking steps in detail
textord_tablefind_show_stats	0	Show page stats used in table finding
textord_tablefind_recognize_tables	0	Enables the table recognizer for table layout and filtering.
textord_all_prop	0	All doc is proportial text
textord_debug_pitch_test	0	Debug on fixed pitch test
textord_disable_pitch_test	0	Turn off dp fixed pitch algorithm
textord_fast_pitch_test	0	Do even faster pitch algorithm
textord_debug_pitch_metric	0	Write full metric stuff
textord_show_row_cuts	0	Draw row-level cuts
textord_show_page_cuts	0	Draw page-level cuts
textord_pitch_cheat	0	Use correct answer for fixed/prop
textord_blockndoc_fixed	0	Attempt whole doc/block fixed pitch
textord_show_initial_words	0	Display separate words
textord_show_new_words	0	Display separate words
textord_show_fixed_words	0	Display forced fixed pitch words
textord_blocksall_fixed	0	Moan about prop blocks
textord_blocksall_prop	0	Moan about fixed pitch blocks
textord_blocksall_testing	0	Dump stats when moaning
textord_test_mode	0	Do current test
textord_pitch_rowsimilarity	0.08	Fraction of xheight for sameness
words_initial_lower	0.5	Max initial cluster size
words_initial_upper	0.15	Min initial cluster spacing
words_default_prop_nonspace	0.25	Fraction of xheight
words_default_fixed_space	0.75	Fraction of xheight
words_default_fixed_limit	0.6	Allowed size variance
textord_words_definite_spread	0.3	Non-fuzzy spacing region
textord_spacesize_ratiofp	2.8	Min ratio space/nonspace
textord_spacesize_ratioprop	2	Min ratio space/nonspace
textord_fpiqr_ratio	1.5	Pitch IQR/Gap IQR threshold
textord_max_pitch_iqr	0.2	Xh fraction noise in pitch
textord_fp_min_width	0.5	Min width of decent blobs
textord_underline_offset	0.1	Fraction of x to ignore
ambigs_debug_level	0	Debug level for unichar ambiguities
classify_debug_level	0	Classify debug level
classify_norm_method	1	Normalization Method ...
matcher_debug_level	0	Matcher Debug Level
matcher_debug_flags	0	Matcher Debug Flags
classify_learning_debug_level	0	Learning Debug Level:
matcher_permanent_classes_min	1	Min # of permanent classes
matcher_min_examples_for_ prototyping	3	Reliable Config Threshold
matcher_sufficient_examples_ for_prototyping	5	Enable adaption even if the ambiguities have not been seen
classify_adapt_proto_threshold	230	Threshold for good protos during adaptive 0-255
classify_adapt_feature_threshold	230	Threshold for good features during adaptive 0-255
classify_class_pruner_threshold	229	Class Pruner Threshold 0-255
classify_class_pruner_multiplier	15	Class Pruner Multiplier 0-255:
classify_cp_cutoff_strength	7	Class Pruner CutoffStrength:
classify_integer_matcher_multiplier	10	Integer Matcher Multiplier 0-255:
dawg_debug_level	0	Set to 1 for general debug info, to 2 for more details, to 3 to see all the debug messages
hyphen_debug_level	0	Debug level for hyphenated words.
stopper_smallword_size	2	Size of dict word to be treated as non-dict word
stopper_debug_level	0	Stopper debug level
tessedit_truncate_wordchoice_log	10	Max words to keep in list
max_permuter_attempts	10000	Maximum number of different character choices to consider during permutation. This limit is especially useful when user patterns are specified, since overly generic patterns can result in dawg search exploring an overly large number of options.
repair_unchopped_blobs	1	Fix blobs that aren't chopped
chop_debug	0	Chop debug
chop_split_length	10000	Split Length
chop_same_distance	2	Same distance
chop_min_outline_points	6	Min Number of Points on Outline
chop_seam_pile_size	150	Max number of seams in seam_pile
chop_inside_angle	-50	Min Inside Angle Bend
chop_min_outline_area	2000	Min Outline Area
chop_centered_maxwidth	90	Width of (smaller) chopped blobs above which we don't care that a chop is not near the center.
chop_x_y_weight	3	X / Y length weight
wordrec_debug_level	0	Debug level for wordrec
wordrec_max_join_chunks	4	Max number of broken pieces to associate
segsearch_debug_level	0	SegSearch debug level
segsearch_max_pain_points	2000	Maximum number of pain points stored in the queue
segsearch_max_futile_classifications	20	Maximum number of pain point classifications per chunk that did not result in finding a better word choice.
language_model_debug_level	0	Language model debug level
language_model_ngram_order	8	Maximum order of the character ngram model
language_model_viterbi_list_ max_num_prunable	10	Maximum number of prunable (those for which PrunablePath() is true) entries in each viterbi list recorded in BLOB_CHOICEs
language_model_viterbi_list_max_size	500	Maximum size of viterbi lists recorded in BLOB_CHOICEs
language_model_min_compound_length	3	Minimum length of compound words
wordrec_display_segmentations	0	Display Segmentations
tessedit_pageseg_mode	6	Page seg mode: 0=osd only, 1=auto+osd, 2=auto_only, 3=auto, 4=column, 5=block_vert, 6=block, 7=line, 8=word, 9=word_circle, 10=char,11=sparse_text, 12=sparse_text+osd, 13=raw_line (Values from PageSegMode enum in tesseract/publictypes.h)
tessedit_ocr_engine_mode	2	Which OCR engine(s) to run (Tesseract, LSTM, both). Defaults to loading and running the most accurate available.
pageseg_devanagari_split_strategy	0	Whether to use the top-line splitting process for Devanagari documents while performing page-segmentation.
ocr_devanagari_split_strategy	0	Whether to use the top-line splitting process for Devanagari documents while performing ocr.
bidi_debug	0	Debug level for BiDi
applybox_debug	1	Debug level
applybox_page	0	Page number to apply boxes from
tessedit_bigram_debug	0	Amount of debug output for bigram correction.
debug_noise_removal	0	Debug reassignment of small outlines
noise_maxperblob	8	Max diacritics to apply to a blob
noise_maxperword	16	Max diacritics to apply to a word
debug_x_ht_level	0	Reestimate debug
quality_min_initial_alphas_reqd	2	alphas in a good word
tessedit_tess_adaption_mode	39	Adaptation decision algorithm for tess
multilang_debug_level	0	Print multilang debug info.
paragraph_debug_level	0	Print paragraph debug info.
tessedit_preserve_min_wd_len	2	Only preserve wds longer than this
crunch_rating_max	10	For adj length in rating per ch
crunch_pot_indicators	1	How many potential indicators needed
crunch_leave_lc_strings	4	Don't crunch words with long lower case strings
crunch_leave_uc_strings	4	Don't crunch words with long lower case strings
crunch_long_repetitions	3	Crunch words with long repetitions
crunch_debug	0	As it says
fixsp_non_noise_limit	1	How many non-noise blbs either side?
fixsp_done_mode	1	What constitues done for spacing
debug_fix_space_level	0	Contextual fixspace debug
x_ht_acceptance_tolerance	8	Max allowed deviation of blob top outside of font data
x_ht_min_change	8	Min change in xht before actually trying it
superscript_debug	0	Debug level for sub & superscript fixer
jpg_quality	85	Set JPEG quality level
user_defined_dpi	0	Specify DPI for input image
min_characters_to_try	50	Specify minimum characters to try during OSD
suspect_level	99	Suspect marker level
suspect_short_words	2	Don't suspect dict wds longer than this
tessedit_reject_mode	0	Rejection algorithm
tessedit_image_border	2	Rej blbs near image edge limit
min_sane_x_ht_pixels	8	Reject any x-ht lt or eq than this
tessedit_page_number	-1	-1 -> All pages, else specific page to process
tessedit_parallelize	1	Run in parallel where possible
lstm_choice_mode	2	Allows to include alternative symbols choices in the hOCR output. Valid input values are 0, 1 and 2. 0 is the default value. With 1 the alternative symbol choices per timestep are included. With 2 alternative symbol choices are extracted from the CTC process instead of the lattice. The choices are mapped per character.
lstm_choice_iterations	5	Sets the number of cascading iterations for the Beamsearch in lstm_choice_mode. Note that lstm_choice_mode must be set to a value greater than 0 to produce results.
tosp_debug_level	0	Debug data
tosp_enough_space_samples_for_median	3	or should we use mean
tosp_redo_kern_limit	10	No.samples reqd to reestimate for row
tosp_few_samples	40	No.gaps reqd with 1 large gap to treat as a table
tosp_short_row	20	No.gaps reqd with few cert spaces to use certs
tosp_sanity_method	1	How to avoid being silly
textord_max_noise_size	7	Pixel size of noise
textord_baseline_debug	0	Baseline debug level
textord_noise_sizefraction	10	Fraction of size for maxima
textord_noise_translimit	16	Transitions for normal blob
textord_noise_sncount	1	super norm blobs to save row
use_ambigs_for_adaption	0	Use ambigs for deciding whether to adapt to a character
prioritize_division	0	Prioritize blob division over chopping
classify_enable_learning	1	Enable adaptive classifier
tess_cn_matching	0	Character Normalized Matching
tess_bn_matching	0	Baseline Normalized Matching
classify_enable_adaptive_matcher	1	Enable adaptive classifier
classify_use_pre_adapted_templates	0	Use pre-adapted classifier templates
classify_save_adapted_templates	0	Save adapted templates to a file
classify_enable_adaptive_debugger	0	Enable match debugger
classify_nonlinear_norm	0	Non-linear stroke-density normalization
disable_character_fragments	1	Do not include character fragments in the results of the classifier
classify_debug_character_fragments	0	Bring up graphical debugging windows for fragments training
matcher_debug_separate_windows	0	Use two different windows for debugging the matching: One for the protos and one for the features.
classify_bln_numeric_mode	0	Assume the input is numbers [0-9].
load_system_dawg	1	Load system word dawg.
load_freq_dawg	1	Load frequent word dawg.
load_unambig_dawg	1	Load unambiguous word dawg.
load_punc_dawg	1	Load dawg with punctuation patterns.
load_number_dawg	1	Load dawg with number patterns.
load_bigram_dawg	1	Load dawg with special word bigrams.
use_only_first_uft8_step	0	Use only the first UTF8 step of the given string when computing log probabilities.
stopper_no_acceptable_choices	0	Make AcceptableChoice() always return false. Useful when there is a need to explore all segmentations
segment_nonalphabetic_script	0	Don't use any alphabetic-specific tricks. Set to true in the traineddata config file for scripts that are cursive or inherently fixed-pitch
save_doc_words	0	Save Document Words
merge_fragments_in_matrix	1	Merge the fragments in the ratings matrix and delete them after merging
wordrec_enable_assoc	1	Associator Enable
force_word_assoc	0	force associator to run regardless of what enable_assoc is. This is used for CJK where component grouping is necessary.
chop_enable	1	Chop enable
chop_vertical_creep	0	Vertical creep
chop_new_seam_pile	1	Use new seam_pile
assume_fixed_pitch_char_segment	0	include fixed-pitch heuristics in char segmentation
wordrec_skip_no_truth_words	0	Only run OCR for words that had truth recorded in BlamerBundle
wordrec_debug_blamer	0	Print blamer debug messages
wordrec_run_blamer	0	Try to set the blame for errors
save_alt_choices	1	Save alternative paths found during chopping and segmentation search
language_model_ngram_on	0	Turn on/off the use of character ngram model
language_model_ngram_use_ only_first_uft8_step	0	Use only the first UTF8 step of the given string when computing log probabilities.
language_model_ngram_space_ delimited_language	1	Words are delimited by space
language_model_use_sigmoidal_certainty	0	Use sigmoidal score for certainty
tessedit_resegment_from_boxes	0	Take segmentation and labeling from box file
tessedit_resegment_from_line_boxes	0	Conversion of word/line box file to char box file
tessedit_train_from_boxes	0	Generate training data from boxed chars
tessedit_make_boxes_from_boxes	0	Generate more boxes from boxed chars
tessedit_train_line_recognizer	0	Break input into lines and remap boxes if present
tessedit_dump_pageseg_images	0	Dump intermediate images made during page segmentation
tessedit_do_invert	1	Try inverting the image in `LSTMRecognizeWord`
tessedit_ambigs_training	0	Perform training for ambiguities
tessedit_adaption_debug	0	Generate and print debug information for adaption
applybox_learn_chars_and_char_frags_mode	0	Learn both character fragments (as is done in the special low exposure mode) as well as unfragmented characters.
applybox_learn_ngrams_mode	0	Each bounding box is assumed to contain ngrams. Only learn the ngrams whose outlines overlap horizontally.
tessedit_display_outwords	0	Draw output words
tessedit_dump_choices	0	Dump char choices
tessedit_timing_debug	0	Print timing stats
tessedit_fix_fuzzy_spaces	1	Try to improve fuzzy spaces
tessedit_unrej_any_wd	0	Don't bother with word plausibility
tessedit_fix_hyphens	1	Crunch double hyphens?
tessedit_enable_doc_dict	1	Add words to the document dictionary
tessedit_debug_fonts	0	Output font info per char
tessedit_debug_block_rejection	0	Block and Row stats
tessedit_enable_bigram_correction	1	Enable correction based on the word bigram dictionary.
tessedit_enable_dict_correction	0	Enable single word correction based on the dictionary.
enable_noise_removal	1	Remove and conditionally reassign small outlines when they confuse layout analysis, determining diacritics vs noise
tessedit_minimal_rej_pass1	0	Do minimal rejection on pass 1 output
tessedit_test_adaption	0	Test adaption criteria
test_pt	0	Test for point
paragraph_text_based	1	Run paragraph detection on the post-text-recognition (more accurate)
lstm_use_matrix	1	Use ratings matrix/beam search with lstm
tessedit_good_quality_unrej	1	Reduce rejection on good docs
tessedit_use_reject_spaces	1	Reject spaces?
tessedit_preserve_blk_rej_perfect_wds	1	Only rej partially rejected words in block rejection
tessedit_preserve_row_rej_perfect_wds	1	Only rej partially rejected words in row rejection
tessedit_dont_blkrej_good_wds	0	Use word segmentation quality metric
tessedit_dont_rowrej_good_wds	0	Use word segmentation quality metric
tessedit_row_rej_good_docs	1	Apply row rejection to good docs
tessedit_reject_bad_qual_wds	1	Reject all bad quality wds
tessedit_debug_doc_rejection	0	Page stats
tessedit_debug_quality_metrics	0	Output data to debug file
bland_unrej	0	unrej potential with no checks
unlv_tilde_crunching	0	Mark v.bad words for tilde crunch
hocr_font_info	0	Add font info to hocr output
hocr_char_boxes	0	Add coordinates for each character to hocr output
crunch_early_merge_tess_fails	1	Before word crunch?
crunch_early_convert_bad_unlv_chs	0	Take out ~^ early?
crunch_terrible_garbage	1	As it says
crunch_leave_ok_strings	1	Don't touch sensible strings
crunch_accept_ok	1	Use acceptability in okstring
crunch_leave_accept_strings	0	Don't pot crunch sensible strings
crunch_include_numerals	0	Fiddle alpha figures
tessedit_prefer_joined_punct	0	Reward punctuation joins
tessedit_write_block_separators	0	Write block separators in output
tessedit_write_rep_codes	0	Write repetition char code
tessedit_write_unlv	0	Write .unlv output file
tessedit_create_txt	0	Write .txt output file
tessedit_create_hocr	0	Write .html hOCR output file
tessedit_create_alto	0	Write .xml ALTO file
tessedit_create_lstmbox	0	Write .box file for LSTM training
tessedit_create_tsv	0	Write .tsv output file
tessedit_create_wordstrbox	0	Write WordStr format .box output file
tessedit_create_pdf	0	Write .pdf output file
textonly_pdf	0	Create PDF with only one invisible text layer
suspect_constrain_1Il	0	UNLV keep 1Il chars rejected
tessedit_minimal_rejection	0	Only reject tess failures
tessedit_zero_rejection	0	Don't reject ANYTHING
tessedit_word_for_word	0	Make output have exactly one word per WERD
tessedit_zero_kelvin_rejection	0	Don't reject ANYTHING AT ALL
tessedit_rejection_debug	0	Adaption debug
tessedit_flip_0O	1	Contextual 0O O0 flips
rej_trust_doc_dawg	0	Use DOC dawg in 11l conf. detector
rej_1Il_use_dict_word	0	Use dictword test
rej_1Il_trust_permuter_type	1	Don't double check
rej_use_tess_accepted	1	Individual rejection control
rej_use_tess_blanks	1	Individual rejection control
rej_use_good_perm	1	Individual rejection control
rej_use_sensible_wd	0	Extend permuter check
rej_alphas_in_number_perm	0	Extend permuter check
tessedit_create_boxfile	0	Output text with boxes
tessedit_write_images	0	Capture the image from the IPE
interactive_display_mode	0	Run interactively?
tessedit_override_permuter	1	According to dict_word
tessedit_use_primary_params_model	0	In multilingual mode use params model of the primary language
textord_tabfind_show_vlines	0	Debug line finding
textord_use_cjk_fp_model	0	Use CJK fixed pitch model
poly_allow_detailed_fx	0	Allow feature extractors to see the original outline
tessedit_init_config_only	0	Only initialize with the config file. Useful if the instance is not going to be used for OCR but say only for layout analysis.
textord_equation_detect	0	Turn on equation detector
textord_tabfind_vertical_text	1	Enable vertical detection
textord_tabfind_force_vertical_text	0	Force using vertical text page mode
preserve_interword_spaces	0	Preserve multiple interword spaces
pageseg_apply_music_mask	1	Detect music staff and remove intersecting components
textord_single_height_mode	0	Script has no xheight, so use a single mode
tosp_old_to_method	0	Space stats use prechopping?
tosp_old_to_constrain_sp_kn	0	Constrain relative values of inter and intra-word gaps for old_to_method.
tosp_only_use_prop_rows	1	Block stats to use fixed pitch rows?
tosp_force_wordbreak_on_punct	0	Force word breaks on punct to break long lines in non-space delimited langs
tosp_use_pre_chopping	0	Space stats use prechopping?
tosp_old_to_bug_fix	0	Fix suspected bug in old code
tosp_block_use_cert_spaces	1	Only stat OBVIOUS spaces
tosp_row_use_cert_spaces	1	Only stat OBVIOUS spaces
tosp_narrow_blobs_not_cert	1	Only stat OBVIOUS spaces
tosp_row_use_cert_spaces1	1	Only stat OBVIOUS spaces
tosp_recovery_isolated_row_stats	1	Use row alone when inadequate cert spaces
tosp_only_small_gaps_for_kern	0	Better guess
tosp_all_flips_fuzzy	0	Pass ANY flip to context?
tosp_fuzzy_limit_all	1	Don't restrict kn->sp fuzzy limit to tables
textord_no_rejects	0	Don't remove noise blobs
textord_show_blobs	0	Display unsorted blobs
textord_show_boxes	0	Display unsorted blobs
textord_noise_rejwords	1	Reject noise-like words
textord_noise_rejrows	1	Reject noise-like rows
textord_noise_debug	0	Debug row garbage detector
classify_learn_debug_str		Class str to debug learning
user_words_file		A filename of user-provided words.
user_words_suffix		A suffix of user-provided words located in tessdata.
user_patterns_file		A filename of user-provided patterns.
user_patterns_suffix		A suffix of user-provided patterns located in tessdata.
output_ambig_words_file		Output file for ambiguities found in the dictionary
word_to_debug		Word for which stopper debug information should be printed to stdout
tessedit_char_blacklist		Blacklist of chars not to recognize
tessedit_char_whitelist		Whitelist of chars to recognize
tessedit_char_unblacklist		List of chars to override tessedit_char_blacklist
tessedit_write_params_to_file		Write all parameters to the given file.
applybox_exposure_pattern	.exp	Exposure value follows this pattern in the image filename. The name of the image files are expected to be in the form [lang].[fontname].exp [num].tif
chs_leading_punct('`"	Leading punctuation
chs_trailing_punct1	).,;:?!	1st Trailing punctuation
chs_trailing_punct2)'`"	2nd Trailing punctuation
outlines_odd	%\|	Non standard number of outlines
outlines_2ij!?%":;	Non standard number of outlines
numeric_punctuation	.,	Punct. chs expected WITHIN numbers
unrecognised_char	\|	Output char for unidentified blobs
ok_repeated_ch_non_alphanum_wds	-?*=	Allow NN to unrej
conflict_set_I_l_1	Il1 []	Il1 conflict set
file_type	.tif	Filename extension
tessedit_load_sublangs		List of languages to load with this one
page_separator		Page separator (default is form feed control character)
classify_char_norm_range	0.2	Character Normalization Range ...
classify_max_rating_ratio	1.5	Veto ratio between classifier ratings
classify_max_certainty_margin	5.5	Veto difference between classifier certainties
matcher_good_threshold	0.125	Good Match (0-1)
matcher_reliable_adaptive_result	0	Great Match (0-1)
matcher_perfect_threshold	0.02	Perfect Match (0-1)
matcher_bad_match_pad	0.15	Bad Match Pad (0-1)
matcher_rating_margin	0.1	New template margin (0-1)
matcher_avg_noise_size	12	Avg. noise blob length
matcher_clustering_max_angle_delta	0.015	Maximum angle delta for prototype clustering
classify_misfit_junk_penalty	0	Penalty to apply when a non-alnum is vertically out of its expected textline position
rating_scale	1.5	Rating scaling factor
certainty_scale	20	Certainty scaling factor
tessedit_class_miss_scale	0.00390625	Scale factor for features not used
classify_adapted_pruning_factor	2.5	Prune poor adapted results this much worse than best result
classify_adapted_pruning_threshold	-1	Threshold at which classify_adapted_pruning_factor starts
classify_character_fragments_ garbage_certainty_threshold	-3	Exclude fragments that do not look like whole characters from training and adaption
speckle_large_max_size	0.3	Max large speckle size
speckle_rating_penalty	10	Penalty to add to worst rating for noise
xheight_penalty_subscripts	0.125	Score penalty (0.1 = 10%) added if there are subscripts or superscripts in a word, but it is otherwise OK.
xheight_penalty_inconsistent	0.25	Score penalty (0.1 = 10%) added if an xheight is inconsistent.
segment_penalty_dict_frequent_word	1	Score multiplier for word matches which have good case and are frequent in the given language (lower is better).
segment_penalty_dict_case_ok	1.1	Score multiplier for word matches that have good case (lower is better).
segment_penalty_dict_case_bad	1.3125	Default score multiplier for word matches, which may have case issues (lower is better).
segment_penalty_dict_nonword	1.25	Score multiplier for glyph fragment segmentations which do not match a dictionary word (lower is better).
certainty_scale	20	Certainty scaling factor
stopper_nondict_certainty_base	-2.5	Certainty threshold for non-dict words
stopper_phase2_certainty_rejection_offset	1	Reject certainty offset
stopper_certainty_per_char	-0.5	Certainty to add for each dict char above small word size.
stopper_allowable_character_badness	3	Max certaintly variation allowed in a word (in sigma)
doc_dict_pending_threshold	0	Worst certainty for using pending dictionary
doc_dict_certainty_threshold	-2.25	Worst certainty for words that can be inserted into the document dictionary
tessedit_certainty_threshold	-2.25	Good blob limit
chop_split_dist_knob	0.5	Split length adjustment
chop_overlap_knob	0.9	Split overlap adjustment
chop_center_knob	0.15	Split center adjustment
chop_sharpness_knob	0.06	Split sharpness adjustment
chop_width_change_knob	5	Width change adjustment
chop_ok_split	100	OK split limit
chop_good_split	50	Good split limit
segsearch_max_char_wh_ratio	2	Maximum character width-to-height ratio

For best results, it's recommended to use IronOCR's image preprocessing filters before applying OCR. These filters can dramatically improve accuracy, especially when working with low quality scans or complex documents like tables.

Frequently Asked Questions

How do I configure IronTesseract for OCR in C#?

To configure IronTesseract, create an IronTesseract instance and set properties like Language and Configuration. You can specify the OCR language (from 125 supported languages), enable barcode reading, configure searchable PDF output, and set character whitelisting. For example: var tesseract = new IronOcr.IronTesseract { Language = IronOcr.OcrLanguage.English, Configuration = new IronOcr.TesseractConfiguration { ReadBarCodes = false, RenderSearchablePdf = true } };

What input formats does IronTesseract support?

IronTesseract accepts various input formats through the OcrInput class. You can process images (PNG, JPG, etc.), PDF files, and scanned documents. The OcrInput class provides flexible methods for loading these different formats, making it easy to perform OCR on virtually any document containing text.

Can I read barcodes along with text using IronTesseract?

Yes, IronTesseract includes advanced barcode reading capabilities. You can enable barcode detection by setting ReadBarCodes = true in the TesseractConfiguration. This allows you to extract both text and barcode data from the same document in a single OCR operation.

How do I create searchable PDFs from scanned documents?

IronTesseract can convert scanned documents and images into searchable PDFs by setting RenderSearchablePdf = true in the TesseractConfiguration. This creates PDF files where the text is selectable and searchable, while maintaining the original document appearance.

What languages does IronTesseract support for OCR?

IronTesseract supports 125 international languages for text recognition. You can specify the language by setting the Language property on your IronTesseract instance, such as IronOcr.OcrLanguage.English, Spanish, Chinese, Arabic, and many others.

Can I restrict which characters are recognized during OCR?

Yes, IronTesseract allows character whitelisting and blacklisting through the WhiteListCharacters property in TesseractConfiguration. This feature helps improve accuracy when you know the expected character set, such as limiting recognition to alphanumeric characters only.

How do I perform OCR on multiple documents simultaneously?

IronTesseract supports multithreading capabilities for batch processing. You can leverage parallel processing to OCR multiple documents simultaneously, significantly improving performance when dealing with large volumes of images or PDFs.

What version of Tesseract does IronOCR use?

IronOCR uses a customized and optimized version of Tesseract 5, known as Iron Tesseract. This enhanced engine provides improved accuracy and performance compared to standard Tesseract implementations while maintaining compatibility with .NET applications.

How can IronOCR improve data accuracy?

IronOCR improves data accuracy through its advanced recognition algorithms and image correction features, ensuring that the text extraction process is both reliable and precise.

Is there a free trial available for IronOCR?

Yes, Iron Software offers a free trial of IronOCR, allowing users to test its features and capabilities before making a purchase decision.

Curtis Chau

Chat with engineering team now

Technical Writer

Curtis Chau holds a Bachelor’s degree in Computer Science (Carleton University) and specializes in front-end development with expertise in Node.js, TypeScript, JavaScript, and React. Passionate about crafting intuitive and aesthetically pleasing user interfaces, Curtis enjoys working with modern frameworks and creating well-structured, visually appealing manuals.

...

Jeffrey T. Fritz

Principal Program Manager - .NET Community Team

Jeff is also a Principal Program Manager for the .NET and Visual Studio teams. He is the executive producer of the .NET Conf virtual conference series and hosts 'Fritz and Friends' a live stream for developers that airs twice weekly where he talks tech and writes code together with viewers. Jeff writes workshops, presentations, and plans content for the largest Microsoft developer events including Microsoft Build, Microsoft Ignite, .NET Conf, and the Microsoft MVP Summit

Ready to Get Started?

Nuget Downloads 6,166,739 | Version: 2026.7 just released

View Licenses

Still Scrolling?

Want proof fast? PM > Install-Package IronOcr
run a sample watch your image become searchable text.

View Licenses

Customer Highlight:

Developer Spotlight:

Webinars:

Start Free 30 Day Trial

On This Page

How to use Iron Tesseract in C#

Install IronOCR with NuGet Package Manager

Copy and run this code snippet.

Deploy to test on your live environment

Basic OCR Workflow

How Do I Create an IronTesseract Instance?

What Are the Advanced Tesseract Configuration Variables?

How Do I Use Tesseract Configuration in Code?

What Is the Complete List of All Tesseract Configuration Variables?

Frequently Asked Questions

How do I configure IronTesseract for OCR in C#?

What input formats does IronTesseract support?

Can I read barcodes along with text using IronTesseract?

How do I create searchable PDFs from scanned documents?

What languages does IronTesseract support for OCR?

Can I restrict which characters are recognized during OCR?

How do I perform OCR on multiple documents simultaneously?

What version of Tesseract does IronOCR use?

How can IronOCR improve data accuracy?

Is there a free trial available for IronOCR?

Still Scrolling?

Your license key has been delivered to your inbox

Your demo request is in.

Iron Support Team

Start Free 30 Day Trial

On This Page

How to use Iron Tesseract in C#

Install IronOCR with NuGet Package Manager

Copy and run this code snippet.

Deploy to test on your live environment

Basic OCR Workflow

How Do I Create an IronTesseract Instance?

What Are the Advanced Tesseract Configuration Variables?

How Do I Use Tesseract Configuration in Code?

What Is the Complete List of All Tesseract Configuration Variables?

Frequently Asked Questions

How do I configure IronTesseract for OCR in C#?

What input formats does IronTesseract support?

Can I read barcodes along with text using IronTesseract?

How do I create searchable PDFs from scanned documents?

What languages does IronTesseract support for OCR?

Can I restrict which characters are recognized during OCR?

How do I perform OCR on multiple documents simultaneously?

What version of Tesseract does IronOCR use?

How can IronOCR improve data accuracy?

Is there a free trial available for IronOCR?

Still Scrolling?

Get your FREE

Next step: Start free 30-day Trial

Thank You

Next step: Start free 30-day Trial

Want to deploy IronSuite to a live project for FREE?

What’s included?

Your license key has been delivered to your inbox

Your demo request is in.

Trusted by Millions of Engineers Worldwide

Iron Support Team