Why IronOCR and not Tesseract

Accuracy

Tesseract

  • Tesseract unable to handle an image that is rotated, skewed, low DPI, scanned, or has background noise
  • Requires image pre-processing using Photoshop or ImageMagick
  • Long processing time before providing nonsensical information

IronOCR

  • _**_IronOCR pre-processing and image filters take this headache away
  • Users often achieve 99.8-100% accuracy with minimal configuration

Image Compatibility

Tesseract

  • _**_Only accepts Leptonica PIX image format which is an IntPtr C++ object in C#
  • PIX objects are not managed memory -- failure to handle them with care in C# results in memory leaks

IronOCR

  • Images memory managed
  • PDF & Broad image support:
  • MultiFrame TIFF
  • JPEG & JPEG2000
  • GIF
  • PNG
  • System.Drawing Bitmaps, Stream, and Byte Array/Binary image Data (byte[]) are included for every file format
  • IronSoftware.System.Drawing soon to replace System.Drawing reliance (allows universal Bitmap format)

Performance

Tesseract

  • Poorly documented settings must be fine-tuned to provide accurate
  • Dependent on clean documents/pre-processed images

IronOCR

  • Zero configuration works accurately and at speed for most images
  • Multithreading makes full use of multi-core processors
  • Even low-resolution images generally work with a high degree of accuracy
  • No Photoshop required

API

Tesseract

Little to no support, not beginner friendly:

  1. Work with Interop layers -- many found on GitHub are out of date with unresolved tickets, memory leaks, and console warnings
    -- May not support .NET Core or Standard
  2. Work with the command line EXE -- difficult to deploy and constantly interrupted by virus scanners and security policies

IronOCR

  • A managed and tested .NET Library for Tesseract called IronTesseract
  • Fully documented with IntelliSense support
  • Team of support engineers ready to assist

Languages

Tesseract

  • Only 100 languages

IronOCR

  • Over 127 built-in languages + custom language pack support

Conclusion

Tesseract is an excellent resource for C++ developers, but it is not a complete OCR library for .NET. Scanned or photographed images must be pre-processed so as to be orthogonal, standardized, high-resolution, and free of digital noise before Tesseract can accurately work with them.

In contrast, IronOCR can do this and more, with just a single line of code. IronOCR uses a very finely-tuned Tesseract for its internal OCR engine, built for C#, with a lot of performance improvements and features added as standard.