Identity Documents

Identify documents are, by design, very difficult for OCR engines to read due to anti-copying/fraud protectoin -- holograms, watermarking images, variable digital noise, etc.-- added to the backgrounds of the card.

This is not to say it is impossible. Results will likely be dependent on image quality.Image formats with less digital noise such as TIFF or PNG are recommended over

lossy image formats such as JPEG.

Please also try the following image optimization filters:

  • DeNoise(); -- Removes digital noise. This filter should only be used where noise is expected. Flattens Alpha channels to white.
  • DeepCleanBackgroundNoise() -- Heavy background noise removal. Only use this filter in case extreme document background noise is known, because this filter will also risk reducing OCR accuracy of clean documents, and is very CPU expensive.

You may also try crop rectangles: https://ironsoftware.com/csharp/ocr/examples/net-tesseract-content-area-rectangle-crop/