How to Save Results as hOCR in an HTML File
hOCR, which stands for "HTML-based OCR," is a file format used to represent the results of Optical Character Recognition (OCR) in a structured manner. HOCR files are typically written in HTML (Hypertext Markup Language) and provide a way to store recognized text, layout information, and the coordinates of each recognized character within an image or document.
IronOCR provides a solution for performing optical character recognition on documents and exporting the results as hOCR in HTML format. It supports both HTML files and strings.
How to Save Results as hOCR in an HTML File
Install with NuGet
Install-Package IronOcr
Download DLL
Manually install into your project
Install with NuGet
Install-Package IronOcr
Download DLL
Manually install into your project
Start using IronPDF in your project today with a free trial.
Check out IronOCR on Nuget for quick installation and deployment. With over 8 million downloads, it's transforming OCR with C#.
Install-Package IronOcr
Consider installing the IronOCR DLL directly. Download and manually install it for your project or GAC form: IronOcr.zip
Manually install into your project
Download DLLExport Result as hOCR Example
To export the result as hOCR, the user must first enable the Configuration.RenderHocr property by setting it to true. After obtaining the OCR result object from the Read
method, use the SaveAsHocrFile
method to export the OCR result as HTML. This method will output an HTML file that contains the reading result of the input documents. The code below demonstrates using the following sample TIFF file.
:path=/static-assets/ocr/content-code-examples/how-to/html-export-export-html.cs
using IronOcr;
// Instantiate IronTesseract
IronTesseract ocrTesseract = new IronTesseract();
// Enable render as hOCR
ocrTesseract.Configuration.RenderHocr = true;
// Add image
using var imageInput = new OcrImageInput("Potter.tiff");
imageInput.Title = "Html Title";
// Perform OCR
OcrResult ocrResult = ocrTesseract.Read(imageInput);
// Export as HTML
ocrResult.SaveAsHocrFile("result.html");
Imports IronOcr
' Instantiate IronTesseract
Private ocrTesseract As New IronTesseract()
' Enable render as hOCR
ocrTesseract.Configuration.RenderHocr = True
' Add image
Dim imageInput = New OcrImageInput("Potter.tiff")
imageInput.Title = "Html Title"
' Perform OCR
Dim ocrResult As OcrResult = ocrTesseract.Read(imageInput)
' Export as HTML
ocrResult.SaveAsHocrFile("result.html")
Export Result as HTML String
Using the same TIFF sample image, you can utilize the SaveAsHocrString
method to export the OCR result as an HTML string. This method will return an HTML string.
:path=/static-assets/ocr/content-code-examples/how-to/html-export-export-html-string.cs
// Export as HTML string
string hocr = ocrResult.SaveAsHocrString();
' Export as HTML string
Dim hocr As String = ocrResult.SaveAsHocrString()