Test in production without watermarks.
Works wherever you need it to.
Get 30 days of fully functional product.
Have it up and running in minutes.
Full access to our support engineering team during your product trial
In this tutorial, we will learn about extracting hardcoded subtitles from video files. We will take a sample video file and extract the hardcoded subtitles into a text file. We will develop a C# .NET program that will extract the hardcoded subtitles using the OCR Process. I will keep this tutorial simple and easy so that even a beginner C# Programmer can understand it.
We need an efficient Optical Character Recognition (OCR) engine that can process the video and get subtitle files irrespective of the subtitle language.
There are many libraries available that provide OCR results. Some of them are paid, some of them are difficult to use, and some of them are not efficient or accurate, so it is very difficult to find a library that is free, efficient, easy to use, and provides accurate results.
IronOCR, which is free for development, provides a one-month free trial for commercial purposes. It supports over 150 languages and provides better accuracy than most other OCR libraries available. It is also efficient and easy to use. We will use this library for our demonstration.
IronOCR is a library developed and maintained by Iron Software that helps C# Software Engineers perform OCR, Barcode Scanning, and Text Extraction in .NET projects.
The features of IronOCR include:
Let's develop a demo application to read license plate numbers.
The first step is to create a new project.
Open Visual Studio. Click on Create New Project, and select the Console Application project template.
Click on the Next button, and name the project (I have named it "OCR Subtitles", you can name it as per your choice).
Click on the Next button, and select your target Framework. Finally, Click on the Create button to create the project.
The project will be created as shown below.
Creating a New Project in Visual Studio
Now, we need to install the IronOCR library to use it in our project. The easiest way is to install it via NuGet Package Manager for Solution.
Click on Tools from the top menu bar, and select NuGet Package Manager > Manage NuGet Packages for Solution, as shown below.
Installing IronOCR within Visual Studio
The following window will appear.
Visual Studio NuGet Package Manager UI
Click on browse, and search for IronOCR. Select IronOCR Package and click on the Install button, as shown below.
Searching for IronOCR in the NuGet Package Manager UI
The IronOCR Library will be installed and ready to use.
Let's write a program to extract hardcoded subtitles.
We are going to use the following screenshot for extracting subtitles.
Sample video screenshot from which text will be extracted
Add the following namespace:
using IronOcr;
using IronOcr;
Imports IronOcr
Write the following code below the namespace declaration.
// Initialize IronTesseract object
var ocr = new IronTesseract();
// Create an OCR Input using the specified image path
using (var input = new OcrInput(@"D:\License Plate\plate3.jpg"))
{
// Perform OCR on the input image to extract text
var result = ocr.Read(input);
// Output the extracted text to the console
Console.WriteLine(result.Text);
}
// Initialize IronTesseract object
var ocr = new IronTesseract();
// Create an OCR Input using the specified image path
using (var input = new OcrInput(@"D:\License Plate\plate3.jpg"))
{
// Perform OCR on the input image to extract text
var result = ocr.Read(input);
// Output the extracted text to the console
Console.WriteLine(result.Text);
}
' Initialize IronTesseract object
Dim ocr = New IronTesseract()
' Create an OCR Input using the specified image path
Using input = New OcrInput("D:\License Plate\plate3.jpg")
' Perform OCR on the input image to extract text
Dim result = ocr.Read(input)
' Output the extracted text to the console
Console.WriteLine(result.Text)
End Using
The code above works as follows:
IronTesseract
object. It will create a default instance of IronTesseract
.OcrInput
object populated with an input image file or PDF document. OcrInput
is the preferred input type because it allows for OCR of multi-paged documents, and allows images to be enhanced before OCR to obtain faster, more accurate results.ocr.Read
will extract subtitles from the given input screenshot.result.Text
will return the entire content extracted from the given input.The sample program produces the console output below:
Console output generated from performing text extraction on the sample image using IronOCR
Let's suppose that you have a video frame that contains both the title of the video and the subtitles:
A single frame of a longer video containing text regions for the video title and the video subtitles
Our goal is to extract the hardcoded subtitles from the bottom region of the image. In this case, we need to specify the text region in which the subtitle is displayed.
We can use a System.Drawing.Rectangle
to specify a region in which we will read a subtitle from the video frame. The unit of measurement is always pixels.
We will use the following sample code to specify the text region.
// Initialize IronTesseract object
var ocr = new IronTesseract();
// Create an OCR Input and specify the region of interest
using (var input = new OcrInput())
{
// Define the area within the image where subtitles are located for a 41% improvement on speed
var contentArea = new CropRectangle(x: 189, y: 272, height: 252, width: 77);
// Add the specific region of the image to the OCR input
input.AddImage(@"D:\subtitle\image.png", contentArea);
// Perform OCR on the specified region
var result = ocr.Read(input);
// Output the extracted text to the console
Console.WriteLine(result.Text);
}
// Initialize IronTesseract object
var ocr = new IronTesseract();
// Create an OCR Input and specify the region of interest
using (var input = new OcrInput())
{
// Define the area within the image where subtitles are located for a 41% improvement on speed
var contentArea = new CropRectangle(x: 189, y: 272, height: 252, width: 77);
// Add the specific region of the image to the OCR input
input.AddImage(@"D:\subtitle\image.png", contentArea);
// Perform OCR on the specified region
var result = ocr.Read(input);
// Output the extracted text to the console
Console.WriteLine(result.Text);
}
' Initialize IronTesseract object
Dim ocr = New IronTesseract()
' Create an OCR Input and specify the region of interest
Using input = New OcrInput()
' Define the area within the image where subtitles are located for a 41% improvement on speed
Dim contentArea = New CropRectangle(x:= 189, y:= 272, height:= 252, width:= 77)
' Add the specific region of the image to the OCR input
input.AddImage("D:\subtitle\image.png", contentArea)
' Perform OCR on the specified region
Dim result = ocr.Read(input)
' Output the extracted text to the console
Console.WriteLine(result.Text)
End Using
This yields a 41% speed increase - and allows us to be specific. In contentArea
, we have specified the start point in x and y, and then the height and width of the required subtitle region.
Let's save the extracted subtitles into a text file.
// Initialize IronTesseract object
var ocr = new IronTesseract();
// Create an OCR Input with the specified image path
using (var input = new OcrInput(@"D:\subtitle\subtitle1.png"))
{
// Perform OCR on the input image to extract text
var result = ocr.Read(input);
// Save the extracted text to a specified file path
result.SaveAsTextFile(@"D:\subtitle\subtitlefile.txt");
}
// Initialize IronTesseract object
var ocr = new IronTesseract();
// Create an OCR Input with the specified image path
using (var input = new OcrInput(@"D:\subtitle\subtitle1.png"))
{
// Perform OCR on the input image to extract text
var result = ocr.Read(input);
// Save the extracted text to a specified file path
result.SaveAsTextFile(@"D:\subtitle\subtitlefile.txt");
}
' Initialize IronTesseract object
Dim ocr = New IronTesseract()
' Create an OCR Input with the specified image path
Using input = New OcrInput("D:\subtitle\subtitle1.png")
' Perform OCR on the input image to extract text
Dim result = ocr.Read(input)
' Save the extracted text to a specified file path
result.SaveAsTextFile("D:\subtitle\subtitlefile.txt")
End Using
result.SaveAsTextFile
will take the output path as an argument, and save the file in the given path.
A single frame of a longer video containing text regions for the video title and the video subtitles
In this tutorial, we have learned to use IronOCR and develop a very simple program to read subtitles from the video screenshot. We can also specify the region for which we want to extract the text.
IronOCR provides the features of OpenCV for Computer Vision. We have seen that IronOCR enables us to read text from blurred or low-resolution images. This library is efficient and provides accuracy. It supports 127+ languages with full accuracy. It's free for development and has no restriction on production.
In summary, IronOCR provides:
IronOCR is part of Iron Software's suite of libraries useful for reading and writing PDFs, manipulating Excel files, reading text from images, and scraping content from websites. Purchase the complete Iron Suite for the price of two individual libraries.
The tutorial aims to teach how to extract hardcoded subtitles from video files using a C# .NET program and the IronOCR library.
IronOCR is an Optical Character Recognition (OCR) library that helps developers extract text from images and videos, supporting over 150 languages. It is known for its accuracy and ease of use.
IronOCR can be installed in a C# project via the NuGet Package Manager in Visual Studio by searching for the IronOCR package and clicking the install button.
The steps include installing the IronOCR library, importing the video image into an OcrInput instance, preprocessing the image, specifying the subtitle location, and exporting the text to a file.
The tutorial is designed to be simple and easy to follow, making it suitable for beginner C# programmers.
Yes, IronOCR supports over 150 languages and provides accurate OCR results across these languages.
Specifying the subtitle location using a System.Drawing.Rectangle can improve OCR processing speed by 41% by focusing on the region of interest.
IronOCR is free for development purposes and offers a one-month free trial for commercial use.
IronOCR supports output as text files, structured data, and searchable PDFs.
Besides OCR, IronOCR can perform barcode scanning and text extraction from various media formats. It also offers features similar to OpenCV for computer vision tasks.