Using IronOcr to read documents on AWS Lambda

This how-to article guides you through the steps of setting up an AWS Lambda function with IronOCR. With this guide, you will learn how to set up IronOCR and read documents from an S3 Bucket.

1. Create an AWS Lambda with a container template

In order for IronOcr to function properly, certain dependencies must be installed on one's machine—to ensure these dependencies are installed, the Lambda function must be containerized.

With Visual Studio, creating a containerized AWS Lambda is an easy process: simply install the AWS Tookit for Visual Studio, select an AWS Lambda C# template, then select a Container Image blueprint, then select "Finish".

2. Add package dependencies

Modify the project's dockerfile with the following:

# Set to dotnet:5 or dotnet:6 for .NET 5/6
FROM public.ecr.aws/lambda/dotnet:7

WORKDIR /var/task

RUN yum update -y

RUN yum install -y amazon-linux-extras
RUN amazon-linux-extras install epel -y
RUN yum install -y libgdiplus

COPY "bin/Release/lambda-publish"  .

3. Install the IronOcr and IronOcr.Linux NuGet packages

To install the IronOcr and IronOcr.Linux packages in Visual Studio:

  1. Go to Project > Manage NuGet Packages...
  2. Select Browse, then search for IronOcr and IronOcr.Linux
  3. Select the packages and install.

4. Modify the FunctionHandler code

This example will retrieve an image from an S3 bucket and read it. It uses the Image class from SixLabors to load in the image to IronOcr. In order for the example to work, an S3 bucket must be set up and the SixLabors.ImageSharp NuGet package must be installed.

:path=/static-assets/ocr/content-code-examples/how-to/iron-ocr-aws-lambda-sample.cs
using Amazon;
using Amazon.Lambda.Core;
using Amazon.S3;
using Amazon.S3.Model;
using SixLabors.ImageSharp;
using IronOcr;

// Assembly attribute to enable the Lambda function's JSON input to be converted into a .NET class.
[assembly: LambdaSerializer(typeof(Amazon.Lambda.Serialization.SystemTextJson.DefaultLambdaJsonSerializer))]

namespace IronOcrAWSLambda;

public class Function
{
	private readonly IAmazonS3 _s3Client;
	private readonly string accessKey;
	private readonly string secretKey;

	public Function()
	{
		accessKey = "ACCESS-KEY";
		secretKey = "SECRET-KEY";
		_s3Client = new AmazonS3Client(accessKey, secretKey);
	}
	public async Task<string> FunctionHandler(string input, ILambdaContext context)
	{
		IronOcr.License.LicenseKey = "IRONOCR-LICENSE-KEY";
		string bucketName = "S3-BUCKET-NAME";

		var getObjectRequest = new GetObjectRequest
		{
			BucketName = bucketName,
			Key = input,
		};

		using (GetObjectResponse response = await _s3Client.GetObjectAsync(getObjectRequest))
		{
			// Read the content of the object
			using (Stream responseStream = response.ResponseStream)
			{
				Console.WriteLine("Reading image from S3");
				Image image = Image.Load(responseStream);

				Console.WriteLine("Reading image with IronOCR");
				IronTesseract ironTesseract = new IronTesseract();
				OcrInput ocrInput = new OcrInput(image);
				OcrResult result = ironTesseract.Read(ocrInput);

				return result.Text;
			}
		}
	}
}
Imports Amazon
Imports Amazon.Lambda.Core
Imports Amazon.S3
Imports Amazon.S3.Model
Imports SixLabors.ImageSharp
Imports IronOcr

' Assembly attribute to enable the Lambda function's JSON input to be converted into a .NET class.
<Assembly: LambdaSerializer(GetType(Amazon.Lambda.Serialization.SystemTextJson.DefaultLambdaJsonSerializer))>

Namespace IronOcrAWSLambda

	Public Class [Function]
		Private ReadOnly _s3Client As IAmazonS3
		Private ReadOnly accessKey As String
		Private ReadOnly secretKey As String

		Public Sub New()
			accessKey = "ACCESS-KEY"
			secretKey = "SECRET-KEY"
			_s3Client = New AmazonS3Client(accessKey, secretKey)
		End Sub
		Public Async Function FunctionHandler(ByVal input As String, ByVal context As ILambdaContext) As Task(Of String)
			IronOcr.License.LicenseKey = "IRONOCR-LICENSE-KEY"
			Dim bucketName As String = "S3-BUCKET-NAME"

			Dim getObjectRequest As New GetObjectRequest With {
				.BucketName = bucketName,
				.Key = input
			}

			Using response As GetObjectResponse = Await _s3Client.GetObjectAsync(getObjectRequest)
				' Read the content of the object
				Using responseStream As Stream = response.ResponseStream
					Console.WriteLine("Reading image from S3")
					Dim image As Image = System.Drawing.Image.Load(responseStream)

					Console.WriteLine("Reading image with IronOCR")
					Dim ironTesseract As New IronTesseract()
					Dim ocrInput As New OcrInput(image)
					Dim result As OcrResult = ironTesseract.Read(ocrInput)

					Return result.Text
				End Using
			End Using
		End Function
	End Class
End Namespace
VB   C#

5. Increase Memory and Timeout

The amount of memory to allocate in the Lambda will vary based on the size of the documents being read and how many will be read at once. As a baseline, set the memory size to 512 MB and the timeout to 300 seconds in aws-lambda-tools-defaults.json:

"function-memory-size" : 512,
"function-timeout" : 300

6. Publish

To publish in Visual Studio, simply right-click the solution and select Publish to AWS Lambda..., then set the configurations. You can read more about publishing a Lambda on the AWS Website.

7. Try it out!

You can activate the Lambda function either through the Lambda console or through Visual Studio.