Using IronOcr to read documents on AWS Lambda
This how-to article guides you through the steps of setting up an AWS Lambda function with IronOCR. With this guide, you will learn how to set up IronOCR and read documents from an S3 Bucket.
1. Create an AWS Lambda with a container template
In order for IronOcr to function properly, certain dependencies must be installed on one's machine—to ensure these dependencies are installed, the Lambda function must be containerized.
With Visual Studio, creating a containerized AWS Lambda is an easy process: simply install the AWS Tookit for Visual Studio, select an AWS Lambda C# template, then select a Container Image blueprint, then select "Finish".
2. Add package dependencies
Modify the project's dockerfile with the following:
# Set to dotnet:5 or dotnet:6 for .NET 5/6
FROM public.ecr.aws/lambda/dotnet:7
WORKDIR /var/task
RUN yum update -y
RUN yum install -y amazon-linux-extras
RUN amazon-linux-extras install epel -y
RUN yum install -y libgdiplus
COPY "bin/Release/lambda-publish" .
3. Install the IronOcr and IronOcr.Linux NuGet packages
To install the IronOcr
and IronOcr.Linux
packages in Visual Studio:
- Go to
Project > Manage NuGet Packages...
- Select
Browse
, then search forIronOcr
andIronOcr.Linux
- Select the packages and install.
4. Modify the FunctionHandler code
This example will retrieve an image from an S3 bucket and read it. It uses the Image
class from SixLabors
to load in the image to IronOcr. In order for the example to work, an S3 bucket must be set up and the SixLabors.ImageSharp NuGet package must be installed.
:path=/static-assets/ocr/content-code-examples/how-to/iron-ocr-aws-lambda-sample.cs
using Amazon;
using Amazon.Lambda.Core;
using Amazon.S3;
using Amazon.S3.Model;
using SixLabors.ImageSharp;
using IronOcr;
// Assembly attribute to enable the Lambda function's JSON input to be converted into a .NET class.
[assembly: LambdaSerializer(typeof(Amazon.Lambda.Serialization.SystemTextJson.DefaultLambdaJsonSerializer))]
namespace IronOcrAWSLambda;
public class Function
{
private readonly IAmazonS3 _s3Client;
private readonly string accessKey;
private readonly string secretKey;
public Function()
{
accessKey = "ACCESS-KEY";
secretKey = "SECRET-KEY";
_s3Client = new AmazonS3Client(accessKey, secretKey);
}
public async Task<string> FunctionHandler(string input, ILambdaContext context)
{
IronOcr.License.LicenseKey = "IRONOCR-LICENSE-KEY";
string bucketName = "S3-BUCKET-NAME";
var getObjectRequest = new GetObjectRequest
{
BucketName = bucketName,
Key = input,
};
using (GetObjectResponse response = await _s3Client.GetObjectAsync(getObjectRequest))
{
// Read the content of the object
using (Stream responseStream = response.ResponseStream)
{
Console.WriteLine("Reading image from S3");
Image image = Image.Load(responseStream);
Console.WriteLine("Reading image with IronOCR");
IronTesseract ironTesseract = new IronTesseract();
OcrInput ocrInput = new OcrInput(image);
OcrResult result = ironTesseract.Read(ocrInput);
return result.Text;
}
}
}
}
Imports Amazon
Imports Amazon.Lambda.Core
Imports Amazon.S3
Imports Amazon.S3.Model
Imports SixLabors.ImageSharp
Imports IronOcr
' Assembly attribute to enable the Lambda function's JSON input to be converted into a .NET class.
<Assembly: LambdaSerializer(GetType(Amazon.Lambda.Serialization.SystemTextJson.DefaultLambdaJsonSerializer))>
Namespace IronOcrAWSLambda
Public Class [Function]
Private ReadOnly _s3Client As IAmazonS3
Private ReadOnly accessKey As String
Private ReadOnly secretKey As String
Public Sub New()
accessKey = "ACCESS-KEY"
secretKey = "SECRET-KEY"
_s3Client = New AmazonS3Client(accessKey, secretKey)
End Sub
Public Async Function FunctionHandler(ByVal input As String, ByVal context As ILambdaContext) As Task(Of String)
IronOcr.License.LicenseKey = "IRONOCR-LICENSE-KEY"
Dim bucketName As String = "S3-BUCKET-NAME"
Dim getObjectRequest As New GetObjectRequest With {
.BucketName = bucketName,
.Key = input
}
Using response As GetObjectResponse = Await _s3Client.GetObjectAsync(getObjectRequest)
' Read the content of the object
Using responseStream As Stream = response.ResponseStream
Console.WriteLine("Reading image from S3")
Dim image As Image = System.Drawing.Image.Load(responseStream)
Console.WriteLine("Reading image with IronOCR")
Dim ironTesseract As New IronTesseract()
Dim ocrInput As New OcrInput(image)
Dim result As OcrResult = ironTesseract.Read(ocrInput)
Return result.Text
End Using
End Using
End Function
End Class
End Namespace
5. Increase Memory and Timeout
The amount of memory to allocate in the Lambda will vary based on the size of the documents being read and how many will be read at once. As a baseline, set the memory size to 512 MB and the timeout to 300 seconds in aws-lambda-tools-defaults.json
:
"function-memory-size" : 512,
"function-timeout" : 300
6. Publish
To publish in Visual Studio, simply right-click the solution and select Publish to AWS Lambda...
, then set the configurations. You can read more about publishing a Lambda on the AWS Website.
7. Try it out!
You can activate the Lambda function either through the Lambda console or through Visual Studio.