How to OCR documents on AWS Lambda

ByChaknith Bin

November 21, 2023

Updated June 22, 2025

This how-to article provides a step-by-step guide for setting up an AWS Lambda function using IronOCR. By following this guide, you will learn how to configure IronOCR and efficiently read documents stored in an S3 bucket.

How to OCR documents on AWS Lambda

Download a C# library to perform OCR on documents
Create and choose the project template
Modify the FunctionHandler code
Configure and deploy the project
Invoke the function and check the results in S3

Installation

This article will use an S3 bucket, so the AWSSDK.S3 package is required.

If you are using IronOCR ZIP, it is essential to set the temporary folder.

// Set temporary folder path and log file path for IronOCR.
var awsTmpPath = @"/tmp/";
IronOcr.Installation.InstallationPath = awsTmpPath;
IronOcr.Installation.LogFilePath = awsTmpPath;

// Set temporary folder path and log file path for IronOCR.
var awsTmpPath = @"/tmp/";
IronOcr.Installation.InstallationPath = awsTmpPath;
IronOcr.Installation.LogFilePath = awsTmpPath;

' Set temporary folder path and log file path for IronOCR.
Dim awsTmpPath = "/tmp/"
IronOcr.Installation.InstallationPath = awsTmpPath
IronOcr.Installation.LogFilePath = awsTmpPath

$vbLabelText $csharpLabel

Start using IronOCR in your project today with a free trial.

First Step:

Create an AWS Lambda Project

With Visual Studio, creating a containerized AWS Lambda is an easy process:

Install the AWS Toolkit for Visual Studio.
Select an 'AWS Lambda Project (.NET Core - C#)'.
Select a '.NET 8 (Container Image)' blueprint, then select 'Finish'.

Select container image

Add Package Dependencies

Using the IronOCR library in .NET 8 does not require additional dependencies to be installed for use on AWS Lambda. Modify the project's Dockerfile with the following:

FROM public.ecr.aws/lambda/dotnet:8

# Update all installed packages
RUN dnf update -y

WORKDIR /var/task

# Copy build artifacts from the host machine into the Docker image
COPY "bin/Release/lambda-publish" .

Modify the FunctionHandler Code

This example retrieves an image from an S3 bucket, processes it, and saves a searchable PDF back to the same bucket. Setting the temp folder is essential when using IronOCR ZIP, as the library requires write permissions to copy the runtime folder from the DLLs.

using Amazon.Lambda.Core;
using Amazon.S3;
using Amazon.S3.Model;
using IronOcr;
using System;
using System.IO;
using System.Threading.Tasks;

// Assembly attribute to enable the Lambda function's JSON input to be converted into a .NET class.
[assembly: LambdaSerializer(typeof(Amazon.Lambda.Serialization.SystemTextJson.DefaultLambdaJsonSerializer))]

namespace IronOcrZipAwsLambda
{
    public class Function
    {
        // Initialize the S3 client with a specific region endpoint
        private static readonly IAmazonS3 _s3Client = new AmazonS3Client(Amazon.RegionEndpoint.APSoutheast1);

        /// <summary>
        /// Function handler to process OCR on the PDF stored in S3.
        /// </summary>
        /// <param name="context">The ILambdaContext that provides methods for logging and describing the Lambda environment.</param>
        public async Task FunctionHandler(ILambdaContext context)
        {
            // Set up necessary paths for IronOCR
            var awsTmpPath = @"/tmp/";
            IronOcr.Installation.InstallationPath = awsTmpPath;
            IronOcr.Installation.LogFilePath = awsTmpPath;

            // Set license key for IronOCR
            IronOcr.License.LicenseKey = "IRONOCR-MYLICENSE-KEY-1EF01";

            string bucketName = "deploymenttestbucket"; // Your bucket name
            string pdfName = "sample";
            string objectKey = $"IronPdfZip/{pdfName}.pdf";
            string objectKeyForSearchablePdf = $"IronPdfZip/{pdfName}-SearchablePdf.pdf";

            try
            {
                // Retrieve the PDF file from S3
                var pdfData = await GetPdfFromS3Async(bucketName, objectKey);

                // Initialize IronTesseract for OCR processing
                IronTesseract ironTesseract = new IronTesseract();
                OcrInput ocrInput = new OcrInput();
                ocrInput.LoadPdf(pdfData);
                OcrResult result = ironTesseract.Read(ocrInput);

                // Log the OCR result
                context.Logger.LogLine($"OCR result: {result.Text}");

                // Upload the searchable PDF to S3
                await UploadPdfToS3Async(bucketName, objectKeyForSearchablePdf, result.SaveAsSearchablePdfBytes());
                context.Logger.LogLine($"PDF uploaded successfully to {bucketName}/{objectKeyForSearchablePdf}");
            }
            catch (Exception e)
            {
                context.Logger.LogLine($"[ERROR] FunctionHandler: {e.Message}");
            }
        }

        /// <summary>
        /// Retrieves a PDF from S3 and returns it as a byte array.
        /// </summary>
        private async Task<byte[]> GetPdfFromS3Async(string bucketName, string objectKey)
        {
            var request = new GetObjectRequest
            {
                BucketName = bucketName,
                Key = objectKey
            };

            using (var response = await _s3Client.GetObjectAsync(request))
            using (var memoryStream = new MemoryStream())
            {
                await response.ResponseStream.CopyToAsync(memoryStream);
                return memoryStream.ToArray();
            }
        }

        /// <summary>
        /// Uploads the generated searchable PDF back to S3.
        /// </summary>
        private async Task UploadPdfToS3Async(string bucketName, string objectKey, byte[] pdfBytes)
        {
            using (var memoryStream = new MemoryStream(pdfBytes))
            {
                var request = new PutObjectRequest
                {
                    BucketName = bucketName,
                    Key = objectKey,
                    InputStream = memoryStream,
                    ContentType = "application/pdf"
                };

                await _s3Client.PutObjectAsync(request);
            }
        }
    }
}

using Amazon.Lambda.Core;
using Amazon.S3;
using Amazon.S3.Model;
using IronOcr;
using System;
using System.IO;
using System.Threading.Tasks;

// Assembly attribute to enable the Lambda function's JSON input to be converted into a .NET class.
[assembly: LambdaSerializer(typeof(Amazon.Lambda.Serialization.SystemTextJson.DefaultLambdaJsonSerializer))]

namespace IronOcrZipAwsLambda
{
    public class Function
    {
        // Initialize the S3 client with a specific region endpoint
        private static readonly IAmazonS3 _s3Client = new AmazonS3Client(Amazon.RegionEndpoint.APSoutheast1);

        /// <summary>
        /// Function handler to process OCR on the PDF stored in S3.
        /// </summary>
        /// <param name="context">The ILambdaContext that provides methods for logging and describing the Lambda environment.</param>
        public async Task FunctionHandler(ILambdaContext context)
        {
            // Set up necessary paths for IronOCR
            var awsTmpPath = @"/tmp/";
            IronOcr.Installation.InstallationPath = awsTmpPath;
            IronOcr.Installation.LogFilePath = awsTmpPath;

            // Set license key for IronOCR
            IronOcr.License.LicenseKey = "IRONOCR-MYLICENSE-KEY-1EF01";

            string bucketName = "deploymenttestbucket"; // Your bucket name
            string pdfName = "sample";
            string objectKey = $"IronPdfZip/{pdfName}.pdf";
            string objectKeyForSearchablePdf = $"IronPdfZip/{pdfName}-SearchablePdf.pdf";

            try
            {
                // Retrieve the PDF file from S3
                var pdfData = await GetPdfFromS3Async(bucketName, objectKey);

                // Initialize IronTesseract for OCR processing
                IronTesseract ironTesseract = new IronTesseract();
                OcrInput ocrInput = new OcrInput();
                ocrInput.LoadPdf(pdfData);
                OcrResult result = ironTesseract.Read(ocrInput);

                // Log the OCR result
                context.Logger.LogLine($"OCR result: {result.Text}");

                // Upload the searchable PDF to S3
                await UploadPdfToS3Async(bucketName, objectKeyForSearchablePdf, result.SaveAsSearchablePdfBytes());
                context.Logger.LogLine($"PDF uploaded successfully to {bucketName}/{objectKeyForSearchablePdf}");
            }
            catch (Exception e)
            {
                context.Logger.LogLine($"[ERROR] FunctionHandler: {e.Message}");
            }
        }

        /// <summary>
        /// Retrieves a PDF from S3 and returns it as a byte array.
        /// </summary>
        private async Task<byte[]> GetPdfFromS3Async(string bucketName, string objectKey)
        {
            var request = new GetObjectRequest
            {
                BucketName = bucketName,
                Key = objectKey
            };

            using (var response = await _s3Client.GetObjectAsync(request))
            using (var memoryStream = new MemoryStream())
            {
                await response.ResponseStream.CopyToAsync(memoryStream);
                return memoryStream.ToArray();
            }
        }

        /// <summary>
        /// Uploads the generated searchable PDF back to S3.
        /// </summary>
        private async Task UploadPdfToS3Async(string bucketName, string objectKey, byte[] pdfBytes)
        {
            using (var memoryStream = new MemoryStream(pdfBytes))
            {
                var request = new PutObjectRequest
                {
                    BucketName = bucketName,
                    Key = objectKey,
                    InputStream = memoryStream,
                    ContentType = "application/pdf"
                };

                await _s3Client.PutObjectAsync(request);
            }
        }
    }
}

Imports Amazon.Lambda.Core
Imports Amazon.S3
Imports Amazon.S3.Model
Imports IronOcr
Imports System
Imports System.IO
Imports System.Threading.Tasks

' Assembly attribute to enable the Lambda function's JSON input to be converted into a .NET class.
<Assembly: LambdaSerializer(GetType(Amazon.Lambda.Serialization.SystemTextJson.DefaultLambdaJsonSerializer))>

Namespace IronOcrZipAwsLambda
	Public Class [Function]
		' Initialize the S3 client with a specific region endpoint
		Private Shared ReadOnly _s3Client As IAmazonS3 = New AmazonS3Client(Amazon.RegionEndpoint.APSoutheast1)

		''' <summary>
		''' Function handler to process OCR on the PDF stored in S3.
		''' </summary>
		''' <param name="context">The ILambdaContext that provides methods for logging and describing the Lambda environment.</param>
		Public Async Function FunctionHandler(ByVal context As ILambdaContext) As Task
			' Set up necessary paths for IronOCR
			Dim awsTmpPath = "/tmp/"
			IronOcr.Installation.InstallationPath = awsTmpPath
			IronOcr.Installation.LogFilePath = awsTmpPath

			' Set license key for IronOCR
			IronOcr.License.LicenseKey = "IRONOCR-MYLICENSE-KEY-1EF01"

			Dim bucketName As String = "deploymenttestbucket" ' Your bucket name
			Dim pdfName As String = "sample"
			Dim objectKey As String = $"IronPdfZip/{pdfName}.pdf"
			Dim objectKeyForSearchablePdf As String = $"IronPdfZip/{pdfName}-SearchablePdf.pdf"

			Try
				' Retrieve the PDF file from S3
				Dim pdfData = Await GetPdfFromS3Async(bucketName, objectKey)

				' Initialize IronTesseract for OCR processing
				Dim ironTesseract As New IronTesseract()
				Dim ocrInput As New OcrInput()
				ocrInput.LoadPdf(pdfData)
				Dim result As OcrResult = ironTesseract.Read(ocrInput)

				' Log the OCR result
				context.Logger.LogLine($"OCR result: {result.Text}")

				' Upload the searchable PDF to S3
				Await UploadPdfToS3Async(bucketName, objectKeyForSearchablePdf, result.SaveAsSearchablePdfBytes())
				context.Logger.LogLine($"PDF uploaded successfully to {bucketName}/{objectKeyForSearchablePdf}")
			Catch e As Exception
				context.Logger.LogLine($"[ERROR] FunctionHandler: {e.Message}")
			End Try
		End Function

		''' <summary>
		''' Retrieves a PDF from S3 and returns it as a byte array.
		''' </summary>
		Private Async Function GetPdfFromS3Async(ByVal bucketName As String, ByVal objectKey As String) As Task(Of Byte())
			Dim request = New GetObjectRequest With {
				.BucketName = bucketName,
				.Key = objectKey
			}

			Using response = Await _s3Client.GetObjectAsync(request)
			Using memoryStream As New MemoryStream()
				Await response.ResponseStream.CopyToAsync(memoryStream)
				Return memoryStream.ToArray()
			End Using
			End Using
		End Function

		''' <summary>
		''' Uploads the generated searchable PDF back to S3.
		''' </summary>
		Private Async Function UploadPdfToS3Async(ByVal bucketName As String, ByVal objectKey As String, ByVal pdfBytes() As Byte) As Task
			Using memoryStream As New MemoryStream(pdfBytes)
				Dim request = New PutObjectRequest With {
					.BucketName = bucketName,
					.Key = objectKey,
					.InputStream = memoryStream,
					.ContentType = "application/pdf"
				}

				Await _s3Client.PutObjectAsync(request)
			End Using
		End Function
	End Class
End Namespace

$vbLabelText $csharpLabel

Before the try block, the file 'sample.pdf' is specified for reading from the IronPdfZip directory. The GetPdfFromS3Async method is then used to retrieve the PDF byte, which is passed to the LoadPdf method.

Increase Memory and Timeout

The amount of memory allocated in the Lambda function will vary based on the size of the documents being processed and the number of documents processed simultaneously. As a baseline, set the memory to 512 MB and the timeout to 300 seconds in aws-lambda-tools-defaults.json.

{
    "function-memory-size": 512,
    "function-timeout": 300
}

When the memory is insufficient, the program will throw the error: 'Runtime exited with error: signal: killed.' Increasing the memory size can resolve this issue. For more details, refer to the troubleshooting article: AWS Lambda - Runtime Exited Signal: Killed.

Publish

To publish in Visual Studio, right-click on the project and select 'Publish to AWS Lambda...', then configure the necessary settings. You can read more about publishing a Lambda on the AWS website.

Try It Out!

You can activate the Lambda function either through the Lambda console or through Visual Studio.

Frequently Asked Questions

What is this library?

IronOCR is a C# OCR (Optical Character Recognition) library that allows developers to read text from images and PDF documents.

How can I integrate a C# OCR library with AWS Lambda?

To integrate IronOCR with AWS Lambda, you need to download the IronOCR library, create an AWS Lambda project using Visual Studio, modify the FunctionHandler code, and deploy the function.

What are the prerequisites for using a C# OCR library on AWS Lambda?

You need to have AWS SDK for S3, IronOCR library, and AWS Toolkit for Visual Studio installed. Additionally, you should have a configured S3 bucket to store and retrieve documents.

How do I set up the AWS Lambda function handler for OCR?

The function handler code initializes the S3 client, retrieves a PDF from S3, processes it with IronOCR, and uploads a searchable PDF back to S3.

What memory and timeout settings are recommended for AWS Lambda using a C# OCR library?

A baseline memory setting of 512 MB and a timeout of 300 seconds are recommended. Adjust these settings based on the size and quantity of documents processed.

How do I publish an AWS Lambda function using Visual Studio?

Right-click on the project in Visual Studio and select 'Publish to AWS Lambda...', then configure the necessary settings. For detailed guidance, refer to the AWS documentation.

What should I do if I encounter a 'Runtime exited with error: signal: killed' error?

This error is typically due to insufficient memory allocation. Increasing the memory size in the Lambda function settings can resolve it.

Do I need additional dependencies for using a C# OCR library in .NET 8 on AWS Lambda?

No additional dependencies are required for using IronOCR in .NET 8 on AWS Lambda, apart from the necessary AWS SDK packages.

Can I test the AWS Lambda function locally?

Yes, you can test AWS Lambda functions locally using the AWS Toolkit for Visual Studio, which provides a local execution environment.

What is the role of the Dockerfile in the AWS Lambda project?

The Dockerfile is used to create a container image for the AWS Lambda function, allowing you to define the environment and dependencies needed for your application.

Chaknith Bin

Chat with engineering team now

Software Engineer

Chaknith works on IronXL and IronBarcode. He has deep expertise in C# and .NET, helping improve the software and support customers. His insights from user interactions contribute to better products, documentation, and overall experience.

On This Page

How to OCR documents on AWS Lambda

How to OCR documents on AWS Lambda

Installation

Create an AWS Lambda Project

Add Package Dependencies

Modify the FunctionHandler Code

Increase Memory and Timeout

Publish

Try It Out!

Frequently Asked Questions

What is this library?

How can I integrate a C# OCR library with AWS Lambda?

What are the prerequisites for using a C# OCR library on AWS Lambda?

How do I set up the AWS Lambda function handler for OCR?

What memory and timeout settings are recommended for AWS Lambda using a C# OCR library?

How do I publish an AWS Lambda function using Visual Studio?

What should I do if I encounter a 'Runtime exited with error: signal: killed' error?

Do I need additional dependencies for using a C# OCR library in .NET 8 on AWS Lambda?

Can I test the AWS Lambda function locally?

What is the role of the Dockerfile in the AWS Lambda project?

Ready to Get Started?

On This Page

How to OCR documents on AWS Lambda

How to OCR documents on AWS Lambda

Installation

Create an AWS Lambda Project

Add Package Dependencies

Modify the FunctionHandler Code

Increase Memory and Timeout

Publish

Try It Out!

Frequently Asked Questions

What is this library?

How can I integrate a C# OCR library with AWS Lambda?

What are the prerequisites for using a C# OCR library on AWS Lambda?

How do I set up the AWS Lambda function handler for OCR?

What memory and timeout settings are recommended for AWS Lambda using a C# OCR library?

How do I publish an AWS Lambda function using Visual Studio?

What should I do if I encounter a 'Runtime exited with error: signal: killed' error?

Do I need additional dependencies for using a C# OCR library in .NET 8 on AWS Lambda?

Can I test the AWS Lambda function locally?

What is the role of the Dockerfile in the AWS Lambda project?

Ready to Get Started?

Next step: Start free 30-day Trial

Next step: Start free 30-day Trial

Trusted by Over 2 Million Engineers Worldwide