使用 IronOcr 在 AWS Lambda 上读取文档

本文将指导您完成使用 IronOCR 设置 AWS Lambda 函数的步骤。通过本指南,您将了解如何设置 IronOCR 并从 S3 Bucket 读取文档。

1.使用容器模板创建 AWS Lambda

为了让 IronOcr 正常运行,必须在机器上安装某些依赖项,以确保安装了这些依赖项,因此必须将 Lambda 函数容器化。

使用 Visual Studio,创建容器化的 AWS Lambda 是一个简单的过程:只需安装 用于 Visual Studio 的 AWS Tookit选择 AWS Lambda C# 模板,然后选择容器映像蓝图,然后选择 "完成"。


在项目的 dockerfile 中修改如下内容:

# 对于 .NET 5/6 设置为 dotnet:5 或 dotnet:6

FROM public.ecr.aws/lambda/dotnet:7

工作目录 /var/task

运行 yum update -y

运行 yum install -y amazon-linux-extras

运行 amazon-linux-extras install epel -y

运行 yum install -y libgdiplus

COPY "bin/Release/lambda-publish" .

3.安装 IronOcr 和 IronOcr.Linux NuGet 软件包

在 Visual Studio 中安装 IronOcrIronOcr.Linux 软件包:

1.转到 项目 > 管理 NuGet 软件包...

2.选择 "浏览",然后搜索 "IronOcr "和 "IronOcr.Linux


4.修改 FunctionHandler 代码

本示例将从 S3 存储桶中获取并读取图像。它使用来自 SixLaborsImage 类将图片加载到 IronOcr。为使示例正常工作,必须设置一个 S3 存储桶,并使用 SixLabors.ImageSharp 必须安装 NuGet 软件包。

using Amazon;
using Amazon.Lambda.Core;
using Amazon.S3;
using Amazon.S3.Model;
using SixLabors.ImageSharp;
using IronOcr;

// Assembly attribute to enable the Lambda function's JSON input to be converted into a .NET class.
[assembly: LambdaSerializer(typeof(Amazon.Lambda.Serialization.SystemTextJson.DefaultLambdaJsonSerializer))]

namespace IronOcrAWSLambda;

public class Function
	private readonly IAmazonS3 _s3Client;
	private readonly string accessKey;
	private readonly string secretKey;

	public Function()
		accessKey = "ACCESS-KEY";
		secretKey = "SECRET-KEY";
		_s3Client = new AmazonS3Client(accessKey, secretKey);
	public async Task<string> FunctionHandler(string input, ILambdaContext context)
		IronOcr.License.LicenseKey = "IRONOCR-LICENSE-KEY";
		string bucketName = "S3-BUCKET-NAME";

		var getObjectRequest = new GetObjectRequest
			BucketName = bucketName,
			Key = input,

		using (GetObjectResponse response = await _s3Client.GetObjectAsync(getObjectRequest))
			// Read the content of the object
			using (Stream responseStream = response.ResponseStream)
				Console.WriteLine("Reading image from S3");
				Image image = Image.Load(responseStream);

				Console.WriteLine("Reading image with IronOCR");
				IronTesseract ironTesseract = new IronTesseract();
				OcrInput ocrInput = new OcrInput(image);
				OcrResult result = ironTesseract.Read(ocrInput);

				return result.Text;
Imports Amazon
Imports Amazon.Lambda.Core
Imports Amazon.S3
Imports Amazon.S3.Model
Imports SixLabors.ImageSharp
Imports IronOcr

' Assembly attribute to enable the Lambda function's JSON input to be converted into a .NET class.
<Assembly: LambdaSerializer(GetType(Amazon.Lambda.Serialization.SystemTextJson.DefaultLambdaJsonSerializer))>

Namespace IronOcrAWSLambda

	Public Class [Function]
		Private ReadOnly _s3Client As IAmazonS3
		Private ReadOnly accessKey As String
		Private ReadOnly secretKey As String

		Public Sub New()
			accessKey = "ACCESS-KEY"
			secretKey = "SECRET-KEY"
			_s3Client = New AmazonS3Client(accessKey, secretKey)
		End Sub
		Public Async Function FunctionHandler(ByVal input As String, ByVal context As ILambdaContext) As Task(Of String)
			IronOcr.License.LicenseKey = "IRONOCR-LICENSE-KEY"
			Dim bucketName As String = "S3-BUCKET-NAME"

			Dim getObjectRequest As New GetObjectRequest With {
				.BucketName = bucketName,
				.Key = input

			Using response As GetObjectResponse = Await _s3Client.GetObjectAsync(getObjectRequest)
				' Read the content of the object
				Using responseStream As Stream = response.ResponseStream
					Console.WriteLine("Reading image from S3")
					Dim image As Image = System.Drawing.Image.Load(responseStream)

					Console.WriteLine("Reading image with IronOCR")
					Dim ironTesseract As New IronTesseract()
					Dim ocrInput As New OcrInput(image)
					Dim result As OcrResult = ironTesseract.Read(ocrInput)

					Return result.Text
				End Using
			End Using
		End Function
	End Class
End Namespace
在 Lambda 中分配的内存大小会根据读取文档的大小和一次读取的数量而变化。作为基准,在 aws-lambda-tools-defaults.json 中将内存大小设置为 512 MB,超时时间设置为 300 秒:

"function-memory-size" : 512、

"函数超时" : 300


要在 Visual Studio 中发布,只需右键单击解决方案并选择 "发布到 AWS Lambda...",然后设置配置即可。有关发布 Lambda 的更多信息,请参阅 AWS 网站.


您可以通过 Lambda 控制台 或通过 Visual Studio。