使用 IronOcr 在 AWS Lambda 上读取文档

This article was translated from English: Does it need improvement?
Translated
View the article in English

本文将指导您完成使用 IronOCR 设置 AWS Lambda 函数的步骤。通过本指南,您将了解如何设置 IronOCR 并从 S3 Bucket 读取文档。

1.使用容器模板创建 AWS Lambda

为了让 IronOcr 正常运行,必须在机器上安装某些依赖项,以确保安装了这些依赖项,因此必须将 Lambda 函数容器化。

使用 Visual Studio,创建容器化的 AWS Lambda 是一个简单的过程:只需安装 用于 Visual Studio 的 AWS Tookit选择 AWS Lambda C# 模板,然后选择容器映像蓝图,然后选择 "完成"。

2.添加软件包依赖关系

在项目的 dockerfile 中修改如下内容:


# 对于 .NET 5/6 设置为 dotnet:5 或 dotnet:6

FROM public.ecr.aws/lambda/dotnet:7

工作目录 /var/task

运行 yum update -y

运行 yum install -y amazon-linux-extras

运行 amazon-linux-extras install epel -y

运行 yum install -y libgdiplus

COPY "bin/Release/lambda-publish" .

3.安装 IronOcr 和 IronOcr.Linux NuGet 软件包

在 Visual Studio 中安装 IronOcrIronOcr.Linux 软件包:

1.转到 项目 > 管理 NuGet 软件包...

2.选择 "浏览",然后搜索 "IronOcr "和 "IronOcr.Linux

3.选择软件包并安装。

4.修改 FunctionHandler 代码

本示例将从 S3 存储桶中获取并读取图像。它使用来自 SixLaborsImage 类将图片加载到 IronOcr。为使示例正常工作,必须设置一个 S3 存储桶,并使用 SixLabors.ImageSharp 必须安装 NuGet 软件包。

:path=/static-assets/ocr/content-code-examples/how-to/iron-ocr-aws-lambda-sample.cs
using Amazon;
using Amazon.Lambda.Core;
using Amazon.S3;
using Amazon.S3.Model;
using SixLabors.ImageSharp;
using IronOcr;

// Assembly attribute to enable the Lambda function's JSON input to be converted into a .NET class.
[assembly: LambdaSerializer(typeof(Amazon.Lambda.Serialization.SystemTextJson.DefaultLambdaJsonSerializer))]

namespace IronOcrAWSLambda;

public class Function
{
	private readonly IAmazonS3 _s3Client;
	private readonly string accessKey;
	private readonly string secretKey;

	public Function()
	{
		accessKey = "ACCESS-KEY";
		secretKey = "SECRET-KEY";
		_s3Client = new AmazonS3Client(accessKey, secretKey);
	}
	public async Task<string> FunctionHandler(string input, ILambdaContext context)
	{
		IronOcr.License.LicenseKey = "IRONOCR-LICENSE-KEY";
		string bucketName = "S3-BUCKET-NAME";

		var getObjectRequest = new GetObjectRequest
		{
			BucketName = bucketName,
			Key = input,
		};

		using (GetObjectResponse response = await _s3Client.GetObjectAsync(getObjectRequest))
		{
			// Read the content of the object
			using (Stream responseStream = response.ResponseStream)
			{
				Console.WriteLine("Reading image from S3");
				Image image = Image.Load(responseStream);

				Console.WriteLine("Reading image with IronOCR");
				IronTesseract ironTesseract = new IronTesseract();
				OcrInput ocrInput = new OcrInput(image);
				OcrResult result = ironTesseract.Read(ocrInput);

				return result.Text;
			}
		}
	}
}
Imports Amazon
Imports Amazon.Lambda.Core
Imports Amazon.S3
Imports Amazon.S3.Model
Imports SixLabors.ImageSharp
Imports IronOcr

' Assembly attribute to enable the Lambda function's JSON input to be converted into a .NET class.
<Assembly: LambdaSerializer(GetType(Amazon.Lambda.Serialization.SystemTextJson.DefaultLambdaJsonSerializer))>

Namespace IronOcrAWSLambda

	Public Class [Function]
		Private ReadOnly _s3Client As IAmazonS3
		Private ReadOnly accessKey As String
		Private ReadOnly secretKey As String

		Public Sub New()
			accessKey = "ACCESS-KEY"
			secretKey = "SECRET-KEY"
			_s3Client = New AmazonS3Client(accessKey, secretKey)
		End Sub
		Public Async Function FunctionHandler(ByVal input As String, ByVal context As ILambdaContext) As Task(Of String)
			IronOcr.License.LicenseKey = "IRONOCR-LICENSE-KEY"
			Dim bucketName As String = "S3-BUCKET-NAME"

			Dim getObjectRequest As New GetObjectRequest With {
				.BucketName = bucketName,
				.Key = input
			}

			Using response As GetObjectResponse = Await _s3Client.GetObjectAsync(getObjectRequest)
				' Read the content of the object
				Using responseStream As Stream = response.ResponseStream
					Console.WriteLine("Reading image from S3")
					Dim image As Image = System.Drawing.Image.Load(responseStream)

					Console.WriteLine("Reading image with IronOCR")
					Dim ironTesseract As New IronTesseract()
					Dim ocrInput As New OcrInput(image)
					Dim result As OcrResult = ironTesseract.Read(ocrInput)

					Return result.Text
				End Using
			End Using
		End Function
	End Class
End Namespace
VB   C#

5.增加内存和超时

在 Lambda 中分配的内存大小会根据读取文档的大小和一次读取的数量而变化。作为基准,在 aws-lambda-tools-defaults.json 中将内存大小设置为 512 MB,超时时间设置为 300 秒:


"function-memory-size" : 512、

"函数超时" : 300

6.发布

要在 Visual Studio 中发布,只需右键单击解决方案并选择 "发布到 AWS Lambda...",然后设置配置即可。有关发布 Lambda 的更多信息,请参阅 AWS 网站.

7.试试看!

您可以通过 Lambda 控制台 或通过 Visual Studio。