使用 IronOcr 在 AWS Lambda 上读取文档
本文将指导您完成使用 IronOCR 设置 AWS Lambda 函数的步骤。通过本指南,您将了解如何设置 IronOCR 并从 S3 Bucket 读取文档。
1.使用容器模板创建 AWS Lambda
为了让 IronOcr 正常运行,必须在机器上安装某些依赖项,以确保安装了这些依赖项,因此必须将 Lambda 函数容器化。
使用 Visual Studio,创建容器化的 AWS Lambda 是一个简单的过程:只需安装 用于 Visual Studio 的 AWS Tookit选择 AWS Lambda C# 模板,然后选择容器映像蓝图,然后选择 "完成"。
2.添加软件包依赖关系
在项目的 dockerfile 中修改如下内容:
# 对于 .NET 5/6 设置为 dotnet:5 或 dotnet:6
FROM public.ecr.aws/lambda/dotnet:7
工作目录 /var/task
运行 yum update -y
运行 yum install -y amazon-linux-extras
运行 amazon-linux-extras install epel -y
运行 yum install -y libgdiplus
COPY "bin/Release/lambda-publish" .
3.安装 IronOcr 和 IronOcr.Linux NuGet 软件包
在 Visual Studio 中安装 IronOcr
和 IronOcr.Linux
软件包:
1.转到 项目 > 管理 NuGet 软件包...
2.选择 "浏览",然后搜索 "IronOcr "和 "IronOcr.Linux
3.选择软件包并安装。
4.修改 FunctionHandler 代码
本示例将从 S3 存储桶中获取并读取图像。它使用来自 SixLabors
的 Image
类将图片加载到 IronOcr。为使示例正常工作,必须设置一个 S3 存储桶,并使用 SixLabors.ImageSharp 必须安装 NuGet 软件包。
:path=/static-assets/ocr/content-code-examples/how-to/iron-ocr-aws-lambda-sample.cs
using Amazon;
using Amazon.Lambda.Core;
using Amazon.S3;
using Amazon.S3.Model;
using SixLabors.ImageSharp;
using IronOcr;
// Assembly attribute to enable the Lambda function's JSON input to be converted into a .NET class.
[assembly: LambdaSerializer(typeof(Amazon.Lambda.Serialization.SystemTextJson.DefaultLambdaJsonSerializer))]
namespace IronOcrAWSLambda;
public class Function
{
private readonly IAmazonS3 _s3Client;
private readonly string accessKey;
private readonly string secretKey;
public Function()
{
accessKey = "ACCESS-KEY";
secretKey = "SECRET-KEY";
_s3Client = new AmazonS3Client(accessKey, secretKey);
}
public async Task<string> FunctionHandler(string input, ILambdaContext context)
{
IronOcr.License.LicenseKey = "IRONOCR-LICENSE-KEY";
string bucketName = "S3-BUCKET-NAME";
var getObjectRequest = new GetObjectRequest
{
BucketName = bucketName,
Key = input,
};
using (GetObjectResponse response = await _s3Client.GetObjectAsync(getObjectRequest))
{
// Read the content of the object
using (Stream responseStream = response.ResponseStream)
{
Console.WriteLine("Reading image from S3");
Image image = Image.Load(responseStream);
Console.WriteLine("Reading image with IronOCR");
IronTesseract ironTesseract = new IronTesseract();
OcrInput ocrInput = new OcrInput(image);
OcrResult result = ironTesseract.Read(ocrInput);
return result.Text;
}
}
}
}
Imports Amazon
Imports Amazon.Lambda.Core
Imports Amazon.S3
Imports Amazon.S3.Model
Imports SixLabors.ImageSharp
Imports IronOcr
' Assembly attribute to enable the Lambda function's JSON input to be converted into a .NET class.
<Assembly: LambdaSerializer(GetType(Amazon.Lambda.Serialization.SystemTextJson.DefaultLambdaJsonSerializer))>
Namespace IronOcrAWSLambda
Public Class [Function]
Private ReadOnly _s3Client As IAmazonS3
Private ReadOnly accessKey As String
Private ReadOnly secretKey As String
Public Sub New()
accessKey = "ACCESS-KEY"
secretKey = "SECRET-KEY"
_s3Client = New AmazonS3Client(accessKey, secretKey)
End Sub
Public Async Function FunctionHandler(ByVal input As String, ByVal context As ILambdaContext) As Task(Of String)
IronOcr.License.LicenseKey = "IRONOCR-LICENSE-KEY"
Dim bucketName As String = "S3-BUCKET-NAME"
Dim getObjectRequest As New GetObjectRequest With {
.BucketName = bucketName,
.Key = input
}
Using response As GetObjectResponse = Await _s3Client.GetObjectAsync(getObjectRequest)
' Read the content of the object
Using responseStream As Stream = response.ResponseStream
Console.WriteLine("Reading image from S3")
Dim image As Image = System.Drawing.Image.Load(responseStream)
Console.WriteLine("Reading image with IronOCR")
Dim ironTesseract As New IronTesseract()
Dim ocrInput As New OcrInput(image)
Dim result As OcrResult = ironTesseract.Read(ocrInput)
Return result.Text
End Using
End Using
End Function
End Class
End Namespace
5.增加内存和超时
在 Lambda 中分配的内存大小会根据读取文档的大小和一次读取的数量而变化。作为基准,在 aws-lambda-tools-defaults.json
中将内存大小设置为 512 MB,超时时间设置为 300 秒:
"function-memory-size" : 512、
"函数超时" : 300
6.发布
要在 Visual Studio 中发布,只需右键单击解决方案并选择 "发布到 AWS Lambda...",然后设置配置即可。有关发布 Lambda 的更多信息,请参阅 AWS 网站.
7.试试看!
您可以通过 Lambda 控制台 或通过 Visual Studio。