使用 IronOcr 在 AWS Lambda 上讀取文件

This article was translated from English: Does it need improvement?
Translated
View the article in English

這篇操作指南將指導您完成使用 IronOCR 設置 AWS Lambda 函數的步驟。 透過本指南,您將學會如何設置IronOCR並從S3 Bucket讀取文件。

1. 使用容器範本建立 AWS Lambda

要讓 IronOcr 正常運作,必須在機器上安裝某些依賴項 —— 為了確保這些依賴項被安裝,Lambda 函數必須被容器化。

使用 Visual Studio,創建容器化的 AWS Lambda 是一個簡單的過程:只需安裝AWS 工具包 for Visual Studio,選擇 AWS Lambda C# 模板,接著選擇 Container Image 藍圖,然後選擇“完成”。

2. 添加套件依賴性

將項目的 dockerfile 進行以下修改: 請提供內容以進行翻譯。

設置為 dotnet:5 或 dotnet:6 用於 .NET 5/6

FROM public.ecr.aws/lambda/dotnet:7

WORKDIR /var/task

運行 yum update -y

RUN yum install -y amazon-linux-extras

RUN amazon-linux-extras install epel -y

RUN yum install -y libgdiplus

將 "bin/Release/lambda-publish" 複製至 。 請提供內容以進行翻譯。

3. 安裝 IronOcr 和 IronOcr.Linux NuGet 套件

要在 Visual Studio 中安裝 IronOcrIronOcr.Linux 套件:

  1. 前往 專案 > 管理 NuGet 套件...

  2. 選擇瀏覽,然後搜索IronOcrIronOcr.Linux

  3. 選擇套件並安裝。

4. 修改 FunctionHandler 代碼

此範例將從 S3 桶中檢索圖像並讀取它。 它使用 SixLaborsImage 類別來將圖像加載到 IronOcr 中。 為了使範例正常運作,必須設置一個S3桶。SixLabors.ImageSharp必須安裝NuGet套件。

:path=/static-assets/ocr/content-code-examples/how-to/iron-ocr-aws-lambda-sample.cs
using Amazon;
using Amazon.Lambda.Core;
using Amazon.S3;
using Amazon.S3.Model;
using SixLabors.ImageSharp;
using IronOcr;

// Assembly attribute to enable the Lambda function's JSON input to be converted into a .NET class.
[assembly: LambdaSerializer(typeof(Amazon.Lambda.Serialization.SystemTextJson.DefaultLambdaJsonSerializer))]

namespace IronOcrAWSLambda;

public class Function
{
	private readonly IAmazonS3 _s3Client;
	private readonly string accessKey;
	private readonly string secretKey;

	public Function()
	{
		accessKey = "ACCESS-KEY";
		secretKey = "SECRET-KEY";
		_s3Client = new AmazonS3Client(accessKey, secretKey);
	}
	public async Task<string> FunctionHandler(string input, ILambdaContext context)
	{
		IronOcr.License.LicenseKey = "IRONOCR-LICENSE-KEY";
		string bucketName = "S3-BUCKET-NAME";

		var getObjectRequest = new GetObjectRequest
		{
			BucketName = bucketName,
			Key = input,
		};

		using (GetObjectResponse response = await _s3Client.GetObjectAsync(getObjectRequest))
		{
			// Read the content of the object
			using (Stream responseStream = response.ResponseStream)
			{
				Console.WriteLine("Reading image from S3");
				Image image = Image.Load(responseStream);

				Console.WriteLine("Reading image with IronOCR");
				IronTesseract ironTesseract = new IronTesseract();
				OcrInput ocrInput = new OcrInput(image);
				OcrResult result = ironTesseract.Read(ocrInput);

				return result.Text;
			}
		}
	}
}
Imports Amazon
Imports Amazon.Lambda.Core
Imports Amazon.S3
Imports Amazon.S3.Model
Imports SixLabors.ImageSharp
Imports IronOcr

' Assembly attribute to enable the Lambda function's JSON input to be converted into a .NET class.
<Assembly: LambdaSerializer(GetType(Amazon.Lambda.Serialization.SystemTextJson.DefaultLambdaJsonSerializer))>

Namespace IronOcrAWSLambda

	Public Class [Function]
		Private ReadOnly _s3Client As IAmazonS3
		Private ReadOnly accessKey As String
		Private ReadOnly secretKey As String

		Public Sub New()
			accessKey = "ACCESS-KEY"
			secretKey = "SECRET-KEY"
			_s3Client = New AmazonS3Client(accessKey, secretKey)
		End Sub
		Public Async Function FunctionHandler(ByVal input As String, ByVal context As ILambdaContext) As Task(Of String)
			IronOcr.License.LicenseKey = "IRONOCR-LICENSE-KEY"
			Dim bucketName As String = "S3-BUCKET-NAME"

			Dim getObjectRequest As New GetObjectRequest With {
				.BucketName = bucketName,
				.Key = input
			}

			Using response As GetObjectResponse = Await _s3Client.GetObjectAsync(getObjectRequest)
				' Read the content of the object
				Using responseStream As Stream = response.ResponseStream
					Console.WriteLine("Reading image from S3")
					Dim image As Image = System.Drawing.Image.Load(responseStream)

					Console.WriteLine("Reading image with IronOCR")
					Dim ironTesseract As New IronTesseract()
					Dim ocrInput As New OcrInput(image)
					Dim result As OcrResult = ironTesseract.Read(ocrInput)

					Return result.Text
				End Using
			End Using
		End Function
	End Class
End Namespace
VB   C#

5. 增加記憶體和超時时间

根據正在讀取的文件的大小以及一次讀取多少文件,分配給 Lambda 的記憶體量將有所不同。 作為基線,在aws-lambda-tools-defaults.json中將記憶體大小設置為512 MB,超時設置為300秒。 請提供內容以進行翻譯。

“function-memory-size”:512,

"function-timeout" : 300 請提供內容以進行翻譯。

6. 發布

在 Visual Studio 中發布,只需右鍵點擊解決方案並選擇 Publish to AWS Lambda...,然後設定配置。 您可以閱讀更多關於在Lambda上發布的内容AWS 網站.

7. 試用看看!

您可以透過以下方式激活 Lambda 函數:Lambda 控制台或透過 Visual Studio。