使用 IronOCR 來讀取 AWS Lambda 上的文件

This article was translated from English: Does it need improvement?
Translated
View the article in English

這篇操作指南將引導您完成設置 AWS Lambda 函數與 IronOCR 的步驟。有了這個指南,您將學會如何設置 IronOCR 並從 S3 存儲桶中讀取文件。

1. 為 AWS Lambda 創建一個容器模板

為了使 IronOCR 正常運行,需要在機器上安裝某些依賴項——為了確保這些依賴項被安裝,Lambda 函數必須容器化。

使用 Visual Studio,創建容器化的 AWS Lambda 是一個簡單的過程:只需安裝 AWS 工具包 for Visual Studio,選擇 AWS Lambda C# 模板,接著選擇 Container Image 藍圖,然後選擇“完成”。

2. 添加套件依赖

修改项目的dockerfile如下:

請提供內容以進行翻譯。

設定為 dotnet:5 或 dotnet:6 以支援 .NET 5/6

FROM public.ecr.aws/lambda/dotnet:7

WORKDIR /var/task

RUN yum update -y

RUN yum install -y amazon-linux-extras RUN amazon-linux-extras install epel -y RUN yum install -y libgdiplus

COPY "bin/Release/lambda-publish" .

請提供內容以進行翻譯。

3. 安裝 IronOCR 和 IronOCR.Linux NuGet 套件

在 Visual Studio 中安裝 IronOCRIronOCR.Linux 套件:

  1. 前往 專案 > 管理 NuGet 套件...
  2. 選擇 瀏覽,然後搜尋 IronOCRIronOCR.Linux
  3. 選擇套件並安裝。

4. 修改 FunctionHandler 代碼

此範例將從一個 S3 存儲桶中檢索圖像並讀取它。它使用 SixLaborsImage 類去加載圖像到 IronOCR。為了使該範例正常工作,必須設置一個 S3 存儲桶並且 SixLabors.ImageSharp 必須安裝NuGet套件。

:path=/static-assets/ocr/content-code-examples/how-to/iron-ocr-aws-lambda-sample.cs
using Amazon;
using Amazon.Lambda.Core;
using Amazon.S3;
using Amazon.S3.Model;
using SixLabors.ImageSharp;
using IronOcr;

// Assembly attribute to enable the Lambda function's JSON input to be converted into a .NET class.
[assembly: LambdaSerializer(typeof(Amazon.Lambda.Serialization.SystemTextJson.DefaultLambdaJsonSerializer))]

namespace IronOcrAWSLambda;

public class Function
{
	private readonly IAmazonS3 _s3Client;
	private readonly string accessKey;
	private readonly string secretKey;

	public Function()
	{
		accessKey = "ACCESS-KEY";
		secretKey = "SECRET-KEY";
		_s3Client = new AmazonS3Client(accessKey, secretKey);
	}
	public async Task<string> FunctionHandler(string input, ILambdaContext context)
	{
		IronOcr.License.LicenseKey = "IRONOCR-LICENSE-KEY";
		string bucketName = "S3-BUCKET-NAME";

		var getObjectRequest = new GetObjectRequest
		{
			BucketName = bucketName,
			Key = input,
		};

		using (GetObjectResponse response = await _s3Client.GetObjectAsync(getObjectRequest))
		{
			// Read the content of the object
			using (Stream responseStream = response.ResponseStream)
			{
				Console.WriteLine("Reading image from S3");
				Image image = Image.Load(responseStream);

				Console.WriteLine("Reading image with IronOCR");
				IronTesseract ironTesseract = new IronTesseract();
				OcrInput ocrInput = new OcrInput(image);
				OcrResult result = ironTesseract.Read(ocrInput);

				return result.Text;
			}
		}
	}
}
Imports Amazon
Imports Amazon.Lambda.Core
Imports Amazon.S3
Imports Amazon.S3.Model
Imports SixLabors.ImageSharp
Imports IronOcr

' Assembly attribute to enable the Lambda function's JSON input to be converted into a .NET class.
<Assembly: LambdaSerializer(GetType(Amazon.Lambda.Serialization.SystemTextJson.DefaultLambdaJsonSerializer))>

Namespace IronOcrAWSLambda

	Public Class [Function]
		Private ReadOnly _s3Client As IAmazonS3
		Private ReadOnly accessKey As String
		Private ReadOnly secretKey As String

		Public Sub New()
			accessKey = "ACCESS-KEY"
			secretKey = "SECRET-KEY"
			_s3Client = New AmazonS3Client(accessKey, secretKey)
		End Sub
		Public Async Function FunctionHandler(ByVal input As String, ByVal context As ILambdaContext) As Task(Of String)
			IronOcr.License.LicenseKey = "IRONOCR-LICENSE-KEY"
			Dim bucketName As String = "S3-BUCKET-NAME"

			Dim getObjectRequest As New GetObjectRequest With {
				.BucketName = bucketName,
				.Key = input
			}

			Using response As GetObjectResponse = Await _s3Client.GetObjectAsync(getObjectRequest)
				' Read the content of the object
				Using responseStream As Stream = response.ResponseStream
					Console.WriteLine("Reading image from S3")
					Dim image As Image = System.Drawing.Image.Load(responseStream)

					Console.WriteLine("Reading image with IronOCR")
					Dim ironTesseract As New IronTesseract()
					Dim ocrInput As New OcrInput(image)
					Dim result As OcrResult = ironTesseract.Read(ocrInput)

					Return result.Text
				End Using
			End Using
		End Function
	End Class
End Namespace
VB   C#

5. 增加記憶體和超時

Lambda中要分配的記憶體量將根據正在閱讀的文件大小以及同時閱讀的數量而有所不同。作為基準,將記憶體大小設定為512 MB,並將超時設定為300秒,在 aws-lambda-tools-defaults.json 中:

"function-memory-size" : 512,

"function-timeout" : 300

請提供內容以進行翻譯。

6. 發佈

要在 Visual Studio 中發佈,只需右鍵點擊解決方案,選擇Publish to AWS Lambda...,然後設置配置。您可以在以下位置了解更多有關發佈 Lambda 的資訊 AWS 網站.

7. 試試看!

您可以通過以下方式啟用Lambda函數 Lambda 控制台 或透過 Visual Studio。