使用 IRONOCR C# 读取 PDF 表单字段:以编程方式提取表单数据 Kannapat Udonpant 已更新:2025年12月19日 下载 IronOCR NuGet 下载 DLL 下载 Windows 安装程序 免费试用 法学硕士副本 法学硕士副本 将页面复制为 Markdown 格式,用于 LLMs 在 ChatGPT 中打开 向 ChatGPT 咨询此页面 在双子座打开 向 Gemini 询问此页面 在 Grok 中打开 向 Grok 询问此页面 打开困惑 向 Perplexity 询问有关此页面的信息 分享 在 Facebook 上分享 分享到 X(Twitter) 在 LinkedIn 上分享 复制链接 电子邮件文章 IronPDF 使您能够使用简单的 C# 代码从 PDF 表单中提取数据,以编程方式读取文本字段、复选框、单选按钮和下拉列表。 这样就无需手动输入数据,并且可以在几秒钟内自动完成表单处理工作流程。 与 PDF 表单工作可能是开发人员的真正头疼问题。 无论你是处理求职申请、调查回复还是保险索赔,手动复制表单数据都非常耗时且容易出错。 使用 IronPDF,您可以跳过所有繁琐工作,只需几行代码即可从 PDF 文档中的交互式表单字段提取字段值。 这将过去需要数小时的工作缩短到几秒钟。 在本文中,我将向您展示如何使用 C# 中的表单对象获取简单表单中的所有字段。 示例代码演示了如何遍历每个字段并提取其值而不费力。 它非常简单易用,您无需费力地使用复杂的 PDF 查看器,也无需处理隐藏的格式问题。 对于DevOps工程师来说,IronPDF 的容器化友好设计意味着您可以在 Docker 中部署表单处理服务,而无需处理复杂的本地依赖项。 我该如何开始使用 IronPDF? 设置 IronPDF 以提取 PDF 表单字段所需的配置很少。 通过 NuGet 包管理器安装库: Install-Package IronPDF 或通过 Visual Studio 的包管理器界面安装。 IronPDF 支持 Windows、Linux、macOS 和Docker 容器,使其能够灵活应用于各种部署场景。 有关详细的设置说明,请参阅IronPDF 文档。 对于容器化部署,IronPDF 提供了一个简化的 Docker 设置: FROM mcr.microsoft.com/dotnet/runtime:8.0 AS base WORKDIR /app # Install dependencies for IronPDF on Linux RUN apt-get update && apt-get install -y \ libgdiplus \ libc6-dev \ && rm -rf /var/lib/apt/lists/* FROM mcr.microsoft.com/dotnet/sdk:8.0 AS build WORKDIR /src COPY ["YourProject.csproj", "."] RUN dotnet restore "YourProject.csproj" COPY . . RUN dotnet build "YourProject.csproj" -c Release -o /app/build FROM build AS publish RUN dotnet publish "YourProject.csproj" -c Release -o /app/publish FROM base AS final WORKDIR /app COPY --from=publish /app/publish . ENTRYPOINT ["dotnet", "YourProject.dll"] 如何使用 IronPDF 读取 PDF 表单数据? 以下代码展示了如何使用 IronPDF 读取现有 PDF 文件中的所有字段: using IronPdf; using System; class Program { static void Main(string[] args) { // Load the PDF document containing interactive form fields PdfDocument pdf = PdfDocument.FromFile("application_form.pdf"); // Access the form object and iterate through all fields var form = pdf.Form; foreach (var field in form) { Console.WriteLine($"Field Name: {field.Name}"); Console.WriteLine($"Field Value: {field.Value}"); Console.WriteLine($"Field Type: {field.GetType().Name}"); Console.WriteLine("---"); } } } using IronPdf; using System; class Program { static void Main(string[] args) { // Load the PDF document containing interactive form fields PdfDocument pdf = PdfDocument.FromFile("application_form.pdf"); // Access the form object and iterate through all fields var form = pdf.Form; foreach (var field in form) { Console.WriteLine($"Field Name: {field.Name}"); Console.WriteLine($"Field Value: {field.Value}"); Console.WriteLine($"Field Type: {field.GetType().Name}"); Console.WriteLine("---"); } } } Imports IronPdf Imports System Class Program Shared Sub Main(args As String()) ' Load the PDF document containing interactive form fields Dim pdf As PdfDocument = PdfDocument.FromFile("application_form.pdf") ' Access the form object and iterate through all fields Dim form = pdf.Form For Each field In form Console.WriteLine($"Field Name: {field.Name}") Console.WriteLine($"Field Value: {field.Value}") Console.WriteLine($"Field Type: {field.GetType().Name}") Console.WriteLine("---") Next End Sub End Class $vbLabelText $csharpLabel 此代码加载包含简单表单的 PDF 文件,迭代每个表单字段,并打印字段名称、字段值和字段类型。 PdfDocument.FromFile()方法解析 PDF 文档,而Form属性提供对所有交互式表单字段的访问。 每个字段都公开其字段类型特有的属性,从而可以精确提取数据。 对于更复杂的场景,请查阅IronPDF API 参考文档,了解高级表单操作方法。 输出 分屏显示:左侧是已填写字段的 PDF 求职申请表,右侧是 Visual Studio 调试控制台,显示提取的表单字段数据。 我可以读取哪些不同类型的表单字段? PDF 表单包含各种字段类型,每种类型都需要特定的处理。 IronPDF 自动识别字段类型并提供量身定制的访问: using IronPdf; using System.Collections.Generic; using System.Linq; PdfDocument pdf = PdfDocument.FromFile("complex_form.pdf"); // Text fields - standard input boxes var nameField = pdf.Form.FindFormField("fullName"); string userName = nameField.Value; // Checkboxes - binary selections var agreeCheckbox = pdf.Form.FindFormField("termsAccepted"); bool isChecked = agreeCheckbox.Value == "Yes"; // Radio buttons - single choice from group var genderRadio = pdf.Form.FindFormField("gender"); string selectedGender = genderRadio.Value; // Dropdown lists (ComboBox) - predefined options var countryDropdown = pdf.Form.FindFormField("country"); string selectedCountry = countryDropdown.Value; // Access all available options var availableCountries = countryDropdown.Choices; // Multi-line text areas var commentsField = pdf.Form.FindFormField("comments_part1_513"); string userComments = commentsField.Value; // Grab all fields that start with "interests_" var interestFields = pdf.Form .Where(f => f.Name.StartsWith("interests_")); // Collect checked interests List<string> selectedInterests = new List<string>(); foreach (var field in interestFields) { if (field.Value == "Yes") // checkboxes are "Yes" if checked { // Extract the interest name from the field name string interestName = field.Name.Replace("interests_", ""); selectedInterests.Add(interestName); } } using IronPdf; using System.Collections.Generic; using System.Linq; PdfDocument pdf = PdfDocument.FromFile("complex_form.pdf"); // Text fields - standard input boxes var nameField = pdf.Form.FindFormField("fullName"); string userName = nameField.Value; // Checkboxes - binary selections var agreeCheckbox = pdf.Form.FindFormField("termsAccepted"); bool isChecked = agreeCheckbox.Value == "Yes"; // Radio buttons - single choice from group var genderRadio = pdf.Form.FindFormField("gender"); string selectedGender = genderRadio.Value; // Dropdown lists (ComboBox) - predefined options var countryDropdown = pdf.Form.FindFormField("country"); string selectedCountry = countryDropdown.Value; // Access all available options var availableCountries = countryDropdown.Choices; // Multi-line text areas var commentsField = pdf.Form.FindFormField("comments_part1_513"); string userComments = commentsField.Value; // Grab all fields that start with "interests_" var interestFields = pdf.Form .Where(f => f.Name.StartsWith("interests_")); // Collect checked interests List<string> selectedInterests = new List<string>(); foreach (var field in interestFields) { if (field.Value == "Yes") // checkboxes are "Yes" if checked { // Extract the interest name from the field name string interestName = field.Name.Replace("interests_", ""); selectedInterests.Add(interestName); } } Imports IronPdf Imports System.Collections.Generic Imports System.Linq Dim pdf As PdfDocument = PdfDocument.FromFile("complex_form.pdf") ' Text fields - standard input boxes Dim nameField = pdf.Form.FindFormField("fullName") Dim userName As String = nameField.Value ' Checkboxes - binary selections Dim agreeCheckbox = pdf.Form.FindFormField("termsAccepted") Dim isChecked As Boolean = agreeCheckbox.Value = "Yes" ' Radio buttons - single choice from group Dim genderRadio = pdf.Form.FindFormField("gender") Dim selectedGender As String = genderRadio.Value ' Dropdown lists (ComboBox) - predefined options Dim countryDropdown = pdf.Form.FindFormField("country") Dim selectedCountry As String = countryDropdown.Value ' Access all available options Dim availableCountries = countryDropdown.Choices ' Multi-line text areas Dim commentsField = pdf.Form.FindFormField("comments_part1_513") Dim userComments As String = commentsField.Value ' Grab all fields that start with "interests_" Dim interestFields = pdf.Form.Where(Function(f) f.Name.StartsWith("interests_")) ' Collect checked interests Dim selectedInterests As New List(Of String)() For Each field In interestFields If field.Value = "Yes" Then ' checkboxes are "Yes" if checked ' Extract the interest name from the field name Dim interestName As String = field.Name.Replace("interests_", "") selectedInterests.Add(interestName) End If Next $vbLabelText $csharpLabel FindFormField()方法允许按名称直接访问特定字段,无需遍历所有表单字段。 复选框选中时返回 "Yes",而单选按钮返回选定值。 选择字段(例如下拉列表和列表框)通过Choices属性提供字段值和所有可用选项。 这套全面的方法使开发人员能够访问和提取复杂交互式表单中的数据。 处理复杂表单时,请考虑使用IronPDF 的表单编辑功能,在提取之前以编程方式填写或修改字段值。 在这里,您可以看到 IronPDF 如何处理更加复杂的表单并从表单字段值中提取数据: 屏幕截图左侧显示一个 PDF 注册表单,其中包含各种字段类型(文本字段、复选框、单选按钮、下拉列表);右侧显示 Visual Studio 调试控制台,其中以编程方式显示了提取的表单字段数据。 如何处理多个调查表? 考虑一个场景,您需要处理来自客户调查的数百份 PDF 表单。 以下代码演示了使用 IronPDF 的批处理: using IronPdf; using System; using System.Text; using System.IO; using System.Collections.Generic; public class SurveyProcessor { static void Main(string[] args) { ProcessSurveyBatch(@"C:\Surveys"); } public static void ProcessSurveyBatch(string folderPath) { StringBuilder csvData = new StringBuilder(); csvData.AppendLine("Date,Name,Email,Rating,Feedback"); foreach (string pdfFile in Directory.GetFiles(folderPath, "*.pdf")) { try { PdfDocument survey = PdfDocument.FromFile(pdfFile); string date = survey.Form.FindFormField("surveyDate")?.Value ?? ""; string name = survey.Form.FindFormField("customerName")?.Value ?? ""; string email = survey.Form.FindFormField("email")?.Value ?? ""; string rating = survey.Form.FindFormField("satisfaction")?.Value ?? ""; string feedback = survey.Form.FindFormField("comments")?.Value ?? ""; feedback = feedback.Replace("\n", " ").Replace("\"", "\"\""); csvData.AppendLine($"{date},{name},{email},{rating},\"{feedback}\""); } catch (Exception ex) { Console.WriteLine($"Error processing {pdfFile}: {ex.Message}"); } } File.WriteAllText("survey_results.csv", csvData.ToString()); Console.WriteLine("Survey processing complete!"); } } using IronPdf; using System; using System.Text; using System.IO; using System.Collections.Generic; public class SurveyProcessor { static void Main(string[] args) { ProcessSurveyBatch(@"C:\Surveys"); } public static void ProcessSurveyBatch(string folderPath) { StringBuilder csvData = new StringBuilder(); csvData.AppendLine("Date,Name,Email,Rating,Feedback"); foreach (string pdfFile in Directory.GetFiles(folderPath, "*.pdf")) { try { PdfDocument survey = PdfDocument.FromFile(pdfFile); string date = survey.Form.FindFormField("surveyDate")?.Value ?? ""; string name = survey.Form.FindFormField("customerName")?.Value ?? ""; string email = survey.Form.FindFormField("email")?.Value ?? ""; string rating = survey.Form.FindFormField("satisfaction")?.Value ?? ""; string feedback = survey.Form.FindFormField("comments")?.Value ?? ""; feedback = feedback.Replace("\n", " ").Replace("\"", "\"\""); csvData.AppendLine($"{date},{name},{email},{rating},\"{feedback}\""); } catch (Exception ex) { Console.WriteLine($"Error processing {pdfFile}: {ex.Message}"); } } File.WriteAllText("survey_results.csv", csvData.ToString()); Console.WriteLine("Survey processing complete!"); } } Imports IronPdf Imports System Imports System.Text Imports System.IO Imports System.Collections.Generic Public Class SurveyProcessor Shared Sub Main(args As String()) ProcessSurveyBatch("C:\Surveys") End Sub Public Shared Sub ProcessSurveyBatch(folderPath As String) Dim csvData As New StringBuilder() csvData.AppendLine("Date,Name,Email,Rating,Feedback") For Each pdfFile As String In Directory.GetFiles(folderPath, "*.pdf") Try Dim survey As PdfDocument = PdfDocument.FromFile(pdfFile) Dim [date] As String = If(survey.Form.FindFormField("surveyDate")?.Value, "") Dim name As String = If(survey.Form.FindFormField("customerName")?.Value, "") Dim email As String = If(survey.Form.FindFormField("email")?.Value, "") Dim rating As String = If(survey.Form.FindFormField("satisfaction")?.Value, "") Dim feedback As String = If(survey.Form.FindFormField("comments")?.Value, "") feedback = feedback.Replace(vbLf, " ").Replace("""", """""") csvData.AppendLine($"{[date]},{name},{email},{rating},""{feedback}""") Catch ex As Exception Console.WriteLine($"Error processing {pdfFile}: {ex.Message}") End Try Next File.WriteAllText("survey_results.csv", csvData.ToString()) Console.WriteLine("Survey processing complete!") End Sub End Class $vbLabelText $csharpLabel 此批处理程序从目录中读取所有 PDF 调查表单,提取相关字段数据,并将结果导出到 CSV 文件。空值合并运算符 ( ?? ) 为缺失字段提供默认值,即使表单不完整也能确保可靠的数据提取。 错误处理功能会在不中断批处理进程的情况下捕获有问题的 PDF 文件。 如何构建可扩展的表单处理服务? 对于希望大规模部署表单处理的DevOps工程师来说,这里有一个可用于生产环境的 API 服务,可以处理 PDF 表单提取: using Microsoft.AspNetCore.Mvc; using IronPdf; using System.Collections.Concurrent; [ApiController] [Route("api/[controller]")] public class FormProcessorController : ControllerBase { private static readonly ConcurrentDictionary<string, ProcessingStatus> _processingJobs = new(); [HttpPost("extract")] public async Task<IActionResult> ExtractFormData(IFormFile pdfFile) { if (pdfFile == null || pdfFile.Length == 0) return BadRequest("No file uploaded"); var jobId = Guid.NewGuid().ToString(); _processingJobs[jobId] = new ProcessingStatus { Status = "Processing" }; // Process asynchronously to avoid blocking _ = Task.Run(async () => { try { using var stream = new MemoryStream(); await pdfFile.CopyToAsync(stream); var pdf = PdfDocument.FromStream(stream); var extractedData = new Dictionary<string, string>(); foreach (var field in pdf.Form) { extractedData[field.Name] = field.Value; } _processingJobs[jobId] = new ProcessingStatus { Status = "Complete", Data = extractedData }; } catch (Exception ex) { _processingJobs[jobId] = new ProcessingStatus { Status = "Error", Error = ex.Message }; } }); return Accepted(new { jobId }); } [HttpGet("status/{jobId}")] public IActionResult GetStatus(string jobId) { if (_processingJobs.TryGetValue(jobId, out var status)) return Ok(status); return NotFound(); } [HttpGet("health")] public IActionResult HealthCheck() { return Ok(new { status = "healthy", activeJobs = _processingJobs.Count(j => j.Value.Status == "Processing"), completedJobs = _processingJobs.Count(j => j.Value.Status == "Complete") }); } } public class ProcessingStatus { public string Status { get; set; } public Dictionary<string, string> Data { get; set; } public string Error { get; set; } } using Microsoft.AspNetCore.Mvc; using IronPdf; using System.Collections.Concurrent; [ApiController] [Route("api/[controller]")] public class FormProcessorController : ControllerBase { private static readonly ConcurrentDictionary<string, ProcessingStatus> _processingJobs = new(); [HttpPost("extract")] public async Task<IActionResult> ExtractFormData(IFormFile pdfFile) { if (pdfFile == null || pdfFile.Length == 0) return BadRequest("No file uploaded"); var jobId = Guid.NewGuid().ToString(); _processingJobs[jobId] = new ProcessingStatus { Status = "Processing" }; // Process asynchronously to avoid blocking _ = Task.Run(async () => { try { using var stream = new MemoryStream(); await pdfFile.CopyToAsync(stream); var pdf = PdfDocument.FromStream(stream); var extractedData = new Dictionary<string, string>(); foreach (var field in pdf.Form) { extractedData[field.Name] = field.Value; } _processingJobs[jobId] = new ProcessingStatus { Status = "Complete", Data = extractedData }; } catch (Exception ex) { _processingJobs[jobId] = new ProcessingStatus { Status = "Error", Error = ex.Message }; } }); return Accepted(new { jobId }); } [HttpGet("status/{jobId}")] public IActionResult GetStatus(string jobId) { if (_processingJobs.TryGetValue(jobId, out var status)) return Ok(status); return NotFound(); } [HttpGet("health")] public IActionResult HealthCheck() { return Ok(new { status = "healthy", activeJobs = _processingJobs.Count(j => j.Value.Status == "Processing"), completedJobs = _processingJobs.Count(j => j.Value.Status == "Complete") }); } } public class ProcessingStatus { public string Status { get; set; } public Dictionary<string, string> Data { get; set; } public string Error { get; set; } } Imports Microsoft.AspNetCore.Mvc Imports IronPdf Imports System.Collections.Concurrent Imports System.IO Imports System.Threading.Tasks <ApiController> <Route("api/[controller]")> Public Class FormProcessorController Inherits ControllerBase Private Shared ReadOnly _processingJobs As New ConcurrentDictionary(Of String, ProcessingStatus)() <HttpPost("extract")> Public Async Function ExtractFormData(pdfFile As IFormFile) As Task(Of IActionResult) If pdfFile Is Nothing OrElse pdfFile.Length = 0 Then Return BadRequest("No file uploaded") End If Dim jobId = Guid.NewGuid().ToString() _processingJobs(jobId) = New ProcessingStatus With {.Status = "Processing"} ' Process asynchronously to avoid blocking _ = Task.Run(Async Function() Try Using stream As New MemoryStream() Await pdfFile.CopyToAsync(stream) Dim pdf = PdfDocument.FromStream(stream) Dim extractedData As New Dictionary(Of String, String)() For Each field In pdf.Form extractedData(field.Name) = field.Value Next _processingJobs(jobId) = New ProcessingStatus With { .Status = "Complete", .Data = extractedData } End Using Catch ex As Exception _processingJobs(jobId) = New ProcessingStatus With { .Status = "Error", .Error = ex.Message } End Try End Function) Return Accepted(New With {Key .jobId = jobId}) End Function <HttpGet("status/{jobId}")> Public Function GetStatus(jobId As String) As IActionResult Dim status As ProcessingStatus = Nothing If _processingJobs.TryGetValue(jobId, status) Then Return Ok(status) End If Return NotFound() End Function <HttpGet("health")> Public Function HealthCheck() As IActionResult Return Ok(New With { Key .status = "healthy", Key .activeJobs = _processingJobs.Count(Function(j) j.Value.Status = "Processing"), Key .completedJobs = _processingJobs.Count(Function(j) j.Value.Status = "Complete") }) End Function End Class Public Class ProcessingStatus Public Property Status As String Public Property Data As Dictionary(Of String, String) Public Property Error As String End Class $vbLabelText $csharpLabel 该 API 服务提供异步表单处理和作业跟踪功能,非常适合微服务架构。 /health端点使 Kubernetes 等容器编排器能够监控服务健康状况。 使用 Docker Compose 部署此服务: version: '3.8' services: form-processor: build: . ports: - "8080:80" environment: - ASPNETCORE_ENVIRONMENT=Production - IRONPDF_LICENSE_KEY=${IRONPDF_LICENSE_KEY} healthcheck: test: ["CMD", "curl", "-f", "___PROTECTED_URL_7___"] interval: 30s timeout: 10s retries: 3 deploy: resources: limits: cpus: '2' memory: 2G reservations: cpus: '1' memory: 1G version: '3.8' services: form-processor: build: . ports: - "8080:80" environment: - ASPNETCORE_ENVIRONMENT=Production - IRONPDF_LICENSE_KEY=${IRONPDF_LICENSE_KEY} healthcheck: test: ["CMD", "curl", "-f", "___PROTECTED_URL_7___"] interval: 30s timeout: 10s retries: 3 deploy: resources: limits: cpus: '2' memory: 2G reservations: cpus: '1' memory: 1G YAML 那么,性能和资源优化方面呢? 处理大量 PDF 表单时,资源优化至关重要。 IronPDF 提供了多种策略来最大限度地提高吞吐量: using IronPdf; using System.Threading.Tasks.Dataflow; public class HighPerformanceFormProcessor { public static async Task ProcessFormsInParallel(string[] pdfPaths) { // Configure parallelism based on available CPU cores var processorCount = Environment.ProcessorCount; var actionBlock = new ActionBlock<string>( async pdfPath => await ProcessSingleForm(pdfPath), new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = processorCount, BoundedCapacity = processorCount * 2 // Prevent memory overflow }); // Feed PDFs to the processing pipeline foreach (var path in pdfPaths) { await actionBlock.SendAsync(path); } actionBlock.Complete(); await actionBlock.Completion; } private static async Task ProcessSingleForm(string pdfPath) { try { // Use async file reading to avoid blocking I/O using var fileStream = new FileStream(pdfPath, FileMode.Open, FileAccess.Read, FileShare.Read, 4096, true); var pdf = PdfDocument.FromStream(fileStream); // Process form fields var results = new Dictionary<string, string>(); foreach (var field in pdf.Form) { results[field.Name] = field.Value; } // Store results (implement your storage logic) await StoreResults(Path.GetFileName(pdfPath), results); } catch (Exception ex) { // Log error (implement your logging) Console.WriteLine($"Error processing {pdfPath}: {ex.Message}"); } } private static async Task StoreResults(string fileName, Dictionary<string, string> data) { // Implement your storage logic (database, file system, cloud storage) await Task.CompletedTask; // Placeholder } } using IronPdf; using System.Threading.Tasks.Dataflow; public class HighPerformanceFormProcessor { public static async Task ProcessFormsInParallel(string[] pdfPaths) { // Configure parallelism based on available CPU cores var processorCount = Environment.ProcessorCount; var actionBlock = new ActionBlock<string>( async pdfPath => await ProcessSingleForm(pdfPath), new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = processorCount, BoundedCapacity = processorCount * 2 // Prevent memory overflow }); // Feed PDFs to the processing pipeline foreach (var path in pdfPaths) { await actionBlock.SendAsync(path); } actionBlock.Complete(); await actionBlock.Completion; } private static async Task ProcessSingleForm(string pdfPath) { try { // Use async file reading to avoid blocking I/O using var fileStream = new FileStream(pdfPath, FileMode.Open, FileAccess.Read, FileShare.Read, 4096, true); var pdf = PdfDocument.FromStream(fileStream); // Process form fields var results = new Dictionary<string, string>(); foreach (var field in pdf.Form) { results[field.Name] = field.Value; } // Store results (implement your storage logic) await StoreResults(Path.GetFileName(pdfPath), results); } catch (Exception ex) { // Log error (implement your logging) Console.WriteLine($"Error processing {pdfPath}: {ex.Message}"); } } private static async Task StoreResults(string fileName, Dictionary<string, string> data) { // Implement your storage logic (database, file system, cloud storage) await Task.CompletedTask; // Placeholder } } Imports IronPdf Imports System.Threading.Tasks.Dataflow Imports System.IO Public Class HighPerformanceFormProcessor Public Shared Async Function ProcessFormsInParallel(pdfPaths As String()) As Task ' Configure parallelism based on available CPU cores Dim processorCount = Environment.ProcessorCount Dim actionBlock = New ActionBlock(Of String)( Async Function(pdfPath) Await ProcessSingleForm(pdfPath), New ExecutionDataflowBlockOptions With { .MaxDegreeOfParallelism = processorCount, .BoundedCapacity = processorCount * 2 ' Prevent memory overflow }) ' Feed PDFs to the processing pipeline For Each path In pdfPaths Await actionBlock.SendAsync(path) Next actionBlock.Complete() Await actionBlock.Completion End Function Private Shared Async Function ProcessSingleForm(pdfPath As String) As Task Try ' Use async file reading to avoid blocking I/O Using fileStream As New FileStream(pdfPath, FileMode.Open, FileAccess.Read, FileShare.Read, 4096, True) Dim pdf = PdfDocument.FromStream(fileStream) ' Process form fields Dim results = New Dictionary(Of String, String)() For Each field In pdf.Form results(field.Name) = field.Value Next ' Store results (implement your storage logic) Await StoreResults(Path.GetFileName(pdfPath), results) End Using Catch ex As Exception ' Log error (implement your logging) Console.WriteLine($"Error processing {pdfPath}: {ex.Message}") End Try End Function Private Shared Async Function StoreResults(fileName As String, data As Dictionary(Of String, String)) As Task ' Implement your storage logic (database, file system, cloud storage) Await Task.CompletedTask ' Placeholder End Function End Class $vbLabelText $csharpLabel 该实现利用 TPL 数据流创建有界处理管道,防止内存耗尽,同时最大限度地利用 CPU。 BoundedCapacity设置可确保管道不会同时将过多的 PDF 文件加载到内存中,这对于内存受限的容器化环境至关重要。 如何监控生产环境中的表单处理过程? 对于生产环境部署,全面的监控可确保表单处理的可靠性。 使用流行的可观测性工具集成应用程序指标: using Prometheus; using System.Diagnostics; public class MonitoredFormProcessor { private static readonly Counter ProcessedFormsCounter = Metrics .CreateCounter("pdf_forms_processed_total", "Total number of processed PDF forms"); private static readonly Histogram ProcessingDuration = Metrics .CreateHistogram("pdf_form_processing_duration_seconds", "Processing duration in seconds"); private static readonly Gauge ActiveProcessingGauge = Metrics .CreateGauge("pdf_forms_active_processing", "Number of forms currently being processed"); public async Task<FormExtractionResult> ProcessFormWithMetrics(string pdfPath) { using (ProcessingDuration.NewTimer()) { ActiveProcessingGauge.Inc(); try { var pdf = PdfDocument.FromFile(pdfPath); var result = new FormExtractionResult { FieldCount = pdf.Form.Count(), Fields = new Dictionary<string, string>() }; foreach (var field in pdf.Form) { result.Fields[field.Name] = field.Value; } ProcessedFormsCounter.Inc(); return result; } finally { ActiveProcessingGauge.Dec(); } } } } public class FormExtractionResult { public int FieldCount { get; set; } public Dictionary<string, string> Fields { get; set; } } using Prometheus; using System.Diagnostics; public class MonitoredFormProcessor { private static readonly Counter ProcessedFormsCounter = Metrics .CreateCounter("pdf_forms_processed_total", "Total number of processed PDF forms"); private static readonly Histogram ProcessingDuration = Metrics .CreateHistogram("pdf_form_processing_duration_seconds", "Processing duration in seconds"); private static readonly Gauge ActiveProcessingGauge = Metrics .CreateGauge("pdf_forms_active_processing", "Number of forms currently being processed"); public async Task<FormExtractionResult> ProcessFormWithMetrics(string pdfPath) { using (ProcessingDuration.NewTimer()) { ActiveProcessingGauge.Inc(); try { var pdf = PdfDocument.FromFile(pdfPath); var result = new FormExtractionResult { FieldCount = pdf.Form.Count(), Fields = new Dictionary<string, string>() }; foreach (var field in pdf.Form) { result.Fields[field.Name] = field.Value; } ProcessedFormsCounter.Inc(); return result; } finally { ActiveProcessingGauge.Dec(); } } } } public class FormExtractionResult { public int FieldCount { get; set; } public Dictionary<string, string> Fields { get; set; } } Imports Prometheus Imports System.Diagnostics Public Class MonitoredFormProcessor Private Shared ReadOnly ProcessedFormsCounter As Counter = Metrics.CreateCounter("pdf_forms_processed_total", "Total number of processed PDF forms") Private Shared ReadOnly ProcessingDuration As Histogram = Metrics.CreateHistogram("pdf_form_processing_duration_seconds", "Processing duration in seconds") Private Shared ReadOnly ActiveProcessingGauge As Gauge = Metrics.CreateGauge("pdf_forms_active_processing", "Number of forms currently being processed") Public Async Function ProcessFormWithMetrics(pdfPath As String) As Task(Of FormExtractionResult) Using ProcessingDuration.NewTimer() ActiveProcessingGauge.Inc() Try Dim pdf = PdfDocument.FromFile(pdfPath) Dim result As New FormExtractionResult With { .FieldCount = pdf.Form.Count(), .Fields = New Dictionary(Of String, String)() } For Each field In pdf.Form result.Fields(field.Name) = field.Value Next ProcessedFormsCounter.Inc() Return result Finally ActiveProcessingGauge.Dec() End Try End Using End Function End Class Public Class FormExtractionResult Public Property FieldCount As Integer Public Property Fields As Dictionary(Of String, String) End Class $vbLabelText $csharpLabel 这些 Prometheus 指标与 Grafana 仪表板无缝集成,可实时显示表单处理性能。 配置警报规则,以便在处理时间超过阈值或错误率飙升时发出通知。 结论 IronPDF 简化了 C# 中的 PDF 表单数据提取,将复杂的文档处理转换为简单的代码。 从基本的字段读取到企业级批量处理,该库能够高效地处理各种表单类型。 对于DevOps团队而言,IronPDF 的容器友好型架构和极少的依赖项使其能够在云平台上顺利部署。 提供的示例展示了真实场景的实际应用,从简单的控制台应用程序到具有监控功能的可扩展微服务。 无论您是自动化调查处理、将纸质表格数字化,还是构建文档管理系统,IronPDF 都能提供可靠地提取表单数据的工具。 其跨平台支持确保您的表单处理服务在开发、测试和生产环境中一致运行。 常见问题解答 IronPDF 如何帮助在 C# 中读取 PDF 表单字段? IronPDF 提供了一种简化的流程,可以从 C# 中的可填写 PDF 中提取表单字段数据,与手动数据提取相比,大大减少了所需的时间和精力。 IronPDF 可以提取哪些类型的 PDF 表单字段? 使用 IronPDF,您可以从可填写 PDF 中提取各种表单字段,包括文本输入框、复选框、下拉选择框等等。 自动提取PDF表单数据有何好处? 使用 IronPDF 自动提取 PDF 表单数据可以节省时间、减少错误,并通过消除手动数据输入来提高生产力。 IronPDF 适合处理大量 PDF 表单吗? 是的,IronPDF 旨在高效处理大量 PDF 表单,因此非常适合处理求职申请、调查和其他批量文档任务。 与手动输入数据相比,使用 IronPDF 有哪些优势? IronPDF 可以减少人为错误,加快数据提取过程,并让开发人员专注于更复杂的任务,而不是枯燥的数据录入。 IronPDF 可以处理不同的 PDF 格式吗? IronPDF能够处理各种PDF格式,确保其多功能性,并与各种文档和表单设计兼容。 IronPDF 如何提高数据提取的准确性? IronPDF 通过自动化提取过程,最大限度地降低了手动数据输入过程中经常出现的人为错误风险,从而提高了准确性。 IronPDF 使用什么编程语言? IronPDF 旨在与 C# 一起使用,为开发人员提供强大的工具,以便在 .NET 应用程序中操作 PDF 文档和提取数据。 Kannapat Udonpant 立即与工程团队聊天 软件工程师 在成为软件工程师之前,Kannapat 在日本北海道大学完成了环境资源博士学位。在攻读学位期间,Kannapat 还成为了车辆机器人实验室的成员,隶属于生物生产工程系。2022 年,他利用自己的 C# 技能加入 Iron Software 的工程团队,专注于 IronPDF。Kannapat 珍视他的工作,因为他可以直接从编写大多数 IronPDF 代码的开发者那里学习。除了同行学习外,Kannapat 还喜欢在 Iron Software 工作的社交方面。不撰写代码或文档时,Kannapat 通常可以在他的 PS5 上玩游戏或重温《最后生还者》。 相关文章 已发布2026年1月21日 OCR C# GitHub 集成:使用 IronOCR 构建文本识别应用程序 OCR C# GitHub 教程:使用 IronOCR 在您的 GitHub 项目中实施文本识别。包括代码示例和版本控制技巧。 阅读更多 已发布2026年1月21日 使用 IronOCR 创建 .NET OCR SDK 使用 IronOCR 的 .NET SDK 创建强大的 OCR 解决方案。简单的 API、企业功能,以及用于文档处理应用程序的跨平台支持。 阅读更多 已更新2026年1月5日 如何 OCR PDF:使用 C# .NET OCR 从扫描文档中提取 PDF 文本 了解如何使用 IronOcr 对 PDF 进行 OCR 并从扫描文档中提取文本。 阅读更多 如何 OCR PDF:使用 C# .NET OCR 从扫描文档中提取 PDF 文本C# 从 PDF 中提取图像:完整...
已发布2026年1月21日 OCR C# GitHub 集成:使用 IronOCR 构建文本识别应用程序 OCR C# GitHub 教程:使用 IronOCR 在您的 GitHub 项目中实施文本识别。包括代码示例和版本控制技巧。 阅读更多
已发布2026年1月21日 使用 IronOCR 创建 .NET OCR SDK 使用 IronOCR 的 .NET SDK 创建强大的 OCR 解决方案。简单的 API、企业功能,以及用于文档处理应用程序的跨平台支持。 阅读更多