跳過到頁腳內容
使用 IRONOCR

C# 讀取 PDF 表單欄位:以程式方式抽取表單資料

IronPDF 讓您能夠使用簡單的 C# 程式碼從 PDF 表單中提取數據,以程式設計方式讀取文字欄位、複選框、單選按鈕和下拉清單。 這樣就無需手動輸入數據,並且可以在幾秒鐘內自動完成表單處理工作流程。

對於開發人員來說,處理 PDF 表單可能是一件非常令人頭痛的事情。 無論你是處理求職申請、調查回應還是保險索賠,手動複製表單資料都非常耗時且容易出錯。 使用IronPDF ,您可以跳過所有繁瑣的工作,只需幾行程式碼即可從 PDF 文件中的互動式表單欄位中提取欄位值。 它將原本需要幾個小時才能完成的事情縮短到了幾秒鐘。

在本文中,我將向您展示如何使用 C# 中的表單物件來取得簡單表單中的所有欄位。 範例程式碼示範如何遍歷每個欄位並輕鬆提取其值。 它非常簡單易用,您無需費力地使用複雜的 PDF 檢視器,也無需處理隱藏的格式問題。 對於DevOps工程師來說,IronPDF 的容器化友善設計意味著您可以在 Docker 中部署表單處理服務,而無需處理複雜的本機相依性。

我該如何開始使用 IronPDF?

設定 IronPDF 以提取 PDF 表單欄位只需極少的配置。 透過 NuGet 套件管理器安裝庫:

Install-Package IronPDF

或透過 Visual Studio 的套件管理器 UI。 IronPDF 支援 Windows、Linux、macOS 和Docker 容器,使其能夠靈活應用於各種部署場景。 有關詳細的設定說明,請參閱IronPDF 文件

對於容器化部署,IronPDF 提供了一個簡化的 Docker 設定:

FROM mcr.microsoft.com/dotnet/runtime:8.0 AS base
WORKDIR /app

# Install dependencies for IronPDF on Linux
RUN apt-get update && apt-get install -y \
    libgdiplus \
    libc6-dev \
    && rm -rf /var/lib/apt/lists/*

FROM mcr.microsoft.com/dotnet/sdk:8.0 AS build
WORKDIR /src
COPY ["YourProject.csproj", "."]
RUN dotnet restore "YourProject.csproj"
COPY . .
RUN dotnet build "YourProject.csproj" -c Release -o /app/build

FROM build AS publish
RUN dotnet publish "YourProject.csproj" -c Release -o /app/publish

FROM base AS final
WORKDIR /app
COPY --from=publish /app/publish .
ENTRYPOINT ["dotnet", "YourProject.dll"]

如何使用 IronPDF 讀取 PDF 表單資料?

以下程式碼顯示如何使用 IronPDF 讀取現有 PDF 文件中的所有欄位:

using IronPdf;
using System;

class Program
{
    static void Main(string[] args)
    {
        // Load the PDF document containing interactive form fields
        PdfDocument pdf = PdfDocument.FromFile("application_form.pdf");
        // Access the form object and iterate through all fields
        var form = pdf.Form;
        foreach (var field in form)
        {
            Console.WriteLine($"Field Name: {field.Name}");
            Console.WriteLine($"Field Value: {field.Value}");
            Console.WriteLine($"Field Type: {field.GetType().Name}");
            Console.WriteLine("---");
        }
    }
}
using IronPdf;
using System;

class Program
{
    static void Main(string[] args)
    {
        // Load the PDF document containing interactive form fields
        PdfDocument pdf = PdfDocument.FromFile("application_form.pdf");
        // Access the form object and iterate through all fields
        var form = pdf.Form;
        foreach (var field in form)
        {
            Console.WriteLine($"Field Name: {field.Name}");
            Console.WriteLine($"Field Value: {field.Value}");
            Console.WriteLine($"Field Type: {field.GetType().Name}");
            Console.WriteLine("---");
        }
    }
}
Imports IronPdf
Imports System

Class Program
    Shared Sub Main(args As String())
        ' Load the PDF document containing interactive form fields
        Dim pdf As PdfDocument = PdfDocument.FromFile("application_form.pdf")
        ' Access the form object and iterate through all fields
        Dim form = pdf.Form
        For Each field In form
            Console.WriteLine($"Field Name: {field.Name}")
            Console.WriteLine($"Field Value: {field.Value}")
            Console.WriteLine($"Field Type: {field.GetType().Name}")
            Console.WriteLine("---")
        Next
    End Sub
End Class
$vbLabelText   $csharpLabel

這段程式碼載入一個包含簡單表單的 PDF 文件,遍歷每個表單字段,並列印字段名稱、字段值和字段類型。 PdfDocument.FromFile()方法解析 PDF 文檔,而Form屬性提供對所有互動式表單欄位的存取。 每個欄位都公開其欄位類型特有的屬性,從而可以精確地提取資料。 對於更複雜的場景,請查閱IronPDF API 參考文檔,以了解高級表單操作方法。

輸出

分割畫面顯示:左側是已填寫欄位的 PDF 求職申請表,右側是 Visual Studio 偵錯控制台,顯示擷取的表單欄位資料。

我可以讀取哪些不同類型的表單欄位?

PDF 表單包含多種欄位類型,每種類型都需要特定的處理方式。 IronPDF 可自動識別欄位類型並提供客製化存取權限:

using IronPdf;
using System.Collections.Generic;
using System.Linq;

PdfDocument pdf = PdfDocument.FromFile("complex_form.pdf");
// Text fields - standard input boxes
var nameField = pdf.Form.FindFormField("fullName");
string userName = nameField.Value;
// Checkboxes - binary selections
var agreeCheckbox = pdf.Form.FindFormField("termsAccepted");
bool isChecked = agreeCheckbox.Value == "Yes";
// Radio buttons - single choice from group
var genderRadio = pdf.Form.FindFormField("gender");
string selectedGender = genderRadio.Value;
// Dropdown lists (ComboBox) - predefined options
var countryDropdown = pdf.Form.FindFormField("country");
string selectedCountry = countryDropdown.Value;
// Access all available options
var availableCountries = countryDropdown.Choices;
// Multi-line text areas
var commentsField = pdf.Form.FindFormField("comments_part1_513");
string userComments = commentsField.Value;
// Grab all fields that start with "interests_"
var interestFields = pdf.Form
    .Where(f => f.Name.StartsWith("interests_"));
// Collect checked interests
List<string> selectedInterests = new List<string>();
foreach (var field in interestFields)
{
    if (field.Value == "Yes")  // checkboxes are "Yes" if checked
    {
        // Extract the interest name from the field name
        string interestName = field.Name.Replace("interests_", "");
        selectedInterests.Add(interestName);
    }
}
using IronPdf;
using System.Collections.Generic;
using System.Linq;

PdfDocument pdf = PdfDocument.FromFile("complex_form.pdf");
// Text fields - standard input boxes
var nameField = pdf.Form.FindFormField("fullName");
string userName = nameField.Value;
// Checkboxes - binary selections
var agreeCheckbox = pdf.Form.FindFormField("termsAccepted");
bool isChecked = agreeCheckbox.Value == "Yes";
// Radio buttons - single choice from group
var genderRadio = pdf.Form.FindFormField("gender");
string selectedGender = genderRadio.Value;
// Dropdown lists (ComboBox) - predefined options
var countryDropdown = pdf.Form.FindFormField("country");
string selectedCountry = countryDropdown.Value;
// Access all available options
var availableCountries = countryDropdown.Choices;
// Multi-line text areas
var commentsField = pdf.Form.FindFormField("comments_part1_513");
string userComments = commentsField.Value;
// Grab all fields that start with "interests_"
var interestFields = pdf.Form
    .Where(f => f.Name.StartsWith("interests_"));
// Collect checked interests
List<string> selectedInterests = new List<string>();
foreach (var field in interestFields)
{
    if (field.Value == "Yes")  // checkboxes are "Yes" if checked
    {
        // Extract the interest name from the field name
        string interestName = field.Name.Replace("interests_", "");
        selectedInterests.Add(interestName);
    }
}
Imports IronPdf
Imports System.Collections.Generic
Imports System.Linq

Dim pdf As PdfDocument = PdfDocument.FromFile("complex_form.pdf")
' Text fields - standard input boxes
Dim nameField = pdf.Form.FindFormField("fullName")
Dim userName As String = nameField.Value
' Checkboxes - binary selections
Dim agreeCheckbox = pdf.Form.FindFormField("termsAccepted")
Dim isChecked As Boolean = agreeCheckbox.Value = "Yes"
' Radio buttons - single choice from group
Dim genderRadio = pdf.Form.FindFormField("gender")
Dim selectedGender As String = genderRadio.Value
' Dropdown lists (ComboBox) - predefined options
Dim countryDropdown = pdf.Form.FindFormField("country")
Dim selectedCountry As String = countryDropdown.Value
' Access all available options
Dim availableCountries = countryDropdown.Choices
' Multi-line text areas
Dim commentsField = pdf.Form.FindFormField("comments_part1_513")
Dim userComments As String = commentsField.Value
' Grab all fields that start with "interests_"
Dim interestFields = pdf.Form.Where(Function(f) f.Name.StartsWith("interests_"))
' Collect checked interests
Dim selectedInterests As New List(Of String)()
For Each field In interestFields
    If field.Value = "Yes" Then ' checkboxes are "Yes" if checked
        ' Extract the interest name from the field name
        Dim interestName As String = field.Name.Replace("interests_", "")
        selectedInterests.Add(interestName)
    End If
Next
$vbLabelText   $csharpLabel

FindFormField()方法允許按名稱直接存取特定字段,無需遍歷所有表單字段。 複選框選中時返回"是",單選按鈕返回所選值。 選擇欄位(例如下拉式清單和列錶框)透過Choices屬性提供欄位值和所有可用選項。 這套全面的方法使開發人員能夠存取和提取複雜互動式表單中的資料。 處理複雜表單時,請考慮使用IronPDF 的表單編輯功能,在擷取之前以程式設定填寫或修改欄位值。

在這裡,您可以看到 IronPDF 如何處理更複雜的表單,並從表單欄位值中提取資料:

螢幕截圖左側顯示一個 PDF 註冊表單,其中包含各種欄位類型(文字欄位、核取方塊、單選按鈕、下拉清單);右側顯示 Visual Studio 偵錯控制台,其中以程式設計方式顯示了提取的表單欄位資料。

如何處理多個調查表?

設想這樣一個場景:你需要處理數百份來自客戶調查的 PDF 表格。 以下程式碼示範如何使用 IronPDF 進行批次處理:

using IronPdf;
using System;
using System.Text;
using System.IO;
using System.Collections.Generic;

public class SurveyProcessor
{
    static void Main(string[] args)
    {
        ProcessSurveyBatch(@"C:\Surveys");
    }

    public static void ProcessSurveyBatch(string folderPath)
    {
        StringBuilder csvData = new StringBuilder();
        csvData.AppendLine("Date,Name,Email,Rating,Feedback");
        foreach (string pdfFile in Directory.GetFiles(folderPath, "*.pdf"))
        {
            try
            {
                PdfDocument survey = PdfDocument.FromFile(pdfFile);
                string date = survey.Form.FindFormField("surveyDate")?.Value ?? "";
                string name = survey.Form.FindFormField("customerName")?.Value ?? "";
                string email = survey.Form.FindFormField("email")?.Value ?? "";
                string rating = survey.Form.FindFormField("satisfaction")?.Value ?? "";
                string feedback = survey.Form.FindFormField("comments")?.Value ?? "";
                feedback = feedback.Replace("\n", " ").Replace("\"", "\"\"");
                csvData.AppendLine($"{date},{name},{email},{rating},\"{feedback}\"");
            }
            catch (Exception ex)
            {
                Console.WriteLine($"Error processing {pdfFile}: {ex.Message}");
            }
        }
        File.WriteAllText("survey_results.csv", csvData.ToString());
        Console.WriteLine("Survey processing complete!");
    }
}
using IronPdf;
using System;
using System.Text;
using System.IO;
using System.Collections.Generic;

public class SurveyProcessor
{
    static void Main(string[] args)
    {
        ProcessSurveyBatch(@"C:\Surveys");
    }

    public static void ProcessSurveyBatch(string folderPath)
    {
        StringBuilder csvData = new StringBuilder();
        csvData.AppendLine("Date,Name,Email,Rating,Feedback");
        foreach (string pdfFile in Directory.GetFiles(folderPath, "*.pdf"))
        {
            try
            {
                PdfDocument survey = PdfDocument.FromFile(pdfFile);
                string date = survey.Form.FindFormField("surveyDate")?.Value ?? "";
                string name = survey.Form.FindFormField("customerName")?.Value ?? "";
                string email = survey.Form.FindFormField("email")?.Value ?? "";
                string rating = survey.Form.FindFormField("satisfaction")?.Value ?? "";
                string feedback = survey.Form.FindFormField("comments")?.Value ?? "";
                feedback = feedback.Replace("\n", " ").Replace("\"", "\"\"");
                csvData.AppendLine($"{date},{name},{email},{rating},\"{feedback}\"");
            }
            catch (Exception ex)
            {
                Console.WriteLine($"Error processing {pdfFile}: {ex.Message}");
            }
        }
        File.WriteAllText("survey_results.csv", csvData.ToString());
        Console.WriteLine("Survey processing complete!");
    }
}
Imports IronPdf
Imports System
Imports System.Text
Imports System.IO
Imports System.Collections.Generic

Public Class SurveyProcessor
    Shared Sub Main(args As String())
        ProcessSurveyBatch("C:\Surveys")
    End Sub

    Public Shared Sub ProcessSurveyBatch(folderPath As String)
        Dim csvData As New StringBuilder()
        csvData.AppendLine("Date,Name,Email,Rating,Feedback")
        For Each pdfFile As String In Directory.GetFiles(folderPath, "*.pdf")
            Try
                Dim survey As PdfDocument = PdfDocument.FromFile(pdfFile)
                Dim [date] As String = If(survey.Form.FindFormField("surveyDate")?.Value, "")
                Dim name As String = If(survey.Form.FindFormField("customerName")?.Value, "")
                Dim email As String = If(survey.Form.FindFormField("email")?.Value, "")
                Dim rating As String = If(survey.Form.FindFormField("satisfaction")?.Value, "")
                Dim feedback As String = If(survey.Form.FindFormField("comments")?.Value, "")
                feedback = feedback.Replace(vbLf, " ").Replace("""", """""")
                csvData.AppendLine($"{[date]},{name},{email},{rating},""{feedback}""")
            Catch ex As Exception
                Console.WriteLine($"Error processing {pdfFile}: {ex.Message}")
            End Try
        Next
        File.WriteAllText("survey_results.csv", csvData.ToString())
        Console.WriteLine("Survey processing complete!")
    End Sub
End Class
$vbLabelText   $csharpLabel

此批次程序從目錄中讀取所有 PDF 調查表單,提取相關欄位數據,並將結果匯出到 CSV 檔案。空值合併運算子 ( ?? ) 為缺失欄位提供預設值,即使表單不完整也能確保可靠的資料擷取。 錯誤處理功能會在不中斷批次處理程序的情況下擷取有問題的 PDF 檔案。

如何建立可擴展的表單處理服務?

對於希望大規模部署表單處理的DevOps工程師來說,這裡有一個可用於生產環境的 API 服務,可以處理 PDF 表單提取:

using Microsoft.AspNetCore.Mvc;
using IronPdf;
using System.Collections.Concurrent;

[ApiController]
[Route("api/[controller]")]
public class FormProcessorController : ControllerBase
{
    private static readonly ConcurrentDictionary<string, ProcessingStatus> _processingJobs = new();

    [HttpPost("extract")]
    public async Task<IActionResult> ExtractFormData(IFormFile pdfFile)
    {
        if (pdfFile == null || pdfFile.Length == 0)
            return BadRequest("No file uploaded");

        var jobId = Guid.NewGuid().ToString();
        _processingJobs[jobId] = new ProcessingStatus { Status = "Processing" };

        // Process asynchronously to avoid blocking
        _ = Task.Run(async () =>
        {
            try
            {
                using var stream = new MemoryStream();
                await pdfFile.CopyToAsync(stream);
                var pdf = PdfDocument.FromStream(stream);

                var extractedData = new Dictionary<string, string>();
                foreach (var field in pdf.Form)
                {
                    extractedData[field.Name] = field.Value;
                }

                _processingJobs[jobId] = new ProcessingStatus 
                { 
                    Status = "Complete",
                    Data = extractedData
                };
            }
            catch (Exception ex)
            {
                _processingJobs[jobId] = new ProcessingStatus 
                { 
                    Status = "Error",
                    Error = ex.Message
                };
            }
        });

        return Accepted(new { jobId });
    }

    [HttpGet("status/{jobId}")]
    public IActionResult GetStatus(string jobId)
    {
        if (_processingJobs.TryGetValue(jobId, out var status))
            return Ok(status);
        return NotFound();
    }

    [HttpGet("health")]
    public IActionResult HealthCheck()
    {
        return Ok(new 
        { 
            status = "healthy",
            activeJobs = _processingJobs.Count(j => j.Value.Status == "Processing"),
            completedJobs = _processingJobs.Count(j => j.Value.Status == "Complete")
        });
    }
}

public class ProcessingStatus
{
    public string Status { get; set; }
    public Dictionary<string, string> Data { get; set; }
    public string Error { get; set; }
}
using Microsoft.AspNetCore.Mvc;
using IronPdf;
using System.Collections.Concurrent;

[ApiController]
[Route("api/[controller]")]
public class FormProcessorController : ControllerBase
{
    private static readonly ConcurrentDictionary<string, ProcessingStatus> _processingJobs = new();

    [HttpPost("extract")]
    public async Task<IActionResult> ExtractFormData(IFormFile pdfFile)
    {
        if (pdfFile == null || pdfFile.Length == 0)
            return BadRequest("No file uploaded");

        var jobId = Guid.NewGuid().ToString();
        _processingJobs[jobId] = new ProcessingStatus { Status = "Processing" };

        // Process asynchronously to avoid blocking
        _ = Task.Run(async () =>
        {
            try
            {
                using var stream = new MemoryStream();
                await pdfFile.CopyToAsync(stream);
                var pdf = PdfDocument.FromStream(stream);

                var extractedData = new Dictionary<string, string>();
                foreach (var field in pdf.Form)
                {
                    extractedData[field.Name] = field.Value;
                }

                _processingJobs[jobId] = new ProcessingStatus 
                { 
                    Status = "Complete",
                    Data = extractedData
                };
            }
            catch (Exception ex)
            {
                _processingJobs[jobId] = new ProcessingStatus 
                { 
                    Status = "Error",
                    Error = ex.Message
                };
            }
        });

        return Accepted(new { jobId });
    }

    [HttpGet("status/{jobId}")]
    public IActionResult GetStatus(string jobId)
    {
        if (_processingJobs.TryGetValue(jobId, out var status))
            return Ok(status);
        return NotFound();
    }

    [HttpGet("health")]
    public IActionResult HealthCheck()
    {
        return Ok(new 
        { 
            status = "healthy",
            activeJobs = _processingJobs.Count(j => j.Value.Status == "Processing"),
            completedJobs = _processingJobs.Count(j => j.Value.Status == "Complete")
        });
    }
}

public class ProcessingStatus
{
    public string Status { get; set; }
    public Dictionary<string, string> Data { get; set; }
    public string Error { get; set; }
}
Imports Microsoft.AspNetCore.Mvc
Imports IronPdf
Imports System.Collections.Concurrent
Imports System.IO
Imports System.Threading.Tasks

<ApiController>
<Route("api/[controller]")>
Public Class FormProcessorController
    Inherits ControllerBase

    Private Shared ReadOnly _processingJobs As New ConcurrentDictionary(Of String, ProcessingStatus)()

    <HttpPost("extract")>
    Public Async Function ExtractFormData(pdfFile As IFormFile) As Task(Of IActionResult)
        If pdfFile Is Nothing OrElse pdfFile.Length = 0 Then
            Return BadRequest("No file uploaded")
        End If

        Dim jobId = Guid.NewGuid().ToString()
        _processingJobs(jobId) = New ProcessingStatus With {.Status = "Processing"}

        ' Process asynchronously to avoid blocking
        _ = Task.Run(Async Function()
                         Try
                             Using stream As New MemoryStream()
                                 Await pdfFile.CopyToAsync(stream)
                                 Dim pdf = PdfDocument.FromStream(stream)

                                 Dim extractedData As New Dictionary(Of String, String)()
                                 For Each field In pdf.Form
                                     extractedData(field.Name) = field.Value
                                 Next

                                 _processingJobs(jobId) = New ProcessingStatus With {
                                     .Status = "Complete",
                                     .Data = extractedData
                                 }
                             End Using
                         Catch ex As Exception
                             _processingJobs(jobId) = New ProcessingStatus With {
                                 .Status = "Error",
                                 .Error = ex.Message
                             }
                         End Try
                     End Function)

        Return Accepted(New With {Key .jobId = jobId})
    End Function

    <HttpGet("status/{jobId}")>
    Public Function GetStatus(jobId As String) As IActionResult
        Dim status As ProcessingStatus = Nothing
        If _processingJobs.TryGetValue(jobId, status) Then
            Return Ok(status)
        End If
        Return NotFound()
    End Function

    <HttpGet("health")>
    Public Function HealthCheck() As IActionResult
        Return Ok(New With {
            Key .status = "healthy",
            Key .activeJobs = _processingJobs.Count(Function(j) j.Value.Status = "Processing"),
            Key .completedJobs = _processingJobs.Count(Function(j) j.Value.Status = "Complete")
        })
    End Function
End Class

Public Class ProcessingStatus
    Public Property Status As String
    Public Property Data As Dictionary(Of String, String)
    Public Property Error As String
End Class
$vbLabelText   $csharpLabel

此 API 服務提供非同步表單處理和作業追蹤功能,非常適合微服務架構。 /health端點使 Kubernetes 等容器編排器能夠監控服務健康狀況。 使用 Docker Compose 部署此服務:

version: '3.8'
services:
  form-processor:
    build: .
    ports:
      - "8080:80"
    environment:
      - ASPNETCORE_ENVIRONMENT=Production
      - IRONPDF_LICENSE_KEY=${IRONPDF_LICENSE_KEY}
    healthcheck:
      test: ["CMD", "curl", "-f", "___PROTECTED_URL_7___"]
      interval: 30s
      timeout: 10s
      retries: 3
    deploy:
      resources:
        limits:
          cpus: '2'
          memory: 2G
        reservations:
          cpus: '1'
          memory: 1G
version: '3.8'
services:
  form-processor:
    build: .
    ports:
      - "8080:80"
    environment:
      - ASPNETCORE_ENVIRONMENT=Production
      - IRONPDF_LICENSE_KEY=${IRONPDF_LICENSE_KEY}
    healthcheck:
      test: ["CMD", "curl", "-f", "___PROTECTED_URL_7___"]
      interval: 30s
      timeout: 10s
      retries: 3
    deploy:
      resources:
        limits:
          cpus: '2'
          memory: 2G
        reservations:
          cpus: '1'
          memory: 1G
YAML

那麼,效能和資源優化方面呢?

處理大量 PDF 表單時,資源最佳化至關重要。 IronPDF 提供了多種策略來最大限度地提高吞吐量:

using IronPdf;
using System.Threading.Tasks.Dataflow;

public class HighPerformanceFormProcessor
{
    public static async Task ProcessFormsInParallel(string[] pdfPaths)
    {
        // Configure parallelism based on available CPU cores
        var processorCount = Environment.ProcessorCount;
        var actionBlock = new ActionBlock<string>(
            async pdfPath => await ProcessSingleForm(pdfPath),
            new ExecutionDataflowBlockOptions
            {
                MaxDegreeOfParallelism = processorCount,
                BoundedCapacity = processorCount * 2 // Prevent memory overflow
            });

        // Feed PDFs to the processing pipeline
        foreach (var path in pdfPaths)
        {
            await actionBlock.SendAsync(path);
        }

        actionBlock.Complete();
        await actionBlock.Completion;
    }

    private static async Task ProcessSingleForm(string pdfPath)
    {
        try
        {
            // Use async file reading to avoid blocking I/O
            using var fileStream = new FileStream(pdfPath, FileMode.Open, FileAccess.Read, FileShare.Read, 4096, true);
            var pdf = PdfDocument.FromStream(fileStream);

            // Process form fields
            var results = new Dictionary<string, string>();
            foreach (var field in pdf.Form)
            {
                results[field.Name] = field.Value;
            }

            // Store results (implement your storage logic)
            await StoreResults(Path.GetFileName(pdfPath), results);
        }
        catch (Exception ex)
        {
            // Log error (implement your logging)
            Console.WriteLine($"Error processing {pdfPath}: {ex.Message}");
        }
    }

    private static async Task StoreResults(string fileName, Dictionary<string, string> data)
    {
        // Implement your storage logic (database, file system, cloud storage)
        await Task.CompletedTask; // Placeholder
    }
}
using IronPdf;
using System.Threading.Tasks.Dataflow;

public class HighPerformanceFormProcessor
{
    public static async Task ProcessFormsInParallel(string[] pdfPaths)
    {
        // Configure parallelism based on available CPU cores
        var processorCount = Environment.ProcessorCount;
        var actionBlock = new ActionBlock<string>(
            async pdfPath => await ProcessSingleForm(pdfPath),
            new ExecutionDataflowBlockOptions
            {
                MaxDegreeOfParallelism = processorCount,
                BoundedCapacity = processorCount * 2 // Prevent memory overflow
            });

        // Feed PDFs to the processing pipeline
        foreach (var path in pdfPaths)
        {
            await actionBlock.SendAsync(path);
        }

        actionBlock.Complete();
        await actionBlock.Completion;
    }

    private static async Task ProcessSingleForm(string pdfPath)
    {
        try
        {
            // Use async file reading to avoid blocking I/O
            using var fileStream = new FileStream(pdfPath, FileMode.Open, FileAccess.Read, FileShare.Read, 4096, true);
            var pdf = PdfDocument.FromStream(fileStream);

            // Process form fields
            var results = new Dictionary<string, string>();
            foreach (var field in pdf.Form)
            {
                results[field.Name] = field.Value;
            }

            // Store results (implement your storage logic)
            await StoreResults(Path.GetFileName(pdfPath), results);
        }
        catch (Exception ex)
        {
            // Log error (implement your logging)
            Console.WriteLine($"Error processing {pdfPath}: {ex.Message}");
        }
    }

    private static async Task StoreResults(string fileName, Dictionary<string, string> data)
    {
        // Implement your storage logic (database, file system, cloud storage)
        await Task.CompletedTask; // Placeholder
    }
}
Imports IronPdf
Imports System.Threading.Tasks.Dataflow
Imports System.IO

Public Class HighPerformanceFormProcessor
    Public Shared Async Function ProcessFormsInParallel(pdfPaths As String()) As Task
        ' Configure parallelism based on available CPU cores
        Dim processorCount = Environment.ProcessorCount
        Dim actionBlock = New ActionBlock(Of String)(
            Async Function(pdfPath) Await ProcessSingleForm(pdfPath),
            New ExecutionDataflowBlockOptions With {
                .MaxDegreeOfParallelism = processorCount,
                .BoundedCapacity = processorCount * 2 ' Prevent memory overflow
            })

        ' Feed PDFs to the processing pipeline
        For Each path In pdfPaths
            Await actionBlock.SendAsync(path)
        Next

        actionBlock.Complete()
        Await actionBlock.Completion
    End Function

    Private Shared Async Function ProcessSingleForm(pdfPath As String) As Task
        Try
            ' Use async file reading to avoid blocking I/O
            Using fileStream As New FileStream(pdfPath, FileMode.Open, FileAccess.Read, FileShare.Read, 4096, True)
                Dim pdf = PdfDocument.FromStream(fileStream)

                ' Process form fields
                Dim results = New Dictionary(Of String, String)()
                For Each field In pdf.Form
                    results(field.Name) = field.Value
                Next

                ' Store results (implement your storage logic)
                Await StoreResults(Path.GetFileName(pdfPath), results)
            End Using
        Catch ex As Exception
            ' Log error (implement your logging)
            Console.WriteLine($"Error processing {pdfPath}: {ex.Message}")
        End Try
    End Function

    Private Shared Async Function StoreResults(fileName As String, data As Dictionary(Of String, String)) As Task
        ' Implement your storage logic (database, file system, cloud storage)
        Await Task.CompletedTask ' Placeholder
    End Function
End Class
$vbLabelText   $csharpLabel

此實作利用 TPL 資料流創建有界處理管道,防止記憶體耗盡,同時最大限度地利用 CPU。 BoundedCapacity設定可確保管道不會同時將過多的 PDF 檔案載入到記憶體中,這對於記憶體受限的容器化環境至關重要。

如何監控生產環境中的表單處理流程?

對於生產環境部署,全面的監控可確保表單處理的可靠性。 使用流行的可觀測性工具整合應用程式指標:

using Prometheus;
using System.Diagnostics;

public class MonitoredFormProcessor
{
    private static readonly Counter ProcessedFormsCounter = Metrics
        .CreateCounter("pdf_forms_processed_total", "Total number of processed PDF forms");

    private static readonly Histogram ProcessingDuration = Metrics
        .CreateHistogram("pdf_form_processing_duration_seconds", "Processing duration in seconds");

    private static readonly Gauge ActiveProcessingGauge = Metrics
        .CreateGauge("pdf_forms_active_processing", "Number of forms currently being processed");

    public async Task<FormExtractionResult> ProcessFormWithMetrics(string pdfPath)
    {
        using (ProcessingDuration.NewTimer())
        {
            ActiveProcessingGauge.Inc();
            try
            {
                var pdf = PdfDocument.FromFile(pdfPath);
                var result = new FormExtractionResult
                {
                    FieldCount = pdf.Form.Count(),
                    Fields = new Dictionary<string, string>()
                };

                foreach (var field in pdf.Form)
                {
                    result.Fields[field.Name] = field.Value;
                }

                ProcessedFormsCounter.Inc();
                return result;
            }
            finally
            {
                ActiveProcessingGauge.Dec();
            }
        }
    }
}

public class FormExtractionResult
{
    public int FieldCount { get; set; }
    public Dictionary<string, string> Fields { get; set; }
}
using Prometheus;
using System.Diagnostics;

public class MonitoredFormProcessor
{
    private static readonly Counter ProcessedFormsCounter = Metrics
        .CreateCounter("pdf_forms_processed_total", "Total number of processed PDF forms");

    private static readonly Histogram ProcessingDuration = Metrics
        .CreateHistogram("pdf_form_processing_duration_seconds", "Processing duration in seconds");

    private static readonly Gauge ActiveProcessingGauge = Metrics
        .CreateGauge("pdf_forms_active_processing", "Number of forms currently being processed");

    public async Task<FormExtractionResult> ProcessFormWithMetrics(string pdfPath)
    {
        using (ProcessingDuration.NewTimer())
        {
            ActiveProcessingGauge.Inc();
            try
            {
                var pdf = PdfDocument.FromFile(pdfPath);
                var result = new FormExtractionResult
                {
                    FieldCount = pdf.Form.Count(),
                    Fields = new Dictionary<string, string>()
                };

                foreach (var field in pdf.Form)
                {
                    result.Fields[field.Name] = field.Value;
                }

                ProcessedFormsCounter.Inc();
                return result;
            }
            finally
            {
                ActiveProcessingGauge.Dec();
            }
        }
    }
}

public class FormExtractionResult
{
    public int FieldCount { get; set; }
    public Dictionary<string, string> Fields { get; set; }
}
Imports Prometheus
Imports System.Diagnostics

Public Class MonitoredFormProcessor
    Private Shared ReadOnly ProcessedFormsCounter As Counter = Metrics.CreateCounter("pdf_forms_processed_total", "Total number of processed PDF forms")

    Private Shared ReadOnly ProcessingDuration As Histogram = Metrics.CreateHistogram("pdf_form_processing_duration_seconds", "Processing duration in seconds")

    Private Shared ReadOnly ActiveProcessingGauge As Gauge = Metrics.CreateGauge("pdf_forms_active_processing", "Number of forms currently being processed")

    Public Async Function ProcessFormWithMetrics(pdfPath As String) As Task(Of FormExtractionResult)
        Using ProcessingDuration.NewTimer()
            ActiveProcessingGauge.Inc()
            Try
                Dim pdf = PdfDocument.FromFile(pdfPath)
                Dim result As New FormExtractionResult With {
                    .FieldCount = pdf.Form.Count(),
                    .Fields = New Dictionary(Of String, String)()
                }

                For Each field In pdf.Form
                    result.Fields(field.Name) = field.Value
                Next

                ProcessedFormsCounter.Inc()
                Return result
            Finally
                ActiveProcessingGauge.Dec()
            End Try
        End Using
    End Function
End Class

Public Class FormExtractionResult
    Public Property FieldCount As Integer
    Public Property Fields As Dictionary(Of String, String)
End Class
$vbLabelText   $csharpLabel

這些 Prometheus 指標與 Grafana 儀表板無縫集成,可即時顯示表單處理效能。 配置警報規則,以便在處理時間超過閾值或錯誤率飆升時發出通知。

結論

IronPDF 簡化了 C# 中的 PDF 表單資料擷取,將複雜的文件處理轉換為簡單的程式碼。 從基本的欄位讀取到企業級批次處理,該程式庫能夠有效率地處理各種表單類型。 對於DevOps團隊而言,IronPDF 的容器友善架構和極少的依賴項使其能夠在雲端平台上順利部署。 提供的範例展示了真實場景的實際應用,從簡單的控制台應用程式到具有監控功能的可擴展微服務。

無論您是自動化調查處理、將紙本表格數位化,或是建立文件管理系統,IronPDF 都能提供可靠擷取表單資料的工具。 其跨平台支援確保您的表單處理服務在開發、測試和生產環境中一致運作。

常見問題解答

IronPDF 如何幫助在 C# 中讀取 PDF 表單欄位?

IronPDF 提供了一個簡化的流程,可以用 C# 從可填寫的 PDF 中抽取表單欄位資料,相較於手動抽取資料,可大幅減少所需的時間與工作量。

使用 IronPDF 可以提取哪些類型的 PDF 表單欄位?

使用 IronPDF,您可以從可填寫的 PDF 中提取各種表單欄位,包括文字輸入、複選框、下拉選項等。

為什麼自動化 PDF 表單資料擷取是有益的?

使用 IronPDF 自動化 PDF 表單資料擷取可節省時間、減少錯誤,並透過消除手動資料輸入的需求來提升生產力。

IronPDF 適合處理大量 PDF 表單嗎?

是的,IronPDF 旨在高效處理大量 PDF 表單,使其成為處理求職申請、調查和其他大量文件任務的理想選擇。

與手動輸入資料相比,使用 IronPdf 有哪些優勢?

IronPdf 可減少人為錯誤、加快資料擷取過程,並讓開發人員專注於更複雜的工作,而非瑣碎的資料輸入。

IronPdf 可以處理不同的 PDF 格式嗎?

IronPDF 能夠處理各種 PDF 格式,確保多樣性以及與各種文件和表格設計的相容性。

IronPDF 如何提高資料擷取的精確度?

透過自動化的擷取程序,IronPDF 可將手動輸入資料時常發生的人為錯誤風險降至最低,進而提高準確性。

IronPDF 使用何種程式語言?

IronPDF 是專為與 C# 搭配使用而設計,為開發人員提供功能強大的工具,以便在 .NET 應用程式中操作 PDF 文件並從中擷取資料。

Kannaopat Udonpant
軟體工程師
在成為軟體工程師之前,Kannapat 完成了日本北海道大學的環境資源博士學位。在攻讀學位期間,Kannapat 也成為生物製造工程系車輛機器人實驗室的成員。2022 年,他利用自己的 C# 技能加入 Iron Software 的工程團隊,主要負責 IronPDF 的開發。Kannapat 非常重視他的工作,因為他可以直接向撰寫 IronPDF 使用的大部分程式碼的開發者學習。除了同儕學習之外,Kannapat 也很享受在 Iron Software 工作的社交生活。不寫程式碼或文件時,Kannapat 通常會用 PS5 玩遊戲或重看《最後的我們》。