如何使用 IronSecureDoc 重制 PDF 上的文本

查克尼特·宾

2024年十月20日

更新 2024年十二月17日

Translated

View the article in English

在本文中，我们将讨论使用 IronSecureDoc 在 PDF 上编辑文本。这样，通过向运行中的 IronSecureDoc 服务器发出简单的 POST 请求并附上 PDF，服务或流程就可以快速、轻松地编辑敏感信息。我们将通过使用 Swagger 文档直观地展示这一点。 POST 请求同时接收必填参数和可选参数，并且高度可定制；回复将返回包含编辑文本的 PDF。

如何使用 IronSecureDoc 重制 PDF 上的文本

获取并启动IronSecureDoc Docker映像
使用 Swagger 测试 API
指定要编辑的文本
根据提供的详细信息执行 API 调用
导出经过编辑的 PDF 文档

拉动并启动 IronSecureDoc

如果您还没有运行 IronSecureDoc，请按照以下链接进行设置：

Host Locally	Deploy to Cloud
Hosting on Windows Hosting on Mac Hosting on Linux	Deploy on Azure Container Deploy on AWS Container

POST] Redact Text API

[POST] Redact Text API 端点允许您使用修订来隐藏 PDF 文档中的敏感文本。该功能对于处理机密文件（如法律合同、医疗记录或财务报告）的应用程序至关重要。通过使用此 API，您可以确保永久删除特定文本，从而提高安全性并确保符合数据保护标准。

请注意

一旦文本被编辑，内容将无法恢复。

Swagger

Swagger 是一款功能强大的工具，可帮助开发人员通过用户友好的 Web 界面与 RESTful API 进行交互。无论您使用的是 Python、Java 还是其他语言，Swagger 都能为您提供测试和实现该 API 的便捷方法。

使用 Swagger 重制文本的步骤

访问 Swagger UI：
如果您的 API 服务器正在本地运行，您可以通过在网络浏览器中导航到 http://localhost:8080/swagger/index.html 来访问 Swagger。
查找 [POST] 涂黑文本 API：
在 Swagger UI 中，找到 [POST] /v1/document-services/pdfs/redact-text 端点。
指定配置：
在这个例子中，我在 POST 请求中提供了 PDF 文件和要编辑的单词。我们将编辑 "我们 "一词，并在其上覆盖一个黑框。在此演示中，我们将使用 'sample.pdf' 文件，并配以下配置：
- draw_black_box: true
- match_whole_word: true
- words_too_redact: 我们
上传示例 PDF：
在请求正文中，上传您要应用编辑的 PDF 样本文件。确保将文件添加为pdf_file。
执行请求：
单击 "执行 "运行请求。响应将包括经过编辑的 PDF。这种 Swagger UI 交互方式可让您轻松测试节录过程，并提供即时反馈。

通过命令提示符使用 CURL 请求

另外，我们也可以使用命令提示符和 curl POST 请求来达到同样的效果。

curl -X POST 'http://localhost:8080/v1/document-services/pdfs/redact-text' \
 -H 'accept: */*' \
 -H 'Content-Type: multipart/form-data' \
 -F 'pdf_file=@sample.pdf;type=application/pdf' \
 -F 'words_to_redact="we"' \
 -F 'draw_black_box=true' \
 -F 'match_whole_word=true'

curl -X POST 'http://localhost:8080/v1/document-services/pdfs/redact-text' \
 -H 'accept: */*' \
 -H 'Content-Type: multipart/form-data' \
 -F 'pdf_file=@sample.pdf;type=application/pdf' \
 -F 'words_to_redact="we"' \
 -F 'draw_black_box=true' \
 -F 'match_whole_word=true'

SHELL

（默认情况下，PowerShell 可能将 curl 解释为 Invoke-WebRequest 的别名，这是一个内置的 PowerShell cmdlet。）尝试使用 curl.exe 代替 curl。

curl.exe --version

curl.exe --version

SHELL

)}]

必需的请求主体参数

名称

可选请求正文参数

Name	Data Type	Description
user_password	string	This is required if the input PDF has a user password. The operation will fail if no password is provided for the password-protected PDF.
owner_password	string	This is required if the input PDF has an owner password. The operation will fail if no password is provided for the password-protected PDF.
specific_pages	array[int]	Allows you to specify which pages to redact text on. By default, the value is null, meaning the provided word in all the pages will be redacted.
draw_black_box	boolean	Allows you to specify whether to draw a black box over the redacted text. By default, this value is set to True.
match_whole_word	boolean	Specifies whether partial matches within words should also be redacted. For example, if the provided word is "are," any words containing "are," such as "hare," will have the "are" redacted as well. By default, this is set to True.
match_case	boolean	Specifies whether the provided word should be an exact match in terms of case. By default, this value is null. Note: Setting this to True means that lowercase and uppercase strings will not be matched. For example, if the provided word is "WE," the lowercase version "we" would not be redacted.
overlay_text	string	It specifies the overlay text, such as words or symbols, over the redacted text. By default, this string is empty.
save_as_pdfa	boolean	Saves the modified PDF with PDF/A-3 compliance. By default, this is set to False.
save_as_pdfua	boolean	Saves the modified PDF with PDF/UA compliance. By default, this is set to False.

可选标题参数

Name	Data Type	Description
author	string	Useful for identifying you as the author of the PDF document. By default, this field is empty.
title	string	Displays the title of the PDF document. By default, this field is empty.
subject	string	Useful for identifying the content of the PDF document at a glance. By default, this field is empty.