How to Redact Text on PDF with IronSecureDoc

This article was translated from English: Does it need improvement?
Translated
View the article in English

In this article, we will discuss redacting text on a PDF using IronSecureDoc. This allows the service or process to quickly and easily redact sensitive information by making a simple POST request with the PDF to the running IronSecureDoc server. We will demonstrate this visually through the use of Swagger docs. The POST request takes in both required and optional parameters and is highly customizable; the response returns the PDF with the redacted text.

Pull and Start IronSecureDoc

If you don't have IronSecureDoc running yet, please follow the links below to get it set up:

Host Locally Deploy to Cloud

The [POST] Redact Text API

The [POST] Redact Text API endpoint allows you to hide sensitive text within a PDF document using redaction. This functionality is essential for applications that handle confidential documents, such as legal contracts, medical records, or financial reports. Using this API ensures that specific text is permanently removed, providing enhanced security and ensuring compliance with data protection standards.

请注意Once a text is redacted, the content cannot be recovered.

Swagger

Swagger is a powerful tool that enables developers to interact with RESTful APIs through a user-friendly web interface. Whether you're using languages like Python, Java, or others, Swagger offers a convenient way to test and implement this API.

Steps to Redact Text with Swagger

  1. Access the Swagger UI:

    If your API server is running locally, you can access Swagger by navigating to http://localhost:8080/swagger/index.html in your web browser.

    Swagger docs

  2. Locate the [POST] Redact Text API:

    Within the Swagger UI, find the [POST] /v1/document-services/pdfs/redact-text endpoint.

    Redact text

  3. Specify Configurations:

    In this example, I am providing both the PDF file and the words to redact in the POST request. We will redact the word "we" and overlay a black box on it. For this demonstration, we will use the 'sample.pdf' file with the following configurations:

    • draw_black_box: true
    • match_whole_word: true
    • words_to_redact: we
  4. Upload a Sample PDF:

    In the request body, upload a sample PDF file where you want to apply the redaction. Ensure that the file is added as pdf_file.

  5. Execute the Request:

    Click "Execute" to run the request. The response will include the redacted PDF. This Swagger UI interaction allows you to easily test the redaction process, providing immediate feedback.


Use CURL Request through Command Prompt

Alternatively, we can use the Command Prompt with a curl POST request to achieve the same result.

curl -X POST 'http://localhost:8080/v1/document-services/pdfs/redact-text' \
 -H 'accept: */*' \
 -H 'Content-Type: multipart/form-data' \
 -F 'pdf_file=@sample.pdf;type=application/pdf' \
 -F 'words_to_redact="we"' \
 -F 'draw_black_box=true' \
 -F 'match_whole_word=true'
curl -X POST 'http://localhost:8080/v1/document-services/pdfs/redact-text' \
 -H 'accept: */*' \
 -H 'Content-Type: multipart/form-data' \
 -F 'pdf_file=@sample.pdf;type=application/pdf' \
 -F 'words_to_redact="we"' \
 -F 'draw_black_box=true' \
 -F 'match_whole_word=true'
SHELL

请注意 By default, PowerShell may interpret curl as an alias for Invoke-WebRequest, a built-in PowerShell cmdlet. Try using curl.exe instead of curl.

curl.exe --version
curl.exe --version
SHELL

Required Request Body Parameters

Name Data Type Description
pdf_file application/pdf The PDF file you want to manipulate.
words_to_redact array[string] This parameter takes a list of words and redacts the text matching the input.

Optional Request Body Parameters

Name Data Type Description
user_password string This is required if the input PDF has a user password. The operation will fail if no password is provided for the password-protected PDF.
owner_password string This is required if the input PDF has an owner password. The operation will fail if no password is provided for the password-protected PDF.
specific_pages array[int] Allows you to specify which pages to redact text on. By default, the value is null, meaning the provided word in all the pages will be redacted.
draw_black_box boolean Allows you to specify whether to draw a black box over the redacted text. By default, this value is set to True.
match_whole_word boolean Specifies whether partial matches within words should also be redacted. For example, if the provided word is "are," any words containing "are," such as "hare," will have the "are" redacted as well. By default, this is set to True.
match_case boolean Specifies whether the provided word should be an exact match in terms of case. By default, this value is null. Note: Setting this to True means that lowercase and uppercase strings will not be matched. For example, if the provided word is "WE," the lowercase version "we" would not be redacted.
overlay_text string It specifies the overlay text, such as words or symbols, over the redacted text. By default, this string is empty.
save_as_pdfa boolean Saves the modified PDF with PDF/A-3 compliance. By default, this is set to False.
save_as_pdfua boolean Saves the modified PDF with PDF/UA compliance. By default, this is set to False.

Optional Header Parameters

Name Data Type Description
author string Useful for identifying you as the author of the PDF document. By default, this field is empty.
title string Displays the title of the PDF document. By default, this field is empty.
subject string Useful for identifying the content of the PDF document at a glance. By default, this field is empty.

常见问题解答

如何使用 POST 请求在 PDF 中编辑文本?

您可以通过向 IronSecureDoc 服务器发送包含 PDF 文件和要编辑的单词的 POST 请求来编辑 PDF 中的文本。服务器处理请求并返回带有编辑文本的 PDF。

使用 IronSecureDoc API 编辑 PDF 的步骤是什么?

要使用 IronSecureDoc API 编辑 PDF,您应首先拉取并启动 IronSecureDoc Docker 映像,使用 Swagger 测试 API,指定要编辑的文本,执行 API 调用,最后导出编辑后的 PDF 文档。

如何在投入使用前测试 IronSecureDoc API?

您可以通过访问 Swagger UI 使用 Swagger 测试 IronSecureDoc API,这允许您使用提供的端点模拟编辑过程。

在 PDF 编辑请求中可以自定义哪些参数?

在 PDF 编辑请求中,您可以自定义参数,如 user_password, owner_password, specific_pages, draw_black_box, match_whole_word, match_case, overlay_text, save_as_pdfa, 和 save_as_pdfua 以进行进一步自定义。

如何使用 curl 执行 PDF 编辑请求?

要使用 curl 执行 PDF 编辑请求,您可以使用 curl POST 请求命令,在命令提示符中指定必要的参数和文件路径。

如果我的 PDF 在编辑过程中受密码保护,我该怎么办?

如果您的 PDF 是受密码保护的,您需要在可选参数中包含 user_password 或 owner_password,以确保编辑过程可以访问并修改文档。

'draw_black_box' 参数在文本编辑中有什么作用?

'draw_black_box' 参数指定是否用黑色框覆盖编辑的文本。此选项对于可视化编辑区域很有用,并且默认情况下已启用。

如何本地托管 IronSecureDoc 用于编辑目的?

您可以通过遵循针对 Windows、Mac 或 Linux 等各种操作系统提供的教程,在本地托管 IronSecureDoc,从而允许您在本地服务器上管理编辑过程。

是否可以在 PDF 中编辑特定页面?

可以,您可以通过使用 'specific_pages' 参数指定要编辑的页面,从而将编辑目标定位于文档的特定区域。

我可以在 PDF 的编辑区域上覆盖文本吗?

可以,您可以通过使用 'overlay_text' 参数在编辑区域上覆盖文本,允许您用自定义消息或占位符替换编辑的文本。

IronSecureDoc 与 .NET 10 及其客户端库兼容吗?

是的,IronSecureDoc 通过 NuGet 包 IronSoftware.SecureDoc.Client 提供了一个 .NET 客户端,它除了兼容 .NET 6-9 等早期版本外,还兼容 .NET 10。这确保您可以将编辑功能和相关 API 无缝集成到 .NET 10 应用程序中。

Curtis Chau
技术作家

Curtis Chau 拥有卡尔顿大学的计算机科学学士学位,专注于前端开发,精通 Node.js、TypeScript、JavaScript 和 React。他热衷于打造直观且美观的用户界面,喜欢使用现代框架并创建结构良好、视觉吸引力强的手册。

除了开发之外,Curtis 对物联网 (IoT) 有浓厚的兴趣,探索将硬件和软件集成的新方法。在空闲时间,他喜欢玩游戏和构建 Discord 机器人,将他对技术的热爱与创造力相结合。

准备开始了吗?
版本: 2024.10 刚刚发布