How to Redact Text on PDF with IronSecureDoc

This article was translated from English: Does it need improvement?
Translated
View the article in English

In this article, we will discuss redacting text on a PDF using IronSecureDoc. This allows the service or process to quickly and easily redact sensitive information by making a simple POST request with the PDF to the running IronSecureDoc server. We will demonstrate this visually through the use of Swagger docs. The POST request takes in both required and optional parameters and is highly customizable; the response returns the PDF with the redacted text.

Pull and Start IronSecureDoc

If you don't have IronSecureDoc running yet, please follow the links below to get it set up:

Host Locally Deploy to Cloud

The [POST] Redact Text API

The [POST] Redact Text API endpoint allows you to hide sensitive text within a PDF document using redaction. This functionality is essential for applications that handle confidential documents, such as legal contracts, medical records, or financial reports. Using this API ensures that specific text is permanently removed, providing enhanced security and ensuring compliance with data protection standards.

請注意Once a text is redacted, the content cannot be recovered.

Swagger

Swagger is a powerful tool that enables developers to interact with RESTful APIs through a user-friendly web interface. Whether you're using languages like Python, Java, or others, Swagger offers a convenient way to test and implement this API.

Steps to Redact Text with Swagger

  1. Access the Swagger UI:

    If your API server is running locally, you can access Swagger by navigating to http://localhost:8080/swagger/index.html in your web browser.

    Swagger docs

  2. Locate the [POST] Redact Text API:

    Within the Swagger UI, find the [POST] /v1/document-services/pdfs/redact-text endpoint.

    Redact text

  3. Specify Configurations:

    In this example, I am providing both the PDF file and the words to redact in the POST request. We will redact the word "we" and overlay a black box on it. For this demonstration, we will use the 'sample.pdf' file with the following configurations:

    • draw_black_box: true
    • match_whole_word: true
    • words_to_redact: we
  4. Upload a Sample PDF:

    In the request body, upload a sample PDF file where you want to apply the redaction. Ensure that the file is added as pdf_file.

  5. Execute the Request:

    Click "Execute" to run the request. The response will include the redacted PDF. This Swagger UI interaction allows you to easily test the redaction process, providing immediate feedback.


Use CURL Request through Command Prompt

Alternatively, we can use the Command Prompt with a curl POST request to achieve the same result.

curl -X POST 'http://localhost:8080/v1/document-services/pdfs/redact-text' \
 -H 'accept: */*' \
 -H 'Content-Type: multipart/form-data' \
 -F 'pdf_file=@sample.pdf;type=application/pdf' \
 -F 'words_to_redact="we"' \
 -F 'draw_black_box=true' \
 -F 'match_whole_word=true'
curl -X POST 'http://localhost:8080/v1/document-services/pdfs/redact-text' \
 -H 'accept: */*' \
 -H 'Content-Type: multipart/form-data' \
 -F 'pdf_file=@sample.pdf;type=application/pdf' \
 -F 'words_to_redact="we"' \
 -F 'draw_black_box=true' \
 -F 'match_whole_word=true'
SHELL

請注意 By default, PowerShell may interpret curl as an alias for Invoke-WebRequest, a built-in PowerShell cmdlet. Try using curl.exe instead of curl.

curl.exe --version
curl.exe --version
SHELL

Required Request Body Parameters

Name Data Type Description
pdf_file application/pdf The PDF file you want to manipulate.
words_to_redact array[string] This parameter takes a list of words and redacts the text matching the input.

Optional Request Body Parameters

Name Data Type Description
user_password string This is required if the input PDF has a user password. The operation will fail if no password is provided for the password-protected PDF.
owner_password string This is required if the input PDF has an owner password. The operation will fail if no password is provided for the password-protected PDF.
specific_pages array[int] Allows you to specify which pages to redact text on. By default, the value is null, meaning the provided word in all the pages will be redacted.
draw_black_box boolean Allows you to specify whether to draw a black box over the redacted text. By default, this value is set to True.
match_whole_word boolean Specifies whether partial matches within words should also be redacted. For example, if the provided word is "are," any words containing "are," such as "hare," will have the "are" redacted as well. By default, this is set to True.
match_case boolean Specifies whether the provided word should be an exact match in terms of case. By default, this value is null. Note: Setting this to True means that lowercase and uppercase strings will not be matched. For example, if the provided word is "WE," the lowercase version "we" would not be redacted.
overlay_text string It specifies the overlay text, such as words or symbols, over the redacted text. By default, this string is empty.
save_as_pdfa boolean Saves the modified PDF with PDF/A-3 compliance. By default, this is set to False.
save_as_pdfua boolean Saves the modified PDF with PDF/UA compliance. By default, this is set to False.

Optional Header Parameters

Name Data Type Description
author string Useful for identifying you as the author of the PDF document. By default, this field is empty.
title string Displays the title of the PDF document. By default, this field is empty.
subject string Useful for identifying the content of the PDF document at a glance. By default, this field is empty.

常見問題解答

如何使用 POST 請求對 PDF 中的文字進行編輯?

您可以透過向 IronSecureDoc 伺服器發送 POST 請求來編輯 PDF 文件中的文本,請求中需包含 PDF 文件以及您想要編輯的文字。伺服器處理該請求後會傳回已編輯文字的 PDF 檔案。

使用 IronSecureDoc API 進行 PDF 資料脫敏的步驟是什麼?

要使用 IronSecureDoc API 進行 PDF 編輯,您應該先拉取並啟動 IronSecureDoc Docker 映像,使用 Swagger 測試 API,指定要編輯的文本,執行 API 調用,最後匯出編輯後的 PDF 文件。

在生產環境中使用 IronSecureDoc API 之前,如何對其進行測試?

您可以使用 Swagger 透過存取 Swagger UI 來測試 IronSecureDoc API,該 UI 允許您使用提供的端點來模擬編輯過程。

PDF 編輯請求中可以自訂哪些參數?

在 PDF 編輯請求中,您可以自訂 user_password、owner_password、specific_pages、draw_black_box、match_whole_word、match_case、overlay_text、save_as_pdfa 和 save_as_pdfua 等參數以進行進一步的自訂。

如何使用 curl 執行 PDF 編輯請求?

若要使用 curl 執行 PDF 編輯請求,您可以使用 curl POST 請求命令,並在命令提示字元中指定必要的參數和檔案路徑。

如果我在編輯過程中PDF檔案設定了密碼保護,該怎麼辦?

如果您的 PDF 檔案受密碼保護,則需要在選用參數中包含 user_password 或 owner_password,以確保編輯過程可以存取和修改文件。

文字編輯中「draw_black_box」參數的用途是什麼?

“draw_black_box”參數指定是否以黑框覆寫已編輯的文字。此選項有助於視覺化已編輯區域,並且預設啟用。

如何將 IronSecureDoc 部署到本地以用於資料脫敏?

您可以依照針對 Windows、Mac 或 Linux 等各種作業系統提供的教學課程,在本機上託管 IronSecureDoc,從而在本機伺服器上管理編輯流程。

是否可以編輯PDF文件中的特定頁面?

是的,您可以使用「specific_pages」參數指定要編輯的頁面,該參數可讓您針對文件的特定區域進行編輯。

我可以在PDF檔案中塗黑的區域上疊加文字嗎?

是的,您可以使用“overlay_text”參數在已編輯區域上疊加文本,該參數允許您將已編輯的文本替換為自訂訊息或占位符。

IronSecureDoc 與 .NET 10 及其客戶端程式庫相容嗎?

是的,IronSecureDoc 透過 NuGet 套件 IronSoftware.SecureDoc.Client 提供了一個 .NET 用戶端,它除了相容於 .NET 6-9 等早期版本外,還相容於 .NET 10。這可確保您可以將編輯功能和相關 API 無縫整合到 .NET 10 應用程式中。

Curtis Chau
技術作家

Curtis Chau 擁有卡爾頓大學計算機科學學士學位,專注於前端開發,擅長於 Node.js、TypeScript、JavaScript 和 React。Curtis 熱衷於創建直觀且美觀的用戶界面,喜歡使用現代框架並打造結構良好、視覺吸引人的手冊。

除了開發之外,Curtis 對物聯網 (IoT) 有著濃厚的興趣,探索將硬體和軟體結合的創新方式。在閒暇時間,他喜愛遊戲並構建 Discord 機器人,結合科技與創意的樂趣。

準備好開始了嗎?
版本: 2024.10 剛剛發布