How to Redact Regions in PDF Files

This article was translated from English: Does it need improvement?
Translated
View the article in English

Redacting sensitive information in PDF documents is crucial for ensuring privacy and compliance with data protection regulations. The [POST] Redact Region API from IronSecure Doc offers an efficient way to hide sensitive text and information in specific regions of a PDF document using true redaction. This API ensures that the redacted data is completely removed and cannot be recovered, making it ideal for handling confidential information in legal, financial, or personal documents.

Pull and Start IronSecureDoc

If you don't have IronSecureDoc running yet, please follow the links below to get it set up:

Host Locally Deploy to Cloud

The [POST] Redact Region API

The [POST] Redact Region API endpoint allows you to hide sensitive information within specific regions of a PDF document using true redaction. This feature is crucial for applications that manage confidential documents, such as legal contracts, medical records, or financial statements. By leveraging this API, you can ensure that sensitive text within defined areas of a PDF is permanently removed, offering both security and compliance.

请注意Once a region is redacted, the content within that area cannot be recovered.

Trying It Out in Swagger

Swagger is a powerful tool that enables developers to interact with RESTful APIs through a user-friendly web interface. Whether you're using languages like Python, Java, or others, Swagger offers a convenient way to test and implement this API.

Steps to Redact Region with Swagger

  1. Access the Swagger UI:

    If your API server is running locally, you can access Swagger by navigating to http://localhost:8080/swagger/index.html in your web browser.

    Swagger docs

  2. Locate the [POST] Redact Region API:

    Within the Swagger UI, find the [POST] /v1/document-services/pdfs/redact-region endpoint.

    Redact regions

  3. Specify Redaction Coordinates:

    In this example, we will remove a table from the PDF on page index 1 (i.e., Page #2). Use the following coordinates to define the redaction region:

    • Page index (specific_pages): 1
    • X Coordinate (region_to_redact_x): 60
    • Y Coordinate (region_to_redact_y): 270
    • Width (region_to_redact_w): 470
    • Height (region_to_redact_h): 200
  4. Set Optional Parameters:

    Optionally, you can add a user or owner password, specify specific pages, or decide whether to draw a black box over the redacted area and save the document with PDF/A or PDF/UA compliance.

    Input Swagger

  5. Upload a Sample PDF:

    In the request body, upload a sample PDF file where you want to apply the redaction. Ensure that the file is added as pdf_file.

  6. Execute the Request:

    Click "Execute" to run the request. The response will include the redacted PDF, with the table removed from page index 1 as specified.

    Response

    This Swagger UI interaction allows you to easily test the redaction process, providing immediate feedback on how the coordinates affect the PDF content.

  7. Check the Output PDF:

    The redacted region will be on page 2.


Understanding Input Parameters

Before using this API, it's essential to understand the input parameters required and optional for redacting a region in your PDF. These parameters help define the specific area to redact.

Key Parameters

  • pdf_file: The PDF document you want to redact.
  • region_to_redact_x: X coordinate of the region to redact (starting from the bottom-left of the page).
  • region_to_redact_y: Y coordinate of the region to redact (starting from the bottom-left of the page).
  • region_to_redact_w: Width of the region to redact.
  • region_to_redact_h: Height of the region to redact.

Optional Parameters

  • user_password: If the PDF is password-protected, provide the user password.
  • owner_password: Provide the owner password if modifications are restricted.
  • specific_pages: Specify which pages to redact. If not provided, the redaction applies to all pages.
  • save_as_pdfa: Save the PDF with PDF/A-3 compliance.
  • save_as_pdfua: Save the PDF with PDF/UA compliance.

API Integration: Python Example

Once you're familiar with the parameters, you can call this API using your preferred programming language. Below is an example of how to integrate this API using Python.

import requests

# Define the API endpoint URL
url = 'http://localhost:8080/v1/document-services/pdfs/redact-region'

# Set the headers for the request (optional relevant metadata)
headers = {
    'accept': '*/*',
    'author': 'IronSoftware',
    'title': 'REDACT REGION DEMO 2024',
    'subject': 'DEMO EXAMPLE'
}

# Open the PDF file to be redacted in binary read mode
files = {
    'pdf_file': ('sample_file.pdf', open('sample_file.pdf', 'rb'), 'application/pdf')
}

# Define the coordinates and page for the redaction region
data = {
    'region_to_redact_x': '60',  # X-coordinate starting at the bottom-left
    'region_to_redact_y': '270', # Y-coordinate starting at the bottom-left
    'region_to_redact_w': '470', # Width of the region to be redacted
    'region_to_redact_h': '200', # Height of the region to be redacted
    'specific_pages': [1]        # Specify the page index to redact
}

# Make the POST request to the API with the provided parameters and file
response = requests.post(url, headers=headers, files=files, data=data)

# Save the redacted PDF response to a new file
with open('redacted_output.pdf', 'wb') as f:
    f.write(response.content)

print('PDF redacted successfully.')
import requests

# Define the API endpoint URL
url = 'http://localhost:8080/v1/document-services/pdfs/redact-region'

# Set the headers for the request (optional relevant metadata)
headers = {
    'accept': '*/*',
    'author': 'IronSoftware',
    'title': 'REDACT REGION DEMO 2024',
    'subject': 'DEMO EXAMPLE'
}

# Open the PDF file to be redacted in binary read mode
files = {
    'pdf_file': ('sample_file.pdf', open('sample_file.pdf', 'rb'), 'application/pdf')
}

# Define the coordinates and page for the redaction region
data = {
    'region_to_redact_x': '60',  # X-coordinate starting at the bottom-left
    'region_to_redact_y': '270', # Y-coordinate starting at the bottom-left
    'region_to_redact_w': '470', # Width of the region to be redacted
    'region_to_redact_h': '200', # Height of the region to be redacted
    'specific_pages': [1]        # Specify the page index to redact
}

# Make the POST request to the API with the provided parameters and file
response = requests.post(url, headers=headers, files=files, data=data)

# Save the redacted PDF response to a new file
with open('redacted_output.pdf', 'wb') as f:
    f.write(response.content)

print('PDF redacted successfully.')
PYTHON

This code performs the following steps:

  • Load the PDF: The PDF file to be redacted is loaded from the local file system.
  • Set Redaction Parameters: Specify the coordinates (X, Y), width, height, and specific page to redact.
  • Call the API: The [POST] Redact Region API is called, passing in the necessary parameters.
  • Save the Result: The redacted PDF is saved as a new file.

The given region is redacted as shown below.

Redacted output

常见问题解答

我如何在 PDF 文件中编辑特定区域?

您可以使用 IronSecureDoc 的 [POST] Redact Region API 在 PDF 文件中编辑特定区域。通过提供坐标和编辑区域的尺寸,API 确保敏感信息被永久删除。

设置用于编辑的 IronSecureDoc API 涉及哪些步骤?

要设置用于编辑的 IronSecureDoc API,您需要拉取并启动 Docker 镜像,使用 Swagger 配置 API,指定编辑参数,并执行 API 调用以编辑 PDF 文档中的区域。

IronSecureDoc 可以在云平台上使用吗?

可以,IronSecureDoc 可以部署在 Azure 和 AWS 等云平台上,提供可扩展和灵活的编辑解决方案。

我如何使用 IronSecureDoc 指定 PDF 的编辑区域?

要使用 IronSecureDoc 指定编辑区域,您需要提供 X 和 Y 坐标以及要编辑区域的宽度和高度。这些参数定义 PDF 页面的确切区域。

是否有方法在全面实施前测试编辑过程?

是的,您可以通过运行 IronSecureDoc API 服务器并使用 Swagger 与 API 交互来在本地测试编辑过程。这使您可以试验编辑参数并在全面实施前验证输出。

可以使用哪些编程语言与 IronSecureDoc API 集成?

IronSecureDoc API 可以与任何能够发出 HTTP 请求的编程语言集成,例如 Python、Java、C# 等。

PDF 中的真正编辑是什么,为什么重要?

PDF 中的真正编辑确保敏感数据不仅被隐藏而且被彻底从文档中移除。这对于维护机密性和遵守数据保护法规至关重要。

IronSecureDoc 支持 PDF 合规标准吗?

是的,使用 IronSecureDoc 保存编辑后的 PDF 时,您可以选择符合 PDF/A-3 或 PDF/UA 等标准,以满足特定文档要求。

IronSecureDoc 能够处理需要密码保护的 PDF 吗?

是的,IronSecureDoc 能够通过在编辑过程中提供所需的用户和所有者密码作为可选参数来处理需要密码保护的 PDF。

Curtis Chau
技术作家

Curtis Chau 拥有卡尔顿大学的计算机科学学士学位,专注于前端开发,精通 Node.js、TypeScript、JavaScript 和 React。他热衷于打造直观且美观的用户界面,喜欢使用现代框架并创建结构良好、视觉吸引力强的手册。

除了开发之外,Curtis 对物联网 (IoT) 有浓厚的兴趣,探索将硬件和软件集成的新方法。在空闲时间,他喜欢玩游戏和构建 Discord 机器人,将他对技术的热爱与创造力相结合。

准备开始了吗?
版本: 2024.10 刚刚发布