C#で検索可能なPDFを保存する with IronOCR

Q: 写真やスクリーンショットから検索可能なPDFを作成することはできますか？

はい、 ReadPhoto 、 ReadScreenShot 、および ReadDocumentAdvanced からの結果に対して SaveAsSearchablePDFが サポートされています。各メソッドは検索可能なPDFへのエクスポートをサポートする結果型を返すため、実際の写真、スクリーンショット、または複雑なスキャン文書を検索可能なPDFに簡単に変換できます。

Q: ModelType パラメータはどのような役割を果たしますか？

ModelType パラメータは、OCR に使用する事前学習済み ML モデルを制御します。デフォルトは Normal で、高速な結果を得るために画像を 960 ピクセルにリサイズして処理します。 Enhanced は最大 2560 ピクセルの画像に対応しており、細部をより忠実に保持し、高解像度の入力に対する精度を向上させます。

カーティス・チャウ

更新日:2026年3月26日

Translated

View the article in English

IronOCRはOCRテクノロジーを使用して、C#開発者がスキャンした文書や画像を検索可能なPDFに変換することを可能にし、わずか数行のコードでファイル、バイト、ストリームとしての出力をサポートします。

検索可能な PDF は、OCR (光学式文字認識) PDF とも呼ばれ、スキャンされた画像と機械で読み取り可能なテキストの両方を含む PDF ドキュメントの一種です。これらの PDF は、スキャンされた紙の文書または画像に対して OCR を実行し、画像内のテキストを認識して、選択および検索可能なテキストに変換することによって作成されます。

ReadDocumentAdvancedの検索結果でも利用可能であり、写真や高度なドキュメントOCRワークフローから検索可能なPDFを作成することを可能にします。この機能は、紙のアーカイブをデジタル化したり、文書管理を改善するためにレガシーPDFを検索可能にしたりする際に特に役立ちます。

クイックスタート: 検索可能なPDFを1行でエクスポート

SaveAsSearchablePdf(...)を呼び出します。 IronOCR を使えば、これだけで完全に検索可能な PDF を作成できます。

IronOCR をNuGetパッケージマネージャでインストール
PM > Install-Package IronOcr

このコードスニペットをコピーして実行します。

new IronOcr.IronTesseract { Configuration = { RenderSearchablePdf = true } } .Read(new IronOcr.OcrImageInput("file.jpg")).SaveAsSearchablePdf("searchable.pdf");

実際の環境でテストするためにデプロイする

今日プロジェクトで IronOCR を使い始めましょう無料トライアル

最小限のワークフロー（5ステップ）

結果を検索可能な PDF として保存するための C# ライブラリをダウンロードします
OCRのための画像とPDFドキュメントを準備
RenderSearchablePdfプロパティをtrueに設定してください。
SaveAsSearchablePdfメソッドを利用して検索可能なPDFファイルを出力します
検索可能なPDFをバイト・ストリームとしてエクスポートする。

OCR結果を検索可能なPDFとしてエクスポートするにはどうすればよいですか?

IronOCRを使用して検索可能なPDFとして結果をエクスポートするには、SaveAsSearchablePdfを呼び出します。

入力

"ハリー・ポッター"の小説の1ページをTIFFファイルとしてスキャンし、OcrImageInput経由で読み込んだものです。このページには文字が密集して印刷されており、検索可能なPDFテキストレイヤーをテストするための現実的な入力データとなります。

potter.tiff: OCRの入力として使用され、不可視のテキストレイヤーを含む検索可能なPDFを生成するためにスキャンされた小説のページ。

:path=/static-assets/ocr/content-code-examples/how-to/searchable-pdf-searchable-pdf.cs

using IronOcr;

// Create the OCR engine: defaults to English with balanced speed and accuracy
IronTesseract ocrTesseract = new IronTesseract();

// Required: without this flag the text overlay layer is not built, and SaveAsSearchablePdf produces a plain image PDF
ocrTesseract.Configuration.RenderSearchablePdf = true;

// Wrap the TIFF in OcrImageInput: handles DPI detection and page layout automatically
using var imageInput = new OcrImageInput("Potter.tiff");
// Run OCR; returns a result containing the recognized text and spatial layout data
OcrResult ocrResult = ocrTesseract.Read(imageInput);

// Write the output: the original scanned image is preserved with an invisible text layer on top
ocrResult.SaveAsSearchablePdf("searchablePdf.pdf");

Imports IronOcr

' Create the OCR engine: defaults to English with balanced speed and accuracy
Dim ocrTesseract As New IronTesseract()

' Required: without this flag the text overlay layer is not built, and SaveAsSearchablePdf produces a plain image PDF
ocrTesseract.Configuration.RenderSearchablePdf = True

' Wrap the TIFF in OcrImageInput: handles DPI detection and page layout automatically
Using imageInput As New OcrImageInput("Potter.tiff")
    ' Run OCR; returns a result containing the recognized text and spatial layout data
    Dim ocrResult As OcrResult = ocrTesseract.Read(imageInput)

    ' Write the output: the original scanned image is preserved with an invisible text layer on top
    ocrResult.SaveAsSearchablePdf("searchablePdf.pdf")
End Using

$vbLabelText $csharpLabel

出力

searchablePdf.pdf: 検索可能なPDF出力。任意の単語を選択または検索して、OCRテキストレイヤーを確認してください。

生成されるPDFには、元のスキャン画像が埋め込まれ、認識された各単語の上に非表示のテキストレイヤーが配置されます。ビューア内で任意のWORDを選択または検索し、テキストレイヤーが存在することを確認してください。

IronOCRはオーバーレイに特定のフォントを使用しているため、元のテキストと比較して表示されるテキストサイズに若干の差異が生じる場合があります。

複数ページの TIFF ファイルや複雑なドキュメントを扱う場合、IronOCR はすべてのページを自動的に処理し、出力に含めます。このライブラリは、ページの順序とテキストオーバーレイの位置を自動的に処理し、テキストと画像の正確なマッピングを保証します。

写真や高度なドキュメントスキャンから検索可能なPDFを作成するにはどうすればよいですか？

ReadDocumentAdvancedを使用する場合、検索可能なPDFエクスポートも利用可能です。これらの各メソッドは、SaveAsSearchablePdfをサポートする結果型を返します。

これらのメソッドを呼び出す際、オプションで ModelType を渡すことができます。デフォルトは Normal ですが、Enhanced は速度を犠牲にする代わりに、より高い精度を提供します。

入力

LoadImage 経由で読み込まれた、文字が描かれた壁画の写真。このシーンには実環境に埋め込まれた複数の WORD が含まれており、ReadPhoto モデルを用いた Enhanced の実用的なテストとなります。

photo.png: ReadPhotoのEnhancedモデルを使用して読み込まれた壁画の写真。検索可能なPDFを生成するために使用されました。

using IronOcr;

var ocr = new IronTesseract();
using var input = new OcrInput();
input.LoadImage("photo.png");

// ReadPhoto with Enhanced model
OcrPhotoResult photoResult = ocr.ReadPhoto(input, ModelType.Enhanced);
Console.WriteLine(photoResult.Text);

// Save as searchable PDF
byte[] pdfBytes = photoResult.SaveAsSearchablePdf();
File.WriteAllBytes("searchable-photo.pdf", pdfBytes);

using IronOcr;

var ocr = new IronTesseract();
using var input = new OcrInput();
input.LoadImage("photo.png");

// ReadPhoto with Enhanced model
OcrPhotoResult photoResult = ocr.ReadPhoto(input, ModelType.Enhanced);
Console.WriteLine(photoResult.Text);

// Save as searchable PDF
byte[] pdfBytes = photoResult.SaveAsSearchablePdf();
File.WriteAllBytes("searchable-photo.pdf", pdfBytes);

Imports IronOcr

Dim ocr As New IronTesseract()
Using input As New OcrInput()
    input.LoadImage("photo.png")

    ' ReadPhoto with Enhanced model
    Dim photoResult As OcrPhotoResult = ocr.ReadPhoto(input, ModelType.Enhanced)
    Console.WriteLine(photoResult.Text)

    ' Save as searchable PDF
    Dim pdfBytes As Byte() = photoResult.SaveAsSearchablePdf()
    File.WriteAllBytes("searchable-photo.pdf", pdfBytes)
End Using

$vbLabelText $csharpLabel

出力

searchable-photo.pdf: ReadPhotoから出力された検索可能なPDF。テキストレイヤーは、あらゆるPDFビューアでの全文検索に対応しています。

生成される検索可能なPDFには、認識された単語の上に非表示のテキストレイヤーが含まれています。 PDFビューアで"Milk"を検索すると、元の写真内の描画テキストから直接抽出された3件の一致が返されます。

同じアプローチは ReadDocumentAdvanced にも適用でき、OcrDocAdvancedResult を返します：

入力

LoadImage 経由で読み込まれたスキャン済み請求書。これには構造化されたフィールド（ベンダー名、明細項目、合計）が含まれており、ReadDocumentAdvanced モデルがこれを認識し、検索可能なテキストレイヤーとして埋め込みます。

invoice.png: OcrInputに読み込まれたスキャン済みの請求書で、Enhancedモデルを使用してReadDocumentAdvancedに渡されたもの。

using IronOcr;

var ocr = new IronTesseract();
using var input = new OcrInput();
input.LoadImage("invoice.png");

// ReadDocumentAdvanced with Enhanced model
OcrDocAdvancedResult docResult = ocr.ReadDocumentAdvanced(input, ModelType.Enhanced);
byte[] docPdfBytes = docResult.SaveAsSearchablePdf();
File.WriteAllBytes("searchable-doc.pdf", docPdfBytes);

using IronOcr;

var ocr = new IronTesseract();
using var input = new OcrInput();
input.LoadImage("invoice.png");

// ReadDocumentAdvanced with Enhanced model
OcrDocAdvancedResult docResult = ocr.ReadDocumentAdvanced(input, ModelType.Enhanced);
byte[] docPdfBytes = docResult.SaveAsSearchablePdf();
File.WriteAllBytes("searchable-doc.pdf", docPdfBytes);

Imports IronOcr

Dim ocr As New IronTesseract()
Using input As New OcrInput()
    input.LoadImage("invoice.png")

    ' ReadDocumentAdvanced with Enhanced model
    Dim docResult As OcrDocAdvancedResult = ocr.ReadDocumentAdvanced(input, ModelType.Enhanced)
    Dim docPdfBytes As Byte() = docResult.SaveAsSearchablePdf()
    File.WriteAllBytes("searchable-doc.pdf", docPdfBytes)
End Using

$vbLabelText $csharpLabel

出力

searchable-doc.pdf: ReadDocumentAdvanced から出力された検索可能な PDF。請求書の各フィールドを選択および検索可能です。

SaveAsSearchablePdf は ReadPassport または ReadLicensePlate の結果ではサポートされておらず、をスローします。

複数ページのドキュメントを扱う

複数ページのドキュメントに対してPDF OCR処理を行う際、IronOCRは各ページを順次処理し、元のドキュメント構造を維持します。

入力

Hartwell Capital Managementによる11ページの年次報告書がOcrPdfInput経由で読み込まれました。ページ 1～10（インデックス 0～9）は PageIndices 範囲を使用して選択され、単一の Read 呼び出しで処理されます。

multi-page-scan.pdf: 複数ページの検索可能なPDFへの変換の入力として使用された、11ページのHartwell Capital Management年次報告書。

:path=/static-assets/ocr/content-code-examples/how-to/searchable-pdf-multi-page.cs

using IronOcr;

// Create the OCR engine. RenderSearchablePdf is false by default; no need to set it when using OcrPdfInput directly
var ocrTesseract = new IronTesseract();

// Load pages 1–10 (indices 0–9) only; PageIndices avoids loading and OCR-ing the full document unnecessarily
using var pdfInput = new OcrPdfInput("multi-page-scan.pdf", PageIndices: Enumerable.Range(0, 10));

// Run OCR across all selected pages in order
OcrResult result = ocrTesseract.Read(pdfInput);

// Write the searchable PDF; true = apply the input's image filters to the embedded page images in the output
result.SaveAsSearchablePdf("searchable-multi-page.pdf", true);

Imports IronOcr

' Create the OCR engine. RenderSearchablePdf is false by default; no need to set it when using OcrPdfInput directly
Dim ocrTesseract As New IronTesseract()

' Load pages 1–10 (indices 0–9) only; PageIndices avoids loading and OCR-ing the full document unnecessarily
Using pdfInput As New OcrPdfInput("multi-page-scan.pdf", PageIndices:=Enumerable.Range(0, 10))
    ' Run OCR across all selected pages in order
    Dim result As OcrResult = ocrTesseract.Read(pdfInput)

    ' Write the searchable PDF; true = apply the input's image filters to the embedded page images in the output
    result.SaveAsSearchablePdf("searchable-multi-page.pdf", True)
End Using

$vbLabelText $csharpLabel

出力

searchable-multi-page.pdf: 10ページの検索可能なPDF出力。各ページには全文検索用の非表示テキストレイヤーが含まれています。

生成されたPDFは10ページ（元のレポートの1～10ページ）で構成されており、各ページには非表示のテキストレイヤーが含まれており、これにより、どのPDFビューアでも抽出されたコンテンツを選択および検索できるようになっています。

検索可能なPDFを作成するときにフィルタを適用するにはどうすればよいですか?

SaveAsSearchablePdf の 2 番目のパラメータは、埋め込み出力に画像フィルターを適用するかどうかを制御するブール値を受け取ります。画像最適化フィルターを使用すると、特に低品質スキャンを扱う場合に、OCRの精度を大幅に向上させることができます。

以下の例では、グレースケールフィルターを適用し、2番目の引数として true を渡すことで、フィルター処理された画像を検索可能なPDF出力に埋め込んでいます。

:path=/static-assets/ocr/content-code-examples/how-to/image-quality-correction-searchable-pdf.cs

using IronOcr;

// Create OCR engine: filters are applied at the OcrInput level, so no configuration changes are needed here
var ocr = new IronTesseract();
var ocrInput = new OcrInput();

// Load the scanned PDF as the OCR source
ocrInput.LoadPdf("invoice.pdf");

// Convert to grayscale: removes color noise that can reduce OCR accuracy on color-printed documents
ocrInput.ToGrayScale();
// Run OCR on the preprocessed input
OcrResult result = ocr.Read(ocrInput);

// Write the searchable PDF; true = embed the grayscale-filtered image rather than the original color scan
result.SaveAsSearchablePdf("outputGrayscale.pdf", true);

Imports IronOcr

' Create OCR engine: filters are applied at the OcrInput level, so no configuration changes are needed here
Dim ocr As New IronTesseract()
Dim ocrInput As New OcrInput()

' Load the scanned PDF as the OCR source
ocrInput.LoadPdf("invoice.pdf")

' Convert to grayscale: removes color noise that can reduce OCR accuracy on color-printed documents
ocrInput.ToGrayScale()
' Run OCR on the preprocessed input
Dim result As OcrResult = ocr.Read(ocrInput)

' Write the searchable PDF; True = embed the grayscale-filtered image rather than the original color scan
result.SaveAsSearchablePdf("outputGrayscale.pdf", True)

$vbLabelText $csharpLabel

最適な結果を得るには、フィルタウィザードを使用して、特定のドキュメントタイプに最適なフィルタの組み合わせを自動的に決定することを検討してください。このツールは、入力を分析し、適切な前処理ステップを提案します。

検索可能なPDFファイル内の誤った文字を修正するにはどうすればよいですか？

PDF上ではテキストが正しく表示されているのに、検索やコピーを行うと文字化けが発生する場合は、検索可能なテキストレイヤーで使用されているデフォルトフォントが原因です。デフォルトでは、SaveAsSearchablePdf は Times New Roman を使用していますが、このフォントはすべての Unicode 文字を完全にサポートしているわけではありません。これは、アクセント記号付き文字や非ASCII文字を含む言語に影響します。

この問題を解決するには、3番目のパラメータとしてUnicode互換のフォントファイルを指定してください。

result.SaveAsSearchablePdf("output.pdf", false, "Fonts/LiberationSerif-Regular.ttf");

result.SaveAsSearchablePdf("output.pdf", false, "Fonts/LiberationSerif-Regular.ttf");

result.SaveAsSearchablePdf("output.pdf", False, "Fonts/LiberationSerif-Regular.ttf")

$vbLabelText $csharpLabel

4番目のパラメータとして、カスタムフォント名を指定することもできます。

result.SaveAsSearchablePdf("output.pdf", false, "Fonts/LiberationSerif-Regular.ttf", "MyFont");

result.SaveAsSearchablePdf("output.pdf", false, "Fonts/LiberationSerif-Regular.ttf", "MyFont");

result.SaveAsSearchablePdf("output.pdf", False, "Fonts/LiberationSerif-Regular.ttf", "MyFont")

$vbLabelText $csharpLabel

これは、OcrDocAdvancedResultを含むすべての結果タイプに適用されるため、どの読み取りメソッドが結果を生成したかに関係なく、修正は機能します。

ご注意Times New Romanで組版された文書の場合、Liberation Serifは計量的に互換性があり、元の間隔とレイアウトが維持されるため推奨されます。汎用的な多言語使用には、Noto SansまたはDejaVu Sansが良い代替品です。）

ファイルパスへの書き込みが不可能な場合、IronOCRは検索可能なPDFをバイト配列またはストリームとして返すこともサポートしています。

検索可能なPDFをバイトまたはストリームとしてエクスポートするにはどうすればよいですか?

検索可能なPDFの出力は、それぞれSaveAsSearchablePdfStreamメソッドを使用して、バイトまたはストリームとして扱うこともできます。以下のコード例は、これらのメソッドの使用方法を示しています。

:path=/static-assets/ocr/content-code-examples/how-to/searchable-pdf-searchable-pdf-byte-stream.cs

// Return as a byte array: suited for storing in a database or sending in an HTTP response body
byte[] pdfByte = ocrResult.SaveAsSearchablePdfBytes();

// Return as a stream: suited for uploading to cloud storage or piping to another I/O operation without buffering the full file
Stream pdfStream = ocrResult.SaveAsSearchablePdfStream();

' Return as a byte array: suited for storing in a database or sending in an HTTP response body
Dim pdfByte As Byte() = ocrResult.SaveAsSearchablePdfBytes()

' Return as a stream: suited for uploading to cloud storage or piping to another I/O operation without buffering the full file
Dim pdfStream As Stream = ocrResult.SaveAsSearchablePdfStream()

$vbLabelText $csharpLabel

これらの出力オプションは、ファイルシステムへのアクセスが制限される可能性のあるクラウドストレージサービス、データベース、またはWebアプリケーションと統合する場合に特に便利です。以下の例は、実際の活用例を示しています：

using IronOcr;
using System.IO;

public class SearchablePdfExporter
{
    public async Task ProcessAndUploadPdf(string inputPath)
    {
        var ocr = new IronTesseract
        {
            Configuration = { RenderSearchablePdf = true }
        };

        // Process the input
        using var input = new OcrImageInput(inputPath);
        var result = ocr.Read(input);

        // Option 1: Save to database as byte array
        byte[] pdfBytes = result.SaveAsSearchablePdfBytes();
        // Store pdfBytes in database BLOB field

        // Option 2: Upload to cloud storage using stream
        using (Stream pdfStream = result.SaveAsSearchablePdfStream())
        {
            // Upload stream to Azure Blob Storage, AWS S3, etc.
            await UploadToCloudStorage(pdfStream, "searchable-output.pdf");
        }

        // Option 3: Return as web response
        // return File(pdfBytes, "application/pdf", "searchable.pdf");
    }

    private async Task UploadToCloudStorage(Stream stream, string fileName)
    {
        // Cloud upload implementation
    }
}

using IronOcr;
using System.IO;

public class SearchablePdfExporter
{
    public async Task ProcessAndUploadPdf(string inputPath)
    {
        var ocr = new IronTesseract
        {
            Configuration = { RenderSearchablePdf = true }
        };

        // Process the input
        using var input = new OcrImageInput(inputPath);
        var result = ocr.Read(input);

        // Option 1: Save to database as byte array
        byte[] pdfBytes = result.SaveAsSearchablePdfBytes();
        // Store pdfBytes in database BLOB field

        // Option 2: Upload to cloud storage using stream
        using (Stream pdfStream = result.SaveAsSearchablePdfStream())
        {
            // Upload stream to Azure Blob Storage, AWS S3, etc.
            await UploadToCloudStorage(pdfStream, "searchable-output.pdf");
        }

        // Option 3: Return as web response
        // return File(pdfBytes, "application/pdf", "searchable.pdf");
    }

    private async Task UploadToCloudStorage(Stream stream, string fileName)
    {
        // Cloud upload implementation
    }
}

Imports IronOcr
Imports System.IO
Imports System.Threading.Tasks

Public Class SearchablePdfExporter
    Public Async Function ProcessAndUploadPdf(inputPath As String) As Task
        Dim ocr As New IronTesseract With {
            .Configuration = New TesseractConfiguration With {
                .RenderSearchablePdf = True
            }
        }

        ' Process the input
        Using input As New OcrImageInput(inputPath)
            Dim result = ocr.Read(input)

            ' Option 1: Save to database as byte array
            Dim pdfBytes As Byte() = result.SaveAsSearchablePdfBytes()
            ' Store pdfBytes in database BLOB field

            ' Option 2: Upload to cloud storage using stream
            Using pdfStream As Stream = result.SaveAsSearchablePdfStream()
                ' Upload stream to Azure Blob Storage, AWS S3, etc.
                Await UploadToCloudStorage(pdfStream, "searchable-output.pdf")
            End Using

            ' Option 3: Return as web response
            ' Return File(pdfBytes, "application/pdf", "searchable.pdf")
        End Using
    End Function

    Private Async Function UploadToCloudStorage(stream As Stream, fileName As String) As Task
        ' Cloud upload implementation
    End Function
End Class

$vbLabelText $csharpLabel

パフォーマンスの考慮事項

大量のドキュメントを処理する場合は、マルチスレッド OCR 操作を実装してスループットを向上させることを検討してください。 IronOCRは同時処理をサポートしており、複数のドキュメントを同時に扱うことができます：

using IronOcr;
using System.Threading.Tasks;
using System.Collections.Concurrent;

public class BatchPdfProcessor
{
    private readonly IronTesseract _ocr;

    public BatchPdfProcessor()
    {
        _ocr = new IronTesseract
        {
            Configuration = 
            {
                RenderSearchablePdf = true,
                // Configure for optimal performance
                Language = OcrLanguage.English
            }
        };
    }

    public async Task ProcessBatchAsync(string[] filePaths)
    {
        var results = new ConcurrentBag<(string source, string output)>();

        await Parallel.ForEachAsync(filePaths, async (filePath, ct) =>
        {
            using var input = new OcrImageInput(filePath);
            var result = _ocr.Read(input);

            string outputPath = Path.ChangeExtension(filePath, ".searchable.pdf");
            result.SaveAsSearchablePdf(outputPath);

            results.Add((filePath, outputPath));
        });

        Console.WriteLine($"Processed {results.Count} files");
    }
}

using IronOcr;
using System.Threading.Tasks;
using System.Collections.Concurrent;

public class BatchPdfProcessor
{
    private readonly IronTesseract _ocr;

    public BatchPdfProcessor()
    {
        _ocr = new IronTesseract
        {
            Configuration = 
            {
                RenderSearchablePdf = true,
                // Configure for optimal performance
                Language = OcrLanguage.English
            }
        };
    }

    public async Task ProcessBatchAsync(string[] filePaths)
    {
        var results = new ConcurrentBag<(string source, string output)>();

        await Parallel.ForEachAsync(filePaths, async (filePath, ct) =>
        {
            using var input = new OcrImageInput(filePath);
            var result = _ocr.Read(input);

            string outputPath = Path.ChangeExtension(filePath, ".searchable.pdf");
            result.SaveAsSearchablePdf(outputPath);

            results.Add((filePath, outputPath));
        });

        Console.WriteLine($"Processed {results.Count} files");
    }
}

Imports IronOcr
Imports System.Threading.Tasks
Imports System.Collections.Concurrent

Public Class BatchPdfProcessor
    Private ReadOnly _ocr As IronTesseract

    Public Sub New()
        _ocr = New IronTesseract With {
            .Configuration = New OcrConfiguration With {
                .RenderSearchablePdf = True,
                ' Configure for optimal performance
                .Language = OcrLanguage.English
            }
        }
    End Sub

    Public Async Function ProcessBatchAsync(filePaths As String()) As Task
        Dim results As New ConcurrentBag(Of (source As String, output As String))()

        Await Task.Run(Sub()
                           Parallel.ForEach(filePaths, Sub(filePath)
                                                           Using input As New OcrImageInput(filePath)
                                                               Dim result = _ocr.Read(input)

                                                               Dim outputPath As String = Path.ChangeExtension(filePath, ".searchable.pdf")
                                                               result.SaveAsSearchablePdf(outputPath)

                                                               results.Add((filePath, outputPath))
                                                           End Using
                                                       End Sub)
                       End Sub)

        Console.WriteLine($"Processed {results.Count} files")
    End Function
End Class

$vbLabelText $csharpLabel

高度な設定オプション

より高度なシナリオでは、詳細なTesseract設定を活用して、特定のドキュメントタイプや言語向けにOCRエンジンを微調整することができます：

var advancedOcr = new IronTesseract
{
    Configuration = 
    {
        RenderSearchablePdf = true,
        TesseractVariables = new Dictionary<string, object>
        {
            { "preserve_interword_spaces", 1 },
            { "tessedit_char_whitelist", "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" }
        },
        PageSegmentationMode = TesseractPageSegmentationMode.SingleColumn
    },
    Language = OcrLanguage.EnglishBest
};

var advancedOcr = new IronTesseract
{
    Configuration = 
    {
        RenderSearchablePdf = true,
        TesseractVariables = new Dictionary<string, object>
        {
            { "preserve_interword_spaces", 1 },
            { "tessedit_char_whitelist", "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" }
        },
        PageSegmentationMode = TesseractPageSegmentationMode.SingleColumn
    },
    Language = OcrLanguage.EnglishBest
};

Imports IronOcr

Dim advancedOcr As New IronTesseract With {
    .Configuration = New TesseractConfiguration With {
        .RenderSearchablePdf = True,
        .TesseractVariables = New Dictionary(Of String, Object) From {
            {"preserve_interword_spaces", 1},
            {"tessedit_char_whitelist", "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"}
        },
        .PageSegmentationMode = TesseractPageSegmentationMode.SingleColumn
    },
    .Language = OcrLanguage.EnglishBest
}

$vbLabelText $csharpLabel

これらの設定オプションは、SaveAsSearchablePdfStreamの3つの出力方法すべてに等しく適用されます。以下の概要には、検索可能なすべてのPDF処理メソッドと、それぞれの適切な出力形式がまとめられています。

まとめ

IronOCRで検索可能なPDFを作成するのは簡単で柔軟です。単一の画像、複数ページのドキュメント、ReadPhoto 経由の写真、あるいは ReadDocumentAdvanced 経由の高度なドキュメントスキャンなど、どのような処理が必要であっても、このライブラリはさまざまな形式で検索可能な PDF を生成するための堅牢なメソッドを提供します。精度を高めるために、ModelType パラメータを使用して、標準モデルと拡張MLモデルのいずれかを選択してください。ファイル、バイト、ストリームとしてエクスポートできるため、デスクトップアプリケーションからクラウドベースのサービスまで、あらゆるアプリケーションアーキテクチャに適応できます。

より高度なOCRのシナリオについては、包括的なコード例を参照するか、詳細なメソッドのシグネチャやオプションについてはAPIドキュメントを参照してください。

よくある質問

C#でスキャン画像から検索可能なPDFを作成するには？

IronOCRはスキャンした画像から検索可能なPDFを簡単に作成できます。設定でRenderSearchablePdfをtrueに設定し、入力画像にRead()メソッドを使用し、希望の出力パスを指定してSaveAsSearchablePdf()を呼び出すだけです。IronOCRは画像に対してOCRを実行し、選択可能で検索可能なテキストを元の画像に重ねたPDFを生成します。

検索可能なPDFに変換できるファイル形式は？

IronOCRはJPG、PNG、TIFF、既存のPDFドキュメントを含む様々な画像フォーマットを検索可能なPDFに変換することができます。このライブラリは単一ページの画像とTIFFファイルのような複数ページのドキュメントの両方をサポートし、自動的にすべてのページを処理し、出力される検索可能なPDFの適切なページ順序を維持します。

検索可能なPDFをファイルではなく、バイト配列やストリームとしてエクスポートできますか？

はい、IronOCRは検索可能なPDFを複数のフォーマットでエクスポートすることができます。SaveAsSearchablePdf()を使ってファイルに直接保存するだけでなく、OCR結果をバイト配列やストリームとしてエクスポートすることもできます。

検索可能なPDFを作成するために最低限必要なコードは何ですか？

IronOCRで検索可能なPDFを作成するのは、たった1行のコードで可能です： new IronOcr.IronTesseract { Configuration = { RenderSearchablePdf = true }.}.Read(new IronOcr.OcrImageInput("file.jpg")).SaveAsSearchablePdf("searchable.pdf").これはIronOCRの合理化されたAPI設計を示している。

検索可能なPDFにおける非表示テキストレイヤーはどのように機能しますか？

IronOCRは、認識されたテキストの位置決めを自動的に処理し、PDF内の元の画像の上に非表示のレイヤーとして配置します。これにより、テキストと画像の正確なマッピングが保証され、元のドキュメントの外観を維持したまま、ユーザーがテキストを選択したり検索したりできるようになります。このIronOCRライブラリは、これを実現するために専用のフォントと位置決めアルゴリズムを使用しています。

写真やスクリーンショットから検索可能なPDFを作成することはできますか？

はい、ReadPhoto、ReadScreenShot、およびReadDocumentAdvancedからの結果に対してSaveAsSearchablePDFがサポートされています。各メソッドは検索可能なPDFへのエクスポートをサポートする結果型を返すため、実際の写真、スクリーンショット、または複雑なスキャン文書を検索可能なPDFに簡単に変換できます。

ModelType パラメータはどのような役割を果たしますか？

ModelType パラメータは、OCR に使用する事前学習済み ML モデルを制御します。デフォルトは Normal で、高速な結果を得るために画像を 960 ピクセルにリサイズして処理します。Enhanced は最大 2560 ピクセルの画像に対応しており、細部をより忠実に保持し、高解像度の入力に対する精度を向上させます。

検索可能なPDFで、コピーまたは検索した文字が破損して表示されるのはなぜですか？

これは、検索可能なテキストレイヤーで使用されるデフォルトのフォント（Times New Roman）が、すべてのUnicode文字を完全にサポートしていないために発生します。これを修正するには、SaveAsSearchablePdfの3番目のパラメータとして、Unicode互換のフォントファイルを指定してください。もしドキュメントがもともとTimes New Romanで組版されており、他のフォントとの間で文字間隔の不一致が見られる場合は、Liberation Serifを試してみてください。このフォントは同じグリフメトリクスを共有しており、元のレイアウトを維持します。

IronOCRはデータ精度をどのように向上させますか？

IronOCRはその高度な認識アルゴリズムと画像補正機能により、信頼性が高く正確なテキスト抽出プロセスを保証します。

IronOCRの無料トライアルを利用できますか？

はい、Iron SoftwareはIronOCRの無料トライアルを提供しており、ユーザーが購入決定をする前にその機能と能力をテストできます。

カーティス・チャウ

今すぐエンジニアリングチームとチャット

テクニカルライター

Curtis Chauは、カールトン大学でコンピュータサイエンスの学士号を取得し、Node.js、TypeScript、JavaScript、およびReactに精通したフロントエンド開発を専門としています。直感的で美しいユーザーインターフェースを作成することに情熱を持ち、Curtisは現代のフレームワークを用いた開発や、構造の良い視覚的に魅力的なマニュアルの作成を楽しんでいます。

開発以外にも、CurtisはIoT（Internet of Things）への強い関心を持ち、ハードウェアとソフトウェアの統合方法を模索しています。余暇には、ゲームをしたりDiscordボットを作成したりして、技術に対する愛情と創造性を組み合わせています。

Jeffrey T. Fritz

プリンシパルプログラムマネージャー - .NETコミュニティチーム

Jeffはまた、.NETとVisual Studioチームのプリンシパルプログラムマネージャーです。彼は.NET Conf仮想会議シリーズのエグゼクティブプロデューサーであり、週に二回放送される開発者向けライブストリーム『Fritz and Friends』のホストを務め、テクノロジーについて話すことや視聴者と一緒にコードを書くことをしています。Jeffはワークショップ、プレゼンテーション、およびMicrosoft Build、Microsoft Ignite、.NET Conf、Microsoft MVPサミットを含む最大のMicrosoft開発者イベントのコンテンツを企画しています。

準備はできましたか？

Nuget ダウンロード 5,896,332 | バージョン: 2026.5 just released

ライセンスを見る

まだスクロールしていますか?

すぐに証拠が欲しいですか? PM > Install-Package IronOcr
サンプルを実行あなたの画像が検索可能なテキストになるのをご覧ください。

ライセンスを見る

顧客ハイライト:

開発者スポットライト:

ウェビナー:

無料30日間のトライアルを開始

このページでは

C#で検索可能なPDFを保存する with IronOCR

IronOCR をNuGetパッケージマネージャでインストール

このコードスニペットをコピーして実行します。

実際の環境でテストするためにデプロイする

最小限のワークフロー（5ステップ）

OCR結果を検索可能なPDFとしてエクスポートするにはどうすればよいですか?

入力

出力

写真や高度なドキュメントスキャンから検索可能なPDFを作成するにはどうすればよいですか？

入力

出力

入力

出力

複数ページのドキュメントを扱う

入力

出力

検索可能なPDFを作成するときにフィルタを適用するにはどうすればよいですか?

検索可能なPDFファイル内の誤った文字を修正するにはどうすればよいですか？

検索可能なPDFをバイトまたはストリームとしてエクスポートするにはどうすればよいですか?

パフォーマンスの考慮事項

高度な設定オプション

まとめ

よくある質問

C#でスキャン画像から検索可能なPDFを作成するには？

検索可能なPDFに変換できるファイル形式は？

検索可能なPDFをファイルではなく、バイト配列やストリームとしてエクスポートできますか？

検索可能なPDFを作成するために最低限必要なコードは何ですか？

検索可能なPDFにおける非表示テキストレイヤーはどのように機能しますか？

写真やスクリーンショットから検索可能なPDFを作成することはできますか？

ModelType パラメータはどのような役割を果たしますか？

検索可能なPDFで、コピーまたは検索した文字が破損して表示されるのはなぜですか？

IronOCRはデータ精度をどのように向上させますか？

IronOCRの無料トライアルを利用できますか？

まだスクロールしていますか?

アイアンサポートチーム

無料30日間のトライアルを開始

このページでは

C#で検索可能なPDFを保存する with IronOCR

IronOCR をNuGetパッケージマネージャでインストール

このコード スニペットをコピーして実行します。

実際の環境でテストするためにデプロイする

最小限のワークフロー（5ステップ）

OCR結果を検索可能なPDFとしてエクスポートするにはどうすればよいですか?

入力

出力

写真や高度なドキュメントスキャンから検索可能なPDFを作成するにはどうすればよいですか？

入力

出力

入力

出力

複数ページのドキュメントを扱う

入力

出力

検索可能なPDFを作成するときにフィルタを適用するにはどうすればよいですか?

検索可能なPDFファイル内の誤った文字を修正するにはどうすればよいですか？

検索可能なPDFをバイトまたはストリームとしてエクスポートするにはどうすればよいですか?

パフォーマンスの考慮事項

高度な設定オプション

まとめ

よくある質問

C#でスキャン画像から検索可能なPDFを作成するには？

検索可能なPDFに変換できるファイル形式は？

検索可能なPDFをファイルではなく、バイト配列やストリームとしてエクスポートできますか？

検索可能なPDFを作成するために最低限必要なコードは何ですか？

検索可能なPDFにおける非表示テキストレイヤーはどのように機能しますか？

写真やスクリーンショットから検索可能なPDFを作成することはできますか？

ModelType パラメータはどのような役割を果たしますか？

検索可能なPDFで、コピーまたは検索した文字が破損して表示されるのはなぜですか？

IronOCRはデータ精度をどのように向上させますか？

IronOCRの無料トライアルを利用できますか？

まだスクロールしていますか?

無料をゲット

次のステップ：30日間の無料トライアルを開始

Thank You

次のステップ：30日間の無料トライアルを開始

Want to deploy IronSuite to a live project for FREE?

What’s included?

世界中の数百万人のエンジニアから信頼されています。

アイアンサポートチーム

このコードスニペットをコピーして実行します。