Class PdfTextExtractor

java.lang.Object
com.itextpdf.text.pdf.parser.PdfTextExtractor

public final class PdfTextExtractor extends Object
Extracts text from a PDF file.
Since:
2.1.4
  • Constructor Details

    • PdfTextExtractor

      private PdfTextExtractor()
      This class only contains static methods.
  • Method Details

    • getTextFromPage

      public static String getTextFromPage(PdfReader reader, int pageNumber, TextExtractionStrategy strategy, Map<String,ContentOperator> additionalContentOperators) throws IOException
      Extract text from a specified page using an extraction strategy. Also allows registration of custom ContentOperators
      Parameters:
      reader - the reader to extract text from
      pageNumber - the page to extract text from
      strategy - the strategy to use for extracting text
      additionalContentOperators - an optional map of custom ContentOperators for rendering instructions
      Returns:
      the extracted text
      Throws:
      IOException - if any operation fails while reading from the provided PdfReader
    • getTextFromPage

      public static String getTextFromPage(PdfReader reader, int pageNumber, TextExtractionStrategy strategy) throws IOException
      Extract text from a specified page using an extraction strategy.
      Parameters:
      reader - the reader to extract text from
      pageNumber - the page to extract text from
      strategy - the strategy to use for extracting text
      Returns:
      the extracted text
      Throws:
      IOException - if any operation fails while reading from the provided PdfReader
      Since:
      5.0.2
    • getTextFromPage

      public static String getTextFromPage(PdfReader reader, int pageNumber) throws IOException
      Extract text from a specified page using the default strategy.

      Note: the default strategy is subject to change. If using a specific strategy is important, use getTextFromPage(PdfReader, int, TextExtractionStrategy)

      Parameters:
      reader - the reader to extract text from
      pageNumber - the page to extract text from
      Returns:
      the extracted text
      Throws:
      IOException - if any operation fails while reading from the provided PdfReader
      Since:
      5.0.2