Class PdfTextExtractor

java.lang.Object
com.aowagie.text.pdf.parser.PdfTextExtractor

class PdfTextExtractor extends Object
Extracts text from a PDF file.
Since:
2.1.4
  • Field Details

  • Constructor Details

    • PdfTextExtractor

      public PdfTextExtractor(PdfReader reader)
      Creates a new Text Extractor object.
      Parameters:
      reader - the reader with the PDF
  • Method Details

    • getContentBytesForPage

      private byte[] getContentBytesForPage(int pageNum) throws IOException
      Gets the content stream of a page.
      Parameters:
      pageNum - the page number of page you want get the content stream from
      Returns:
      a byte array with the content stream of a page
      Throws:
      IOException
    • getTextFromPage

      public String getTextFromPage(int page) throws IOException
      Gets the text from a page.
      Parameters:
      page - the page number of the page
      Returns:
      a String with the content as plain text (without PDF syntax)
      Throws:
      IOException