Class PdfTextExtractor


  • class PdfTextExtractor
    extends java.lang.Object
    Extracts text from a PDF file.
    Since:
    2.1.4
    • Constructor Detail

      • PdfTextExtractor

        public PdfTextExtractor​(PdfReader reader)
        Creates a new Text Extractor object.
        Parameters:
        reader - the reader with the PDF
    • Method Detail

      • getContentBytesForPage

        private byte[] getContentBytesForPage​(int pageNum)
                                       throws java.io.IOException
        Gets the content stream of a page.
        Parameters:
        pageNum - the page number of page you want get the content stream from
        Returns:
        a byte array with the content stream of a page
        Throws:
        java.io.IOException
      • getTextFromPage

        public java.lang.String getTextFromPage​(int page)
                                         throws java.io.IOException
        Gets the text from a page.
        Parameters:
        page - the page number of the page
        Returns:
        a String with the content as plain text (without PDF syntax)
        Throws:
        java.io.IOException