Package com.aowagie.text.pdf.parser
Class PdfTextExtractor
java.lang.Object
com.aowagie.text.pdf.parser.PdfTextExtractor
Extracts text from a PDF file.
- Since:
- 2.1.4
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final SimpleTextExtractingPdfContentStreamProcessor
The processor that will extract the text.private final PdfReader
The PdfReader that holds the PDF file. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprivate byte[]
getContentBytesForPage
(int pageNum) Gets the content stream of a page.getTextFromPage
(int page) Gets the text from a page.
-
Field Details
-
reader
The PdfReader that holds the PDF file. -
extractionProcessor
The processor that will extract the text.
-
-
Constructor Details
-
PdfTextExtractor
Creates a new Text Extractor object.- Parameters:
reader
- the reader with the PDF
-
-
Method Details
-
getContentBytesForPage
Gets the content stream of a page.- Parameters:
pageNum
- the page number of page you want get the content stream from- Returns:
- a byte array with the content stream of a page
- Throws:
IOException
-
getTextFromPage
Gets the text from a page.- Parameters:
page
- the page number of the page- Returns:
- a String with the content as plain text (without PDF syntax)
- Throws:
IOException
-