Package com.aowagie.text.pdf.parser
Class PdfTextExtractor
- java.lang.Object
-
- com.aowagie.text.pdf.parser.PdfTextExtractor
-
class PdfTextExtractor extends java.lang.Object
Extracts text from a PDF file.- Since:
- 2.1.4
-
-
Field Summary
Fields Modifier and Type Field Description private SimpleTextExtractingPdfContentStreamProcessor
extractionProcessor
The processor that will extract the text.private PdfReader
reader
The PdfReader that holds the PDF file.
-
Constructor Summary
Constructors Constructor Description PdfTextExtractor(PdfReader reader)
Creates a new Text Extractor object.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description private byte[]
getContentBytesForPage(int pageNum)
Gets the content stream of a page.java.lang.String
getTextFromPage(int page)
Gets the text from a page.
-
-
-
Field Detail
-
reader
private final PdfReader reader
The PdfReader that holds the PDF file.
-
extractionProcessor
private final SimpleTextExtractingPdfContentStreamProcessor extractionProcessor
The processor that will extract the text.
-
-
Constructor Detail
-
PdfTextExtractor
public PdfTextExtractor(PdfReader reader)
Creates a new Text Extractor object.- Parameters:
reader
- the reader with the PDF
-
-
Method Detail
-
getContentBytesForPage
private byte[] getContentBytesForPage(int pageNum) throws java.io.IOException
Gets the content stream of a page.- Parameters:
pageNum
- the page number of page you want get the content stream from- Returns:
- a byte array with the content stream of a page
- Throws:
java.io.IOException
-
getTextFromPage
public java.lang.String getTextFromPage(int page) throws java.io.IOException
Gets the text from a page.- Parameters:
page
- the page number of the page- Returns:
- a String with the content as plain text (without PDF syntax)
- Throws:
java.io.IOException
-
-