Package com.itextpdf.text.pdf.parser
Class PdfTextExtractor
java.lang.Object
com.itextpdf.text.pdf.parser.PdfTextExtractor
Extracts text from a PDF file.
- Since:
- 2.1.4
-
Constructor Summary
ConstructorsModifierConstructorDescriptionprivate
This class only contains static methods. -
Method Summary
Modifier and TypeMethodDescriptionstatic String
getTextFromPage
(PdfReader reader, int pageNumber) Extract text from a specified page using the default strategy.static String
getTextFromPage
(PdfReader reader, int pageNumber, TextExtractionStrategy strategy) Extract text from a specified page using an extraction strategy.static String
getTextFromPage
(PdfReader reader, int pageNumber, TextExtractionStrategy strategy, Map<String, ContentOperator> additionalContentOperators) Extract text from a specified page using an extraction strategy.
-
Constructor Details
-
PdfTextExtractor
private PdfTextExtractor()This class only contains static methods.
-
-
Method Details
-
getTextFromPage
public static String getTextFromPage(PdfReader reader, int pageNumber, TextExtractionStrategy strategy, Map<String, ContentOperator> additionalContentOperators) throws IOExceptionExtract text from a specified page using an extraction strategy. Also allows registration of custom ContentOperators- Parameters:
reader
- the reader to extract text frompageNumber
- the page to extract text fromstrategy
- the strategy to use for extracting textadditionalContentOperators
- an optional map of custom ContentOperators for rendering instructions- Returns:
- the extracted text
- Throws:
IOException
- if any operation fails while reading from the provided PdfReader
-
getTextFromPage
public static String getTextFromPage(PdfReader reader, int pageNumber, TextExtractionStrategy strategy) throws IOException Extract text from a specified page using an extraction strategy.- Parameters:
reader
- the reader to extract text frompageNumber
- the page to extract text fromstrategy
- the strategy to use for extracting text- Returns:
- the extracted text
- Throws:
IOException
- if any operation fails while reading from the provided PdfReader- Since:
- 5.0.2
-
getTextFromPage
Extract text from a specified page using the default strategy.Note: the default strategy is subject to change. If using a specific strategy is important, use
getTextFromPage(PdfReader, int, TextExtractionStrategy)
- Parameters:
reader
- the reader to extract text frompageNumber
- the page to extract text from- Returns:
- the extracted text
- Throws:
IOException
- if any operation fails while reading from the provided PdfReader- Since:
- 5.0.2
-