Package com.itextpdf.text.pdf.parser
Class PdfTextExtractor
- java.lang.Object
-
- com.itextpdf.text.pdf.parser.PdfTextExtractor
-
public final class PdfTextExtractor extends java.lang.Object
Extracts text from a PDF file.- Since:
- 2.1.4
-
-
Constructor Summary
Constructors Modifier Constructor Description private
PdfTextExtractor()
This class only contains static methods.
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static java.lang.String
getTextFromPage(PdfReader reader, int pageNumber)
Extract text from a specified page using the default strategy.static java.lang.String
getTextFromPage(PdfReader reader, int pageNumber, TextExtractionStrategy strategy)
Extract text from a specified page using an extraction strategy.static java.lang.String
getTextFromPage(PdfReader reader, int pageNumber, TextExtractionStrategy strategy, java.util.Map<java.lang.String,ContentOperator> additionalContentOperators)
Extract text from a specified page using an extraction strategy.
-
-
-
Method Detail
-
getTextFromPage
public static java.lang.String getTextFromPage(PdfReader reader, int pageNumber, TextExtractionStrategy strategy, java.util.Map<java.lang.String,ContentOperator> additionalContentOperators) throws java.io.IOException
Extract text from a specified page using an extraction strategy. Also allows registration of custom ContentOperators- Parameters:
reader
- the reader to extract text frompageNumber
- the page to extract text fromstrategy
- the strategy to use for extracting textadditionalContentOperators
- an optional map of custom ContentOperators for rendering instructions- Returns:
- the extracted text
- Throws:
java.io.IOException
- if any operation fails while reading from the provided PdfReader
-
getTextFromPage
public static java.lang.String getTextFromPage(PdfReader reader, int pageNumber, TextExtractionStrategy strategy) throws java.io.IOException
Extract text from a specified page using an extraction strategy.- Parameters:
reader
- the reader to extract text frompageNumber
- the page to extract text fromstrategy
- the strategy to use for extracting text- Returns:
- the extracted text
- Throws:
java.io.IOException
- if any operation fails while reading from the provided PdfReader- Since:
- 5.0.2
-
getTextFromPage
public static java.lang.String getTextFromPage(PdfReader reader, int pageNumber) throws java.io.IOException
Extract text from a specified page using the default strategy.Note: the default strategy is subject to change. If using a specific strategy is important, use
getTextFromPage(PdfReader, int, TextExtractionStrategy)
- Parameters:
reader
- the reader to extract text frompageNumber
- the page to extract text from- Returns:
- the extracted text
- Throws:
java.io.IOException
- if any operation fails while reading from the provided PdfReader- Since:
- 5.0.2
-
-