Class PdfTextExtractor
- java.lang.Object
-
- com.itextpdf.kernel.pdf.canvas.parser.PdfTextExtractor
-
public final class PdfTextExtractor extends java.lang.Object
-
-
Constructor Summary
Constructors Modifier Constructor Description private
PdfTextExtractor()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static java.lang.String
getTextFromPage(PdfPage page)
Extract text from a specified page using the default strategy.static java.lang.String
getTextFromPage(PdfPage page, ITextExtractionStrategy strategy)
Extract text from a specified page using an extraction strategy.static java.lang.String
getTextFromPage(PdfPage page, ITextExtractionStrategy strategy, java.util.Map<java.lang.String,IContentOperator> additionalContentOperators)
Extract text from a specified page using an extraction strategy.
-
-
-
Method Detail
-
getTextFromPage
public static java.lang.String getTextFromPage(PdfPage page, ITextExtractionStrategy strategy, java.util.Map<java.lang.String,IContentOperator> additionalContentOperators)
Extract text from a specified page using an extraction strategy. Also allows registration of custom IContentOperators that can influence how (and whether or not) the PDF instructions will be parsed. Extraction strategy must be passed as a new object for every single page.- Parameters:
page
- the page for the text to be extracted fromstrategy
- the strategy to use for extracting textadditionalContentOperators
- an optional map of customIContentOperator
s for rendering instructions- Returns:
- the extracted text
-
getTextFromPage
public static java.lang.String getTextFromPage(PdfPage page, ITextExtractionStrategy strategy)
Extract text from a specified page using an extraction strategy. Extraction strategy must be passed as a new object for every single page.- Parameters:
page
- the page for the text to be extracted fromstrategy
- the strategy to use for extracting text- Returns:
- the extracted text
-
getTextFromPage
public static java.lang.String getTextFromPage(PdfPage page)
Extract text from a specified page using the default strategy. Node: the default strategy is subject to change. If using a specific strategy is important, please usegetTextFromPage(PdfPage, ITextExtractionStrategy)
.- Parameters:
page
- the page for the text to be extracted from- Returns:
- the extracted text
-
-