Class PdfTextExtractor


  • public final class PdfTextExtractor
    extends java.lang.Object
    • Constructor Detail

      • PdfTextExtractor

        private PdfTextExtractor()
    • Method Detail

      • getTextFromPage

        public static java.lang.String getTextFromPage​(PdfPage page,
                                                       ITextExtractionStrategy strategy,
                                                       java.util.Map<java.lang.String,​IContentOperator> additionalContentOperators)
        Extract text from a specified page using an extraction strategy. Also allows registration of custom IContentOperators that can influence how (and whether or not) the PDF instructions will be parsed. Extraction strategy must be passed as a new object for every single page.
        Parameters:
        page - the page for the text to be extracted from
        strategy - the strategy to use for extracting text
        additionalContentOperators - an optional map of custom IContentOperators for rendering instructions
        Returns:
        the extracted text
      • getTextFromPage

        public static java.lang.String getTextFromPage​(PdfPage page,
                                                       ITextExtractionStrategy strategy)
        Extract text from a specified page using an extraction strategy. Extraction strategy must be passed as a new object for every single page.
        Parameters:
        page - the page for the text to be extracted from
        strategy - the strategy to use for extracting text
        Returns:
        the extracted text
      • getTextFromPage

        public static java.lang.String getTextFromPage​(PdfPage page)
        Extract text from a specified page using the default strategy. Node: the default strategy is subject to change. If using a specific strategy is important, please use getTextFromPage(PdfPage, ITextExtractionStrategy).
        Parameters:
        page - the page for the text to be extracted from
        Returns:
        the extracted text