Class LocationTextExtractionStrategy

  • All Implemented Interfaces:
    RenderListener, TextExtractionStrategy

    public class LocationTextExtractionStrategy
    extends Object
    implements TextExtractionStrategy
    Development preview - this class (and all of the parser classes) are still experiencing heavy development, and are subject to change both behavior and interface.
    A text extraction renderer that keeps track of relative position of text on page The resultant text will be relatively consistent with the physical layout that most PDF files have on screen.
    This renderer keeps track of the orientation and distance (both perpendicular and parallel) to the unit vector of the orientation. Text is ordered by orientation, then perpendicular, then parallel distance. Text with the same perpendicular distance, but different parallel distance is treated as being on the same line.
    This renderer also uses a simple strategy based on the font metrics to determine if a blank space should be inserted into the output.
    Since:
    5.0.2
    • Constructor Detail

      • LocationTextExtractionStrategy

        public LocationTextExtractionStrategy()
        Creates a new text extraction renderer.
      • LocationTextExtractionStrategy

        public LocationTextExtractionStrategy​(LocationTextExtractionStrategy.TextChunkLocationStrategy strat)
        Creates a new text extraction renderer, with a custom strategy for creating new TextChunkLocation objects based on the input of the TextRenderInfo.
        Parameters:
        strat - the custom strategy