Class LocationTextExtractionStrategy

    • Field Detail

      • DUMP_STATE

        private static boolean DUMP_STATE
        set to true for debugging
      • locationalResult

        private final java.util.List<TextChunk> locationalResult
        a summary of all found text
      • useActualText

        private boolean useActualText
      • rightToLeftRunDirection

        private boolean rightToLeftRunDirection
    • Constructor Detail

      • LocationTextExtractionStrategy

        public LocationTextExtractionStrategy()
        Creates a new text extraction renderer.
      • LocationTextExtractionStrategy

        public LocationTextExtractionStrategy​(LocationTextExtractionStrategy.ITextChunkLocationStrategy strat)
        Creates a new text extraction renderer, with a custom strategy for creating new TextChunkLocation objects based on the input of the TextRenderInfo.
        Parameters:
        strat - the custom strategy
    • Method Detail

      • setUseActualText

        public LocationTextExtractionStrategy setUseActualText​(boolean useActualText)
        Changes the behavior of text extraction so that if the parameter is set to true, /ActualText marked content property will be used instead of raw decoded bytes. Beware: the logic is not stable yet.
        Parameters:
        useActualText - true to use /ActualText, false otherwise
        Returns:
        this object
      • setRightToLeftRunDirection

        public LocationTextExtractionStrategy setRightToLeftRunDirection​(boolean rightToLeftRunDirection)
        Sets if text flows from left to right or from right to left. Call this method with true argument for extracting Arabic, Hebrew or other text with right-to-left writing direction.
        Parameters:
        rightToLeftRunDirection - value specifying whether the direction should be right to left
        Returns:
        this object
      • isUseActualText

        public boolean isUseActualText()
        Gets the value of the property which determines if /ActualText will be used when extracting the text
        Returns:
        true if /ActualText value is used, false otherwise
      • eventOccurred

        public void eventOccurred​(IEventData data,
                                  EventType type)
        Description copied from interface: IEventListener
        Called when some event occurs during parsing a content stream.
        Specified by:
        eventOccurred in interface IEventListener
        Parameters:
        data - Combines the data required for processing corresponding event type.
        type - Event type.
      • getSupportedEvents

        public java.util.Set<EventType> getSupportedEvents()
        Description copied from interface: IEventListener
        Provides the set of event types this listener supports. Returns null if all possible event types are supported.
        Specified by:
        getSupportedEvents in interface IEventListener
        Returns:
        Set of event types supported by this listener or null if all possible event types are supported.
      • isChunkAtWordBoundary

        protected boolean isChunkAtWordBoundary​(TextChunk chunk,
                                                TextChunk previousChunk)
        Determines if a space character should be inserted between a previous chunk and the current chunk. This method is exposed as a callback so subclasses can fine time the algorithm for determining whether a space should be inserted or not. By default, this method will insert a space if the there is a gap of more than half the font space character width between the end of the previous chunk and the beginning of the current chunk. It will also indicate that a space is needed if the starting point of the new chunk appears *before* the end of the previous chunk (i.e. overlapping text).
        Parameters:
        chunk - the new chunk being evaluated
        previousChunk - the chunk that appeared immediately before the current chunk
        Returns:
        true if the two chunks represent different words (i.e. should have a space between them). False otherwise.
      • startsWithSpace

        private boolean startsWithSpace​(java.lang.String str)
        Checks if the string starts with a space character, false if the string is empty or starts with a non-space character.
        Parameters:
        str - the string to be checked
        Returns:
        true if the string starts with a space character, false if the string is empty or starts with a non-space character
      • endsWithSpace

        private boolean endsWithSpace​(java.lang.String str)
        Checks if the string ends with a space character, false if the string is empty or ends with a non-space character
        Parameters:
        str - the string to be checked
        Returns:
        true if the string ends with a space character, false if the string is empty or ends with a non-space character
      • dumpState

        private void dumpState()
        Used for debugging only
      • findLastTagWithActualText

        private CanvasTag findLastTagWithActualText​(java.util.List<CanvasTag> canvasTagHierarchy)
      • sortWithMarks

        private void sortWithMarks​(java.util.List<TextChunk> textChunks)