Class LocationTextExtractionStrategy
- java.lang.Object
-
- com.itextpdf.kernel.pdf.canvas.parser.listener.LocationTextExtractionStrategy
-
- All Implemented Interfaces:
IEventListener
,ITextExtractionStrategy
public class LocationTextExtractionStrategy extends java.lang.Object implements ITextExtractionStrategy
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static interface
LocationTextExtractionStrategy.ITextChunkLocationStrategy
private static class
LocationTextExtractionStrategy.ITextChunkLocationStrategyImpl
private static class
LocationTextExtractionStrategy.TextChunkMarks
-
Field Summary
Fields Modifier and Type Field Description private static boolean
DUMP_STATE
set to true for debuggingprivate TextRenderInfo
lastTextRenderInfo
private java.util.List<TextChunk>
locationalResult
a summary of all found textprivate boolean
rightToLeftRunDirection
private LocationTextExtractionStrategy.ITextChunkLocationStrategy
tclStrat
private boolean
useActualText
-
Constructor Summary
Constructors Constructor Description LocationTextExtractionStrategy()
Creates a new text extraction renderer.LocationTextExtractionStrategy(LocationTextExtractionStrategy.ITextChunkLocationStrategy strat)
Creates a new text extraction renderer, with a custom strategy for creating new TextChunkLocation objects based on the input of the TextRenderInfo.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description private void
dumpState()
Used for debugging onlyprivate boolean
endsWithSpace(java.lang.String str)
Checks if the string ends with a space character, false if the string is empty or ends with a non-space charactervoid
eventOccurred(IEventData data, EventType type)
Called when some event occurs during parsing a content stream.private CanvasTag
findLastTagWithActualText(java.util.List<CanvasTag> canvasTagHierarchy)
java.lang.String
getResultantText()
Returns the text that has been processed so far.java.util.Set<EventType>
getSupportedEvents()
Provides the set of event types this listener supports.protected boolean
isChunkAtWordBoundary(TextChunk chunk, TextChunk previousChunk)
Determines if a space character should be inserted between a previous chunk and the current chunk.boolean
isUseActualText()
Gets the value of the property which determines if /ActualText will be used when extracting the textLocationTextExtractionStrategy
setRightToLeftRunDirection(boolean rightToLeftRunDirection)
Sets if text flows from left to right or from right to left.LocationTextExtractionStrategy
setUseActualText(boolean useActualText)
Changes the behavior of text extraction so that if the parameter is set totrue
, /ActualText marked content property will be used instead of raw decoded bytes.private void
sortWithMarks(java.util.List<TextChunk> textChunks)
private boolean
startsWithSpace(java.lang.String str)
Checks if the string starts with a space character, false if the string is empty or starts with a non-space character.
-
-
-
Field Detail
-
DUMP_STATE
private static boolean DUMP_STATE
set to true for debugging
-
locationalResult
private final java.util.List<TextChunk> locationalResult
a summary of all found text
-
tclStrat
private final LocationTextExtractionStrategy.ITextChunkLocationStrategy tclStrat
-
useActualText
private boolean useActualText
-
rightToLeftRunDirection
private boolean rightToLeftRunDirection
-
lastTextRenderInfo
private TextRenderInfo lastTextRenderInfo
-
-
Constructor Detail
-
LocationTextExtractionStrategy
public LocationTextExtractionStrategy()
Creates a new text extraction renderer.
-
LocationTextExtractionStrategy
public LocationTextExtractionStrategy(LocationTextExtractionStrategy.ITextChunkLocationStrategy strat)
Creates a new text extraction renderer, with a custom strategy for creating new TextChunkLocation objects based on the input of the TextRenderInfo.- Parameters:
strat
- the custom strategy
-
-
Method Detail
-
setUseActualText
public LocationTextExtractionStrategy setUseActualText(boolean useActualText)
Changes the behavior of text extraction so that if the parameter is set totrue
, /ActualText marked content property will be used instead of raw decoded bytes. Beware: the logic is not stable yet.- Parameters:
useActualText
- true to use /ActualText, false otherwise- Returns:
- this object
-
setRightToLeftRunDirection
public LocationTextExtractionStrategy setRightToLeftRunDirection(boolean rightToLeftRunDirection)
Sets if text flows from left to right or from right to left. Call this method withtrue
argument for extracting Arabic, Hebrew or other text with right-to-left writing direction.- Parameters:
rightToLeftRunDirection
- value specifying whether the direction should be right to left- Returns:
- this object
-
isUseActualText
public boolean isUseActualText()
Gets the value of the property which determines if /ActualText will be used when extracting the text- Returns:
- true if /ActualText value is used, false otherwise
-
eventOccurred
public void eventOccurred(IEventData data, EventType type)
Description copied from interface:IEventListener
Called when some event occurs during parsing a content stream.- Specified by:
eventOccurred
in interfaceIEventListener
- Parameters:
data
- Combines the data required for processing corresponding event type.type
- Event type.
-
getSupportedEvents
public java.util.Set<EventType> getSupportedEvents()
Description copied from interface:IEventListener
Provides the set of event types this listener supports. Returns null if all possible event types are supported.- Specified by:
getSupportedEvents
in interfaceIEventListener
- Returns:
- Set of event types supported by this listener or null if all possible event types are supported.
-
getResultantText
public java.lang.String getResultantText()
Description copied from interface:ITextExtractionStrategy
Returns the text that has been processed so far.- Specified by:
getResultantText
in interfaceITextExtractionStrategy
- Returns:
String
instance with the current resultant text
-
isChunkAtWordBoundary
protected boolean isChunkAtWordBoundary(TextChunk chunk, TextChunk previousChunk)
Determines if a space character should be inserted between a previous chunk and the current chunk. This method is exposed as a callback so subclasses can fine time the algorithm for determining whether a space should be inserted or not. By default, this method will insert a space if the there is a gap of more than half the font space character width between the end of the previous chunk and the beginning of the current chunk. It will also indicate that a space is needed if the starting point of the new chunk appears *before* the end of the previous chunk (i.e. overlapping text).- Parameters:
chunk
- the new chunk being evaluatedpreviousChunk
- the chunk that appeared immediately before the current chunk- Returns:
- true if the two chunks represent different words (i.e. should have a space between them). False otherwise.
-
startsWithSpace
private boolean startsWithSpace(java.lang.String str)
Checks if the string starts with a space character, false if the string is empty or starts with a non-space character.- Parameters:
str
- the string to be checked- Returns:
- true if the string starts with a space character, false if the string is empty or starts with a non-space character
-
endsWithSpace
private boolean endsWithSpace(java.lang.String str)
Checks if the string ends with a space character, false if the string is empty or ends with a non-space character- Parameters:
str
- the string to be checked- Returns:
- true if the string ends with a space character, false if the string is empty or ends with a non-space character
-
dumpState
private void dumpState()
Used for debugging only
-
findLastTagWithActualText
private CanvasTag findLastTagWithActualText(java.util.List<CanvasTag> canvasTagHierarchy)
-
sortWithMarks
private void sortWithMarks(java.util.List<TextChunk> textChunks)
-
-