Package com.lowagie.text.pdf.parser
Class MarkedUpTextAssembler
- java.lang.Object
-
- com.lowagie.text.pdf.parser.MarkedUpTextAssembler
-
- All Implemented Interfaces:
TextAssembler
public class MarkedUpTextAssembler extends java.lang.Object implements TextAssembler
We'll get called on a variety of marked section content (perhaps including the results of nested sections), and will assemble it into an order as we can.
-
-
Field Summary
Fields Modifier and Type Field Description private ParsedTextImpl
inProgress
private int
page
private java.util.List<TextAssemblyBuffer>
partialWords
as we get new content (final or not), we accumulate it until we reach the end of a parsing unitprivate PdfReader
reader
(package private) java.util.List<FinalText>
result
our result may be partially processed already, in which case we'll just add things to it, once ready.private boolean
usePdfMarkupElements
private int
wordIdCounter
-
Constructor Summary
Constructors Constructor Description MarkedUpTextAssembler(PdfReader reader)
MarkedUpTextAssembler(PdfReader reader, boolean usePdfMarkupElements)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description private void
clearAccumulator()
private FinalText
concatenateResult(java.lang.String containingElementName)
FinalText
endParsingContext(java.lang.String containingElementName)
protected PdfReader
getReader()
Getter.java.lang.String
getWordId()
assembler can calculate an identifier for each word on a page, for use in markup.void
process(FinalText completed, java.lang.String contextName)
Slot fully-assembled chunk into our result at the current location.void
process(ParsedText unassembled, java.lang.String contextName)
Remember an unassembled chunk until we hit the end of this element, or we hit an assembled chunk, and need to pull things together.void
process(Word completed, java.lang.String contextName)
void
renderText(FinalText finalText)
void
renderText(ParsedTextImpl partialWord)
Captures text using a simplified algorithm for inserting hard returns and spacesvoid
reset()
void
setPage(int page)
-
-
-
Field Detail
-
result
java.util.List<FinalText> result
our result may be partially processed already, in which case we'll just add things to it, once ready.
-
reader
private PdfReader reader
-
inProgress
private ParsedTextImpl inProgress
-
page
private int page
-
wordIdCounter
private int wordIdCounter
-
usePdfMarkupElements
private boolean usePdfMarkupElements
-
partialWords
private java.util.List<TextAssemblyBuffer> partialWords
as we get new content (final or not), we accumulate it until we reach the end of a parsing unitEach parsing unit may have a tag name that should wrap its content
-
-
Method Detail
-
process
public void process(ParsedText unassembled, java.lang.String contextName)
Remember an unassembled chunk until we hit the end of this element, or we hit an assembled chunk, and need to pull things together.- Specified by:
process
in interfaceTextAssembler
- Parameters:
unassembled
- chunk of text rendering instruction to contribute to final textcontextName
- Name of the element context we are in. Null value if it's an Artifact.
-
process
public void process(FinalText completed, java.lang.String contextName)
Slot fully-assembled chunk into our result at the current location. If there are unassembled chunks waiting, assemble them first.- Specified by:
process
in interfaceTextAssembler
- Parameters:
completed
- This is a chunk from a nested elementcontextName
- Name of the element context we are in. Null value if it's an Artifact.
-
process
public void process(Word completed, java.lang.String contextName)
- Specified by:
process
in interfaceTextAssembler
- Parameters:
completed
- process a complete chunk -- just add this subsection into the proper place.contextName
- Name of the element context we are in. Null value if it's an Artifact.- See Also:
TextAssembler.process(Word, String)
-
clearAccumulator
private void clearAccumulator()
-
concatenateResult
private FinalText concatenateResult(java.lang.String containingElementName)
-
endParsingContext
public FinalText endParsingContext(java.lang.String containingElementName)
- Specified by:
endParsingContext
in interfaceTextAssembler
- Parameters:
containingElementName
- This is an element name to surround the extracted text- Returns:
- the final text for the set of fragments and fully parsed items we were passed during processing.
- See Also:
TextAssembler.endParsingContext(String)
-
reset
public void reset()
- Specified by:
reset
in interfaceTextAssembler
- See Also:
TextAssembler.reset()
-
renderText
public void renderText(FinalText finalText)
- Specified by:
renderText
in interfaceTextAssembler
- Parameters:
finalText
- process a complete chunk -- just add this subsection into the proper place.
-
renderText
public void renderText(ParsedTextImpl partialWord)
Captures text using a simplified algorithm for inserting hard returns and spaces- Specified by:
renderText
in interfaceTextAssembler
- Parameters:
partialWord
- process one of a number of raw pdf text chunks, with placement, font, etc.- See Also:
GraphicsState
,Matrix
-
getReader
protected PdfReader getReader()
Getter.- Returns:
- reader
-
setPage
public void setPage(int page)
- Specified by:
setPage
in interfaceTextAssembler
- Parameters:
page
- number of the page we are assembling- See Also:
TextAssembler.setPage(int)
-
getWordId
public java.lang.String getWordId()
assembler can calculate an identifier for each word on a page, for use in markup.- Specified by:
getWordId
in interfaceTextAssembler
- Returns:
- the new unique id.
- See Also:
TextAssembler.getWordId()
-
-