Package com.lowagie.text.pdf.parser
Class MarkedUpTextAssembler
java.lang.Object
com.lowagie.text.pdf.parser.MarkedUpTextAssembler
- All Implemented Interfaces:
TextAssembler
We'll get called on a variety of marked section content (perhaps including the results of nested sections), and will
assemble it into an order as we can.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate ParsedTextImpl
private int
private List
<TextAssemblyBuffer> as we get new content (final or not), we accumulate it until we reach the end of a parsing unitprivate PdfReader
our result may be partially processed already, in which case we'll just add things to it, once ready.private boolean
private int
-
Constructor Summary
ConstructorsConstructorDescriptionMarkedUpTextAssembler
(PdfReader reader) MarkedUpTextAssembler
(PdfReader reader, boolean usePdfMarkupElements) -
Method Summary
Modifier and TypeMethodDescriptionprivate void
private FinalText
concatenateResult
(String containingElementName) endParsingContext
(String containingElementName) protected PdfReader
Getter.assembler can calculate an identifier for each word on a page, for use in markup.void
Slot fully-assembled chunk into our result at the current location.void
process
(ParsedText unassembled, String contextName) Remember an unassembled chunk until we hit the end of this element, or we hit an assembled chunk, and need to pull things together.void
void
renderText
(FinalText finalText) void
renderText
(ParsedTextImpl partialWord) Captures text using a simplified algorithm for inserting hard returns and spacesvoid
reset()
void
setPage
(int page)
-
Field Details
-
result
our result may be partially processed already, in which case we'll just add things to it, once ready. -
reader
-
inProgress
-
page
private int page -
wordIdCounter
private int wordIdCounter -
usePdfMarkupElements
private boolean usePdfMarkupElements -
partialWords
as we get new content (final or not), we accumulate it until we reach the end of a parsing unitEach parsing unit may have a tag name that should wrap its content
-
-
Constructor Details
-
Method Details
-
process
Remember an unassembled chunk until we hit the end of this element, or we hit an assembled chunk, and need to pull things together.- Specified by:
process
in interfaceTextAssembler
- Parameters:
unassembled
- chunk of text rendering instruction to contribute to final textcontextName
- Name of the element context we are in. Null value if it's an Artifact.
-
process
Slot fully-assembled chunk into our result at the current location. If there are unassembled chunks waiting, assemble them first.- Specified by:
process
in interfaceTextAssembler
- Parameters:
completed
- This is a chunk from a nested elementcontextName
- Name of the element context we are in. Null value if it's an Artifact.
-
process
- Specified by:
process
in interfaceTextAssembler
- Parameters:
completed
- process a complete chunk -- just add this subsection into the proper place.contextName
- Name of the element context we are in. Null value if it's an Artifact.- See Also:
-
clearAccumulator
private void clearAccumulator() -
concatenateResult
-
endParsingContext
- Specified by:
endParsingContext
in interfaceTextAssembler
- Parameters:
containingElementName
- This is an element name to surround the extracted text- Returns:
- the final text for the set of fragments and fully parsed items we were passed during processing.
- See Also:
-
reset
public void reset()- Specified by:
reset
in interfaceTextAssembler
- See Also:
-
renderText
- Specified by:
renderText
in interfaceTextAssembler
- Parameters:
finalText
- process a complete chunk -- just add this subsection into the proper place.
-
renderText
Captures text using a simplified algorithm for inserting hard returns and spaces- Specified by:
renderText
in interfaceTextAssembler
- Parameters:
partialWord
- process one of a number of raw pdf text chunks, with placement, font, etc.- See Also:
-
getReader
Getter.- Returns:
- reader
-
setPage
public void setPage(int page) - Specified by:
setPage
in interfaceTextAssembler
- Parameters:
page
- number of the page we are assembling- See Also:
-
getWordId
assembler can calculate an identifier for each word on a page, for use in markup.- Specified by:
getWordId
in interfaceTextAssembler
- Returns:
- the new unique id.
- See Also:
-