Class MarkedUpTextAssembler

java.lang.Object
com.lowagie.text.pdf.parser.MarkedUpTextAssembler
All Implemented Interfaces:
TextAssembler

public class MarkedUpTextAssembler extends Object implements TextAssembler
We'll get called on a variety of marked section content (perhaps including the results of nested sections), and will assemble it into an order as we can.
  • Field Details

    • result

      List<FinalText> result
      our result may be partially processed already, in which case we'll just add things to it, once ready.
    • reader

      private PdfReader reader
    • inProgress

      private ParsedTextImpl inProgress
    • page

      private int page
    • wordIdCounter

      private int wordIdCounter
    • usePdfMarkupElements

      private boolean usePdfMarkupElements
    • partialWords

      private List<TextAssemblyBuffer> partialWords
      as we get new content (final or not), we accumulate it until we reach the end of a parsing unit

      Each parsing unit may have a tag name that should wrap its content

  • Constructor Details

    • MarkedUpTextAssembler

      MarkedUpTextAssembler(PdfReader reader)
    • MarkedUpTextAssembler

      MarkedUpTextAssembler(PdfReader reader, boolean usePdfMarkupElements)
  • Method Details

    • process

      public void process(ParsedText unassembled, String contextName)
      Remember an unassembled chunk until we hit the end of this element, or we hit an assembled chunk, and need to pull things together.
      Specified by:
      process in interface TextAssembler
      Parameters:
      unassembled - chunk of text rendering instruction to contribute to final text
      contextName - Name of the element context we are in. Null value if it's an Artifact.
    • process

      public void process(FinalText completed, String contextName)
      Slot fully-assembled chunk into our result at the current location. If there are unassembled chunks waiting, assemble them first.
      Specified by:
      process in interface TextAssembler
      Parameters:
      completed - This is a chunk from a nested element
      contextName - Name of the element context we are in. Null value if it's an Artifact.
    • process

      public void process(Word completed, String contextName)
      Specified by:
      process in interface TextAssembler
      Parameters:
      completed - process a complete chunk -- just add this subsection into the proper place.
      contextName - Name of the element context we are in. Null value if it's an Artifact.
      See Also:
    • clearAccumulator

      private void clearAccumulator()
    • concatenateResult

      private FinalText concatenateResult(String containingElementName)
    • endParsingContext

      public FinalText endParsingContext(String containingElementName)
      Specified by:
      endParsingContext in interface TextAssembler
      Parameters:
      containingElementName - This is an element name to surround the extracted text
      Returns:
      the final text for the set of fragments and fully parsed items we were passed during processing.
      See Also:
    • reset

      public void reset()
      Specified by:
      reset in interface TextAssembler
      See Also:
    • renderText

      public void renderText(FinalText finalText)
      Specified by:
      renderText in interface TextAssembler
      Parameters:
      finalText - process a complete chunk -- just add this subsection into the proper place.
    • renderText

      public void renderText(ParsedTextImpl partialWord)
      Captures text using a simplified algorithm for inserting hard returns and spaces
      Specified by:
      renderText in interface TextAssembler
      Parameters:
      partialWord - process one of a number of raw pdf text chunks, with placement, font, etc.
      See Also:
    • getReader

      protected PdfReader getReader()
      Getter.
      Returns:
      reader
    • setPage

      public void setPage(int page)
      Specified by:
      setPage in interface TextAssembler
      Parameters:
      page - number of the page we are assembling
      See Also:
    • getWordId

      public String getWordId()
      assembler can calculate an identifier for each word on a page, for use in markup.
      Specified by:
      getWordId in interface TextAssembler
      Returns:
      the new unique id.
      See Also: