Interface IMarkupHandler

All Superinterfaces:
IAttributeSequenceHandler, ICDATASectionHandler, ICommentHandler, IDocTypeHandler, IDocumentHandler, IElementHandler, IProcessingInstructionHandler, ITextHandler, IXMLDeclarationHandler
All Known Implementing Classes:
AbstractChainedMarkupHandler, AbstractMarkupHandler, AttributeSelectionMarkingMarkupHandler, BlockSelectorMarkupHandler, DiscardMarkupHandler, DOMBuilderMarkupHandler, DuplicateMarkupHandler, HtmlMarkupHandler, MarkupEventProcessorHandler, MinimizeHtmlMarkupHandler, NodeSelectorMarkupHandler, OutputMarkupHandler, PrettyHtmlMarkupHandler, SimplifierMarkupHandler, TextOutputMarkupHandler, TraceBuilderMarkupHandler

Interface to be implemented by all Markup Handlers.

Markup handlers are the objects that receive the events produced during parsing and perform the operations the users need. This interface is the main entry point to use AttoParser.

Markup handlers can be stateful, which means that a new instance of the markup handler class should be created for each parsing operation. In such case, it is not required that these implementations are thread-safe.

There is an abstract, basic, no-op implementation of this interface called AbstractMarkupHandler which can be used for easily creating new handlers by overriding only the relevant event handling methods.

Note also there is a simplified version of this interface that reduces the number of events and also simplifies the operations on textual buffers, called ISimpleMarkupHandler, which can be easily used with the convenience ad-hoc parser class SimpleMarkupParser.

AttoParser provides several useful implementations of this interface out-of-the-box:

Markup output

OutputMarkupHandler
For writing the received events to a specified Writer object, without any loss of information (case, whitespaces, etc.). This handler is useful for performing filtering/transformation operations on the parsed markup, placing this handler at the end of the handler chain so that it outputs the final results of such operation.
TextOutputMarkupHandler
For writing the received events to a specified Writer object as mere text, ignoring all non-text events. This will effectively strip all markup elements, comments, DOCTYPEs, etc. from the original markup.

Format conversion and transformation operations

DOMBuilderMarkupHandler
For building a DOM tree as a result of parsing a document. This DOM tree will be created using the classes at the org.attoparser.dom package. This handler can be more easily applied by using the convenience ad-hoc parser class DOMMarkupParser.
SimplifierMarkupHandler
For transforming the produced markup parsing events into a much simpler format, removing much of the complexity of these parsing events and allowing users to create their handlers by means of the ISimpleMarkupHandler interface. Note this handler can be more easily applied by using the convenience ad-hoc parser class SimpleMarkupParser.
MinimizeHtmlMarkupHandler
For minimizing (compacting) HTML markup: remove excess white space, unquote attributes, etc.

Fragment selection and event management

BlockSelectorMarkupHandler
For applying block selection (element + subtree) on the parsed markup, based on a set of specified markup selectors (see org.attoparser.select).
NodeSelectorMarkupHandler
For applying node selection (element, no subtree) on the parsed markup, based on a set of specified markup selectors (see org.attoparser.select).
AttributeSelectionMarkingMarkupHandler
For synthetically adding an attribute (with the specified name) to markup elements displaying which of the specified selectors (block or node) match those markup elements.
DuplicateMarkupHandler
For duplicating parsing events, sending each of them to two different implementations if IMarkupHandler.

Testing and Debugging

PrettyHtmlMarkupHandler
For creating an HTML document visually explaining all the events happened during the parsing of a document: elements, attributes, auto-closing of elements, unmatched artifacts, etc.
TraceBuilderMarkupHandler
For building a trace of parsing events (a list of MarkupTraceEvent objects) detailing all the events launched during the parsing of a document.
Since:
2.0.0
  • Method Details

    • setParseConfiguration

      void setParseConfiguration(ParseConfiguration parseConfiguration)

      Sets the ParseConfiguration object that will be used during the parsing operation. This object will normally have been specified to the parser object during its instantiation or initialization.

      This method is always called by the parser before calling any other event handling method.

      Note that this method can be safely ignored by most implementations, as there are very few scenarios in which this kind of interaction would be consisdered relevant.

      Parameters:
      parseConfiguration - the configuration object.
    • setParseStatus

      void setParseStatus(ParseStatus status)

      Sets the ParseStatus object that will be used during the parsing operation. This object can be used for instructing the parser about specific low-level conditions arisen during event handling.

      This method is always called by the parser before calling any other event handling method.

      Note that this method can be safely ignored by most implementations, as there are very few and very specific scenarios in which this kind of interaction with the parser would be needed. It is therefore mainly for internal use.

      Parameters:
      status - the status object.
    • setParseSelection

      void setParseSelection(ParseSelection selection)

      Sets the ParseSelection object that represents the different levels of selectors (if any) that are currently active for the fired events.

      This method is always called by the parser before calling any other event handling method.

      Note that this method can be safely ignored by most implementations, as there are very few scenarios in which this kind of interaction would be consisdered relevant.

      Parameters:
      selection - the selection object.