Class TextOutputMarkupHandler

java.lang.Object
org.attoparser.AbstractMarkupHandler
org.attoparser.output.TextOutputMarkupHandler
All Implemented Interfaces:
IAttributeSequenceHandler, ICDATASectionHandler, ICommentHandler, IDocTypeHandler, IDocumentHandler, IElementHandler, IMarkupHandler, IProcessingInstructionHandler, ITextHandler, IXMLDeclarationHandler

public final class TextOutputMarkupHandler extends AbstractMarkupHandler

Implementation of IMarkupHandler used for writing received parsing events as text output, by ignoring all events except the Text ones. This means this handler will effectively strip all markup tags (and other structures like comments, CDATA, etc.) away.

Note that, as with most handlers, this class is not thread-safe. Also, instances of this class should not be reused across parsing operations.

Sample usage:


   final Writer writer = new StringWriter();
   final IMarkupHandler handler = new TextOutputMarkupHandler(writer);
   parser.parse(document, handler);
   return writer.toString();
 
Since:
2.0.0
  • Field Details

    • writer

      private final Writer writer
  • Constructor Details

    • TextOutputMarkupHandler

      public TextOutputMarkupHandler(Writer writer)

      Creates a new instance of this handler.

      Parameters:
      writer - the writer to which output will be written.
  • Method Details

    • handleText

      public void handleText(char[] buffer, int offset, int len, int line, int col) throws ParseException
      Description copied from interface: ITextHandler

      Called when a text artifact is found.

      A sequence of chars is considered to be text when no structures of any kind are contained inside it. In markup parsers, for example, this means no tags (a.k.a. elements), DOCTYPE's, processing instructions, etc. are contained in the sequence.

      Text sequences might include any number of new line and/or control characters.

      Text artifacts are reported using the document buffer directly, and this buffer should not be considered to be immutable, so reported texts should be copied if they need to be stored (either by copying len chars from the buffer char[] starting in offset or by creating a String from it using the same specification).

      Implementations of this handler should never modify the document buffer.

      Specified by:
      handleText in interface ITextHandler
      Overrides:
      handleText in class AbstractMarkupHandler
      Parameters:
      buffer - the document buffer (not copied)
      offset - the offset (position in buffer) where the text artifact starts.
      len - the length (in chars) of the text artifact, starting in offset.
      line - the line in the original document where this text artifact starts.
      col - the column in the original document where this text artifact starts.
      Throws:
      ParseException - if any exceptions occur during handling.