Class TextOutputMarkupHandler
- All Implemented Interfaces:
IAttributeSequenceHandler
,ICDATASectionHandler
,ICommentHandler
,IDocTypeHandler
,IDocumentHandler
,IElementHandler
,IMarkupHandler
,IProcessingInstructionHandler
,ITextHandler
,IXMLDeclarationHandler
Implementation of IMarkupHandler
used for writing received parsing events as text output,
by ignoring all events except the Text ones. This means this handler will effectively strip all
markup tags (and other structures like comments, CDATA, etc.) away.
Note that, as with most handlers, this class is not thread-safe. Also, instances of this class should not be reused across parsing operations.
Sample usage:
final Writer writer = new StringWriter();
final IMarkupHandler handler = new TextOutputMarkupHandler(writer);
parser.parse(document, handler);
return writer.toString();
- Since:
- 2.0.0
-
Field Summary
Fields -
Constructor Summary
ConstructorsConstructorDescriptionTextOutputMarkupHandler
(Writer writer) Creates a new instance of this handler. -
Method Summary
Modifier and TypeMethodDescriptionvoid
handleText
(char[] buffer, int offset, int len, int line, int col) Called when a text artifact is found.Methods inherited from class org.attoparser.AbstractMarkupHandler
handleAttribute, handleAutoCloseElementEnd, handleAutoCloseElementStart, handleAutoOpenElementEnd, handleAutoOpenElementStart, handleCDATASection, handleCloseElementEnd, handleCloseElementStart, handleComment, handleDocType, handleDocumentEnd, handleDocumentStart, handleInnerWhiteSpace, handleOpenElementEnd, handleOpenElementStart, handleProcessingInstruction, handleStandaloneElementEnd, handleStandaloneElementStart, handleUnmatchedCloseElementEnd, handleUnmatchedCloseElementStart, handleXmlDeclaration, setParseConfiguration, setParseSelection, setParseStatus
-
Field Details
-
writer
-
-
Constructor Details
-
TextOutputMarkupHandler
Creates a new instance of this handler.
- Parameters:
writer
- the writer to which output will be written.
-
-
Method Details
-
handleText
Description copied from interface:ITextHandler
Called when a text artifact is found.
A sequence of chars is considered to be text when no structures of any kind are contained inside it. In markup parsers, for example, this means no tags (a.k.a. elements), DOCTYPE's, processing instructions, etc. are contained in the sequence.
Text sequences might include any number of new line and/or control characters.
Text artifacts are reported using the document buffer directly, and this buffer should not be considered to be immutable, so reported texts should be copied if they need to be stored (either by copying len chars from the buffer char[] starting in offset or by creating a String from it using the same specification).
Implementations of this handler should never modify the document buffer.
- Specified by:
handleText
in interfaceITextHandler
- Overrides:
handleText
in classAbstractMarkupHandler
- Parameters:
buffer
- the document buffer (not copied)offset
- the offset (position in buffer) where the text artifact starts.len
- the length (in chars) of the text artifact, starting in offset.line
- the line in the original document where this text artifact starts.col
- the column in the original document where this text artifact starts.- Throws:
ParseException
- if any exceptions occur during handling.
-