Class MinimizeHtmlMarkupHandler

  • All Implemented Interfaces:
    IAttributeSequenceHandler, ICDATASectionHandler, ICommentHandler, IDocTypeHandler, IDocumentHandler, IElementHandler, IMarkupHandler, IProcessingInstructionHandler, ITextHandler, IXMLDeclarationHandler

    public final class MinimizeHtmlMarkupHandler
    extends AbstractChainedMarkupHandler

    Implementation of IMarkupHandler used for minimizing (compacting) HTML markup.

    The minimization operations that can be performed are:

    • White Space minimization: excess white space is removed from texts. Also, white space is removed from between attributes in a tag, and between block and structural tags (which would ignore it).
    • Unquoting of attributes: quotes are removed from attributes which values contain only alphanumeric characters.
    • Collapsing of boolean attributes: boolean attributes (selected, required, disabled, etc.) are detected and collapsed to their no-value form (e.g. selected="selected" -> selected).
    • Standalone minimized elements are de-minimized to save the slash char (e.g. <meta /> -> <meta>).

    Note that, though theoretically possible per the HTML rules, no tags are created or removed during minimization in order to ensure the lowest impact (ideally zero, except for text white space) on the DOM of the resulting markup.

    Two minimization modes are available: MinimizeHtmlMarkupHandler.MinimizeMode.ONLY_WHITE_SPACE and MinimizeHtmlMarkupHandler.MinimizeMode.COMPLETE. The former only minimizes white space whereas the latter performs all the available minimizations.

    Note that, as with most handlers, this class is not thread-safe. Also, instances of this class should not be reused across parsing operations.

    Sample usage:

    
    
       final Writer writer = new StringWriter();
    
       // The output handler will be the last in the handler chain
       final IMarkupHandler outputHandler = new OutputMarkupHandler(writer);
    
       // The minimizer handler will do its job before events reach output handler
       final IMarkupHandler handler = new MinimizeHtmlMarkupHandler(MinimizeMode.COMPLETE, outputHandler);
    
       parser.parse(document, handler);
    
       return writer.toString();
    
     
    Since:
    2.0.0
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      private static boolean canAttributeValueBeUnquoted​(char[] buffer, int valueContentOffset, int valueContentLen, int valueOuterOffset, int valueOuterLen)  
      private void flushPendingInterBlockElementWhiteSpace​(boolean ignore)  
      void handleAttribute​(char[] buffer, int nameOffset, int nameLen, int nameLine, int nameCol, int operatorOffset, int operatorLen, int operatorLine, int operatorCol, int valueContentOffset, int valueContentLen, int valueOuterOffset, int valueOuterLen, int valueLine, int valueCol)
      Called when an attribute is found.
      void handleAutoCloseElementEnd​(char[] buffer, int nameOffset, int nameLen, int line, int col)
      Called for signaling the end of an auto-close element, created for balancing an unclosed tag.
      void handleAutoCloseElementStart​(char[] buffer, int nameOffset, int nameLen, int line, int col)
      Called for signaling the start of an auto-close element (a synthetic close tag), created for balancing an unclosed tag.
      void handleAutoOpenElementEnd​(char[] buffer, int nameOffset, int nameLen, int line, int col)
      Called for signaling the end of an auto-open element (a synthetic open tag), created for adapting parsed markup to a specification such as, for example, HTML5.
      void handleAutoOpenElementStart​(char[] buffer, int nameOffset, int nameLen, int line, int col)
      Called for signaling the start of an auto-open element (a synthetic open tag), created for adapting parsed markup to a specification such as, for example, HTML5.
      void handleCDATASection​(char[] buffer, int contentOffset, int contentLen, int outerOffset, int outerLen, int line, int col)
      Called when a CDATA section is found.
      void handleCloseElementEnd​(char[] buffer, int nameOffset, int nameLen, int line, int col)
      Called when the end of a close element (a close tag) is found.
      void handleCloseElementStart​(char[] buffer, int nameOffset, int nameLen, int line, int col)
      Called when the start of a close element (a close tag) is found.
      void handleComment​(char[] buffer, int contentOffset, int contentLen, int outerOffset, int outerLen, int line, int col)
      Called when a comment is found.
      void handleDocType​(char[] buffer, int keywordOffset, int keywordLen, int keywordLine, int keywordCol, int elementNameOffset, int elementNameLen, int elementNameLine, int elementNameCol, int typeOffset, int typeLen, int typeLine, int typeCol, int publicIdOffset, int publicIdLen, int publicIdLine, int publicIdCol, int systemIdOffset, int systemIdLen, int systemIdLine, int systemIdCol, int internalSubsetOffset, int internalSubsetLen, int internalSubsetLine, int internalSubsetCol, int outerOffset, int outerLen, int outerLine, int outerCol)
      Called when a DOCTYPE clause is found.
      void handleDocumentEnd​(long endTimeNanos, long totalTimeNanos, int line, int col)
      Called at the end of document parsing.
      void handleDocumentStart​(long startTimeNanos, int line, int col)
      Called at the beginning of document parsing.
      void handleInnerWhiteSpace​(char[] buffer, int offset, int len, int line, int col)
      Called when an amount of white space is found inside an element.
      void handleOpenElementEnd​(char[] buffer, int nameOffset, int nameLen, int line, int col)
      Called when the end of an open element (an open tag) is found.
      void handleOpenElementStart​(char[] buffer, int nameOffset, int nameLen, int line, int col)
      Called when an open element (an open tag) is found.
      void handleProcessingInstruction​(char[] buffer, int targetOffset, int targetLen, int targetLine, int targetCol, int contentOffset, int contentLen, int contentLine, int contentCol, int outerOffset, int outerLen, int line, int col)
      Called when a Processing Instruction is found.
      void handleStandaloneElementEnd​(char[] buffer, int nameOffset, int nameLen, boolean minimized, int line, int col)
      Called when the end of a standalone element (an element with no closing tag) is found
      void handleStandaloneElementStart​(char[] buffer, int nameOffset, int nameLen, boolean minimized, int line, int col)
      Called when a standalone element (an element with no closing tag) is found.
      void handleText​(char[] buffer, int offset, int len, int line, int col)
      Called when a text artifact is found.
      void handleUnmatchedCloseElementEnd​(char[] buffer, int nameOffset, int nameLen, int line, int col)
      Called when the end of an unmatched close element (close tag) is found.
      void handleUnmatchedCloseElementStart​(char[] buffer, int nameOffset, int nameLen, int line, int col)
      Called when the start of an unmatched close element (close tag) is found.
      void handleXmlDeclaration​(char[] buffer, int keywordOffset, int keywordLen, int keywordLine, int keywordCol, int versionOffset, int versionLen, int versionLine, int versionCol, int encodingOffset, int encodingLen, int encodingLine, int encodingCol, int standaloneOffset, int standaloneLen, int standaloneLine, int standaloneCol, int outerOffset, int outerLen, int line, int col)
      Called when a XML Declaration is found.
      private static boolean isBlockElement​(char[] buffer, int nameOffset, int nameLen)  
      private static boolean isBooleanAttribute​(char[] buffer, int nameOffset, int nameLen)  
      private static boolean isPreformattedElement​(char[] buffer, int nameOffset, int nameLen)  
      private static boolean isWhitespace​(char c)  
      void setParseConfiguration​(ParseConfiguration parseConfiguration)
      Sets the ParseConfiguration object that will be used during the parsing operation.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • BLOCK_ELEMENTS

        private static final java.lang.String[] BLOCK_ELEMENTS
      • PREFORMATTED_ELEMENTS

        private static final java.lang.String[] PREFORMATTED_ELEMENTS
      • BOOLEAN_ATTRIBUTE_NAMES

        private static final java.lang.String[] BOOLEAN_ATTRIBUTE_NAMES
      • SIZE_ONE_WHITE_SPACE

        private static final char[] SIZE_ONE_WHITE_SPACE
      • ATTRIBUTE_OPERATOR

        private static final char[] ATTRIBUTE_OPERATOR
      • internalBuffer

        private char[] internalBuffer
      • lastTextEndedInWhiteSpace

        private boolean lastTextEndedInWhiteSpace
      • lastOpenElementWasBlock

        private boolean lastOpenElementWasBlock
      • lastClosedElementWasBlock

        private boolean lastClosedElementWasBlock
      • lastVisibleEventWasElement

        private boolean lastVisibleEventWasElement
      • pendingInterBlockElementWhiteSpace

        private boolean pendingInterBlockElementWhiteSpace
      • inPreformattedElement

        private boolean inPreformattedElement
      • pendingEventLine

        private int pendingEventLine
      • pendingEventCol

        private int pendingEventCol
    • Constructor Detail

      • MinimizeHtmlMarkupHandler

        public MinimizeHtmlMarkupHandler​(MinimizeHtmlMarkupHandler.MinimizeMode minimizeMode,
                                         IMarkupHandler next)

        Creates a new instance of this handler, specifying the minimization mode and the handler to which minimized events will be delegated.

        Parameters:
        minimizeMode - the minimization mode to be used.
        next - the handler to which events will be delegated after minimization.
    • Method Detail

      • setParseConfiguration

        public void setParseConfiguration​(ParseConfiguration parseConfiguration)
        Description copied from interface: IMarkupHandler

        Sets the ParseConfiguration object that will be used during the parsing operation. This object will normally have been specified to the parser object during its instantiation or initialization.

        This method is always called by the parser before calling any other event handling method.

        Note that this method can be safely ignored by most implementations, as there are very few scenarios in which this kind of interaction would be consisdered relevant.

        Specified by:
        setParseConfiguration in interface IMarkupHandler
        Overrides:
        setParseConfiguration in class AbstractChainedMarkupHandler
        Parameters:
        parseConfiguration - the configuration object.
      • handleDocumentStart

        public void handleDocumentStart​(long startTimeNanos,
                                        int line,
                                        int col)
                                 throws ParseException
        Description copied from interface: IDocumentHandler

        Called at the beginning of document parsing.

        Specified by:
        handleDocumentStart in interface IDocumentHandler
        Overrides:
        handleDocumentStart in class AbstractChainedMarkupHandler
        Parameters:
        startTimeNanos - the current time (in nanoseconds) obtained when parsing starts.
        line - the line of the document where parsing starts (usually number 1).
        col - the column of the document where parsing starts (usually number 1).
        Throws:
        ParseException - if any exceptions occur during handling.
      • handleDocumentEnd

        public void handleDocumentEnd​(long endTimeNanos,
                                      long totalTimeNanos,
                                      int line,
                                      int col)
                               throws ParseException
        Description copied from interface: IDocumentHandler

        Called at the end of document parsing.

        Specified by:
        handleDocumentEnd in interface IDocumentHandler
        Overrides:
        handleDocumentEnd in class AbstractChainedMarkupHandler
        Parameters:
        endTimeNanos - the current time (in nanoseconds) obtained when parsing ends.
        totalTimeNanos - the difference between current times at the start and end of parsing (in nanoseconds).
        line - the line of the document where parsing ends (usually the last one).
        col - the column of the document where the parsing ends (usually the last one).
        Throws:
        ParseException - if any exceptions occur during handling.
      • handleText

        public void handleText​(char[] buffer,
                               int offset,
                               int len,
                               int line,
                               int col)
                        throws ParseException
        Description copied from interface: ITextHandler

        Called when a text artifact is found.

        A sequence of chars is considered to be text when no structures of any kind are contained inside it. In markup parsers, for example, this means no tags (a.k.a. elements), DOCTYPE's, processing instructions, etc. are contained in the sequence.

        Text sequences might include any number of new line and/or control characters.

        Text artifacts are reported using the document buffer directly, and this buffer should not be considered to be immutable, so reported texts should be copied if they need to be stored (either by copying len chars from the buffer char[] starting in offset or by creating a String from it using the same specification).

        Implementations of this handler should never modify the document buffer.

        Specified by:
        handleText in interface ITextHandler
        Overrides:
        handleText in class AbstractChainedMarkupHandler
        Parameters:
        buffer - the document buffer (not copied)
        offset - the offset (position in buffer) where the text artifact starts.
        len - the length (in chars) of the text artifact, starting in offset.
        line - the line in the original document where this text artifact starts.
        col - the column in the original document where this text artifact starts.
        Throws:
        ParseException - if any exceptions occur during handling.
      • flushPendingInterBlockElementWhiteSpace

        private void flushPendingInterBlockElementWhiteSpace​(boolean ignore)
                                                      throws ParseException
        Throws:
        ParseException
      • handleComment

        public void handleComment​(char[] buffer,
                                  int contentOffset,
                                  int contentLen,
                                  int outerOffset,
                                  int outerLen,
                                  int line,
                                  int col)
                           throws ParseException
        Description copied from interface: ICommentHandler

        Called when a comment is found.

        Two [offset, len] pairs are provided for two partitions (outer and content):

        <!-- this is a comment -->
        |   [CONTENT----------]  |
        [OUTER-------------------]

        Artifacts are reported using the document buffer directly, and this buffer should not be considered to be immutable, so reported structures should be copied if they need to be stored (either by copying len chars from the buffer char[] starting in offset or by creating a String from it using the same specification).

        Implementations of this handler should never modify the document buffer.

        Specified by:
        handleComment in interface ICommentHandler
        Overrides:
        handleComment in class AbstractChainedMarkupHandler
        Parameters:
        buffer - the document buffer (not copied)
        contentOffset - offset for the content partition.
        contentLen - length of the content partition.
        outerOffset - offset for the outer partition.
        outerLen - length of the outer partition.
        line - the line in the original document where this artifact starts.
        col - the column in the original document where this artifact starts.
        Throws:
        ParseException - if any exceptions occur during handling.
      • handleCDATASection

        public void handleCDATASection​(char[] buffer,
                                       int contentOffset,
                                       int contentLen,
                                       int outerOffset,
                                       int outerLen,
                                       int line,
                                       int col)
                                throws ParseException
        Description copied from interface: ICDATASectionHandler

        Called when a CDATA section is found.

        Two [offset, len] pairs are provided for two partitions (outer and content):

        <![CDATA[ this is a CDATA section ]]>
        |        [CONTENT----------------]  |
        [OUTER------------------------------]

        Artifacts are reported using the document buffer directly, and this buffer should not be considered to be immutable, so reported structures should be copied if they need to be stored (either by copying len chars from the buffer char[] starting in offset or by creating a String from it using the same specification).

        Implementations of this handler should never modify the document buffer.

        Specified by:
        handleCDATASection in interface ICDATASectionHandler
        Overrides:
        handleCDATASection in class AbstractChainedMarkupHandler
        Parameters:
        buffer - the document buffer (not copied)
        contentOffset - offset for the content partition.
        contentLen - length of the content partition.
        outerOffset - offset for the outer partition.
        outerLen - length of the outer partition.
        line - the line in the original document where this artifact starts.
        col - the column in the original document where this artifact starts.
        Throws:
        ParseException - if any exceptions occur during handling.
      • handleStandaloneElementStart

        public void handleStandaloneElementStart​(char[] buffer,
                                                 int nameOffset,
                                                 int nameLen,
                                                 boolean minimized,
                                                 int line,
                                                 int col)
                                          throws ParseException
        Description copied from interface: IElementHandler

        Called when a standalone element (an element with no closing tag) is found. The name of the element is also reported.

        Artifacts are reported using the document buffer directly, and this buffer should not be considered to be immutable, so reported structures should be copied if they need to be stored (either by copying len chars from the buffer char[] starting in offset or by creating a String from it using the same specification).

        Implementations of this handler should never modify the document buffer.

        Specified by:
        handleStandaloneElementStart in interface IElementHandler
        Overrides:
        handleStandaloneElementStart in class AbstractChainedMarkupHandler
        Parameters:
        buffer - the document buffer (not copied)
        nameOffset - the offset (position in buffer) where the element name appears.
        nameLen - the length (in chars) of the element name.
        minimized - whether the element has been found minimized (<element/>)in code or not.
        line - the line in the original document where this artifact starts.
        col - the column in the original document where this artifact starts.
        Throws:
        ParseException - if any exceptions occur during handling.
      • handleStandaloneElementEnd

        public void handleStandaloneElementEnd​(char[] buffer,
                                               int nameOffset,
                                               int nameLen,
                                               boolean minimized,
                                               int line,
                                               int col)
                                        throws ParseException
        Description copied from interface: IElementHandler

        Called when the end of a standalone element (an element with no closing tag) is found

        Artifacts are reported using the document buffer directly, and this buffer should not be considered to be immutable, so reported structures should be copied if they need to be stored (either by copying len chars from the buffer char[] starting in offset or by creating a String from it using the same specification).

        Implementations of this handler should never modify the document buffer.

        Specified by:
        handleStandaloneElementEnd in interface IElementHandler
        Overrides:
        handleStandaloneElementEnd in class AbstractChainedMarkupHandler
        Parameters:
        buffer - the document buffer (not copied)
        nameOffset - the offset (position in buffer) where the element name appears.
        nameLen - the length (in chars) of the element name.
        minimized - whether the element has been found minimized (<element/>)in code or not.
        line - the line in the original document where the element ending structure appears.
        col - the column in the original document where the element ending structure appears.
        Throws:
        ParseException - if any exceptions occur during handling.
      • handleOpenElementStart

        public void handleOpenElementStart​(char[] buffer,
                                           int nameOffset,
                                           int nameLen,
                                           int line,
                                           int col)
                                    throws ParseException
        Description copied from interface: IElementHandler

        Called when an open element (an open tag) is found. The name of the element is also reported.

        Artifacts are reported using the document buffer directly, and this buffer should not be considered to be immutable, so reported structures should be copied if they need to be stored (either by copying len chars from the buffer char[] starting in offset or by creating a String from it using the same specification).

        Implementations of this handler should never modify the document buffer.

        Specified by:
        handleOpenElementStart in interface IElementHandler
        Overrides:
        handleOpenElementStart in class AbstractChainedMarkupHandler
        Parameters:
        buffer - the document buffer (not copied)
        nameOffset - the offset (position in buffer) where the element name appears.
        nameLen - the length (in chars) of the element name.
        line - the line in the original document where this artifact starts.
        col - the column in the original document where this artifact starts.
        Throws:
        ParseException - if any exceptions occur during handling.
      • handleOpenElementEnd

        public void handleOpenElementEnd​(char[] buffer,
                                         int nameOffset,
                                         int nameLen,
                                         int line,
                                         int col)
                                  throws ParseException
        Description copied from interface: IElementHandler

        Called when the end of an open element (an open tag) is found.

        Artifacts are reported using the document buffer directly, and this buffer should not be considered to be immutable, so reported structures should be copied if they need to be stored (either by copying len chars from the buffer char[] starting in offset or by creating a String from it using the same specification).

        Implementations of this handler should never modify the document buffer.

        Specified by:
        handleOpenElementEnd in interface IElementHandler
        Overrides:
        handleOpenElementEnd in class AbstractChainedMarkupHandler
        Parameters:
        buffer - the document buffer (not copied)
        nameOffset - the offset (position in buffer) where the element name appears.
        nameLen - the length (in chars) of the element name.
        line - the line in the original document where the element ending structure appears.
        col - the column in the original document where the element ending structure appears.
        Throws:
        ParseException - if any exceptions occur during handling.
      • handleAutoOpenElementStart

        public void handleAutoOpenElementStart​(char[] buffer,
                                               int nameOffset,
                                               int nameLen,
                                               int line,
                                               int col)
                                        throws ParseException
        Description copied from interface: IElementHandler

        Called for signaling the start of an auto-open element (a synthetic open tag), created for adapting parsed markup to a specification such as, for example, HTML5. The name of the element is also reported.

        Artifacts are reported using the document buffer directly, and this buffer should not be considered to be immutable, so reported structures should be copied if they need to be stored (either by copying len chars from the buffer char[] starting in offset or by creating a String from it using the same specification).

        Implementations of this handler should never modify the document buffer.

        Specified by:
        handleAutoOpenElementStart in interface IElementHandler
        Overrides:
        handleAutoOpenElementStart in class AbstractChainedMarkupHandler
        Parameters:
        buffer - the document buffer (not copied)
        nameOffset - the offset (position in buffer) where the element name appears.
        nameLen - the length (in chars) of the element name.
        line - the line in the original document where this artifact starts.
        col - the column in the original document where this artifact starts.
        Throws:
        ParseException - if any exceptions occur during handling.
      • handleAutoOpenElementEnd

        public void handleAutoOpenElementEnd​(char[] buffer,
                                             int nameOffset,
                                             int nameLen,
                                             int line,
                                             int col)
                                      throws ParseException
        Description copied from interface: IElementHandler

        Called for signaling the end of an auto-open element (a synthetic open tag), created for adapting parsed markup to a specification such as, for example, HTML5. The name of the element is also reported.

        Artifacts are reported using the document buffer directly, and this buffer should not be considered to be immutable, so reported structures should be copied if they need to be stored (either by copying len chars from the buffer char[] starting in offset or by creating a String from it using the same specification).

        Implementations of this handler should never modify the document buffer.

        Specified by:
        handleAutoOpenElementEnd in interface IElementHandler
        Overrides:
        handleAutoOpenElementEnd in class AbstractChainedMarkupHandler
        Parameters:
        buffer - the document buffer (not copied)
        nameOffset - the offset (position in buffer) where the element name appears.
        nameLen - the length (in chars) of the element name.
        line - the line in the original document where the element ending structure appears.
        col - the column in the original document where the element ending structure appears.
        Throws:
        ParseException - if any exceptions occur during handling.
      • handleCloseElementStart

        public void handleCloseElementStart​(char[] buffer,
                                            int nameOffset,
                                            int nameLen,
                                            int line,
                                            int col)
                                     throws ParseException
        Description copied from interface: IElementHandler

        Called when the start of a close element (a close tag) is found. The name of the element is also reported.

        Artifacts are reported using the document buffer directly, and this buffer should not be considered to be immutable, so reported structures should be copied if they need to be stored (either by copying len chars from the buffer char[] starting in offset or by creating a String from it using the same specification).

        Implementations of this handler should never modify the document buffer.

        Specified by:
        handleCloseElementStart in interface IElementHandler
        Overrides:
        handleCloseElementStart in class AbstractChainedMarkupHandler
        Parameters:
        buffer - the document buffer (not copied)
        nameOffset - the offset (position in buffer) where the element name appears.
        nameLen - the length (in chars) of the element name.
        line - the line in the original document where this artifact starts.
        col - the column in the original document where this artifact starts.
        Throws:
        ParseException - if any exceptions occur during handling.
      • handleCloseElementEnd

        public void handleCloseElementEnd​(char[] buffer,
                                          int nameOffset,
                                          int nameLen,
                                          int line,
                                          int col)
                                   throws ParseException
        Description copied from interface: IElementHandler

        Called when the end of a close element (a close tag) is found.

        Artifacts are reported using the document buffer directly, and this buffer should not be considered to be immutable, so reported structures should be copied if they need to be stored (either by copying len chars from the buffer char[] starting in offset or by creating a String from it using the same specification).

        Implementations of this handler should never modify the document buffer.

        Specified by:
        handleCloseElementEnd in interface IElementHandler
        Overrides:
        handleCloseElementEnd in class AbstractChainedMarkupHandler
        Parameters:
        buffer - the document buffer (not copied)
        nameOffset - the offset (position in buffer) where the element name appears.
        nameLen - the length (in chars) of the element name.
        line - the line in the original document where the element ending structure appears.
        col - the column in the original document where the element ending structure appears.
        Throws:
        ParseException - if any exceptions occur during handling.
      • handleAutoCloseElementStart

        public void handleAutoCloseElementStart​(char[] buffer,
                                                int nameOffset,
                                                int nameLen,
                                                int line,
                                                int col)
                                         throws ParseException
        Description copied from interface: IElementHandler

        Called for signaling the start of an auto-close element (a synthetic close tag), created for balancing an unclosed tag. The name of the element is also reported.

        Artifacts are reported using the document buffer directly, and this buffer should not be considered to be immutable, so reported structures should be copied if they need to be stored (either by copying len chars from the buffer char[] starting in offset or by creating a String from it using the same specification).

        Implementations of this handler should never modify the document buffer.

        Specified by:
        handleAutoCloseElementStart in interface IElementHandler
        Overrides:
        handleAutoCloseElementStart in class AbstractChainedMarkupHandler
        Parameters:
        buffer - the document buffer (not copied)
        nameOffset - the offset (position in buffer) where the element name appears.
        nameLen - the length (in chars) of the element name.
        line - the line in the original document where this artifact starts.
        col - the column in the original document where this artifact starts.
        Throws:
        ParseException - if any exceptions occur during handling.
      • handleAutoCloseElementEnd

        public void handleAutoCloseElementEnd​(char[] buffer,
                                              int nameOffset,
                                              int nameLen,
                                              int line,
                                              int col)
                                       throws ParseException
        Description copied from interface: IElementHandler

        Called for signaling the end of an auto-close element, created for balancing an unclosed tag.

        Artifacts are reported using the document buffer directly, and this buffer should not be considered to be immutable, so reported structures should be copied if they need to be stored (either by copying len chars from the buffer char[] starting in offset or by creating a String from it using the same specification).

        Implementations of this handler should never modify the document buffer.

        Specified by:
        handleAutoCloseElementEnd in interface IElementHandler
        Overrides:
        handleAutoCloseElementEnd in class AbstractChainedMarkupHandler
        Parameters:
        buffer - the document buffer (not copied)
        nameOffset - the offset (position in buffer) where the element name appears.
        nameLen - the length (in chars) of the element name.
        line - the line in the original document where the element ending structure appears.
        col - the column in the original document where the element ending structure appears.
        Throws:
        ParseException - if any exceptions occur during handling.
      • handleUnmatchedCloseElementStart

        public void handleUnmatchedCloseElementStart​(char[] buffer,
                                                     int nameOffset,
                                                     int nameLen,
                                                     int line,
                                                     int col)
                                              throws ParseException
        Description copied from interface: IElementHandler

        Called when the start of an unmatched close element (close tag) is found. The name of the element is also reported.

        Artifacts are reported using the document buffer directly, and this buffer should not be considered to be immutable, so reported structures should be copied if they need to be stored (either by copying len chars from the buffer char[] starting in offset or by creating a String from it using the same specification).

        Implementations of this handler should never modify the document buffer.

        Specified by:
        handleUnmatchedCloseElementStart in interface IElementHandler
        Overrides:
        handleUnmatchedCloseElementStart in class AbstractChainedMarkupHandler
        Parameters:
        buffer - the document buffer (not copied)
        nameOffset - the offset (position in buffer) where the element name appears.
        nameLen - the length (in chars) of the element name.
        line - the line in the original document where this artifact starts.
        col - the column in the original document where this artifact starts.
        Throws:
        ParseException - if any exceptions occur during handling.
      • handleUnmatchedCloseElementEnd

        public void handleUnmatchedCloseElementEnd​(char[] buffer,
                                                   int nameOffset,
                                                   int nameLen,
                                                   int line,
                                                   int col)
                                            throws ParseException
        Description copied from interface: IElementHandler

        Called when the end of an unmatched close element (close tag) is found.

        Artifacts are reported using the document buffer directly, and this buffer should not be considered to be immutable, so reported structures should be copied if they need to be stored (either by copying len chars from the buffer char[] starting in offset or by creating a String from it using the same specification).

        Implementations of this handler should never modify the document buffer.

        Specified by:
        handleUnmatchedCloseElementEnd in interface IElementHandler
        Overrides:
        handleUnmatchedCloseElementEnd in class AbstractChainedMarkupHandler
        Parameters:
        buffer - the document buffer (not copied)
        nameOffset - the offset (position in buffer) where the element name appears.
        nameLen - the length (in chars) of the element name.
        line - the line in the original document where the element ending structure appears.
        col - the column in the original document where the element ending structure appears.
        Throws:
        ParseException - if any exceptions occur during handling.
      • handleAttribute

        public void handleAttribute​(char[] buffer,
                                    int nameOffset,
                                    int nameLen,
                                    int nameLine,
                                    int nameCol,
                                    int operatorOffset,
                                    int operatorLen,
                                    int operatorLine,
                                    int operatorCol,
                                    int valueContentOffset,
                                    int valueContentLen,
                                    int valueOuterOffset,
                                    int valueOuterLen,
                                    int valueLine,
                                    int valueCol)
                             throws ParseException
        Description copied from interface: IAttributeSequenceHandler

        Called when an attribute is found.

        Three [offset, len] pairs are provided for three partitions (name, operator, valueContent and valueOuter):

        class="basic_column"
        [NAM]* [VALUECONTE]| (*) = [OPERATOR]
        |     [VALUEOUTER--]
        [OUTER-------------]

        Artifacts are reported using the document buffer directly, and this buffer should not be considered to be immutable, so reported structures should be copied if they need to be stored (either by copying len chars from the buffer char[] starting in offset or by creating a String from it using the same specification).

        Implementations of this handler should never modify the document buffer.

        Specified by:
        handleAttribute in interface IAttributeSequenceHandler
        Overrides:
        handleAttribute in class AbstractChainedMarkupHandler
        Parameters:
        buffer - the document buffer (not copied)
        nameOffset - offset for the name partition.
        nameLen - length of the name partition.
        nameLine - the line in the original document where the name partition starts.
        nameCol - the column in the original document where the name partition starts.
        operatorOffset - offset for the operator partition.
        operatorLen - length of the operator partition.
        operatorLine - the line in the original document where the operator partition starts.
        operatorCol - the column in the original document where the operator partition starts.
        valueContentOffset - offset for the valueContent partition.
        valueContentLen - length of the valueContent partition.
        valueOuterOffset - offset for the valueOuter partition.
        valueOuterLen - length of the valueOuter partition.
        valueLine - the line in the original document where the value (outer) partition starts.
        valueCol - the column in the original document where the value (outer) partition starts.
        Throws:
        ParseException - if any exceptions occur during handling.
      • handleInnerWhiteSpace

        public void handleInnerWhiteSpace​(char[] buffer,
                                          int offset,
                                          int len,
                                          int line,
                                          int col)
                                   throws ParseException
        Description copied from interface: IAttributeSequenceHandler

        Called when an amount of white space is found inside an element.

        This attribute separators can contain any amount of whitespace, including line feeds:

        <div id="main"        class="basic_column">
                      [INNWSP]

        Artifacts are reported using the document buffer directly, and this buffer should not be considered to be immutable, so reported structures should be copied if they need to be stored (either by copying len chars from the buffer char[] starting in offset or by creating a String from it using the same specification).

        Implementations of this handler should never modify the document buffer.

        Specified by:
        handleInnerWhiteSpace in interface IAttributeSequenceHandler
        Overrides:
        handleInnerWhiteSpace in class AbstractChainedMarkupHandler
        Parameters:
        buffer - the document buffer (not copied)
        offset - offset for the artifact.
        len - length of the artifact.
        line - the line in the original document where the artifact starts.
        col - the column in the original document where the artifact starts.
        Throws:
        ParseException - if any exceptions occur during handling.
      • handleDocType

        public void handleDocType​(char[] buffer,
                                  int keywordOffset,
                                  int keywordLen,
                                  int keywordLine,
                                  int keywordCol,
                                  int elementNameOffset,
                                  int elementNameLen,
                                  int elementNameLine,
                                  int elementNameCol,
                                  int typeOffset,
                                  int typeLen,
                                  int typeLine,
                                  int typeCol,
                                  int publicIdOffset,
                                  int publicIdLen,
                                  int publicIdLine,
                                  int publicIdCol,
                                  int systemIdOffset,
                                  int systemIdLen,
                                  int systemIdLine,
                                  int systemIdCol,
                                  int internalSubsetOffset,
                                  int internalSubsetLen,
                                  int internalSubsetLine,
                                  int internalSubsetCol,
                                  int outerOffset,
                                  int outerLen,
                                  int outerLine,
                                  int outerCol)
                           throws ParseException
        Description copied from interface: IDocTypeHandler

        Called when a DOCTYPE clause is found.

        This method reports the DOCTYPE clause splitting it into its different parts.

        Seven [offset, len] pairs are provided for seven partitions (outer, keyword, elementName, type, publicId, systemId and internalSubset) of the DOCTYPE clause:

        <!DOCTYPE html PUBLIC ".........." ".........." [................]>
        | [KEYWO] [EN] [TYPE]  [PUBLICID]   [SYSTEMID]   [INTERNALSUBSET] |
        [OUTER------------------------------------------------------------]

        Artifacts are reported using the document buffer directly, and this buffer should not be considered to be immutable, so reported structures should be copied if they need to be stored (either by copying len chars from the buffer char[] starting in offset or by creating a String from it using the same specification).

        Implementations of this handler should never modify the document buffer.

        Specified by:
        handleDocType in interface IDocTypeHandler
        Overrides:
        handleDocType in class AbstractChainedMarkupHandler
        Parameters:
        buffer - the document buffer (not copied)
        keywordOffset - offset for the keyword partition.
        keywordLen - length of the keyword partition.
        keywordLine - the line in the original document where the keyword partition starts.
        keywordCol - the column in the original document where the keyword partition starts.
        elementNameOffset - offset for the elementName partition.
        elementNameLen - length of the elementName partition.
        elementNameLine - the line in the original document where the elementName partition starts.
        elementNameCol - the column in the original document where the elementName partition starts.
        typeOffset - offset for the type partition.
        typeLen - length of the type partition.
        typeLine - the line in the original document where the type partition starts.
        typeCol - the column in the original document where the type partition starts.
        publicIdOffset - offset for the publicId partition.
        publicIdLen - length of the publicId partition.
        publicIdLine - the line in the original document where the publicId partition starts.
        publicIdCol - the column in the original document where the publicId partition starts.
        systemIdOffset - offset for the systemId partition.
        systemIdLen - length of the systemId partition.
        systemIdLine - the line in the original document where the systemId partition starts.
        systemIdCol - the column in the original document where the systemId partition starts.
        internalSubsetOffset - offset for the internalSubsetId partition.
        internalSubsetLen - length of the internalSubsetId partition.
        internalSubsetLine - the line in the original document where the internalSubsetId partition starts.
        internalSubsetCol - the column in the original document where the internalSubsetId partition starts.
        outerOffset - offset for the outer partition.
        outerLen - length of the outer partition.
        outerLine - the line in the original document where this artifact starts.
        outerCol - the column in the original document where this artifact starts.
        Throws:
        ParseException - if any exceptions occur during handling.
      • handleXmlDeclaration

        public void handleXmlDeclaration​(char[] buffer,
                                         int keywordOffset,
                                         int keywordLen,
                                         int keywordLine,
                                         int keywordCol,
                                         int versionOffset,
                                         int versionLen,
                                         int versionLine,
                                         int versionCol,
                                         int encodingOffset,
                                         int encodingLen,
                                         int encodingLine,
                                         int encodingCol,
                                         int standaloneOffset,
                                         int standaloneLen,
                                         int standaloneLine,
                                         int standaloneCol,
                                         int outerOffset,
                                         int outerLen,
                                         int line,
                                         int col)
                                  throws ParseException
        Description copied from interface: IXMLDeclarationHandler

        Called when a XML Declaration is found.

        Five [offset, len] pairs are provided for five partitions (outer, keyword, version, encoding and standalone):

        <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
        | [K]          [V]            [ENC]              [S]  |
        [OUTER------------------------------------------------]

        Artifacts are reported using the document buffer directly, and this buffer should not be considered to be immutable, so reported structures should be copied if they need to be stored (either by copying len chars from the buffer char[] starting in offset or by creating a String from it using the same specification).

        Implementations of this handler should never modify the document buffer.

        Specified by:
        handleXmlDeclaration in interface IXMLDeclarationHandler
        Overrides:
        handleXmlDeclaration in class AbstractChainedMarkupHandler
        Parameters:
        buffer - the document buffer (not copied)
        keywordOffset - offset for the keyword partition.
        keywordLen - length of the keyword partition.
        keywordLine - the line in the original document where the keyword partition starts.
        keywordCol - the column in the original document where the keyword partition starts.
        versionOffset - offset for the version partition.
        versionLen - length of the version partition.
        versionLine - the line in the original document where the version partition starts.
        versionCol - the column in the original document where the version partition starts.
        encodingOffset - offset for the encoding partition.
        encodingLen - length of the encoding partition.
        encodingLine - the line in the original document where the encoding partition starts.
        encodingCol - the column in the original document where the encoding partition starts.
        standaloneOffset - offset for the standalone partition.
        standaloneLen - length of the standalone partition.
        standaloneLine - the line in the original document where the standalone partition starts.
        standaloneCol - the column in the original document where the standalone partition starts.
        outerOffset - offset for the outer partition.
        outerLen - length of the outer partition.
        line - the line in the original document where this artifact starts.
        col - the column in the original document where this artifact starts.
        Throws:
        ParseException - if any exceptions occur during handling.
      • handleProcessingInstruction

        public void handleProcessingInstruction​(char[] buffer,
                                                int targetOffset,
                                                int targetLen,
                                                int targetLine,
                                                int targetCol,
                                                int contentOffset,
                                                int contentLen,
                                                int contentLine,
                                                int contentCol,
                                                int outerOffset,
                                                int outerLen,
                                                int line,
                                                int col)
                                         throws ParseException
        Description copied from interface: IProcessingInstructionHandler

        Called when a Processing Instruction is found.

        Three [offset, len] pairs are provided for three partitions (outer, target and content):

        <?xls-stylesheet somePar1="a" somePar2="b"?>
        | [TARGET------] [CONTENT----------------] |
        [OUTER-------------------------------------]

        Note that, although XML Declarations have the same format as processing instructions, they are not considered as such and therefore are handled through a different handling method.

        Artifacts are reported using the document buffer directly, and this buffer should not be considered to be immutable, so reported structures should be copied if they need to be stored (either by copying len chars from the buffer char[] starting in offset or by creating a String from it using the same specification).

        Implementations of this handler should never modify the document buffer.

        Specified by:
        handleProcessingInstruction in interface IProcessingInstructionHandler
        Overrides:
        handleProcessingInstruction in class AbstractChainedMarkupHandler
        Parameters:
        buffer - the document buffer (not copied)
        targetOffset - offset for the target partition.
        targetLen - length of the target partition.
        targetLine - the line in the original document where the target partition starts.
        targetCol - the column in the original document where the target partition starts.
        contentOffset - offset for the content partition.
        contentLen - length of the content partition.
        contentLine - the line in the original document where the content partition starts.
        contentCol - the column in the original document where the content partition starts.
        outerOffset - offset for the outer partition.
        outerLen - length of the outer partition.
        line - the line in the original document where this artifact starts.
        col - the column in the original document where this artifact starts.
        Throws:
        ParseException - if any exceptions occur during handling.
      • canAttributeValueBeUnquoted

        private static boolean canAttributeValueBeUnquoted​(char[] buffer,
                                                           int valueContentOffset,
                                                           int valueContentLen,
                                                           int valueOuterOffset,
                                                           int valueOuterLen)
      • isWhitespace

        private static boolean isWhitespace​(char c)
      • isBlockElement

        private static boolean isBlockElement​(char[] buffer,
                                              int nameOffset,
                                              int nameLen)
      • isPreformattedElement

        private static boolean isPreformattedElement​(char[] buffer,
                                                     int nameOffset,
                                                     int nameLen)
      • isBooleanAttribute

        private static boolean isBooleanAttribute​(char[] buffer,
                                                  int nameOffset,
                                                  int nameLen)