Class SimpleXMLParser


  • public final class SimpleXMLParser
    extends java.lang.Object
    A simple XML and HTML parser. This parser is, like the SAX parser, an event based parser, but with much less functionality.

    The parser can:

    • It recognizes the encoding used
    • It recognizes all the elements' start tags and end tags
    • It lists attributes, where attribute values can be enclosed in single or double quotes
    • It recognizes the <[CDATA[ ... ]]> construct
    • It recognizes the standard entities: &amp;, &lt;, &gt;, &quot;, and &apos;, as well as numeric entities
    • It maps lines ending in \r\n and \r to \n on input, in accordance with the XML Specification, Section 2.11
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private static int ATTRIBUTE_EQUAL  
      private static int ATTRIBUTE_KEY  
      private static int ATTRIBUTE_VALUE  
      (package private) java.lang.String attributekey
      the attribute key.
      (package private) java.util.Map<java.lang.String,​java.lang.String> attributes
      current attributes
      (package private) java.lang.String attributevalue
      the attribute value.
      private static int CDATA  
      (package private) int character
      The current character.
      (package private) int columns
      the column where the current character occurs
      (package private) SimpleXMLDocHandlerComment comment
      The handler to which we are going to forward comments.
      private static int COMMENT  
      (package private) SimpleXMLDocHandler doc
      The handler to which we are going to forward document content
      (package private) java.lang.StringBuffer entity
      current entity (whatever is encountered between & and ;)
      private static int ENTITY  
      (package private) boolean eol
      was the last character equivalent to a newline?
      private static int EXAMIN_TAG  
      (package private) boolean html
      Are we parsing HTML?
      private static int IN_CLOSETAG  
      (package private) int lines
      the line we are currently reading
      (package private) int nested
      Keeps track of the number of tags that are open.
      (package private) boolean nowhite
      A boolean indicating if the next character should be taken into account if it's a space character.
      private static int PI  
      (package private) int previousCharacter
      The previous character.
      private static int QUOTE  
      (package private) int quoteCharacter
      the quote character that was used to open the quote.
      private static int SINGLE_TAG  
      (package private) java.util.Stack<java.lang.Integer> stack
      the state stack
      (package private) int state
      the current state
      (package private) java.lang.String tag
      current tagname
      private static int TAG_ENCOUNTERED  
      private static int TAG_EXAMINED  
      (package private) java.lang.StringBuffer text
      current text (whatever is encountered between tags)
      private static int TEXT  
      private static int UNKNOWN
      possible states
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      private static java.util.Optional<java.nio.charset.Charset> detectCharsetFromBOM​(byte[] bom)
      Detect charset from BOM, as per Unicode FAQ.
      private void doTag()
      Sets the name of the tag.
      private void flush()
      Flushes the text that is currently in the buffer.
      private static java.lang.String getDeclaredEncoding​(java.lang.String decl)  
      private void go​(java.io.Reader r)
      Does the actual parsing.
      private void initTag()
      Initialized the tag name and attributes.
      static void parse​(SimpleXMLDocHandler doc, SimpleXMLDocHandlerComment comment, java.io.Reader r, boolean html)
      Parses the XML document firing the events to the handler.
      static void parse​(SimpleXMLDocHandler doc, java.io.InputStream in)
      Parses the XML document firing the events to the handler.
      static void parse​(SimpleXMLDocHandler doc, java.io.Reader r)  
      private void processTag​(boolean start)
      processes the tag.
      private int restoreState()
      Gets a state from the stack
      private void saveState​(int s)
      Adds a state to the stack.
      private void throwException​(java.lang.String s)
      Throws an exception
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • stack

        java.util.Stack<java.lang.Integer> stack
        the state stack
      • character

        int character
        The current character.
      • previousCharacter

        int previousCharacter
        The previous character.
      • lines

        int lines
        the line we are currently reading
      • columns

        int columns
        the column where the current character occurs
      • eol

        boolean eol
        was the last character equivalent to a newline?
      • nowhite

        boolean nowhite
        A boolean indicating if the next character should be taken into account if it's a space character. When nospace is false, the previous character wasn't whitespace.
        Since:
        2.1.5
      • state

        int state
        the current state
      • html

        boolean html
        Are we parsing HTML?
      • text

        java.lang.StringBuffer text
        current text (whatever is encountered between tags)
      • entity

        java.lang.StringBuffer entity
        current entity (whatever is encountered between & and ;)
      • tag

        java.lang.String tag
        current tagname
      • attributes

        java.util.Map<java.lang.String,​java.lang.String> attributes
        current attributes
      • nested

        int nested
        Keeps track of the number of tags that are open.
      • quoteCharacter

        int quoteCharacter
        the quote character that was used to open the quote.
      • attributekey

        java.lang.String attributekey
        the attribute key.
      • attributevalue

        java.lang.String attributevalue
        the attribute value.
    • Method Detail

      • parse

        public static void parse​(SimpleXMLDocHandler doc,
                                 SimpleXMLDocHandlerComment comment,
                                 java.io.Reader r,
                                 boolean html)
                          throws java.io.IOException
        Parses the XML document firing the events to the handler.
        Parameters:
        doc - the document handler
        comment - comment
        r - the document. The encoding is already resolved. The reader is not closed
        html - html
        Throws:
        java.io.IOException - on error
      • detectCharsetFromBOM

        private static java.util.Optional<java.nio.charset.Charset> detectCharsetFromBOM​(byte[] bom)
        Detect charset from BOM, as per Unicode FAQ.
      • parse

        public static void parse​(SimpleXMLDocHandler doc,
                                 java.io.InputStream in)
                          throws java.io.IOException
        Parses the XML document firing the events to the handler.
        Parameters:
        doc - the document handler
        in - the document. The encoding is deduced from the stream. The stream is not closed
        Throws:
        java.io.IOException - on error
      • getDeclaredEncoding

        private static java.lang.String getDeclaredEncoding​(java.lang.String decl)
      • parse

        public static void parse​(SimpleXMLDocHandler doc,
                                 java.io.Reader r)
                          throws java.io.IOException
        Throws:
        java.io.IOException
      • go

        private void go​(java.io.Reader r)
                 throws java.io.IOException
        Does the actual parsing. Perform this immediately after creating the parser object.
        Throws:
        java.io.IOException
      • restoreState

        private int restoreState()
        Gets a state from the stack
        Returns:
        the previous state
      • saveState

        private void saveState​(int s)
        Adds a state to the stack.
        Parameters:
        s - a state to add to the stack
      • flush

        private void flush()
        Flushes the text that is currently in the buffer. The text can be ignored, added to the document as content or as comment,... depending on the current state.
      • initTag

        private void initTag()
        Initialized the tag name and attributes.
      • doTag

        private void doTag()
        Sets the name of the tag.
      • processTag

        private void processTag​(boolean start)
        processes the tag.
        Parameters:
        start - if true we are dealing with a tag that has just been opened; if false we are closing a tag.
      • throwException

        private void throwException​(java.lang.String s)
                             throws java.io.IOException
        Throws an exception
        Throws:
        java.io.IOException