Class SimpleXMLParser


  • public class SimpleXMLParser
    extends java.lang.Object
    A simple XML and HTML parser. This parser is, like the SAX parser, an event based parser, but with much less functionality.

    The parser can:

    • It recognizes the encoding used
    • It recognizes all the elements' start tags and end tags
    • It lists attributes, where attribute values can be enclosed in single or double quotes
    • It recognizes the <[CDATA[ ... ]]> construct
    • It recognizes the standard entities: &amp;, &lt;, &gt;, &quot;, and &apos;, as well as numeric entities
    • It maps lines ending in \r\n and \r to \n on input, in accordance with the XML Specification, Section 2.11

    The code is based on http://www.javaworld.com/javaworld/javatips/javatip128/ with some extra code from XERCES to recognize the encoding.

    • Constructor Summary

      Constructors 
      Modifier Constructor Description
      private SimpleXMLParser()  
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static char decodeEntity​(java.lang.String s)  
      static java.lang.String escapeXML​(java.lang.String s, boolean onlyASCII)
      Escapes a string with the appropriated XML codes.
      private static void exc​(java.lang.String s, int line, int col)  
      private static java.lang.String getDeclaredEncoding​(java.lang.String decl)  
      private static java.lang.String getEncodingName​(byte[] b4)  
      static java.lang.String getJavaEncoding​(java.lang.String iana)
      Gets the java encoding from the IANA encoding.
      static void parse​(SimpleXMLDocHandler doc, SimpleXMLDocHandlerComment comment, java.io.Reader r, boolean html)
      Parses the XML document firing the events to the handler.
      static void parse​(SimpleXMLDocHandler doc, java.io.InputStream in)
      Parses the XML document firing the events to the handler.
      static void parse​(SimpleXMLDocHandler doc, java.io.Reader r)  
      private static int popMode​(java.util.Stack st)  
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • SimpleXMLParser

        private SimpleXMLParser()
    • Method Detail

      • popMode

        private static int popMode​(java.util.Stack st)
      • parse

        public static void parse​(SimpleXMLDocHandler doc,
                                 java.io.InputStream in)
                          throws java.io.IOException
        Parses the XML document firing the events to the handler.
        Parameters:
        doc - the document handler
        in - the document. The encoding is deduced from the stream. The stream is not closed
        Throws:
        java.io.IOException - on error
      • getDeclaredEncoding

        private static java.lang.String getDeclaredEncoding​(java.lang.String decl)
      • getJavaEncoding

        public static java.lang.String getJavaEncoding​(java.lang.String iana)
        Gets the java encoding from the IANA encoding. If the encoding cannot be found it returns the input.
        Parameters:
        iana - the IANA encoding
        Returns:
        the java encoding
      • parse

        public static void parse​(SimpleXMLDocHandler doc,
                                 java.io.Reader r)
                          throws java.io.IOException
        Throws:
        java.io.IOException
      • parse

        public static void parse​(SimpleXMLDocHandler doc,
                                 SimpleXMLDocHandlerComment comment,
                                 java.io.Reader r,
                                 boolean html)
                          throws java.io.IOException
        Parses the XML document firing the events to the handler.
        Parameters:
        doc - the document handler
        r - the document. The encoding is already resolved. The reader is not closed
        Throws:
        java.io.IOException - on error
      • exc

        private static void exc​(java.lang.String s,
                                int line,
                                int col)
                         throws java.io.IOException
        Throws:
        java.io.IOException
      • escapeXML

        public static java.lang.String escapeXML​(java.lang.String s,
                                                 boolean onlyASCII)
        Escapes a string with the appropriated XML codes.
        Parameters:
        s - the string to be escaped
        onlyASCII - codes above 127 will always be escaped with &#nn; if true
        Returns:
        the escaped string
      • decodeEntity

        public static char decodeEntity​(java.lang.String s)
      • getEncodingName

        private static java.lang.String getEncodingName​(byte[] b4)