Class SimpleXMLParser
- java.lang.Object
-
- com.gitlab.pdftk_java.com.lowagie.text.pdf.SimpleXMLParser
-
public class SimpleXMLParser extends java.lang.Object
A simple XML and HTML parser. This parser is, like the SAX parser, an event based parser, but with much less functionality.The parser can:
- It recognizes the encoding used
- It recognizes all the elements' start tags and end tags
- It lists attributes, where attribute values can be enclosed in single or double quotes
- It recognizes the
<[CDATA[ ... ]]>
construct - It recognizes the standard entities: &, <, >, ", and ', as well as numeric entities
- It maps lines ending in
\r\n
and\r
to\n
on input, in accordance with the XML Specification, Section 2.11
The code is based on http://www.javaworld.com/javaworld/javatips/javatip128/ with some extra code from XERCES to recognize the encoding.
-
-
Field Summary
Fields Modifier and Type Field Description private static int
ATTRIBUTE_EQUAL
private static int
ATTRIBUTE_LVALUE
private static int
ATTRIBUTE_RVALUE
private static int
CDATA
private static int
CLOSE_TAG
private static int
COMMENT
private static int
DOCTYPE
private static int
DONE
private static int
ENTITY
private static java.util.HashMap
entityMap
private static java.util.HashMap
fIANA2JavaMap
private static int
IN_TAG
private static int
OPEN_TAG
private static int
PRE
private static int
QUOTE
private static int
SINGLE_TAG
private static int
START_TAG
private static int
TEXT
-
Constructor Summary
Constructors Modifier Constructor Description private
SimpleXMLParser()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static char
decodeEntity(java.lang.String s)
static java.lang.String
escapeXML(java.lang.String s, boolean onlyASCII)
Escapes a string with the appropriated XML codes.private static void
exc(java.lang.String s, int line, int col)
private static java.lang.String
getDeclaredEncoding(java.lang.String decl)
private static java.lang.String
getEncodingName(byte[] b4)
static java.lang.String
getJavaEncoding(java.lang.String iana)
Gets the java encoding from the IANA encoding.static void
parse(SimpleXMLDocHandler doc, SimpleXMLDocHandlerComment comment, java.io.Reader r, boolean html)
Parses the XML document firing the events to the handler.static void
parse(SimpleXMLDocHandler doc, java.io.InputStream in)
Parses the XML document firing the events to the handler.static void
parse(SimpleXMLDocHandler doc, java.io.Reader r)
private static int
popMode(java.util.Stack st)
-
-
-
Field Detail
-
fIANA2JavaMap
private static final java.util.HashMap fIANA2JavaMap
-
entityMap
private static final java.util.HashMap entityMap
-
TEXT
private static final int TEXT
- See Also:
- Constant Field Values
-
ENTITY
private static final int ENTITY
- See Also:
- Constant Field Values
-
OPEN_TAG
private static final int OPEN_TAG
- See Also:
- Constant Field Values
-
CLOSE_TAG
private static final int CLOSE_TAG
- See Also:
- Constant Field Values
-
START_TAG
private static final int START_TAG
- See Also:
- Constant Field Values
-
ATTRIBUTE_LVALUE
private static final int ATTRIBUTE_LVALUE
- See Also:
- Constant Field Values
-
ATTRIBUTE_EQUAL
private static final int ATTRIBUTE_EQUAL
- See Also:
- Constant Field Values
-
ATTRIBUTE_RVALUE
private static final int ATTRIBUTE_RVALUE
- See Also:
- Constant Field Values
-
QUOTE
private static final int QUOTE
- See Also:
- Constant Field Values
-
IN_TAG
private static final int IN_TAG
- See Also:
- Constant Field Values
-
SINGLE_TAG
private static final int SINGLE_TAG
- See Also:
- Constant Field Values
-
COMMENT
private static final int COMMENT
- See Also:
- Constant Field Values
-
DONE
private static final int DONE
- See Also:
- Constant Field Values
-
DOCTYPE
private static final int DOCTYPE
- See Also:
- Constant Field Values
-
PRE
private static final int PRE
- See Also:
- Constant Field Values
-
CDATA
private static final int CDATA
- See Also:
- Constant Field Values
-
-
Method Detail
-
popMode
private static int popMode(java.util.Stack st)
-
parse
public static void parse(SimpleXMLDocHandler doc, java.io.InputStream in) throws java.io.IOException
Parses the XML document firing the events to the handler.- Parameters:
doc
- the document handlerin
- the document. The encoding is deduced from the stream. The stream is not closed- Throws:
java.io.IOException
- on error
-
getDeclaredEncoding
private static java.lang.String getDeclaredEncoding(java.lang.String decl)
-
getJavaEncoding
public static java.lang.String getJavaEncoding(java.lang.String iana)
Gets the java encoding from the IANA encoding. If the encoding cannot be found it returns the input.- Parameters:
iana
- the IANA encoding- Returns:
- the java encoding
-
parse
public static void parse(SimpleXMLDocHandler doc, java.io.Reader r) throws java.io.IOException
- Throws:
java.io.IOException
-
parse
public static void parse(SimpleXMLDocHandler doc, SimpleXMLDocHandlerComment comment, java.io.Reader r, boolean html) throws java.io.IOException
Parses the XML document firing the events to the handler.- Parameters:
doc
- the document handlerr
- the document. The encoding is already resolved. The reader is not closed- Throws:
java.io.IOException
- on error
-
exc
private static void exc(java.lang.String s, int line, int col) throws java.io.IOException
- Throws:
java.io.IOException
-
escapeXML
public static java.lang.String escapeXML(java.lang.String s, boolean onlyASCII)
Escapes a string with the appropriated XML codes.- Parameters:
s
- the string to be escapedonlyASCII
- codes above 127 will always be escaped with &#nn; iftrue
- Returns:
- the escaped string
-
decodeEntity
public static char decodeEntity(java.lang.String s)
-
getEncodingName
private static java.lang.String getEncodingName(byte[] b4)
-
-