Package org.ccil.cowan.tagsoup
Class Parser
java.lang.Object
org.xml.sax.helpers.DefaultHandler
org.ccil.cowan.tagsoup.Parser
- All Implemented Interfaces:
ScanHandler
,ContentHandler
,DTDHandler
,EntityResolver
,ErrorHandler
,LexicalHandler
,XMLReader
The SAX parser class.
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final String
Specifies the AutoDetector (for encoding detection) this Parser uses.static final String
A value of "true" indicates that the parser will give unknown elements a content model of EMPTY; a value of "false", a content model of ANY.static final String
A value of "true" indicates that the parser will treat CDATA elements specially.static final String
A value of "true" indicates that the parser will return default attribute values for missing attributes that have default values.static final String
Reports whether this parser processes external general entities (it doesn't).static final String
Reports whether this parser processes external parameter entities (it doesn't).static final String
A value of "true" indicates that the parser will transmit whitespace in element-only content via the SAX ignorableWhitespace callback.static final String
A value of "true" indicates that the parser will ignore unknown elements.static final String
May be examined only during a parse, after the startDocument() callback has been completed; read-only.static final String
A value of "true" indicates that the LexicalHandler will report the beginning and end of parameter entities (it won't).static final String
Used to see some syntax events that are essential in some applications: comments, CDATA delimiters, selected general entity inclusions, and the start and end of the DTD (and declaration of document element name).static final String
A value of "true" indicates that XML qualified names (with prefixes) and attributes (including xmlns* attributes) will be available.static final String
A value of "true" indicates namespace URIs and unprefixed local names for element and attribute names will be available.static final String
A value of "true" indicates that system IDs in declarations will be absolutized (relative to their base URIs) before reporting.static final String
A value of "true" indicates that the parser will attempt to restart the restartable elements.static final String
A value of "true" indicates that the parser will allow unknown elements to be the root element.static final String
Specifies the Scanner object this Parser uses.static final String
Specifies the Schema object this Parser uses.static final String
Has a value of "true" if all XML names (for elements, prefixes, attributes, entities, notations, and local names), as well as Namespace URIs, will have been interned using java.lang.String.intern.static final String
A value of "true" indicates that the parser will translate colons into underscores in names.static final String
Controls whether the parser reports Unicode normalization errors as described in section 2.13 and Appendix B of the XML 1.1 Recommendation.static final String
Returns "true" if the Attributes objects passed by this parser in ContentHandler.startElement() implement the org.xml.sax.ext.Attributes2 interface.static final String
Returns "true" if, when setEntityResolver is given an object implementing the org.xml.sax.ext.EntityResolver2 interface, those new methods will be used.static final String
Returns "true" if the Locator objects passed by this parser in ContentHandler.setDocumentLocator() implement the org.xml.sax.ext.Locator2 interface.static final String
Controls whether the parser is reporting all validity errors (We don't report any validity errors.)static final String
Returns "true" if the parser supports both XML 1.1 and XML 1.0.static final String
Controls whether, when the namespace-prefixes feature is set, the parser treats namespace declaration attributes as being in the http://www.w3.org/2000/xmlns/ namespace. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoid
adup
(char[] buff, int offset, int length) Reports an attribute name without a value.void
aname
(char[] buff, int offset, int length) Reports an attribute name; a value will follow.void
aval
(char[] buff, int offset, int length) Reports an attribute value.void
cdsect
(char[] buff, int offset, int length) Reports the content of a CDATA section (not a CDATA element)void
cmnt
(char[] buff, int offset, int length) Reports a comment.void
comment
(char[] ch, int start, int length) void
decl
(char[] buff, int offset, int length) Parsing the complete XML Document Type Definition is way too complex, but for many simple cases we can extract something useful from it.void
endCDATA()
void
endDTD()
void
void
entity
(char[] buff, int offset, int length) Reports an entity reference or character reference.void
eof
(char[] buff, int offset, int length) Reports EOF.void
etag
(char[] buff, int offset, int length) Reports an end-tag.void
etag_basic
(char[] buff, int offset, int length) boolean
etag_cdata
(char[] buff, int offset, int length) int
Returns the value of the last entity or character reference reported.boolean
getFeature
(String name) getProperty
(String name) void
gi
(char[] buff, int offset, int length) Reports the general identifier (element type name) of a start-tag.void
void
parse
(InputSource input) void
pcdata
(char[] buff, int offset, int length) Reports character content.void
pi
(char[] buff, int offset, int length) Reports the data part of a processing instruction.void
pitarget
(char[] buff, int offset, int length) Reports the target part of a processing instruction.void
setContentHandler
(ContentHandler handler) void
setDTDHandler
(DTDHandler handler) void
setEntityResolver
(EntityResolver resolver) void
setErrorHandler
(ErrorHandler handler) void
setFeature
(String name, boolean value) void
setProperty
(String name, Object value) void
stagc
(char[] buff, int offset, int length) Reports the close of a start-tag.void
stage
(char[] buff, int offset, int length) Reports the close of an empty-tag.void
void
void
startEntity
(String name) Methods inherited from class org.xml.sax.helpers.DefaultHandler
characters, endDocument, endElement, endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startDocument, startElement, startPrefixMapping, unparsedEntityDecl, warning
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.xml.sax.ContentHandler
declaration
-
Field Details
-
namespacesFeature
A value of "true" indicates namespace URIs and unprefixed local names for element and attribute names will be available.- See Also:
-
namespacePrefixesFeature
A value of "true" indicates that XML qualified names (with prefixes) and attributes (including xmlns* attributes) will be available. We don't support this value.- See Also:
-
externalGeneralEntitiesFeature
Reports whether this parser processes external general entities (it doesn't).- See Also:
-
externalParameterEntitiesFeature
Reports whether this parser processes external parameter entities (it doesn't).- See Also:
-
isStandaloneFeature
May be examined only during a parse, after the startDocument() callback has been completed; read-only. The value is true if the document specified standalone="yes" in its XML declaration, and otherwise is false. (It's always false.)- See Also:
-
lexicalHandlerParameterEntitiesFeature
A value of "true" indicates that the LexicalHandler will report the beginning and end of parameter entities (it won't).- See Also:
-
resolveDTDURIsFeature
A value of "true" indicates that system IDs in declarations will be absolutized (relative to their base URIs) before reporting. (This returns true but doesn't actually do anything.)- See Also:
-
stringInterningFeature
Has a value of "true" if all XML names (for elements, prefixes, attributes, entities, notations, and local names), as well as Namespace URIs, will have been interned using java.lang.String.intern. This supports fast testing of equality/inequality against string constants, rather than forcing slower calls to String.equals(). (We always intern.)- See Also:
-
useAttributes2Feature
Returns "true" if the Attributes objects passed by this parser in ContentHandler.startElement() implement the org.xml.sax.ext.Attributes2 interface. (They don't.)- See Also:
-
useLocator2Feature
Returns "true" if the Locator objects passed by this parser in ContentHandler.setDocumentLocator() implement the org.xml.sax.ext.Locator2 interface. (They don't.)- See Also:
-
useEntityResolver2Feature
Returns "true" if, when setEntityResolver is given an object implementing the org.xml.sax.ext.EntityResolver2 interface, those new methods will be used. (They won't be.)- See Also:
-
validationFeature
Controls whether the parser is reporting all validity errors (We don't report any validity errors.)- See Also:
-
unicodeNormalizationCheckingFeature
Controls whether the parser reports Unicode normalization errors as described in section 2.13 and Appendix B of the XML 1.1 Recommendation. (We don't normalize.)- See Also:
-
xmlnsURIsFeature
Controls whether, when the namespace-prefixes feature is set, the parser treats namespace declaration attributes as being in the http://www.w3.org/2000/xmlns/ namespace. (It doesn't.)- See Also:
-
XML11Feature
Returns "true" if the parser supports both XML 1.1 and XML 1.0. (Always false.)- See Also:
-
ignoreBogonsFeature
A value of "true" indicates that the parser will ignore unknown elements.- See Also:
-
bogonsEmptyFeature
A value of "true" indicates that the parser will give unknown elements a content model of EMPTY; a value of "false", a content model of ANY.- See Also:
-
rootBogonsFeature
A value of "true" indicates that the parser will allow unknown elements to be the root element.- See Also:
-
defaultAttributesFeature
A value of "true" indicates that the parser will return default attribute values for missing attributes that have default values.- See Also:
-
translateColonsFeature
A value of "true" indicates that the parser will translate colons into underscores in names.- See Also:
-
restartElementsFeature
A value of "true" indicates that the parser will attempt to restart the restartable elements.- See Also:
-
ignorableWhitespaceFeature
A value of "true" indicates that the parser will transmit whitespace in element-only content via the SAX ignorableWhitespace callback. Normally this is not done, because HTML is an SGML application and SGML suppresses such whitespace.- See Also:
-
CDATAElementsFeature
A value of "true" indicates that the parser will treat CDATA elements specially. Normally true, since the input is by default HTML.- See Also:
-
lexicalHandlerProperty
Used to see some syntax events that are essential in some applications: comments, CDATA delimiters, selected general entity inclusions, and the start and end of the DTD (and declaration of document element name). The Object must implement org.xml.sax.ext.LexicalHandler.- See Also:
-
scannerProperty
Specifies the Scanner object this Parser uses.- See Also:
-
schemaProperty
Specifies the Schema object this Parser uses.- See Also:
-
autoDetectorProperty
Specifies the AutoDetector (for encoding detection) this Parser uses.- See Also:
-
-
Constructor Details
-
Parser
public Parser()
-
-
Method Details
-
getFeature
- Specified by:
getFeature
in interfaceXMLReader
- Throws:
SAXNotRecognizedException
SAXNotSupportedException
-
setFeature
public void setFeature(String name, boolean value) throws SAXNotRecognizedException, SAXNotSupportedException - Specified by:
setFeature
in interfaceXMLReader
- Throws:
SAXNotRecognizedException
SAXNotSupportedException
-
getProperty
- Specified by:
getProperty
in interfaceXMLReader
- Throws:
SAXNotRecognizedException
SAXNotSupportedException
-
setProperty
public void setProperty(String name, Object value) throws SAXNotRecognizedException, SAXNotSupportedException - Specified by:
setProperty
in interfaceXMLReader
- Throws:
SAXNotRecognizedException
SAXNotSupportedException
-
setEntityResolver
- Specified by:
setEntityResolver
in interfaceXMLReader
-
getEntityResolver
- Specified by:
getEntityResolver
in interfaceXMLReader
-
setDTDHandler
- Specified by:
setDTDHandler
in interfaceXMLReader
-
getDTDHandler
- Specified by:
getDTDHandler
in interfaceXMLReader
-
setContentHandler
- Specified by:
setContentHandler
in interfaceXMLReader
-
getContentHandler
- Specified by:
getContentHandler
in interfaceXMLReader
-
setErrorHandler
- Specified by:
setErrorHandler
in interfaceXMLReader
-
getErrorHandler
- Specified by:
getErrorHandler
in interfaceXMLReader
-
parse
- Specified by:
parse
in interfaceXMLReader
- Throws:
IOException
SAXException
-
parse
- Specified by:
parse
in interfaceXMLReader
- Throws:
IOException
SAXException
-
adup
Description copied from interface:ScanHandler
Reports an attribute name without a value.- Specified by:
adup
in interfaceScanHandler
- Throws:
SAXException
-
aname
Description copied from interface:ScanHandler
Reports an attribute name; a value will follow.- Specified by:
aname
in interfaceScanHandler
- Throws:
SAXException
-
aval
Description copied from interface:ScanHandler
Reports an attribute value.- Specified by:
aval
in interfaceScanHandler
- Throws:
SAXException
-
entity
Description copied from interface:ScanHandler
Reports an entity reference or character reference.- Specified by:
entity
in interfaceScanHandler
- Throws:
SAXException
-
eof
Description copied from interface:ScanHandler
Reports EOF.- Specified by:
eof
in interfaceScanHandler
- Throws:
SAXException
-
etag
Description copied from interface:ScanHandler
Reports an end-tag.- Specified by:
etag
in interfaceScanHandler
- Throws:
SAXException
-
etag_cdata
- Throws:
SAXException
-
etag_basic
- Throws:
SAXException
-
decl
Parsing the complete XML Document Type Definition is way too complex, but for many simple cases we can extract something useful from it. doctypedecl ::= '' DeclSep ::= PEReference | S intSubset ::= (markupdecl | DeclSep)* markupdecl ::= elementdecl | AttlistDecl | EntityDecl | NotationDecl | PI | Comment ExternalID ::= 'SYSTEM' S SystemLiteral | 'PUBLIC' S PubidLiteral S SystemLiteral- Specified by:
decl
in interfaceScanHandler
- Throws:
SAXException
-
gi
Description copied from interface:ScanHandler
Reports the general identifier (element type name) of a start-tag.- Specified by:
gi
in interfaceScanHandler
- Throws:
SAXException
-
cdsect
Description copied from interface:ScanHandler
Reports the content of a CDATA section (not a CDATA element)- Specified by:
cdsect
in interfaceScanHandler
- Throws:
SAXException
-
pcdata
Description copied from interface:ScanHandler
Reports character content.- Specified by:
pcdata
in interfaceScanHandler
- Throws:
SAXException
-
pitarget
Description copied from interface:ScanHandler
Reports the target part of a processing instruction.- Specified by:
pitarget
in interfaceScanHandler
- Throws:
SAXException
-
pi
Description copied from interface:ScanHandler
Reports the data part of a processing instruction.- Specified by:
pi
in interfaceScanHandler
- Throws:
SAXException
-
stagc
Description copied from interface:ScanHandler
Reports the close of a start-tag.- Specified by:
stagc
in interfaceScanHandler
- Throws:
SAXException
-
stage
Description copied from interface:ScanHandler
Reports the close of an empty-tag.- Specified by:
stage
in interfaceScanHandler
- Throws:
SAXException
-
cmnt
Description copied from interface:ScanHandler
Reports a comment.- Specified by:
cmnt
in interfaceScanHandler
- Throws:
SAXException
-
getEntity
public int getEntity()Description copied from interface:ScanHandler
Returns the value of the last entity or character reference reported.- Specified by:
getEntity
in interfaceScanHandler
-
comment
- Specified by:
comment
in interfaceLexicalHandler
- Throws:
SAXException
-
endCDATA
- Specified by:
endCDATA
in interfaceLexicalHandler
- Throws:
SAXException
-
endDTD
- Specified by:
endDTD
in interfaceLexicalHandler
- Throws:
SAXException
-
endEntity
- Specified by:
endEntity
in interfaceLexicalHandler
- Throws:
SAXException
-
startCDATA
- Specified by:
startCDATA
in interfaceLexicalHandler
- Throws:
SAXException
-
startDTD
- Specified by:
startDTD
in interfaceLexicalHandler
- Throws:
SAXException
-
startEntity
- Specified by:
startEntity
in interfaceLexicalHandler
- Throws:
SAXException
-