Package org.cyberneko.html
Class HTMLConfiguration
- java.lang.Object
-
- org.apache.xerces.util.ParserConfigurationSettings
-
- org.cyberneko.html.HTMLConfiguration
-
- All Implemented Interfaces:
org.apache.xerces.xni.parser.XMLComponentManager
,org.apache.xerces.xni.parser.XMLParserConfiguration
,org.apache.xerces.xni.parser.XMLPullParserConfiguration
public class HTMLConfiguration extends org.apache.xerces.util.ParserConfigurationSettings implements org.apache.xerces.xni.parser.XMLPullParserConfiguration
An XNI-based parser configuration that can be used to parse HTML documents. This configuration can be used directly in order to parse HTML documents or can be used in conjunction with any XNI based tools, such as the Xerces2 implementation.This configuration recognizes the following features:
- http://cyberneko.org/html/features/augmentations
- http://cyberneko.org/html/features/report-errors
- http://cyberneko.org/html/features/report-errors/simple
- http://cyberneko.org/html/features/balance-tags
- and
- the features supported by the scanner and tag balancer components.
This configuration recognizes the following properties:
- http://cyberneko.org/html/properties/names/elems
- http://cyberneko.org/html/properties/names/attrs
- http://cyberneko.org/html/properties/filters
- http://cyberneko.org/html/properties/error-reporter
- and
- the properties supported by the scanner and tag balancer.
For complete usage information, refer to the documentation.
- Version:
- $Id: HTMLConfiguration.java,v 1.9 2005/02/14 03:56:54 andyc Exp $
- Author:
- Andy Clark
- See Also:
HTMLScanner
,HTMLTagBalancer
,HTMLErrorReporter
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected class
HTMLConfiguration.ErrorReporter
Defines an error reporter for reporting HTML errors.
-
Field Summary
Fields Modifier and Type Field Description protected static java.lang.String
AUGMENTATIONS
Include infoset augmentations.protected static java.lang.String
BALANCE_TAGS
Balance tags.protected static java.lang.String
ERROR_DOMAIN
Error domain.protected static java.lang.String
ERROR_REPORTER
Error reporter.protected boolean
fCloseStream
Stream opened by parser.protected org.apache.xerces.xni.XMLDocumentHandler
fDocumentHandler
Document handler.protected HTMLScanner
fDocumentScanner
Document scanner.protected org.apache.xerces.xni.XMLDTDContentModelHandler
fDTDContentModelHandler
DTD content model handler.protected org.apache.xerces.xni.XMLDTDHandler
fDTDHandler
DTD handler.protected org.apache.xerces.xni.parser.XMLEntityResolver
fEntityResolver
Entity resolver.protected org.apache.xerces.xni.parser.XMLErrorHandler
fErrorHandler
Error handler.protected HTMLErrorReporter
fErrorReporter
Error reporter.protected java.util.Vector
fHTMLComponents
Components.protected static java.lang.String
FILTERS
Pipeline filters.protected java.util.Locale
fLocale
Locale.protected NamespaceBinder
fNamespaceBinder
Namespace binder.protected HTMLTagBalancer
fTagBalancer
HTML tag balancer.protected static java.lang.String
NAMES_ATTRS
Modify HTML attribute names: { "upper", "lower", "default" }.protected static java.lang.String
NAMES_ELEMS
Modify HTML element names: { "upper", "lower", "default" }.protected static java.lang.String
NAMESPACES
Namespaces.protected static java.lang.String
REPORT_ERRORS
Report errors.protected static java.lang.String
SIMPLE_ERROR_FORMAT
Simple report format.protected static boolean
XERCES_2_0_0
Parser version is Xerces 2.0.0.protected static boolean
XERCES_2_0_1
Parser version is Xerces 2.0.1.protected static boolean
XML4J_4_0_x
Parser version is XML4J 4.0.x.
-
Constructor Summary
Constructors Constructor Description HTMLConfiguration()
Default constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
addComponent(HTMLComponent component)
Adds a component.void
cleanup()
If the application decides to terminate parsing before the xml document is fully parsed, the application should call this method to free any resource allocated during parsing.protected HTMLScanner
createDocumentScanner()
void
evaluateInputSource(org.apache.xerces.xni.parser.XMLInputSource inputSource)
EXPERIMENTAL: may change in next release
Immediately evaluates an input source and add the new content (e.g.org.apache.xerces.xni.XMLDocumentHandler
getDocumentHandler()
Returns the document handler.org.apache.xerces.xni.XMLDTDContentModelHandler
getDTDContentModelHandler()
Returns the DTD content model handler.org.apache.xerces.xni.XMLDTDHandler
getDTDHandler()
Returns the DTD handler.org.apache.xerces.xni.parser.XMLEntityResolver
getEntityResolver()
Returns the entity resolver.org.apache.xerces.xni.parser.XMLErrorHandler
getErrorHandler()
Returns the error handler.java.util.Locale
getLocale()
Returns the locale.boolean
parse(boolean complete)
Parses the document in a pull parsing fashion.void
parse(org.apache.xerces.xni.parser.XMLInputSource source)
Parses a document.void
pushInputSource(org.apache.xerces.xni.parser.XMLInputSource inputSource)
Pushes an input source onto the current entity stack.protected void
reset()
Resets the parser configuration.void
setDocumentHandler(org.apache.xerces.xni.XMLDocumentHandler handler)
Sets the document handler.void
setDTDContentModelHandler(org.apache.xerces.xni.XMLDTDContentModelHandler handler)
Sets the DTD content model handler.void
setDTDHandler(org.apache.xerces.xni.XMLDTDHandler handler)
Sets the DTD handler.void
setEntityResolver(org.apache.xerces.xni.parser.XMLEntityResolver resolver)
Sets the entity resolver.void
setErrorHandler(org.apache.xerces.xni.parser.XMLErrorHandler handler)
Sets the error handler.void
setFeature(java.lang.String featureId, boolean state)
Sets a feature.void
setInputSource(org.apache.xerces.xni.parser.XMLInputSource inputSource)
Sets the input source for the document to parse.void
setLocale(java.util.Locale locale)
Sets the locale.void
setProperty(java.lang.String propertyId, java.lang.Object value)
Sets a property.-
Methods inherited from class org.apache.xerces.util.ParserConfigurationSettings
addRecognizedFeatures, addRecognizedProperties, checkFeature, checkProperty, getFeature, getProperty
-
-
-
-
Field Detail
-
NAMESPACES
protected static final java.lang.String NAMESPACES
Namespaces.- See Also:
- Constant Field Values
-
AUGMENTATIONS
protected static final java.lang.String AUGMENTATIONS
Include infoset augmentations.- See Also:
- Constant Field Values
-
REPORT_ERRORS
protected static final java.lang.String REPORT_ERRORS
Report errors.- See Also:
- Constant Field Values
-
SIMPLE_ERROR_FORMAT
protected static final java.lang.String SIMPLE_ERROR_FORMAT
Simple report format.- See Also:
- Constant Field Values
-
BALANCE_TAGS
protected static final java.lang.String BALANCE_TAGS
Balance tags.- See Also:
- Constant Field Values
-
NAMES_ELEMS
protected static final java.lang.String NAMES_ELEMS
Modify HTML element names: { "upper", "lower", "default" }.- See Also:
- Constant Field Values
-
NAMES_ATTRS
protected static final java.lang.String NAMES_ATTRS
Modify HTML attribute names: { "upper", "lower", "default" }.- See Also:
- Constant Field Values
-
FILTERS
protected static final java.lang.String FILTERS
Pipeline filters.- See Also:
- Constant Field Values
-
ERROR_REPORTER
protected static final java.lang.String ERROR_REPORTER
Error reporter.- See Also:
- Constant Field Values
-
ERROR_DOMAIN
protected static final java.lang.String ERROR_DOMAIN
Error domain.- See Also:
- Constant Field Values
-
fDocumentHandler
protected org.apache.xerces.xni.XMLDocumentHandler fDocumentHandler
Document handler.
-
fDTDHandler
protected org.apache.xerces.xni.XMLDTDHandler fDTDHandler
DTD handler.
-
fDTDContentModelHandler
protected org.apache.xerces.xni.XMLDTDContentModelHandler fDTDContentModelHandler
DTD content model handler.
-
fErrorHandler
protected org.apache.xerces.xni.parser.XMLErrorHandler fErrorHandler
Error handler.
-
fEntityResolver
protected org.apache.xerces.xni.parser.XMLEntityResolver fEntityResolver
Entity resolver.
-
fLocale
protected java.util.Locale fLocale
Locale.
-
fCloseStream
protected boolean fCloseStream
Stream opened by parser. Therefore, must close stream manually upon termination of parsing.
-
fHTMLComponents
protected final java.util.Vector fHTMLComponents
Components.
-
fDocumentScanner
protected final HTMLScanner fDocumentScanner
Document scanner.
-
fTagBalancer
protected final HTMLTagBalancer fTagBalancer
HTML tag balancer.
-
fNamespaceBinder
protected final NamespaceBinder fNamespaceBinder
Namespace binder.
-
fErrorReporter
protected final HTMLErrorReporter fErrorReporter
Error reporter.
-
XERCES_2_0_0
protected static boolean XERCES_2_0_0
Parser version is Xerces 2.0.0.
-
XERCES_2_0_1
protected static boolean XERCES_2_0_1
Parser version is Xerces 2.0.1.
-
XML4J_4_0_x
protected static boolean XML4J_4_0_x
Parser version is XML4J 4.0.x.
-
-
Method Detail
-
createDocumentScanner
protected HTMLScanner createDocumentScanner()
-
pushInputSource
public void pushInputSource(org.apache.xerces.xni.parser.XMLInputSource inputSource)
Pushes an input source onto the current entity stack. This enables the scanner to transparently scan new content (e.g. the output written by an embedded script). At the end of the current entity, the scanner returns where it left off at the time this entity source was pushed.Hint: To use this feature to insert the output of <SCRIPT> tags, remember to buffer the entire output of the processed instructions before pushing a new input source. Otherwise, events may appear out of sequence.
- Parameters:
inputSource
- The new input source to start scanning.- See Also:
evaluateInputSource(XMLInputSource)
-
evaluateInputSource
public void evaluateInputSource(org.apache.xerces.xni.parser.XMLInputSource inputSource)
EXPERIMENTAL: may change in next release
Immediately evaluates an input source and add the new content (e.g. the output written by an embedded script).- Parameters:
inputSource
- The new input source to start scanning.- See Also:
pushInputSource(XMLInputSource)
-
setFeature
public void setFeature(java.lang.String featureId, boolean state) throws org.apache.xerces.xni.parser.XMLConfigurationException
Sets a feature.- Specified by:
setFeature
in interfaceorg.apache.xerces.xni.parser.XMLParserConfiguration
- Overrides:
setFeature
in classorg.apache.xerces.util.ParserConfigurationSettings
- Throws:
org.apache.xerces.xni.parser.XMLConfigurationException
-
setProperty
public void setProperty(java.lang.String propertyId, java.lang.Object value) throws org.apache.xerces.xni.parser.XMLConfigurationException
Sets a property.- Specified by:
setProperty
in interfaceorg.apache.xerces.xni.parser.XMLParserConfiguration
- Overrides:
setProperty
in classorg.apache.xerces.util.ParserConfigurationSettings
- Throws:
org.apache.xerces.xni.parser.XMLConfigurationException
-
setDocumentHandler
public void setDocumentHandler(org.apache.xerces.xni.XMLDocumentHandler handler)
Sets the document handler.- Specified by:
setDocumentHandler
in interfaceorg.apache.xerces.xni.parser.XMLParserConfiguration
-
getDocumentHandler
public org.apache.xerces.xni.XMLDocumentHandler getDocumentHandler()
Returns the document handler.- Specified by:
getDocumentHandler
in interfaceorg.apache.xerces.xni.parser.XMLParserConfiguration
-
setDTDHandler
public void setDTDHandler(org.apache.xerces.xni.XMLDTDHandler handler)
Sets the DTD handler.- Specified by:
setDTDHandler
in interfaceorg.apache.xerces.xni.parser.XMLParserConfiguration
-
getDTDHandler
public org.apache.xerces.xni.XMLDTDHandler getDTDHandler()
Returns the DTD handler.- Specified by:
getDTDHandler
in interfaceorg.apache.xerces.xni.parser.XMLParserConfiguration
-
setDTDContentModelHandler
public void setDTDContentModelHandler(org.apache.xerces.xni.XMLDTDContentModelHandler handler)
Sets the DTD content model handler.- Specified by:
setDTDContentModelHandler
in interfaceorg.apache.xerces.xni.parser.XMLParserConfiguration
-
getDTDContentModelHandler
public org.apache.xerces.xni.XMLDTDContentModelHandler getDTDContentModelHandler()
Returns the DTD content model handler.- Specified by:
getDTDContentModelHandler
in interfaceorg.apache.xerces.xni.parser.XMLParserConfiguration
-
setErrorHandler
public void setErrorHandler(org.apache.xerces.xni.parser.XMLErrorHandler handler)
Sets the error handler.- Specified by:
setErrorHandler
in interfaceorg.apache.xerces.xni.parser.XMLParserConfiguration
-
getErrorHandler
public org.apache.xerces.xni.parser.XMLErrorHandler getErrorHandler()
Returns the error handler.- Specified by:
getErrorHandler
in interfaceorg.apache.xerces.xni.parser.XMLParserConfiguration
-
setEntityResolver
public void setEntityResolver(org.apache.xerces.xni.parser.XMLEntityResolver resolver)
Sets the entity resolver.- Specified by:
setEntityResolver
in interfaceorg.apache.xerces.xni.parser.XMLParserConfiguration
-
getEntityResolver
public org.apache.xerces.xni.parser.XMLEntityResolver getEntityResolver()
Returns the entity resolver.- Specified by:
getEntityResolver
in interfaceorg.apache.xerces.xni.parser.XMLParserConfiguration
-
setLocale
public void setLocale(java.util.Locale locale)
Sets the locale.- Specified by:
setLocale
in interfaceorg.apache.xerces.xni.parser.XMLParserConfiguration
-
getLocale
public java.util.Locale getLocale()
Returns the locale.- Specified by:
getLocale
in interfaceorg.apache.xerces.xni.parser.XMLParserConfiguration
-
parse
public void parse(org.apache.xerces.xni.parser.XMLInputSource source) throws org.apache.xerces.xni.XNIException, java.io.IOException
Parses a document.- Specified by:
parse
in interfaceorg.apache.xerces.xni.parser.XMLParserConfiguration
- Throws:
org.apache.xerces.xni.XNIException
java.io.IOException
-
setInputSource
public void setInputSource(org.apache.xerces.xni.parser.XMLInputSource inputSource) throws org.apache.xerces.xni.parser.XMLConfigurationException, java.io.IOException
Sets the input source for the document to parse.- Specified by:
setInputSource
in interfaceorg.apache.xerces.xni.parser.XMLPullParserConfiguration
- Parameters:
inputSource
- The document's input source.- Throws:
org.apache.xerces.xni.parser.XMLConfigurationException
- Thrown if there is a configuration error when initializing the parser.java.io.IOException
- Thrown on I/O error.- See Also:
parse(boolean)
-
parse
public boolean parse(boolean complete) throws org.apache.xerces.xni.XNIException, java.io.IOException
Parses the document in a pull parsing fashion.- Specified by:
parse
in interfaceorg.apache.xerces.xni.parser.XMLPullParserConfiguration
- Parameters:
complete
- True if the pull parser should parse the remaining document completely.- Returns:
- True if there is more document to parse.
- Throws:
org.apache.xerces.xni.XNIException
- Any XNI exception, possibly wrapping another exception.java.io.IOException
- An IO exception from the parser, possibly from a byte stream or character stream supplied by the parser.- See Also:
setInputSource(org.apache.xerces.xni.parser.XMLInputSource)
-
cleanup
public void cleanup()
If the application decides to terminate parsing before the xml document is fully parsed, the application should call this method to free any resource allocated during parsing. For example, close all opened streams.- Specified by:
cleanup
in interfaceorg.apache.xerces.xni.parser.XMLPullParserConfiguration
-
addComponent
protected void addComponent(HTMLComponent component)
Adds a component.
-
reset
protected void reset() throws org.apache.xerces.xni.parser.XMLConfigurationException
Resets the parser configuration.- Throws:
org.apache.xerces.xni.parser.XMLConfigurationException
-
-