Package org.htmlunit.cyberneko
Class HTMLConfiguration
- java.lang.Object
-
- org.htmlunit.cyberneko.xerces.util.ParserConfigurationSettings
-
- org.htmlunit.cyberneko.HTMLConfiguration
-
- All Implemented Interfaces:
XMLComponentManager
,XMLParserConfiguration
public class HTMLConfiguration extends ParserConfigurationSettings implements XMLParserConfiguration
An XNI-based parser configuration that can be used to parse HTML documents. This configuration can be used directly in order to parse HTML documents or can be used in conjunction with any XNI based tools, such as the Xerces2 implementation.This configuration recognizes the following features:
- http://cyberneko.org/html/features/augmentations
- http://cyberneko.org/html/features/report-errors
- http://cyberneko.org/html/features/report-errors/simple
- and
- the features supported by the scanner and tag balancer components.
This configuration recognizes the following properties:
- http://cyberneko.org/html/properties/names/elems
- http://cyberneko.org/html/properties/names/attrs
- http://cyberneko.org/html/properties/filters
- http://cyberneko.org/html/properties/error-reporter
- and
- the properties supported by the scanner and tag balancer.
For complete usage information, refer to the documentation.
- See Also:
HTMLScanner
,HTMLTagBalancer
,HTMLErrorReporter
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected class
HTMLConfiguration.ErrorReporter
Defines an error reporter for reporting HTML errors.
-
Field Summary
Fields Modifier and Type Field Description protected static java.lang.String
AUGMENTATIONS
Include infoset augmentations.private boolean
closeStream_
Stream opened by parser.private XMLDocumentHandler
documentHandler_
Document handler.(package private) HTMLScanner
documentScanner_
Document scanner.protected static java.lang.String
ERROR_DOMAIN
Error domain.protected static java.lang.String
ERROR_REPORTER
Error reporter.(package private) XMLErrorHandler
errorHandler_
Error handler.static java.lang.String
FILTERS
Pipeline filters.private java.util.List<HTMLComponent>
htmlComponents_
Components.private HTMLElements
htmlElements_
protected static java.lang.String
NAMES_ATTRS
Modify HTML attribute names: { "upper", "lower", "default" }.protected static java.lang.String
NAMES_ELEMS
Modify HTML element names: { "upper", "lower", "default" }.private NamespaceBinder
namespaceBinder_
Namespace binder.protected static java.lang.String
NAMESPACES
Namespaces.protected static java.lang.String
REPORT_ERRORS
Report errors.protected static java.lang.String
SIMPLE_ERROR_FORMAT
Simple report format.private HTMLTagBalancer
tagBalancer_
HTML tag balancer.
-
Constructor Summary
Constructors Constructor Description HTMLConfiguration()
Default constructor.HTMLConfiguration(HTMLElements htmlElements)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
addComponent(HTMLComponent component)
void
cleanup()
If the application decides to terminate parsing before the xml document is fully parsed, the application should call this method to free any resource allocated during parsing.protected HTMLScanner
createDocumentScanner()
void
evaluateInputSource(XMLInputSource inputSource)
EXPERIMENTAL: may change in next release
Immediately evaluates an input source and add the new content (e.g.XMLDocumentHandler
getDocumentHandler()
HTMLScanner
getDocumentScanner()
XMLErrorHandler
getErrorHandler()
java.util.List<HTMLComponent>
getHtmlComponents()
HTMLElements
getHtmlElements()
NamespaceBinder
getNamespaceBinder()
HTMLTagBalancer
getTagBalancer()
boolean
parse(boolean complete)
Parses the document in a pull parsing fashion.void
parse(XMLInputSource source)
Parses a document.void
pushInputSource(XMLInputSource inputSource)
Pushes an input source onto the current entity stack.protected void
reset()
Resets the parser configuration.void
setDocumentHandler(XMLDocumentHandler handler)
Sets the document handler to receive information about the document.void
setErrorHandler(XMLErrorHandler handler)
Sets the error handler.void
setFeature(java.lang.String featureId, boolean state)
Set the state of a feature.void
setInputSource(XMLInputSource inputSource)
Sets the input source for the document to parse.void
setProperty(java.lang.String propertyId, java.lang.Object value)
setProperty-
Methods inherited from class org.htmlunit.cyberneko.xerces.util.ParserConfigurationSettings
addRecognizedFeatures, addRecognizedProperties, checkFeature, checkProperty, getFeature, getProperty
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.htmlunit.cyberneko.xerces.xni.parser.XMLParserConfiguration
addRecognizedFeatures, addRecognizedProperties, getFeature, getProperty
-
-
-
-
Field Detail
-
NAMESPACES
protected static final java.lang.String NAMESPACES
Namespaces.- See Also:
- Constant Field Values
-
AUGMENTATIONS
protected static final java.lang.String AUGMENTATIONS
Include infoset augmentations.- See Also:
- Constant Field Values
-
REPORT_ERRORS
protected static final java.lang.String REPORT_ERRORS
Report errors.- See Also:
- Constant Field Values
-
SIMPLE_ERROR_FORMAT
protected static final java.lang.String SIMPLE_ERROR_FORMAT
Simple report format.- See Also:
- Constant Field Values
-
NAMES_ELEMS
protected static final java.lang.String NAMES_ELEMS
Modify HTML element names: { "upper", "lower", "default" }.- See Also:
- Constant Field Values
-
NAMES_ATTRS
protected static final java.lang.String NAMES_ATTRS
Modify HTML attribute names: { "upper", "lower", "default" }.- See Also:
- Constant Field Values
-
FILTERS
public static final java.lang.String FILTERS
Pipeline filters.- See Also:
- Constant Field Values
-
ERROR_REPORTER
protected static final java.lang.String ERROR_REPORTER
Error reporter.- See Also:
- Constant Field Values
-
ERROR_DOMAIN
protected static final java.lang.String ERROR_DOMAIN
Error domain.- See Also:
- Constant Field Values
-
documentHandler_
private XMLDocumentHandler documentHandler_
Document handler.
-
errorHandler_
XMLErrorHandler errorHandler_
Error handler.
-
closeStream_
private boolean closeStream_
Stream opened by parser. Therefore, must close stream manually upon termination of parsing.
-
htmlComponents_
private final java.util.List<HTMLComponent> htmlComponents_
Components.
-
documentScanner_
final HTMLScanner documentScanner_
Document scanner.
-
tagBalancer_
private final HTMLTagBalancer tagBalancer_
HTML tag balancer.
-
namespaceBinder_
private final NamespaceBinder namespaceBinder_
Namespace binder.
-
htmlElements_
private final HTMLElements htmlElements_
-
-
Constructor Detail
-
HTMLConfiguration
public HTMLConfiguration()
Default constructor.
-
HTMLConfiguration
public HTMLConfiguration(HTMLElements htmlElements)
-
-
Method Detail
-
createDocumentScanner
protected HTMLScanner createDocumentScanner()
-
pushInputSource
public void pushInputSource(XMLInputSource inputSource)
Pushes an input source onto the current entity stack. This enables the scanner to transparently scan new content (e.g. the output written by an embedded script). At the end of the current entity, the scanner returns where it left off at the time this entity source was pushed.Hint: To use this feature to insert the output of <SCRIPT> tags, remember to buffer the entire output of the processed instructions before pushing a new input source. Otherwise, events may appear out of sequence.
- Parameters:
inputSource
- The new input source to start scanning.- See Also:
evaluateInputSource(XMLInputSource)
-
evaluateInputSource
public void evaluateInputSource(XMLInputSource inputSource)
EXPERIMENTAL: may change in next release
Immediately evaluates an input source and add the new content (e.g. the output written by an embedded script).- Parameters:
inputSource
- The new input source to start scanning.- See Also:
pushInputSource(XMLInputSource)
-
setFeature
public void setFeature(java.lang.String featureId, boolean state) throws XMLConfigurationException
Description copied from class:ParserConfigurationSettings
Set the state of a feature.Set the state of any feature in a SAX2 parser. The parser might not recognize the feature, and if it does recognize it, it might not be able to fulfill the request.
- Specified by:
setFeature
in interfaceXMLParserConfiguration
- Overrides:
setFeature
in classParserConfigurationSettings
- Parameters:
featureId
- The unique identifier (URI) of the feature.state
- The requested state of the feature (true or false).- Throws:
XMLConfigurationException
- If the requested feature is not known.
-
setProperty
public void setProperty(java.lang.String propertyId, java.lang.Object value) throws XMLConfigurationException
Description copied from class:ParserConfigurationSettings
setProperty- Specified by:
setProperty
in interfaceXMLParserConfiguration
- Overrides:
setProperty
in classParserConfigurationSettings
- Parameters:
propertyId
- the property idvalue
- the value- Throws:
XMLConfigurationException
- If the requested feature is not known.
-
setDocumentHandler
public void setDocumentHandler(XMLDocumentHandler handler)
Description copied from interface:XMLParserConfiguration
Sets the document handler to receive information about the document.- Specified by:
setDocumentHandler
in interfaceXMLParserConfiguration
- Parameters:
handler
- The document handler.
-
getDocumentHandler
public XMLDocumentHandler getDocumentHandler()
- Specified by:
getDocumentHandler
in interfaceXMLParserConfiguration
- Returns:
- the document handler.
-
setErrorHandler
public void setErrorHandler(XMLErrorHandler handler)
Description copied from interface:XMLParserConfiguration
Sets the error handler.- Specified by:
setErrorHandler
in interfaceXMLParserConfiguration
- Parameters:
handler
- The error resolver.
-
getErrorHandler
public XMLErrorHandler getErrorHandler()
- Specified by:
getErrorHandler
in interfaceXMLParserConfiguration
- Returns:
- the error handler.
-
getHtmlElements
public HTMLElements getHtmlElements()
- Returns:
- the HTMLElements
-
getHtmlComponents
public java.util.List<HTMLComponent> getHtmlComponents()
- Returns:
- the list of HTMLComponents
-
getDocumentScanner
public HTMLScanner getDocumentScanner()
- Returns:
- the DocumentScanner
-
getTagBalancer
public HTMLTagBalancer getTagBalancer()
- Returns:
- the TagBalancer
-
getNamespaceBinder
public NamespaceBinder getNamespaceBinder()
- Returns:
- the NamespaceBinder
-
parse
public void parse(XMLInputSource source) throws XNIException, java.io.IOException
Parses a document.- Specified by:
parse
in interfaceXMLParserConfiguration
- Parameters:
source
- The input source for the top-level of the XML document.- Throws:
XNIException
- Any XNI exception, possibly wrapping another exception.java.io.IOException
- An IO exception from the parser, possibly from a byte stream or character stream supplied by the parser.
-
setInputSource
public void setInputSource(XMLInputSource inputSource) throws XMLConfigurationException, java.io.IOException
Sets the input source for the document to parse.- Specified by:
setInputSource
in interfaceXMLParserConfiguration
- Parameters:
inputSource
- The document's input source.- Throws:
XMLConfigurationException
- Thrown if there is a configuration error when initializing the parser.java.io.IOException
- Thrown on I/O error.- See Also:
parse(boolean)
-
parse
public boolean parse(boolean complete) throws XNIException, java.io.IOException
Parses the document in a pull parsing fashion.- Specified by:
parse
in interfaceXMLParserConfiguration
- Parameters:
complete
- True if the pull parser should parse the remaining document completely.- Returns:
- True if there is more document to parse.
- Throws:
XNIException
- Any XNI exception, possibly wrapping another exception.java.io.IOException
- An IO exception from the parser, possibly from a byte stream or character stream supplied by the parser.- See Also:
setInputSource(org.htmlunit.cyberneko.xerces.xni.parser.XMLInputSource)
-
cleanup
public void cleanup()
If the application decides to terminate parsing before the xml document is fully parsed, the application should call this method to free any resource allocated during parsing. For example, close all opened streams.- Specified by:
cleanup
in interfaceXMLParserConfiguration
-
addComponent
protected void addComponent(HTMLComponent component)
-
reset
protected void reset() throws XMLConfigurationException
Resets the parser configuration.- Throws:
XMLConfigurationException
-
-