Package nu.validator.htmlparser.io
Class Driver
- java.lang.Object
-
- nu.validator.htmlparser.io.Driver
-
- All Implemented Interfaces:
EncodingDeclarationHandler
public class Driver extends java.lang.Object implements EncodingDeclarationHandler
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description private class
Driver.ReparseException
-
Field Summary
Fields Modifier and Type Field Description private boolean
allowRewinding
private Encoding
characterEncoding
private CharacterHandler[]
characterHandlers
Used for NFC checking if non-null
, source code capture, etc.private Confidence
confidence
private Heuristics
heuristics
private java.io.Reader
reader
The input UTF-16 code unit stream.private RewindableInputStream
rewindableInputStream
The reference to the rewindable byte stream.private boolean
swallowBom
private Tokenizer
tokenizer
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
addCharacterHandler(CharacterHandler characterHandler)
private void
becomeConfident()
(package private) void
dontSwallowBom()
protected Encoding
encodingFromExternalDeclaration(java.lang.String encoding)
Initializes a decoder from external decl.java.lang.String
getCharacterEncoding()
Queries the environment for the encoding in use (for error reporting).org.xml.sax.Locator
getDocumentLocator()
boolean
internalEncodingDeclaration(java.lang.String internalCharset)
Indicates that the parser has found an internal encoding declaration with the charset valuecharset
.boolean
isAllowRewinding()
Returns the allowRewinding.boolean
isCheckingNormalization()
Query if checking normalization.(package private) void
notifyAboutMetaBoundary()
private void
runStates()
void
setAllowRewinding(boolean allowRewinding)
Sets the allowRewinding.void
setCheckingNormalization(boolean enable)
Turns NFC checking on or off.void
setCommentPolicy(XmlViolationPolicy commentPolicy)
void
setContentNonXmlCharPolicy(XmlViolationPolicy contentNonXmlCharPolicy)
void
setContentSpacePolicy(XmlViolationPolicy contentSpacePolicy)
void
setEncoding(Encoding encoding, Confidence confidence)
void
setErrorHandler(org.xml.sax.ErrorHandler eh)
void
setHeuristics(Heuristics heuristics)
Sets the encoding sniffing heuristics.void
setHtml4ModeCompatibleWithXhtml1Schemata(boolean html4ModeCompatibleWithXhtml1Schemata)
void
setMappingLangToXmlLang(boolean mappingLangToXmlLang)
void
setNamePolicy(XmlViolationPolicy namePolicy)
void
setTransitionHandler(TransitionHandler transitionHandler)
void
setXmlnsPolicy(XmlViolationPolicy xmlnsPolicy)
void
tokenize(org.xml.sax.InputSource is)
Runs the tokenization.protected void
warnWithoutLocation(java.lang.String message)
Reports a warning without line/colprotected Encoding
whineAboutEncodingAndReturnActual(java.lang.String encoding, Encoding cs)
-
-
-
Field Detail
-
reader
private java.io.Reader reader
The input UTF-16 code unit stream. If a byte stream was given, this object is an instance ofHtmlInputStreamReader
.
-
rewindableInputStream
private RewindableInputStream rewindableInputStream
The reference to the rewindable byte stream.null
if p rohibited or no longer needed.
-
swallowBom
private boolean swallowBom
-
characterEncoding
private Encoding characterEncoding
-
allowRewinding
private boolean allowRewinding
-
heuristics
private Heuristics heuristics
-
tokenizer
private final Tokenizer tokenizer
-
confidence
private Confidence confidence
-
characterHandlers
private CharacterHandler[] characterHandlers
Used for NFC checking if non-null
, source code capture, etc.
-
-
Constructor Detail
-
Driver
public Driver(Tokenizer tokenizer)
-
-
Method Detail
-
isAllowRewinding
public boolean isAllowRewinding()
Returns the allowRewinding.- Returns:
- the allowRewinding
-
setAllowRewinding
public void setAllowRewinding(boolean allowRewinding)
Sets the allowRewinding.- Parameters:
allowRewinding
- the allowRewinding to set
-
setCheckingNormalization
public void setCheckingNormalization(boolean enable)
Turns NFC checking on or off.- Parameters:
enable
-true
if checking on
-
addCharacterHandler
public void addCharacterHandler(CharacterHandler characterHandler)
-
isCheckingNormalization
public boolean isCheckingNormalization()
Query if checking normalization.- Returns:
true
if checking on
-
tokenize
public void tokenize(org.xml.sax.InputSource is) throws org.xml.sax.SAXException, java.io.IOException
Runs the tokenization. This is the main entry point.- Parameters:
is
- the input source- Throws:
org.xml.sax.SAXException
- on fatal error (if configured to treat XML violations as fatal) or if the token handler threwjava.io.IOException
- if the stream threw
-
dontSwallowBom
void dontSwallowBom()
-
runStates
private void runStates() throws org.xml.sax.SAXException, java.io.IOException
- Throws:
org.xml.sax.SAXException
java.io.IOException
-
setEncoding
public void setEncoding(Encoding encoding, Confidence confidence)
-
internalEncodingDeclaration
public boolean internalEncodingDeclaration(java.lang.String internalCharset) throws org.xml.sax.SAXException
Description copied from interface:EncodingDeclarationHandler
Indicates that the parser has found an internal encoding declaration with the charset valuecharset
.- Specified by:
internalEncodingDeclaration
in interfaceEncodingDeclarationHandler
- Parameters:
internalCharset
- the charset name found.- Returns:
true
if the value ofcharset
was an encoding name for a supported ASCII-superset encoding.- Throws:
org.xml.sax.SAXException
- if something went wrong
-
becomeConfident
private void becomeConfident()
-
setHeuristics
public void setHeuristics(Heuristics heuristics)
Sets the encoding sniffing heuristics.- Parameters:
heuristics
- the heuristics to set
-
warnWithoutLocation
protected void warnWithoutLocation(java.lang.String message) throws org.xml.sax.SAXException
Reports a warning without line/col- Parameters:
message
- the message- Throws:
org.xml.sax.SAXException
-
encodingFromExternalDeclaration
protected Encoding encodingFromExternalDeclaration(java.lang.String encoding) throws org.xml.sax.SAXException
Initializes a decoder from external decl.- Throws:
org.xml.sax.SAXException
-
whineAboutEncodingAndReturnActual
protected Encoding whineAboutEncodingAndReturnActual(java.lang.String encoding, Encoding cs) throws org.xml.sax.SAXException
- Parameters:
encoding
-cs
-- Returns:
- Throws:
org.xml.sax.SAXException
-
notifyAboutMetaBoundary
void notifyAboutMetaBoundary()
-
setCommentPolicy
public void setCommentPolicy(XmlViolationPolicy commentPolicy)
- Parameters:
commentPolicy
-- See Also:
Tokenizer.setCommentPolicy(nu.validator.htmlparser.common.XmlViolationPolicy)
-
setContentNonXmlCharPolicy
public void setContentNonXmlCharPolicy(XmlViolationPolicy contentNonXmlCharPolicy)
- Parameters:
contentNonXmlCharPolicy
-- See Also:
Tokenizer.setContentNonXmlCharPolicy(nu.validator.htmlparser.common.XmlViolationPolicy)
-
setContentSpacePolicy
public void setContentSpacePolicy(XmlViolationPolicy contentSpacePolicy)
- Parameters:
contentSpacePolicy
-- See Also:
Tokenizer.setContentSpacePolicy(nu.validator.htmlparser.common.XmlViolationPolicy)
-
setErrorHandler
public void setErrorHandler(org.xml.sax.ErrorHandler eh)
- Parameters:
eh
-- See Also:
Tokenizer.setErrorHandler(org.xml.sax.ErrorHandler)
-
setTransitionHandler
public void setTransitionHandler(TransitionHandler transitionHandler)
-
setHtml4ModeCompatibleWithXhtml1Schemata
public void setHtml4ModeCompatibleWithXhtml1Schemata(boolean html4ModeCompatibleWithXhtml1Schemata)
- Parameters:
html4ModeCompatibleWithXhtml1Schemata
-- See Also:
Tokenizer.setHtml4ModeCompatibleWithXhtml1Schemata(boolean)
-
setMappingLangToXmlLang
public void setMappingLangToXmlLang(boolean mappingLangToXmlLang)
- Parameters:
mappingLangToXmlLang
-- See Also:
Tokenizer.setMappingLangToXmlLang(boolean)
-
setNamePolicy
public void setNamePolicy(XmlViolationPolicy namePolicy)
- Parameters:
namePolicy
-- See Also:
Tokenizer.setNamePolicy(nu.validator.htmlparser.common.XmlViolationPolicy)
-
setXmlnsPolicy
public void setXmlnsPolicy(XmlViolationPolicy xmlnsPolicy)
- Parameters:
xmlnsPolicy
-- See Also:
Tokenizer.setXmlnsPolicy(nu.validator.htmlparser.common.XmlViolationPolicy)
-
getCharacterEncoding
public java.lang.String getCharacterEncoding() throws org.xml.sax.SAXException
Description copied from interface:EncodingDeclarationHandler
Queries the environment for the encoding in use (for error reporting).- Specified by:
getCharacterEncoding
in interfaceEncodingDeclarationHandler
- Returns:
- the encoding in use
- Throws:
org.xml.sax.SAXException
- if something went wrong
-
getDocumentLocator
public org.xml.sax.Locator getDocumentLocator()
-
-