Class ParseConfiguration
- java.lang.Object
-
- org.attoparser.config.ParseConfiguration
-
- All Implemented Interfaces:
java.io.Serializable
,java.lang.Cloneable
public final class ParseConfiguration extends java.lang.Object implements java.io.Serializable, java.lang.Cloneable
Models a series of parsing configurations that can be applied during document parsing by
MarkupParser
and its variantsSimpleMarkupParser
andDOMMarkupParser
.Among others, the parameters that can be configured are:
- The parsing mode: XML or HTML.
- Whether to expect XML-well-formed code or not.
- Whether to perform automatic tag balancing or not.
- Whether we will allow parsing of markup fragments or just entire documents.
The
htmlConfiguration()
andxmlConfiguration()
static methods act as starting points for configuration. Once one of these pre-initialized configurations has been created, it can be fine-tuned for the user's needs.Note these configuration objects are mutable, so they should not be modified once they have been passed to a parser in order to initialize it.
Instances of this class can be cloned, so creating a variant of an already-tuned configuration is easy.
- Since:
- 2.0.0
- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
ParseConfiguration.ElementBalancing
Enumeration representing the possible actions to be taken with regard to element balancing:static class
ParseConfiguration.ParsingMode
Enumeration used for determining the parsing mode, which will affect the parser's behaviour.static class
ParseConfiguration.PrologParseConfiguration
Class encapsulating the configuration parameters used for parsing and validating the "prolog" section of a markup document.static class
ParseConfiguration.PrologPresence
Enumeration used for determining whether an element in the document prolog (DOCTYPE, XML Declaration) or the prolog itself should be allowed, required or even forbidden.static class
ParseConfiguration.UniqueRootElementPresence
Enumeration used for determining the behaviour the parser should have with respect to the presence and number of root elements in the parsed document.
-
Field Summary
Fields Modifier and Type Field Description private boolean
caseSensitive
private static ParseConfiguration
DEFAULT_HTML_PARSE_CONFIGURATION
private static ParseConfiguration
DEFAULT_XML_PARSE_CONFIGURATION
private ParseConfiguration.ElementBalancing
elementBalancing
private ParseConfiguration.ParsingMode
mode
private boolean
noUnmatchedCloseElementsRequired
private ParseConfiguration.PrologParseConfiguration
prologParseConfiguration
private static long
serialVersionUID
private boolean
textSplittable
private boolean
uniqueAttributesInElementRequired
private ParseConfiguration.UniqueRootElementPresence
uniqueRootElementPresence
private boolean
xmlWellFormedAttributeValuesRequired
-
Constructor Summary
Constructors Modifier Constructor Description private
ParseConfiguration()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description ParseConfiguration
clone()
ParseConfiguration.ElementBalancing
getElementBalancing()
Returns the level of element balancing required at the document being parsed, enabling auto-closing of elements if needed.ParseConfiguration.ParsingMode
getMode()
Return the parsing mode to be used.ParseConfiguration.PrologParseConfiguration
getPrologParseConfiguration()
Returns theParseConfiguration.PrologParseConfiguration
object determining the way in which prolog (XML Declaration, DOCTYPE) will be dealt with during parsing.ParseConfiguration.UniqueRootElementPresence
getUniqueRootElementPresence()
This value determines whether it will be required that the document has a unique root element.static ParseConfiguration
htmlConfiguration()
Return an instance ofParseConfiguration
containing a valid configuration set for most HTML scenarios.boolean
isCaseSensitive()
Returns whether validations performed on the parsed document should be case sensitive or not (e.g.boolean
isNoUnmatchedCloseElementsRequired()
Returns whether unmatched close elements (those not matching any equivalent open elements) are allowed or not.boolean
isTextSplittable()
Returns whether text fragments in markup can be split in more than one text node, if it occupies more than an entire buffer in size.boolean
isUniqueAttributesInElementRequired()
Returns whether attributes should never appear duplicated in elements.boolean
isXmlWellFormedAttributeValuesRequired()
Returns whether element attributes will be required to be well-formed from the XML standpoint.void
setCaseSensitive(boolean caseSensitive)
Specify whether validations performed on the parsed document should be case sensitive or not (e.g.void
setElementBalancing(ParseConfiguration.ElementBalancing elementBalancing)
Specify the level of element balancing required at the document being parsed, enabling auto-closing of elements if needed.void
setMode(ParseConfiguration.ParsingMode mode)
Specify the parsing mode to be used.void
setNoUnmatchedCloseElementsRequired(boolean noUnmatchedCloseElementsRequired)
Specify whether unmatched close elements (those not matching any equivalent open elements) are allowed or not.void
setTextSplittable(boolean textSplittable)
Specify whether text fragments in markup can be split in more than one text node, if it occupies more than an entire buffer in size.void
setUniqueAttributesInElementRequired(boolean uniqueAttributesInElementRequired)
Returns whether attributes should never appear duplicated in elements.void
setUniqueRootElementPresence(ParseConfiguration.UniqueRootElementPresence uniqueRootElementPresence)
This value determines whether it will be required that the document has a unique root element.void
setXmlWellFormedAttributeValuesRequired(boolean xmlWellFormedAttributeValuesRequired)
Specify whether element attributes will be required to be well-formed from the XML standpoint.private static void
validateNotNull(java.lang.Object obj, java.lang.String message)
static ParseConfiguration
xmlConfiguration()
Return an instance ofParseConfiguration
containing a valid configuration set for most XML scenarios.
-
-
-
Field Detail
-
serialVersionUID
private static final long serialVersionUID
- See Also:
- Constant Field Values
-
DEFAULT_HTML_PARSE_CONFIGURATION
private static final ParseConfiguration DEFAULT_HTML_PARSE_CONFIGURATION
-
DEFAULT_XML_PARSE_CONFIGURATION
private static final ParseConfiguration DEFAULT_XML_PARSE_CONFIGURATION
-
mode
private ParseConfiguration.ParsingMode mode
-
caseSensitive
private boolean caseSensitive
-
textSplittable
private boolean textSplittable
-
elementBalancing
private ParseConfiguration.ElementBalancing elementBalancing
-
noUnmatchedCloseElementsRequired
private boolean noUnmatchedCloseElementsRequired
-
xmlWellFormedAttributeValuesRequired
private boolean xmlWellFormedAttributeValuesRequired
-
uniqueAttributesInElementRequired
private boolean uniqueAttributesInElementRequired
-
prologParseConfiguration
private ParseConfiguration.PrologParseConfiguration prologParseConfiguration
-
uniqueRootElementPresence
private ParseConfiguration.UniqueRootElementPresence uniqueRootElementPresence
-
-
Method Detail
-
htmlConfiguration
public static ParseConfiguration htmlConfiguration()
Return an instance of
ParseConfiguration
containing a valid configuration set for most HTML scenarios.- Mode:
ParseConfiguration.ParsingMode.HTML
- Text splittable: false
- Element balancing:
ParseConfiguration.ElementBalancing.AUTO_CLOSE
- No unmatched close elements required: false
- Unique attributes in elements required: false
- Xml-well-formed attribute values required: false
- Unique root element presence:
ParseConfiguration.UniqueRootElementPresence.NOT_VALIDATED
- Validate Prolog: false
- Returns:
- a valid default configuration object for HTML parsing.
- Mode:
-
xmlConfiguration
public static ParseConfiguration xmlConfiguration()
Return an instance of
ParseConfiguration
containing a valid configuration set for most XML scenarios.- Mode:
ParseConfiguration.ParsingMode.XML
- Text splittable: false
- Element balancing:
ParseConfiguration.ElementBalancing.REQUIRE_BALANCED
- No unmatched close elements required: true
- Unique attributes in elements required: true
- Xml-well-formed attribute values required: true
- Unique root element presence:
ParseConfiguration.UniqueRootElementPresence.DEPENDS_ON_PROLOG_DOCTYPE
- Validate Prolog: true
- Prolog presence:
ParseConfiguration.PrologPresence.ALLOWED
- XML Declaration presence:
ParseConfiguration.PrologPresence.ALLOWED
- DOCTYPE presence:
ParseConfiguration.PrologPresence.ALLOWED
- Require DOCTYPE keyword to be uppercase: true
- Returns:
- a valid default configuration object for XML parsing.
- Mode:
-
getMode
public ParseConfiguration.ParsingMode getMode()
Return the parsing mode to be used. Can be XML or HTML.
Depending on the selected mode parsers will behave differently, given HTML has some specific rules which are not XML-compatible (like void elements which might appear unclosed like <meta>.
- Returns:
- the parsing mode to be used.
-
setMode
public void setMode(ParseConfiguration.ParsingMode mode)
Specify the parsing mode to be used. Can be XML or HTML.
Depending on the selected mode parsers will behave differently, given HTML has some specific rules which are not XML-compatible (like void elements which might appear unclosed like <meta>.
- Parameters:
mode
- the parsing mode to be used.
-
isCaseSensitive
public boolean isCaseSensitive()
Returns whether validations performed on the parsed document should be case sensitive or not (e.g. attribute names, document root element name, element open vs close elements, etc.)
HTML requires this parameter to be false. Default for XML is true.
- Returns:
- whether validations should be case sensitive or not.
-
setCaseSensitive
public void setCaseSensitive(boolean caseSensitive)
Specify whether validations performed on the parsed document should be case sensitive or not (e.g. attribute names, document root element name, element open vs close elements, etc.)
HTML requires this parameter to be false. Default for XML is true.
- Parameters:
caseSensitive
- whether validations should be case sensitive or not.
-
isTextSplittable
public boolean isTextSplittable()
Returns whether text fragments in markup can be split in more than one text node, if it occupies more than an entire buffer in size.
Default is false.
- Returns:
- whether text fragments can be split or not.
-
setTextSplittable
public void setTextSplittable(boolean textSplittable)
Specify whether text fragments in markup can be split in more than one text node, if it occupies more than an entire buffer in size.
Default is false.
- Parameters:
textSplittable
- whether text fragments can be split or not.
-
getElementBalancing
public ParseConfiguration.ElementBalancing getElementBalancing()
Returns the level of element balancing required at the document being parsed, enabling auto-closing of elements if needed.
Possible values are:
ParseConfiguration.ElementBalancing.NO_BALANCING
: Do not perform element balancing checks at all. Events will be reported as they appear. There is no guarantee that a DOM tree can be built from the fired events though.ParseConfiguration.ElementBalancing.REQUIRE_BALANCED
: Require that elements are already correctly balanced in markup, throwing an exception if not. Note that when in HTML mode, this does not require the specification of optional tags such as <tbody>. Also note that this will automatically consider thesetNoUnmatchedCloseElementsRequired(boolean)
flag to be set to true.ParseConfiguration.ElementBalancing.AUTO_OPEN_CLOSE
: Auto open and close elements, which includes both those elements that, according to the HTML spec (when in HTML mode) have optional start or end tags (see http://www.w3.org/html/wg/drafts/html/master/syntax.html#optional-tags) and those that simply are unclosed at the moment a parent element needs to be closed (so their closing is forced). As an example of optional tags, the HTML5 spec establishes that <html>, <body> and <tbody> are optional, and that an <li> will close any currently open <li> elements. This is not really ill-formed code, but something allowed by the spec. All of these will be reported as auto-* events by the parser.ParseConfiguration.ElementBalancing.AUTO_CLOSE
: Equivalent toParseConfiguration.ElementBalancing.AUTO_OPEN_CLOSE
but not performing any auto-open operations, so that processing of HTML fragments is possible (no <html> or <body> elements are automatically added).
- Returns:
- the level of element balancing.
-
setElementBalancing
public void setElementBalancing(ParseConfiguration.ElementBalancing elementBalancing)
Specify the level of element balancing required at the document being parsed, enabling auto-closing of elements if needed.
Possible values are:
ParseConfiguration.ElementBalancing.NO_BALANCING
: Do not perform element balancing checks at all. Events will be reported as they appear. There is no guarantee that a DOM tree can be built from the fired events though.ParseConfiguration.ElementBalancing.REQUIRE_BALANCED
: Require that elements are already correctly balanced in markup, throwing an exception if not. Note that when in HTML mode, this does not require the specification of optional tags such as <tbody>. Also note that this will automatically consider thesetNoUnmatchedCloseElementsRequired(boolean)
flag to be set to true.ParseConfiguration.ElementBalancing.AUTO_OPEN_CLOSE
: Auto open and close elements, which includes both those elements that, according to the HTML spec (when in HTML mode) have optional start or end tags (see http://www.w3.org/html/wg/drafts/html/master/syntax.html#optional-tags) and those that simply are unclosed at the moment a parent element needs to be closed (so their closing is forced). As an example of optional tags, the HTML5 spec establishes that <html>, <body> and <tbody> are optional, and that an <li> will close any currently open <li> elements. This is not really ill-formed code, but something allowed by the spec. All of these will be reported as auto-* events by the parser.ParseConfiguration.ElementBalancing.AUTO_CLOSE
: Equivalent toParseConfiguration.ElementBalancing.AUTO_OPEN_CLOSE
but not performing any auto-open operations, so that processing of HTML fragments is possible (no <html> or <body> elements are automatically added).
- Parameters:
elementBalancing
- the level of element balancing.
-
getPrologParseConfiguration
public ParseConfiguration.PrologParseConfiguration getPrologParseConfiguration()
Returns the
ParseConfiguration.PrologParseConfiguration
object determining the way in which prolog (XML Declaration, DOCTYPE) will be dealt with during parsing.- Returns:
- the configuration object.
-
isNoUnmatchedCloseElementsRequired
public boolean isNoUnmatchedCloseElementsRequired()
Returns whether unmatched close elements (those not matching any equivalent open elements) are allowed or not.
- Returns:
- whether unmatched close elements will be allowed (false) or not (true).
-
setNoUnmatchedCloseElementsRequired
public void setNoUnmatchedCloseElementsRequired(boolean noUnmatchedCloseElementsRequired)
Specify whether unmatched close elements (those not matching any equivalent open elements) are allowed or not.
- Parameters:
noUnmatchedCloseElementsRequired
- whether unmatched close elements will be allowed (false) or not (true).
-
isXmlWellFormedAttributeValuesRequired
public boolean isXmlWellFormedAttributeValuesRequired()
Returns whether element attributes will be required to be well-formed from the XML standpoint. This means:
- Attributes should always have a value.
- Attribute values should be surrounded by double-quotes.
- Returns:
- whether attributes should be XML-well-formed or not.
-
setXmlWellFormedAttributeValuesRequired
public void setXmlWellFormedAttributeValuesRequired(boolean xmlWellFormedAttributeValuesRequired)
Specify whether element attributes will be required to be well-formed from the XML standpoint. This means:
- Attributes should always have a value.
- Attribute values should be surrounded by double-quotes.
- Parameters:
xmlWellFormedAttributeValuesRequired
- whether attributes should be XML-well-formed or not.
-
isUniqueAttributesInElementRequired
public boolean isUniqueAttributesInElementRequired()
Returns whether attributes should never appear duplicated in elements.
- Returns:
- whether attributes should never appear duplicated in elements.
-
setUniqueAttributesInElementRequired
public void setUniqueAttributesInElementRequired(boolean uniqueAttributesInElementRequired)
Returns whether attributes should never appear duplicated in elements.
- Parameters:
uniqueAttributesInElementRequired
- whether attributes should never appear duplicated in elements.
-
getUniqueRootElementPresence
public ParseConfiguration.UniqueRootElementPresence getUniqueRootElementPresence()
This value determines whether it will be required that the document has a unique root element.
If set to
ParseConfiguration.UniqueRootElementPresence.REQUIRED_ALWAYS
, then a document with more than one elements at the root level will never be considered valid. And ifParseConfiguration.PrologParseConfiguration.isValidateProlog()
is true and there is a DOCTYPE clause, it will be checked that the root name established at the DOCTYPE clause is the same as the document's element root.If set to
ParseConfiguration.UniqueRootElementPresence.DEPENDS_ON_PROLOG_DOCTYPE
, then:- If
ParseConfiguration.PrologParseConfiguration.isValidateProlog()
is false, multiple document root elements will be allowed. - If
ParseConfiguration.PrologParseConfiguration.isValidateProlog()
is true:- If there is a DOCTYPE clause, a unique element root will be required, and its name will be checked against the name specified at the DOCTYPE clause.
- If there is no DOCTYPE clause (even if it is forbidden), multiple document root elements will be allowed.
If set to
ParseConfiguration.UniqueRootElementPresence.NOT_VALIDATED
, then nothing will be checked regarding the name of the root element/s.Default value is
ParseConfiguration.UniqueRootElementPresence.DEPENDS_ON_PROLOG_DOCTYPE
.- Returns:
- the configuration value for validating the presence of a unique root element.
- If
-
setUniqueRootElementPresence
public void setUniqueRootElementPresence(ParseConfiguration.UniqueRootElementPresence uniqueRootElementPresence)
This value determines whether it will be required that the document has a unique root element.
If set to
ParseConfiguration.UniqueRootElementPresence.REQUIRED_ALWAYS
, then a document with more than one elements at the root level will never be considered valid. And ifParseConfiguration.PrologParseConfiguration.isValidateProlog()
is true and there is a DOCTYPE clause, it will be checked that the root name established at the DOCTYPE clause is the same as the document's element root.If set to
ParseConfiguration.UniqueRootElementPresence.DEPENDS_ON_PROLOG_DOCTYPE
, then:- If
ParseConfiguration.PrologParseConfiguration.isValidateProlog()
is false, multiple document root elements will be allowed. - If
ParseConfiguration.PrologParseConfiguration.isValidateProlog()
is true:- If there is a DOCTYPE clause, a unique element root will be required, and its name will be checked against the name specified at the DOCTYPE clause.
- If there is no DOCTYPE clause (even if it is forbidden), multiple document root elements will be allowed.
If set to
ParseConfiguration.UniqueRootElementPresence.NOT_VALIDATED
, then nothing will be checked regarding the name of the root element/s.Default value is
ParseConfiguration.UniqueRootElementPresence.DEPENDS_ON_PROLOG_DOCTYPE
.- Parameters:
uniqueRootElementPresence
- the configuration value for validating the presence of a unique root element.
- If
-
clone
public ParseConfiguration clone() throws java.lang.CloneNotSupportedException
- Overrides:
clone
in classjava.lang.Object
- Throws:
java.lang.CloneNotSupportedException
-
validateNotNull
private static void validateNotNull(java.lang.Object obj, java.lang.String message)
-
-