Class ParseConfiguration

java.lang.Object
org.attoparser.config.ParseConfiguration
All Implemented Interfaces:
Serializable, Cloneable

public final class ParseConfiguration extends Object implements Serializable, Cloneable

Models a series of parsing configurations that can be applied during document parsing by MarkupParser and its variants SimpleMarkupParser and DOMMarkupParser.

Among others, the parameters that can be configured are:

  • The parsing mode: XML or HTML.
  • Whether to expect XML-well-formed code or not.
  • Whether to perform automatic tag balancing or not.
  • Whether we will allow parsing of markup fragments or just entire documents.

The htmlConfiguration() and xmlConfiguration() static methods act as starting points for configuration. Once one of these pre-initialized configurations has been created, it can be fine-tuned for the user's needs.

Note these configuration objects are mutable, so they should not be modified once they have been passed to a parser in order to initialize it.

Instances of this class can be cloned, so creating a variant of an already-tuned configuration is easy.

Since:
2.0.0
See Also:
  • Field Details

  • Constructor Details

    • ParseConfiguration

      private ParseConfiguration()
  • Method Details

    • htmlConfiguration

      public static ParseConfiguration htmlConfiguration()

      Return an instance of ParseConfiguration containing a valid configuration set for most HTML scenarios.

      Returns:
      a valid default configuration object for HTML parsing.
    • xmlConfiguration

      public static ParseConfiguration xmlConfiguration()

      Return an instance of ParseConfiguration containing a valid configuration set for most XML scenarios.

      Returns:
      a valid default configuration object for XML parsing.
    • getMode

      Return the parsing mode to be used. Can be XML or HTML.

      Depending on the selected mode parsers will behave differently, given HTML has some specific rules which are not XML-compatible (like void elements which might appear unclosed like <meta>.

      Returns:
      the parsing mode to be used.
    • setMode

      public void setMode(ParseConfiguration.ParsingMode mode)

      Specify the parsing mode to be used. Can be XML or HTML.

      Depending on the selected mode parsers will behave differently, given HTML has some specific rules which are not XML-compatible (like void elements which might appear unclosed like <meta>.

      Parameters:
      mode - the parsing mode to be used.
    • isCaseSensitive

      public boolean isCaseSensitive()

      Returns whether validations performed on the parsed document should be case sensitive or not (e.g. attribute names, document root element name, element open vs close elements, etc.)

      HTML requires this parameter to be false. Default for XML is true.

      Returns:
      whether validations should be case sensitive or not.
    • setCaseSensitive

      public void setCaseSensitive(boolean caseSensitive)

      Specify whether validations performed on the parsed document should be case sensitive or not (e.g. attribute names, document root element name, element open vs close elements, etc.)

      HTML requires this parameter to be false. Default for XML is true.

      Parameters:
      caseSensitive - whether validations should be case sensitive or not.
    • isTextSplittable

      public boolean isTextSplittable()

      Returns whether text fragments in markup can be split in more than one text node, if it occupies more than an entire buffer in size.

      Default is false.

      Returns:
      whether text fragments can be split or not.
    • setTextSplittable

      public void setTextSplittable(boolean textSplittable)

      Specify whether text fragments in markup can be split in more than one text node, if it occupies more than an entire buffer in size.

      Default is false.

      Parameters:
      textSplittable - whether text fragments can be split or not.
    • getElementBalancing

      public ParseConfiguration.ElementBalancing getElementBalancing()

      Returns the level of element balancing required at the document being parsed, enabling auto-closing of elements if needed.

      Possible values are:

      Returns:
      the level of element balancing.
    • setElementBalancing

      public void setElementBalancing(ParseConfiguration.ElementBalancing elementBalancing)

      Specify the level of element balancing required at the document being parsed, enabling auto-closing of elements if needed.

      Possible values are:

      Parameters:
      elementBalancing - the level of element balancing.
    • getPrologParseConfiguration

      public ParseConfiguration.PrologParseConfiguration getPrologParseConfiguration()

      Returns the ParseConfiguration.PrologParseConfiguration object determining the way in which prolog (XML Declaration, DOCTYPE) will be dealt with during parsing.

      Returns:
      the configuration object.
    • isNoUnmatchedCloseElementsRequired

      public boolean isNoUnmatchedCloseElementsRequired()

      Returns whether unmatched close elements (those not matching any equivalent open elements) are allowed or not.

      Returns:
      whether unmatched close elements will be allowed (false) or not (true).
    • setNoUnmatchedCloseElementsRequired

      public void setNoUnmatchedCloseElementsRequired(boolean noUnmatchedCloseElementsRequired)

      Specify whether unmatched close elements (those not matching any equivalent open elements) are allowed or not.

      Parameters:
      noUnmatchedCloseElementsRequired - whether unmatched close elements will be allowed (false) or not (true).
    • isXmlWellFormedAttributeValuesRequired

      public boolean isXmlWellFormedAttributeValuesRequired()

      Returns whether element attributes will be required to be well-formed from the XML standpoint. This means:

      • Attributes should always have a value.
      • Attribute values should be surrounded by double-quotes.
      Returns:
      whether attributes should be XML-well-formed or not.
    • setXmlWellFormedAttributeValuesRequired

      public void setXmlWellFormedAttributeValuesRequired(boolean xmlWellFormedAttributeValuesRequired)

      Specify whether element attributes will be required to be well-formed from the XML standpoint. This means:

      • Attributes should always have a value.
      • Attribute values should be surrounded by double-quotes.
      Parameters:
      xmlWellFormedAttributeValuesRequired - whether attributes should be XML-well-formed or not.
    • isUniqueAttributesInElementRequired

      public boolean isUniqueAttributesInElementRequired()

      Returns whether attributes should never appear duplicated in elements.

      Returns:
      whether attributes should never appear duplicated in elements.
    • setUniqueAttributesInElementRequired

      public void setUniqueAttributesInElementRequired(boolean uniqueAttributesInElementRequired)

      Returns whether attributes should never appear duplicated in elements.

      Parameters:
      uniqueAttributesInElementRequired - whether attributes should never appear duplicated in elements.
    • getUniqueRootElementPresence

      public ParseConfiguration.UniqueRootElementPresence getUniqueRootElementPresence()

      This value determines whether it will be required that the document has a unique root element.

      If set to ParseConfiguration.UniqueRootElementPresence.REQUIRED_ALWAYS, then a document with more than one elements at the root level will never be considered valid. And if ParseConfiguration.PrologParseConfiguration.isValidateProlog() is true and there is a DOCTYPE clause, it will be checked that the root name established at the DOCTYPE clause is the same as the document's element root.

      If set to ParseConfiguration.UniqueRootElementPresence.DEPENDS_ON_PROLOG_DOCTYPE, then:

      If set to ParseConfiguration.UniqueRootElementPresence.NOT_VALIDATED, then nothing will be checked regarding the name of the root element/s.

      Default value is ParseConfiguration.UniqueRootElementPresence.DEPENDS_ON_PROLOG_DOCTYPE.

      Returns:
      the configuration value for validating the presence of a unique root element.
    • setUniqueRootElementPresence

      public void setUniqueRootElementPresence(ParseConfiguration.UniqueRootElementPresence uniqueRootElementPresence)

      This value determines whether it will be required that the document has a unique root element.

      If set to ParseConfiguration.UniqueRootElementPresence.REQUIRED_ALWAYS, then a document with more than one elements at the root level will never be considered valid. And if ParseConfiguration.PrologParseConfiguration.isValidateProlog() is true and there is a DOCTYPE clause, it will be checked that the root name established at the DOCTYPE clause is the same as the document's element root.

      If set to ParseConfiguration.UniqueRootElementPresence.DEPENDS_ON_PROLOG_DOCTYPE, then:

      If set to ParseConfiguration.UniqueRootElementPresence.NOT_VALIDATED, then nothing will be checked regarding the name of the root element/s.

      Default value is ParseConfiguration.UniqueRootElementPresence.DEPENDS_ON_PROLOG_DOCTYPE.

      Parameters:
      uniqueRootElementPresence - the configuration value for validating the presence of a unique root element.
    • clone

      Overrides:
      clone in class Object
      Throws:
      CloneNotSupportedException
    • validateNotNull

      private static void validateNotNull(Object obj, String message)