Class CleanerProperties

java.lang.Object
org.htmlcleaner.CleanerProperties
All Implemented Interfaces:
HtmlModificationListener

public class CleanerProperties extends Object implements HtmlModificationListener
Properties defining cleaner's behaviour
  • Field Details

    • DEFAULT_CHARSET

      public static final String DEFAULT_CHARSET
      See Also:
    • BOOL_ATT_SELF

      public static final String BOOL_ATT_SELF
      See Also:
    • BOOL_ATT_EMPTY

      public static final String BOOL_ATT_EMPTY
      See Also:
    • BOOL_ATT_TRUE

      public static final String BOOL_ATT_TRUE
      See Also:
    • tagInfoProvider

      private ITagInfoProvider tagInfoProvider
    • advancedXmlEscape

      private boolean advancedXmlEscape
      If this parameter is set to true, ampersand sign (invalid input: '&') that proceeds valid XML character sequences (&XXX;) will not be escaped with &XXX;
    • useCdataFor

      private String useCdataFor
    • useCdataForList

      private List<String> useCdataForList
    • translateSpecialEntities

      private boolean translateSpecialEntities
    • recognizeUnicodeChars

      private boolean recognizeUnicodeChars
    • omitUnknownTags

      private boolean omitUnknownTags
    • treatUnknownTagsAsContent

      private boolean treatUnknownTagsAsContent
    • omitDeprecatedTags

      private boolean omitDeprecatedTags
    • omitComments

      private boolean omitComments
    • treatDeprecatedTagsAsContent

      private boolean treatDeprecatedTagsAsContent
    • omitXmlDeclaration

      private OptionalOutput omitXmlDeclaration
    • omitDoctypeDeclaration

      private OptionalOutput omitDoctypeDeclaration
    • omitHtmlEnvelope

      private OptionalOutput omitHtmlEnvelope
    • useEmptyElementTags

      private boolean useEmptyElementTags
    • allowMultiWordAttributes

      private boolean allowMultiWordAttributes
    • booleanAttributeValues

      private String booleanAttributeValues
    • ignoreQuestAndExclam

      private boolean ignoreQuestAndExclam
    • allowHtmlInsideAttributes

      private boolean allowHtmlInsideAttributes
    • namespacesAware

      private boolean namespacesAware
    • transSpecialEntitiesToNCR

      private boolean transSpecialEntitiesToNCR
    • omitCdataOutsideScriptAndStyle

      private boolean omitCdataOutsideScriptAndStyle
    • deserializeEntities

      private boolean deserializeEntities
    • trimAttributeValues

      private boolean trimAttributeValues
    • htmlVersion

      private int htmlVersion
    • allowInvalidAttributeNames

      private boolean allowInvalidAttributeNames
    • invalidAttributeNamePrefix

      private String invalidAttributeNamePrefix
    • maxDepth

      private int maxDepth
      Provides an arbitrary recursion depth
    • addNewlineToHeadAndBody

      private boolean addNewlineToHeadAndBody
      "cause the cleaner cannot keep track of whitespace at that level", there are 2 lists built: one for the head , one for the body. So whitespace that falls outside of the head and body is not preserved this creates at least a newline break. More work than really wanted at this point to "preserve" the whitespace.
    • keepWhitespaceAndCommentsInHead

      private boolean keepWhitespaceAndCommentsInHead
      Tries to keep inside head all whitespace and comments that were originally there
    • hyphenReplacementInComment

      private String hyphenReplacementInComment
    • pruneTags

      private String pruneTags
    • allowTags

      private String allowTags
    • cleanerTransformations

      private CleanerTransformations cleanerTransformations
    • htmlModificationListeners

      private List<HtmlModificationListener> htmlModificationListeners
    • pruneTagSet

      private Set<ITagNodeCondition> pruneTagSet
      blacklist of tags
    • allowTagSet

      private Set<ITagNodeCondition> allowTagSet
      the list of allowed tags (whitelist approach v. blacklist approach of pruneTags )
    • charset

      private String charset
    • transResCharsToNCR

      private boolean transResCharsToNCR
  • Constructor Details

    • CleanerProperties

      public CleanerProperties()
    • CleanerProperties

      public CleanerProperties(ITagInfoProvider tagInfoProvider)
      Parameters:
      tagInfoProvider -
  • Method Details

    • getMaxDepth

      public int getMaxDepth()
    • setMaxDepth

      public void setMaxDepth(int maxDepth)
    • setTagInfoProvider

      void setTagInfoProvider(ITagInfoProvider tagInfoProvider)
      Parameters:
      tagInfoProvider - the tagInfoProvider to set
    • getTagInfoProvider

      public ITagInfoProvider getTagInfoProvider()
    • isAdvancedXmlEscape

      public boolean isAdvancedXmlEscape()
    • setAdvancedXmlEscape

      public void setAdvancedXmlEscape(boolean advancedXmlEscape)
    • isTransResCharsToNCR

      public boolean isTransResCharsToNCR()
    • setTransResCharsToNCR

      public void setTransResCharsToNCR(boolean transResCharsToNCR)
    • isUseCdataForScriptAndStyle

      public boolean isUseCdataForScriptAndStyle()
    • setUseCdataForScriptAndStyle

      public void setUseCdataForScriptAndStyle(boolean useCdataForScriptAndStyle)
    • setUseCdataFor

      public void setUseCdataFor(String useCdataFor)
    • getUseCdataFor

      public String getUseCdataFor()
    • isUseCdataFor

      public boolean isUseCdataFor(String useCdataFor)
    • isTranslateSpecialEntities

      public boolean isTranslateSpecialEntities()
    • setTranslateSpecialEntities

      public void setTranslateSpecialEntities(boolean translateSpecialEntities)
      TODO : use OptionalOutput
      Parameters:
      translateSpecialEntities -
    • isRecognizeUnicodeChars

      public boolean isRecognizeUnicodeChars()
    • setRecognizeUnicodeChars

      public void setRecognizeUnicodeChars(boolean recognizeUnicodeChars)
    • isOmitUnknownTags

      public boolean isOmitUnknownTags()
    • setOmitUnknownTags

      public void setOmitUnknownTags(boolean omitUnknownTags)
    • isTreatUnknownTagsAsContent

      public boolean isTreatUnknownTagsAsContent()
    • setTreatUnknownTagsAsContent

      public void setTreatUnknownTagsAsContent(boolean treatUnknownTagsAsContent)
    • isOmitDeprecatedTags

      public boolean isOmitDeprecatedTags()
    • setOmitDeprecatedTags

      public void setOmitDeprecatedTags(boolean omitDeprecatedTags)
    • isTreatDeprecatedTagsAsContent

      public boolean isTreatDeprecatedTagsAsContent()
    • setTreatDeprecatedTagsAsContent

      public void setTreatDeprecatedTagsAsContent(boolean treatDeprecatedTagsAsContent)
    • isOmitComments

      public boolean isOmitComments()
    • setOmitComments

      public void setOmitComments(boolean omitComments)
    • isOmitXmlDeclaration

      public boolean isOmitXmlDeclaration()
    • setOmitXmlDeclaration

      public void setOmitXmlDeclaration(boolean omitXmlDeclaration)
    • isOmitDoctypeDeclaration

      public boolean isOmitDoctypeDeclaration()
      Returns:
      also return true if omitting the Html Envelope
    • setOmitDoctypeDeclaration

      public void setOmitDoctypeDeclaration(boolean omitDoctypeDeclaration)
    • isOmitHtmlEnvelope

      public boolean isOmitHtmlEnvelope()
    • setOmitHtmlEnvelope

      public void setOmitHtmlEnvelope(boolean omitHtmlEnvelope)
    • isUseEmptyElementTags

      public boolean isUseEmptyElementTags()
    • setUseEmptyElementTags

      public void setUseEmptyElementTags(boolean useEmptyElementTags)
    • isAllowMultiWordAttributes

      public boolean isAllowMultiWordAttributes()
    • setAllowMultiWordAttributes

      public void setAllowMultiWordAttributes(boolean allowMultiWordAttributes)
    • isAllowHtmlInsideAttributes

      public boolean isAllowHtmlInsideAttributes()
    • setAllowHtmlInsideAttributes

      public void setAllowHtmlInsideAttributes(boolean allowHtmlInsideAttributes)
    • isIgnoreQuestAndExclam

      public boolean isIgnoreQuestAndExclam()
    • setIgnoreQuestAndExclam

      public void setIgnoreQuestAndExclam(boolean ignoreQuestAndExclam)
    • isNamespacesAware

      public boolean isNamespacesAware()
    • setNamespacesAware

      public void setNamespacesAware(boolean namespacesAware)
    • isAddNewlineToHeadAndBody

      public boolean isAddNewlineToHeadAndBody()
    • setAddNewlineToHeadAndBody

      public void setAddNewlineToHeadAndBody(boolean addNewlineToHeadAndBody)
    • isKeepWhitespaceAndCommentsInHead

      public boolean isKeepWhitespaceAndCommentsInHead()
    • setKeepWhitespaceAndCommentsInHead

      public void setKeepWhitespaceAndCommentsInHead(boolean keepHeadWhitespace)
    • getHyphenReplacementInComment

      public String getHyphenReplacementInComment()
    • setHyphenReplacementInComment

      public void setHyphenReplacementInComment(String hyphenReplacementInComment)
    • getPruneTags

      public String getPruneTags()
    • isOmitCdataOutsideScriptAndStyle

      public boolean isOmitCdataOutsideScriptAndStyle()
    • setOmitCdataOutsideScriptAndStyle

      public void setOmitCdataOutsideScriptAndStyle(boolean value)
    • isDeserializeEntities

      public boolean isDeserializeEntities()
    • setDeserializeEntities

      public void setDeserializeEntities(boolean deserializeEntities)
    • setHtmlVersion

      public void setHtmlVersion(int version)
      Sets the html version according to the parameter.Also,it sets the tag provider to the appropriate version.
      Parameters:
      version - Number 4 for html4 or 5 for html5
    • getHtmlVersion

      public int getHtmlVersion()
      Return the html version
      Returns:
      int The html version
    • isTrimAttributeValues

      public boolean isTrimAttributeValues()
    • setTrimAttributeValues

      public void setTrimAttributeValues(boolean trimAttributeValues)
    • setPruneTags

      public void setPruneTags(String pruneTags)
      Resets prune tags set and adds tag name conditions to it. All the tags listed by pruneTags param are added.
      Parameters:
      pruneTags -
    • addPruneTagNodeCondition

      public void addPruneTagNodeCondition(ITagNodeCondition condition)
      Adds the condition to existing prune tag set.
      Parameters:
      condition -
    • getPruneTagSet

      public Set<ITagNodeCondition> getPruneTagSet()
    • getAllowTags

      public String getAllowTags()
    • setAllowTags

      public void setAllowTags(String allowTags)
    • setAllowTagSet

      private void setAllowTagSet(String allowTags)
    • isTransSpecialEntitiesToNCR

      public boolean isTransSpecialEntitiesToNCR()
    • setTransSpecialEntitiesToNCR

      public void setTransSpecialEntitiesToNCR(boolean transSpecialEntitiesToNCR)
    • addTagNameConditions

      private void addTagNameConditions(Set<ITagNodeCondition> tagSet, String tagsNameStr)
      Parameters:
      tagSet -
      tagsNameStr -
    • getAllowTagSet

      public Set<ITagNodeCondition> getAllowTagSet()
    • setCharset

      public void setCharset(String charset)
      Parameters:
      charset - the charset to set
    • getCharset

      public String getCharset()
      Returns:
      the charset
    • getBooleanAttributeValues

      public String getBooleanAttributeValues()
    • setBooleanAttributeValues

      public void setBooleanAttributeValues(String booleanAttributeValues)
    • reset

      public void reset()
      advancedXmlEscape = true; setUseCdataFor("script,style"); translateSpecialEntities = true; recognizeUnicodeChars = true; omitUnknownTags = false; treatUnknownTagsAsContent = false; omitDeprecatedTags = false; treatDeprecatedTagsAsContent = false; omitComments = false; omitXmlDeclaration = OptionalOutput.alwaysOutput; omitDoctypeDeclaration = OptionalOutput.alwaysOutput; omitHtmlEnvelope = OptionalOutput.alwaysOutput; useEmptyElementTags = true; allowMultiWordAttributes = true; allowHtmlInsideAttributes = false; ignoreQuestAndExclam = true; namespacesAware = true; keepHeadWhitespace = true; addNewlineToHeadAndBody = true; hyphenReplacementInComment = "="; pruneTags = null; allowTags = null; booleanAttributeValues = BOOL_ATT_SELF; collapseNullHtml = CollapseHtml.none charset = "UTF-8"; trimAttributeValues = true; tagInfoProvider = HTML5TagProvider.INSTANCE maxDepth = 1000
    • resetPruneTagSet

      private void resetPruneTagSet()
    • getCleanerTransformations

      public CleanerTransformations getCleanerTransformations()
      Returns:
      the cleanerTransformations
    • setCleanerTransformations

      public void setCleanerTransformations(CleanerTransformations cleanerTransformations)
    • addHtmlModificationListener

      public void addHtmlModificationListener(HtmlModificationListener listener)
      Adds a listener to the list of objects that will be notified about changes that cleaner does during cleanup process.
      Parameters:
      listener - -- listener object to be notified of the changes.
    • fireConditionModification

      public void fireConditionModification(ITagNodeCondition condition, TagNode tagNode)
      Description copied from interface: HtmlModificationListener
      Fired when cleaner modifies html due to ITagNodeCondition match.
      Specified by:
      fireConditionModification in interface HtmlModificationListener
      Parameters:
      condition - that was applied to make the modification
      tagNode - - problematic node.
    • fireHtmlError

      public void fireHtmlError(boolean certainty, TagNode startTagToken, ErrorType type)
      Description copied from interface: HtmlModificationListener
      Fired when cleaner fixes some error in html syntax.
      Specified by:
      fireHtmlError in interface HtmlModificationListener
      Parameters:
      certainty - - true if change made doesn't hurts end document.
      startTagToken - - problematic node.
      type -
    • fireUglyHtml

      public void fireUglyHtml(boolean certainty, TagNode startTagToken, ErrorType errorType)
      Description copied from interface: HtmlModificationListener
      Fired when cleaner fixes ugly html -- when syntax was correct but task was implemented by weird code. For example when deprecated tags are removed.
      Specified by:
      fireUglyHtml in interface HtmlModificationListener
      Parameters:
      certainty - - true if change made doesn't hurts end document.
      startTagToken - - problematic node.
      errorType -
    • fireUserDefinedModification

      public void fireUserDefinedModification(boolean certainty, TagNode tagNode, ErrorType errorType)
      Description copied from interface: HtmlModificationListener
      Fired when cleaner modifies html due to user specified rules.
      Specified by:
      fireUserDefinedModification in interface HtmlModificationListener
      Parameters:
      certainty - - true if change made doesn't hurts end document.
      tagNode - - problematic node.
      errorType -
    • getInvalidXmlAttributeNamePrefix

      public String getInvalidXmlAttributeNamePrefix()
      Get the prefix to use to try to make valid attribute names
      Returns:
      invalidAttributeNamePrefix
    • setInvalidXmlAttributeNamePrefix

      public void setInvalidXmlAttributeNamePrefix(String invalidXmlAttributePrefix)
      Sets the prefix to use for xml attributes that are invalid
      Parameters:
      invalidXmlAttributePrefix - the prefix to use
    • setAllowInvalidAttributeNames

      public void setAllowInvalidAttributeNames(boolean allowInvalidAttributeNames)
      Set whether to allow invalid attribute names, or to try to fix or omit them
      Parameters:
      allowInvalidAttributeNames - True if invalid attributes allowed
    • isAllowInvalidAttributeNames

      public boolean isAllowInvalidAttributeNames()
      If false, when outputting XML, if an attribute name is not valid, attempt to fix it by using a prefix and removing invalid characters. Otherwise, omit invalid attributes
      Returns:
      True if invalid attribute names are allowed.