Class CleanerProperties

    • Field Detail

      • advancedXmlEscape

        private boolean advancedXmlEscape
        If this parameter is set to true, ampersand sign (&) that proceeds valid XML character sequences (&XXX;) will not be escaped with &XXX;
      • useCdataFor

        private java.lang.String useCdataFor
      • useCdataForList

        private java.util.List<java.lang.String> useCdataForList
      • translateSpecialEntities

        private boolean translateSpecialEntities
      • recognizeUnicodeChars

        private boolean recognizeUnicodeChars
      • omitUnknownTags

        private boolean omitUnknownTags
      • treatUnknownTagsAsContent

        private boolean treatUnknownTagsAsContent
      • omitDeprecatedTags

        private boolean omitDeprecatedTags
      • omitComments

        private boolean omitComments
      • treatDeprecatedTagsAsContent

        private boolean treatDeprecatedTagsAsContent
      • omitDoctypeDeclaration

        private OptionalOutput omitDoctypeDeclaration
      • useEmptyElementTags

        private boolean useEmptyElementTags
      • allowMultiWordAttributes

        private boolean allowMultiWordAttributes
      • booleanAttributeValues

        private java.lang.String booleanAttributeValues
      • ignoreQuestAndExclam

        private boolean ignoreQuestAndExclam
      • allowHtmlInsideAttributes

        private boolean allowHtmlInsideAttributes
      • namespacesAware

        private boolean namespacesAware
      • transSpecialEntitiesToNCR

        private boolean transSpecialEntitiesToNCR
      • omitCdataOutsideScriptAndStyle

        private boolean omitCdataOutsideScriptAndStyle
      • deserializeEntities

        private boolean deserializeEntities
      • trimAttributeValues

        private boolean trimAttributeValues
      • htmlVersion

        private int htmlVersion
      • allowInvalidAttributeNames

        private boolean allowInvalidAttributeNames
      • invalidAttributeNamePrefix

        private java.lang.String invalidAttributeNamePrefix
      • maxDepth

        private int maxDepth
        Provides an arbitrary recursion depth
      • addNewlineToHeadAndBody

        private boolean addNewlineToHeadAndBody
        "cause the cleaner cannot keep track of whitespace at that level", there are 2 lists built: one for the head , one for the body. So whitespace that falls outside of the head and body is not preserved this creates at least a newline break. More work than really wanted at this point to "preserve" the whitespace.
      • keepWhitespaceAndCommentsInHead

        private boolean keepWhitespaceAndCommentsInHead
        Tries to keep inside head all whitespace and comments that were originally there
      • hyphenReplacementInComment

        private java.lang.String hyphenReplacementInComment
      • pruneTags

        private java.lang.String pruneTags
      • allowTags

        private java.lang.String allowTags
      • pruneTagSet

        private java.util.Set<ITagNodeCondition> pruneTagSet
        blacklist of tags
      • allowTagSet

        private java.util.Set<ITagNodeCondition> allowTagSet
        the list of allowed tags (whitelist approach v. blacklist approach of pruneTags )
      • charset

        private java.lang.String charset
      • transResCharsToNCR

        private boolean transResCharsToNCR
    • Constructor Detail

      • CleanerProperties

        public CleanerProperties()
      • CleanerProperties

        public CleanerProperties​(ITagInfoProvider tagInfoProvider)
        Parameters:
        tagInfoProvider -
    • Method Detail

      • getMaxDepth

        public int getMaxDepth()
      • setMaxDepth

        public void setMaxDepth​(int maxDepth)
      • setTagInfoProvider

        void setTagInfoProvider​(ITagInfoProvider tagInfoProvider)
        Parameters:
        tagInfoProvider - the tagInfoProvider to set
      • isAdvancedXmlEscape

        public boolean isAdvancedXmlEscape()
      • setAdvancedXmlEscape

        public void setAdvancedXmlEscape​(boolean advancedXmlEscape)
      • isTransResCharsToNCR

        public boolean isTransResCharsToNCR()
      • setTransResCharsToNCR

        public void setTransResCharsToNCR​(boolean transResCharsToNCR)
      • isUseCdataForScriptAndStyle

        public boolean isUseCdataForScriptAndStyle()
      • setUseCdataForScriptAndStyle

        public void setUseCdataForScriptAndStyle​(boolean useCdataForScriptAndStyle)
      • setUseCdataFor

        public void setUseCdataFor​(java.lang.String useCdataFor)
      • getUseCdataFor

        public java.lang.String getUseCdataFor()
      • isUseCdataFor

        public boolean isUseCdataFor​(java.lang.String useCdataFor)
      • isTranslateSpecialEntities

        public boolean isTranslateSpecialEntities()
      • setTranslateSpecialEntities

        public void setTranslateSpecialEntities​(boolean translateSpecialEntities)
        TODO : use OptionalOutput
        Parameters:
        translateSpecialEntities -
      • isRecognizeUnicodeChars

        public boolean isRecognizeUnicodeChars()
      • setRecognizeUnicodeChars

        public void setRecognizeUnicodeChars​(boolean recognizeUnicodeChars)
      • isOmitUnknownTags

        public boolean isOmitUnknownTags()
      • setOmitUnknownTags

        public void setOmitUnknownTags​(boolean omitUnknownTags)
      • isTreatUnknownTagsAsContent

        public boolean isTreatUnknownTagsAsContent()
      • setTreatUnknownTagsAsContent

        public void setTreatUnknownTagsAsContent​(boolean treatUnknownTagsAsContent)
      • isOmitDeprecatedTags

        public boolean isOmitDeprecatedTags()
      • setOmitDeprecatedTags

        public void setOmitDeprecatedTags​(boolean omitDeprecatedTags)
      • isTreatDeprecatedTagsAsContent

        public boolean isTreatDeprecatedTagsAsContent()
      • setTreatDeprecatedTagsAsContent

        public void setTreatDeprecatedTagsAsContent​(boolean treatDeprecatedTagsAsContent)
      • isOmitComments

        public boolean isOmitComments()
      • setOmitComments

        public void setOmitComments​(boolean omitComments)
      • isOmitXmlDeclaration

        public boolean isOmitXmlDeclaration()
      • setOmitXmlDeclaration

        public void setOmitXmlDeclaration​(boolean omitXmlDeclaration)
      • isOmitDoctypeDeclaration

        public boolean isOmitDoctypeDeclaration()
        Returns:
        also return true if omitting the Html Envelope
      • setOmitDoctypeDeclaration

        public void setOmitDoctypeDeclaration​(boolean omitDoctypeDeclaration)
      • isOmitHtmlEnvelope

        public boolean isOmitHtmlEnvelope()
      • setOmitHtmlEnvelope

        public void setOmitHtmlEnvelope​(boolean omitHtmlEnvelope)
      • isUseEmptyElementTags

        public boolean isUseEmptyElementTags()
      • setUseEmptyElementTags

        public void setUseEmptyElementTags​(boolean useEmptyElementTags)
      • isAllowMultiWordAttributes

        public boolean isAllowMultiWordAttributes()
      • setAllowMultiWordAttributes

        public void setAllowMultiWordAttributes​(boolean allowMultiWordAttributes)
      • isAllowHtmlInsideAttributes

        public boolean isAllowHtmlInsideAttributes()
      • setAllowHtmlInsideAttributes

        public void setAllowHtmlInsideAttributes​(boolean allowHtmlInsideAttributes)
      • isIgnoreQuestAndExclam

        public boolean isIgnoreQuestAndExclam()
      • setIgnoreQuestAndExclam

        public void setIgnoreQuestAndExclam​(boolean ignoreQuestAndExclam)
      • isNamespacesAware

        public boolean isNamespacesAware()
      • setNamespacesAware

        public void setNamespacesAware​(boolean namespacesAware)
      • isAddNewlineToHeadAndBody

        public boolean isAddNewlineToHeadAndBody()
      • setAddNewlineToHeadAndBody

        public void setAddNewlineToHeadAndBody​(boolean addNewlineToHeadAndBody)
      • isKeepWhitespaceAndCommentsInHead

        public boolean isKeepWhitespaceAndCommentsInHead()
      • setKeepWhitespaceAndCommentsInHead

        public void setKeepWhitespaceAndCommentsInHead​(boolean keepHeadWhitespace)
      • getHyphenReplacementInComment

        public java.lang.String getHyphenReplacementInComment()
      • setHyphenReplacementInComment

        public void setHyphenReplacementInComment​(java.lang.String hyphenReplacementInComment)
      • getPruneTags

        public java.lang.String getPruneTags()
      • isOmitCdataOutsideScriptAndStyle

        public boolean isOmitCdataOutsideScriptAndStyle()
      • setOmitCdataOutsideScriptAndStyle

        public void setOmitCdataOutsideScriptAndStyle​(boolean value)
      • isDeserializeEntities

        public boolean isDeserializeEntities()
      • setDeserializeEntities

        public void setDeserializeEntities​(boolean deserializeEntities)
      • setHtmlVersion

        public void setHtmlVersion​(int version)
        Sets the html version according to the parameter.Also,it sets the tag provider to the appropriate version.
        Parameters:
        version - Number 4 for html4 or 5 for html5
      • getHtmlVersion

        public int getHtmlVersion()
        Return the html version
        Returns:
        int The html version
      • isTrimAttributeValues

        public boolean isTrimAttributeValues()
      • setTrimAttributeValues

        public void setTrimAttributeValues​(boolean trimAttributeValues)
      • setPruneTags

        public void setPruneTags​(java.lang.String pruneTags)
        Resets prune tags set and adds tag name conditions to it. All the tags listed by pruneTags param are added.
        Parameters:
        pruneTags -
      • addPruneTagNodeCondition

        public void addPruneTagNodeCondition​(ITagNodeCondition condition)
        Adds the condition to existing prune tag set.
        Parameters:
        condition -
      • getAllowTags

        public java.lang.String getAllowTags()
      • setAllowTags

        public void setAllowTags​(java.lang.String allowTags)
      • setAllowTagSet

        private void setAllowTagSet​(java.lang.String allowTags)
      • isTransSpecialEntitiesToNCR

        public boolean isTransSpecialEntitiesToNCR()
      • setTransSpecialEntitiesToNCR

        public void setTransSpecialEntitiesToNCR​(boolean transSpecialEntitiesToNCR)
      • addTagNameConditions

        private void addTagNameConditions​(java.util.Set<ITagNodeCondition> tagSet,
                                          java.lang.String tagsNameStr)
        Parameters:
        tagSet -
        tagsNameStr -
      • setCharset

        public void setCharset​(java.lang.String charset)
        Parameters:
        charset - the charset to set
      • getCharset

        public java.lang.String getCharset()
        Returns:
        the charset
      • getBooleanAttributeValues

        public java.lang.String getBooleanAttributeValues()
      • setBooleanAttributeValues

        public void setBooleanAttributeValues​(java.lang.String booleanAttributeValues)
      • reset

        public void reset()
        advancedXmlEscape = true; setUseCdataFor("script,style"); translateSpecialEntities = true; recognizeUnicodeChars = true; omitUnknownTags = false; treatUnknownTagsAsContent = false; omitDeprecatedTags = false; treatDeprecatedTagsAsContent = false; omitComments = false; omitXmlDeclaration = OptionalOutput.alwaysOutput; omitDoctypeDeclaration = OptionalOutput.alwaysOutput; omitHtmlEnvelope = OptionalOutput.alwaysOutput; useEmptyElementTags = true; allowMultiWordAttributes = true; allowHtmlInsideAttributes = false; ignoreQuestAndExclam = true; namespacesAware = true; keepHeadWhitespace = true; addNewlineToHeadAndBody = true; hyphenReplacementInComment = "="; pruneTags = null; allowTags = null; booleanAttributeValues = BOOL_ATT_SELF; collapseNullHtml = CollapseHtml.none charset = "UTF-8"; trimAttributeValues = true; tagInfoProvider = HTML5TagProvider.INSTANCE maxDepth = 1000
      • resetPruneTagSet

        private void resetPruneTagSet()
      • getCleanerTransformations

        public CleanerTransformations getCleanerTransformations()
        Returns:
        the cleanerTransformations
      • setCleanerTransformations

        public void setCleanerTransformations​(CleanerTransformations cleanerTransformations)
      • addHtmlModificationListener

        public void addHtmlModificationListener​(HtmlModificationListener listener)
        Adds a listener to the list of objects that will be notified about changes that cleaner does during cleanup process.
        Parameters:
        listener - -- listener object to be notified of the changes.
      • fireHtmlError

        public void fireHtmlError​(boolean certainty,
                                  TagNode startTagToken,
                                  ErrorType type)
        Description copied from interface: HtmlModificationListener
        Fired when cleaner fixes some error in html syntax.
        Specified by:
        fireHtmlError in interface HtmlModificationListener
        Parameters:
        certainty - - true if change made doesn't hurts end document.
        startTagToken - - problematic node.
      • fireUglyHtml

        public void fireUglyHtml​(boolean certainty,
                                 TagNode startTagToken,
                                 ErrorType errorType)
        Description copied from interface: HtmlModificationListener
        Fired when cleaner fixes ugly html -- when syntax was correct but task was implemented by weird code. For example when deprecated tags are removed.
        Specified by:
        fireUglyHtml in interface HtmlModificationListener
        Parameters:
        certainty - - true if change made doesn't hurts end document.
        startTagToken - - problematic node.
      • getInvalidXmlAttributeNamePrefix

        public java.lang.String getInvalidXmlAttributeNamePrefix()
        Get the prefix to use to try to make valid attribute names
        Returns:
        invalidAttributeNamePrefix
      • setInvalidXmlAttributeNamePrefix

        public void setInvalidXmlAttributeNamePrefix​(java.lang.String invalidXmlAttributePrefix)
        Sets the prefix to use for xml attributes that are invalid
        Parameters:
        invalidXmlAttributePrefix - the prefix to use
      • setAllowInvalidAttributeNames

        public void setAllowInvalidAttributeNames​(boolean allowInvalidAttributeNames)
        Set whether to allow invalid attribute names, or to try to fix or omit them
        Parameters:
        allowInvalidAttributeNames - True if invalid attributes allowed
      • isAllowInvalidAttributeNames

        public boolean isAllowInvalidAttributeNames()
        If false, when outputting XML, if an attribute name is not valid, attempt to fix it by using a prefix and removing invalid characters. Otherwise, omit invalid attributes
        Returns:
        True if invalid attribute names are allowed.