Package org.htmlcleaner
Class CleanerProperties
java.lang.Object
org.htmlcleaner.CleanerProperties
- All Implemented Interfaces:
HtmlModificationListener
Properties defining cleaner's behaviour
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate boolean
"cause the cleaner cannot keep track of whitespace at that level", there are 2 lists built: one for the head , one for the body.private boolean
If this parameter is set to true, ampersand sign (invalid input: '&') that proceeds valid XML character sequences (&XXX;) will not be escaped with &XXX;private boolean
private boolean
private boolean
private String
private Set
<ITagNodeCondition> the list of allowed tags (whitelist approach v.static final String
static final String
static final String
private String
private String
private CleanerTransformations
static final String
private boolean
private List
<HtmlModificationListener> private int
private String
private boolean
private String
private boolean
Tries to keep inside head all whitespace and comments that were originally thereprivate int
Provides an arbitrary recursion depthprivate boolean
private boolean
private boolean
private boolean
private OptionalOutput
private OptionalOutput
private boolean
private OptionalOutput
private String
private Set
<ITagNodeCondition> blacklist of tagsprivate boolean
private ITagInfoProvider
private boolean
private boolean
private boolean
private boolean
private boolean
private boolean
private String
private boolean
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoid
Adds a listener to the list of objects that will be notified about changes that cleaner does during cleanup process.void
addPruneTagNodeCondition
(ITagNodeCondition condition) Adds the condition to existing prune tag set.private void
addTagNameConditions
(Set<ITagNodeCondition> tagSet, String tagsNameStr) void
fireConditionModification
(ITagNodeCondition condition, TagNode tagNode) Fired when cleaner modifies html due toITagNodeCondition
match.void
fireHtmlError
(boolean certainty, TagNode startTagToken, ErrorType type) Fired when cleaner fixes some error in html syntax.void
fireUglyHtml
(boolean certainty, TagNode startTagToken, ErrorType errorType) Fired when cleaner fixes ugly html -- when syntax was correct but task was implemented by weird code.void
fireUserDefinedModification
(boolean certainty, TagNode tagNode, ErrorType errorType) Fired when cleaner modifies html due to user specified rules.int
Return the html versionGet the prefix to use to try to make valid attribute namesint
boolean
boolean
boolean
boolean
If false, when outputting XML, if an attribute name is not valid, attempt to fix it by using a prefix and removing invalid characters.boolean
boolean
boolean
boolean
boolean
boolean
boolean
boolean
boolean
boolean
boolean
boolean
boolean
boolean
boolean
boolean
boolean
boolean
boolean
boolean
isUseCdataFor
(String useCdataFor) boolean
boolean
void
reset()
advancedXmlEscape = true; setUseCdataFor("script,style"); translateSpecialEntities = true; recognizeUnicodeChars = true; omitUnknownTags = false; treatUnknownTagsAsContent = false; omitDeprecatedTags = false; treatDeprecatedTagsAsContent = false; omitComments = false; omitXmlDeclaration = OptionalOutput.alwaysOutput; omitDoctypeDeclaration = OptionalOutput.alwaysOutput; omitHtmlEnvelope = OptionalOutput.alwaysOutput; useEmptyElementTags = true; allowMultiWordAttributes = true; allowHtmlInsideAttributes = false; ignoreQuestAndExclam = true; namespacesAware = true; keepHeadWhitespace = true; addNewlineToHeadAndBody = true; hyphenReplacementInComment = "="; pruneTags = null; allowTags = null; booleanAttributeValues = BOOL_ATT_SELF; collapseNullHtml = CollapseHtml.none charset = "UTF-8"; trimAttributeValues = true; tagInfoProvider = HTML5TagProvider.INSTANCE maxDepth = 1000private void
void
setAddNewlineToHeadAndBody
(boolean addNewlineToHeadAndBody) void
setAdvancedXmlEscape
(boolean advancedXmlEscape) void
setAllowHtmlInsideAttributes
(boolean allowHtmlInsideAttributes) void
setAllowInvalidAttributeNames
(boolean allowInvalidAttributeNames) Set whether to allow invalid attribute names, or to try to fix or omit themvoid
setAllowMultiWordAttributes
(boolean allowMultiWordAttributes) void
setAllowTags
(String allowTags) private void
setAllowTagSet
(String allowTags) void
setBooleanAttributeValues
(String booleanAttributeValues) void
setCharset
(String charset) void
setCleanerTransformations
(CleanerTransformations cleanerTransformations) void
setDeserializeEntities
(boolean deserializeEntities) void
setHtmlVersion
(int version) Sets the html version according to the parameter.Also,it sets the tag provider to the appropriate version.void
setHyphenReplacementInComment
(String hyphenReplacementInComment) void
setIgnoreQuestAndExclam
(boolean ignoreQuestAndExclam) void
setInvalidXmlAttributeNamePrefix
(String invalidXmlAttributePrefix) Sets the prefix to use for xml attributes that are invalidvoid
setKeepWhitespaceAndCommentsInHead
(boolean keepHeadWhitespace) void
setMaxDepth
(int maxDepth) void
setNamespacesAware
(boolean namespacesAware) void
setOmitCdataOutsideScriptAndStyle
(boolean value) void
setOmitComments
(boolean omitComments) void
setOmitDeprecatedTags
(boolean omitDeprecatedTags) void
setOmitDoctypeDeclaration
(boolean omitDoctypeDeclaration) void
setOmitHtmlEnvelope
(boolean omitHtmlEnvelope) void
setOmitUnknownTags
(boolean omitUnknownTags) void
setOmitXmlDeclaration
(boolean omitXmlDeclaration) void
setPruneTags
(String pruneTags) Resets prune tags set and adds tag name conditions to it.void
setRecognizeUnicodeChars
(boolean recognizeUnicodeChars) (package private) void
setTagInfoProvider
(ITagInfoProvider tagInfoProvider) void
setTranslateSpecialEntities
(boolean translateSpecialEntities) TODO : useOptionalOutput
void
setTransResCharsToNCR
(boolean transResCharsToNCR) void
setTransSpecialEntitiesToNCR
(boolean transSpecialEntitiesToNCR) void
setTreatDeprecatedTagsAsContent
(boolean treatDeprecatedTagsAsContent) void
setTreatUnknownTagsAsContent
(boolean treatUnknownTagsAsContent) void
setTrimAttributeValues
(boolean trimAttributeValues) void
setUseCdataFor
(String useCdataFor) void
setUseCdataForScriptAndStyle
(boolean useCdataForScriptAndStyle) void
setUseEmptyElementTags
(boolean useEmptyElementTags)
-
Field Details
-
DEFAULT_CHARSET
- See Also:
-
BOOL_ATT_SELF
- See Also:
-
BOOL_ATT_EMPTY
- See Also:
-
BOOL_ATT_TRUE
- See Also:
-
tagInfoProvider
-
advancedXmlEscape
private boolean advancedXmlEscapeIf this parameter is set to true, ampersand sign (invalid input: '&') that proceeds valid XML character sequences (&XXX;) will not be escaped with &XXX; -
useCdataFor
-
useCdataForList
-
translateSpecialEntities
private boolean translateSpecialEntities -
recognizeUnicodeChars
private boolean recognizeUnicodeChars -
omitUnknownTags
private boolean omitUnknownTags -
treatUnknownTagsAsContent
private boolean treatUnknownTagsAsContent -
omitDeprecatedTags
private boolean omitDeprecatedTags -
omitComments
private boolean omitComments -
treatDeprecatedTagsAsContent
private boolean treatDeprecatedTagsAsContent -
omitXmlDeclaration
-
omitDoctypeDeclaration
-
omitHtmlEnvelope
-
useEmptyElementTags
private boolean useEmptyElementTags -
allowMultiWordAttributes
private boolean allowMultiWordAttributes -
booleanAttributeValues
-
ignoreQuestAndExclam
private boolean ignoreQuestAndExclam -
allowHtmlInsideAttributes
private boolean allowHtmlInsideAttributes -
namespacesAware
private boolean namespacesAware -
transSpecialEntitiesToNCR
private boolean transSpecialEntitiesToNCR -
omitCdataOutsideScriptAndStyle
private boolean omitCdataOutsideScriptAndStyle -
deserializeEntities
private boolean deserializeEntities -
trimAttributeValues
private boolean trimAttributeValues -
htmlVersion
private int htmlVersion -
allowInvalidAttributeNames
private boolean allowInvalidAttributeNames -
invalidAttributeNamePrefix
-
maxDepth
private int maxDepthProvides an arbitrary recursion depth -
addNewlineToHeadAndBody
private boolean addNewlineToHeadAndBody"cause the cleaner cannot keep track of whitespace at that level", there are 2 lists built: one for the head , one for the body. So whitespace that falls outside of the head and body is not preserved this creates at least a newline break. More work than really wanted at this point to "preserve" the whitespace. -
keepWhitespaceAndCommentsInHead
private boolean keepWhitespaceAndCommentsInHeadTries to keep inside head all whitespace and comments that were originally there -
hyphenReplacementInComment
-
pruneTags
-
allowTags
-
cleanerTransformations
-
htmlModificationListeners
-
pruneTagSet
blacklist of tags -
allowTagSet
the list of allowed tags (whitelist approach v. blacklist approach of pruneTags ) -
charset
-
transResCharsToNCR
private boolean transResCharsToNCR
-
-
Constructor Details
-
CleanerProperties
public CleanerProperties() -
CleanerProperties
- Parameters:
tagInfoProvider
-
-
-
Method Details
-
getMaxDepth
public int getMaxDepth() -
setMaxDepth
public void setMaxDepth(int maxDepth) -
setTagInfoProvider
- Parameters:
tagInfoProvider
- the tagInfoProvider to set
-
getTagInfoProvider
-
isAdvancedXmlEscape
public boolean isAdvancedXmlEscape() -
setAdvancedXmlEscape
public void setAdvancedXmlEscape(boolean advancedXmlEscape) -
isTransResCharsToNCR
public boolean isTransResCharsToNCR() -
setTransResCharsToNCR
public void setTransResCharsToNCR(boolean transResCharsToNCR) -
isUseCdataForScriptAndStyle
public boolean isUseCdataForScriptAndStyle() -
setUseCdataForScriptAndStyle
public void setUseCdataForScriptAndStyle(boolean useCdataForScriptAndStyle) -
setUseCdataFor
-
getUseCdataFor
-
isUseCdataFor
-
isTranslateSpecialEntities
public boolean isTranslateSpecialEntities() -
setTranslateSpecialEntities
public void setTranslateSpecialEntities(boolean translateSpecialEntities) TODO : useOptionalOutput
- Parameters:
translateSpecialEntities
-
-
isRecognizeUnicodeChars
public boolean isRecognizeUnicodeChars() -
setRecognizeUnicodeChars
public void setRecognizeUnicodeChars(boolean recognizeUnicodeChars) -
isOmitUnknownTags
public boolean isOmitUnknownTags() -
setOmitUnknownTags
public void setOmitUnknownTags(boolean omitUnknownTags) -
isTreatUnknownTagsAsContent
public boolean isTreatUnknownTagsAsContent() -
setTreatUnknownTagsAsContent
public void setTreatUnknownTagsAsContent(boolean treatUnknownTagsAsContent) -
isOmitDeprecatedTags
public boolean isOmitDeprecatedTags() -
setOmitDeprecatedTags
public void setOmitDeprecatedTags(boolean omitDeprecatedTags) -
isTreatDeprecatedTagsAsContent
public boolean isTreatDeprecatedTagsAsContent() -
setTreatDeprecatedTagsAsContent
public void setTreatDeprecatedTagsAsContent(boolean treatDeprecatedTagsAsContent) -
isOmitComments
public boolean isOmitComments() -
setOmitComments
public void setOmitComments(boolean omitComments) -
isOmitXmlDeclaration
public boolean isOmitXmlDeclaration() -
setOmitXmlDeclaration
public void setOmitXmlDeclaration(boolean omitXmlDeclaration) -
isOmitDoctypeDeclaration
public boolean isOmitDoctypeDeclaration()- Returns:
- also return true if omitting the Html Envelope
-
setOmitDoctypeDeclaration
public void setOmitDoctypeDeclaration(boolean omitDoctypeDeclaration) -
isOmitHtmlEnvelope
public boolean isOmitHtmlEnvelope() -
setOmitHtmlEnvelope
public void setOmitHtmlEnvelope(boolean omitHtmlEnvelope) -
isUseEmptyElementTags
public boolean isUseEmptyElementTags() -
setUseEmptyElementTags
public void setUseEmptyElementTags(boolean useEmptyElementTags) -
isAllowMultiWordAttributes
public boolean isAllowMultiWordAttributes() -
setAllowMultiWordAttributes
public void setAllowMultiWordAttributes(boolean allowMultiWordAttributes) -
isAllowHtmlInsideAttributes
public boolean isAllowHtmlInsideAttributes() -
setAllowHtmlInsideAttributes
public void setAllowHtmlInsideAttributes(boolean allowHtmlInsideAttributes) -
isIgnoreQuestAndExclam
public boolean isIgnoreQuestAndExclam() -
setIgnoreQuestAndExclam
public void setIgnoreQuestAndExclam(boolean ignoreQuestAndExclam) -
isNamespacesAware
public boolean isNamespacesAware() -
setNamespacesAware
public void setNamespacesAware(boolean namespacesAware) -
isAddNewlineToHeadAndBody
public boolean isAddNewlineToHeadAndBody() -
setAddNewlineToHeadAndBody
public void setAddNewlineToHeadAndBody(boolean addNewlineToHeadAndBody) -
isKeepWhitespaceAndCommentsInHead
public boolean isKeepWhitespaceAndCommentsInHead() -
setKeepWhitespaceAndCommentsInHead
public void setKeepWhitespaceAndCommentsInHead(boolean keepHeadWhitespace) -
getHyphenReplacementInComment
-
setHyphenReplacementInComment
-
getPruneTags
-
isOmitCdataOutsideScriptAndStyle
public boolean isOmitCdataOutsideScriptAndStyle() -
setOmitCdataOutsideScriptAndStyle
public void setOmitCdataOutsideScriptAndStyle(boolean value) -
isDeserializeEntities
public boolean isDeserializeEntities() -
setDeserializeEntities
public void setDeserializeEntities(boolean deserializeEntities) -
setHtmlVersion
public void setHtmlVersion(int version) Sets the html version according to the parameter.Also,it sets the tag provider to the appropriate version.- Parameters:
version
- Number 4 for html4 or 5 for html5
-
getHtmlVersion
public int getHtmlVersion()Return the html version- Returns:
- int The html version
-
isTrimAttributeValues
public boolean isTrimAttributeValues() -
setTrimAttributeValues
public void setTrimAttributeValues(boolean trimAttributeValues) -
setPruneTags
Resets prune tags set and adds tag name conditions to it. All the tags listed by pruneTags param are added.- Parameters:
pruneTags
-
-
addPruneTagNodeCondition
Adds the condition to existing prune tag set.- Parameters:
condition
-
-
getPruneTagSet
-
getAllowTags
-
setAllowTags
-
setAllowTagSet
-
isTransSpecialEntitiesToNCR
public boolean isTransSpecialEntitiesToNCR() -
setTransSpecialEntitiesToNCR
public void setTransSpecialEntitiesToNCR(boolean transSpecialEntitiesToNCR) -
addTagNameConditions
- Parameters:
tagSet
-tagsNameStr
-
-
getAllowTagSet
-
setCharset
- Parameters:
charset
- the charset to set
-
getCharset
- Returns:
- the charset
-
getBooleanAttributeValues
-
setBooleanAttributeValues
-
reset
public void reset()advancedXmlEscape = true; setUseCdataFor("script,style"); translateSpecialEntities = true; recognizeUnicodeChars = true; omitUnknownTags = false; treatUnknownTagsAsContent = false; omitDeprecatedTags = false; treatDeprecatedTagsAsContent = false; omitComments = false; omitXmlDeclaration = OptionalOutput.alwaysOutput; omitDoctypeDeclaration = OptionalOutput.alwaysOutput; omitHtmlEnvelope = OptionalOutput.alwaysOutput; useEmptyElementTags = true; allowMultiWordAttributes = true; allowHtmlInsideAttributes = false; ignoreQuestAndExclam = true; namespacesAware = true; keepHeadWhitespace = true; addNewlineToHeadAndBody = true; hyphenReplacementInComment = "="; pruneTags = null; allowTags = null; booleanAttributeValues = BOOL_ATT_SELF; collapseNullHtml = CollapseHtml.none charset = "UTF-8"; trimAttributeValues = true; tagInfoProvider = HTML5TagProvider.INSTANCE maxDepth = 1000 -
resetPruneTagSet
private void resetPruneTagSet() -
getCleanerTransformations
- Returns:
- the cleanerTransformations
-
setCleanerTransformations
-
addHtmlModificationListener
Adds a listener to the list of objects that will be notified about changes that cleaner does during cleanup process.- Parameters:
listener
- -- listener object to be notified of the changes.
-
fireConditionModification
Description copied from interface:HtmlModificationListener
Fired when cleaner modifies html due toITagNodeCondition
match.- Specified by:
fireConditionModification
in interfaceHtmlModificationListener
- Parameters:
condition
- that was applied to make the modificationtagNode
- - problematic node.
-
fireHtmlError
Description copied from interface:HtmlModificationListener
Fired when cleaner fixes some error in html syntax.- Specified by:
fireHtmlError
in interfaceHtmlModificationListener
- Parameters:
certainty
- - true if change made doesn't hurts end document.startTagToken
- - problematic node.type
-
-
fireUglyHtml
Description copied from interface:HtmlModificationListener
Fired when cleaner fixes ugly html -- when syntax was correct but task was implemented by weird code. For example when deprecated tags are removed.- Specified by:
fireUglyHtml
in interfaceHtmlModificationListener
- Parameters:
certainty
- - true if change made doesn't hurts end document.startTagToken
- - problematic node.errorType
-
-
fireUserDefinedModification
Description copied from interface:HtmlModificationListener
Fired when cleaner modifies html due to user specified rules.- Specified by:
fireUserDefinedModification
in interfaceHtmlModificationListener
- Parameters:
certainty
- - true if change made doesn't hurts end document.tagNode
- - problematic node.errorType
-
-
getInvalidXmlAttributeNamePrefix
Get the prefix to use to try to make valid attribute names- Returns:
- invalidAttributeNamePrefix
-
setInvalidXmlAttributeNamePrefix
Sets the prefix to use for xml attributes that are invalid- Parameters:
invalidXmlAttributePrefix
- the prefix to use
-
setAllowInvalidAttributeNames
public void setAllowInvalidAttributeNames(boolean allowInvalidAttributeNames) Set whether to allow invalid attribute names, or to try to fix or omit them- Parameters:
allowInvalidAttributeNames
- True if invalid attributes allowed
-
isAllowInvalidAttributeNames
public boolean isAllowInvalidAttributeNames()If false, when outputting XML, if an attribute name is not valid, attempt to fix it by using a prefix and removing invalid characters. Otherwise, omit invalid attributes- Returns:
- True if invalid attribute names are allowed.
-