Class Cleaner
java.lang.Object
com.itextpdf.styledxmlparser.jsoup.safety.Cleaner
The safelist based HTML cleaner. Use to ensure that end-user provided HTML contains only the elements and attributes
that you are expecting; no junk, and no cross-site scripting attacks!
The HTML cleaner parses the input as HTML and then runs it through a safe-list, so the output HTML can only contain HTML that is allowed by the safelist.
It is assumed that the input HTML is a body fragment; the clean methods only pull from the source's body, and the canned safe-lists only allow body contained tags.
Rather than interacting directly with a Cleaner object, generally see the clean
methods in Jsoup
.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprivate final class
Iterates the input and copies trusted nodes (tags, attributes, text) into the destination.private static class
-
Field Summary
Fields -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionCreates a new, clean document, from the original dirty document, containing only elements allowed by the safelist.private int
copySafeNodes
(Element source, Element dest) private Cleaner.ElementMeta
createSafeElement
(Element sourceEl) boolean
Determines if the input document bodyis valid, against the safelist.boolean
isValidBodyHtml
(String bodyHtml)
-
Field Details
-
safelist
-
-
Constructor Details
-
Cleaner
Create a new cleaner, that sanitizes documents using the supplied safelist.- Parameters:
safelist
- safe-list to clean with
-
Cleaner
Deprecated.as of 1.14.1.UseCleaner(Safelist)
instead.
-
-
Method Details
-
clean
Creates a new, clean document, from the original dirty document, containing only elements allowed by the safelist. The original document is not modified. Only elements from the dirty document'sbody
are used. The OutputSettings of the original document are cloned into the clean document.- Parameters:
dirtyDocument
- Untrusted base document to clean.- Returns:
- cleaned document.
-
isValid
Determines if the input document bodyis valid, against the safelist. It is considered valid if all the tags and attributes in the input HTML are allowed by the safelist, and that there is no content in thehead
.This method can be used as a validator for user input. An invalid document will still be cleaned successfully using the
clean(Document)
document. If using as a validator, it is recommended to still clean the document to ensure enforced attributes are set correctly, and that the output is tidied.- Parameters:
dirtyDocument
- document to test- Returns:
- true if no tags or attributes need to be removed; false if they do
-
isValidBodyHtml
-
copySafeNodes
-
createSafeElement
-