Package org.htmlcleaner
Class TagNode
- java.lang.Object
-
- org.htmlcleaner.BaseTokenImpl
-
- org.htmlcleaner.BaseHtmlNode
-
- org.htmlcleaner.TagToken
-
- org.htmlcleaner.TagNode
-
- Direct Known Subclasses:
ProxyTagNode
,Serializer.HeadlessTagNode
public class TagNode extends TagToken implements HtmlNode
XML node tag - basic node of the cleaned HTML tree. At the same time, it represents start tag token after HTML parsing phase and before cleaning phase. After cleaning process, tree structure remains containing tag nodes (TagNode class), content (text nodes - ContentNode), comments (CommentNode) and optionally doctype node (DoctypeToken).
-
-
Field Summary
Fields Modifier and Type Field Description private java.util.LinkedHashMap<java.lang.String,java.lang.String>
attributes
private boolean
autoGenerated
Used to indicate a start tag that was auto generated becauseTagInfo.isContinueAfter(String)
(closedTag.getName()) returned true For example,private java.util.List<BaseToken>
children
private DoctypeToken
docType
private boolean
foreignMarkupFlagSet
This flag is set if foreignMarkup is set; if it is false it means that the tagnode tree has not been built and so it isn't known whether this node is a HTML node or foreign markup such as SVG.private boolean
isCopy
Indicates that the node is a copy of another node.private boolean
isForeignMarkup
This flag is set if we are using namespace aware setting, and the tagnode belongs to a non-HTML namespace.private boolean
isFormed
private boolean
isTrimAttributeValues
This flag is set if attribute values should be trimmed.private java.util.List<BaseToken>
itemsToMove
private java.util.Map<java.lang.String,java.lang.String>
nsDeclarations
private boolean
pruned
Indicates that the node was marked to be pruned out of the tree.-
Fields inherited from class org.htmlcleaner.BaseHtmlNode
parent
-
-
Method Summary
All Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description void
addAttribute(java.lang.String attName, java.lang.String attValue)
Adds specified attribute to this tag or overrides existing one.void
addChild(java.lang.Object child)
void
addChildren(java.util.List newChildren)
Add all elements from specified list to this node.(package private) void
addItemForMoving(java.lang.Object item)
void
addNamespaceDeclaration(java.lang.String nsPrefix, java.lang.String nsURI)
Adds namespace declaration to the nodeprivate java.util.Map<java.lang.String,java.lang.String>
attributesToLowerCase()
Returns a copy of the set of attributes for this node with lowercase names.(package private) void
collectNamespacePrefixesOnPath(java.util.Set<java.lang.String> prefixes)
Collect all prefixes in namespace declarations up the path to the document root from the specified nodejava.lang.Object[]
evaluateXPath(java.lang.String xPathExpression)
Evaluates XPath expression on give node.private TagNode
findElement(ITagNodeCondition condition, boolean isRecursive)
Finds first element in the tree that satisfy specified condition.TagNode
findElementByAttValue(java.lang.String attName, java.lang.String attValue, boolean isRecursive, boolean isCaseSensitive)
TagNode
findElementByName(java.lang.String findName, boolean isRecursive)
TagNode
findElementHavingAttribute(java.lang.String attName, boolean isRecursive)
private java.util.List<TagNode>
findMatchingTagNodes(ITagNodeCondition condition, boolean isRecursive)
Get all elements in the tree that satisfy specified condition.java.util.List<? extends BaseToken>
getAllChildren()
TagNode[]
getAllElements(boolean isRecursive)
java.util.List<? extends TagNode>
getAllElementsList(boolean isRecursive)
java.lang.String
getAttributeByName(java.lang.String attName)
java.util.Map<java.lang.String,java.lang.String>
getAttributes()
Returns the attributes of the tagnode.java.util.Map<java.lang.String,java.lang.String>
getAttributesInLowerCase()
Returns the attributes of the tagnode in lower case.int
getChildIndex(HtmlNode child)
java.util.List<TagNode>
getChildren()
Deprecated.usegetChildTagList()
, will be refactored and possibly removed in future versions.java.util.List<TagNode>
getChildTagList()
TagNode[]
getChildTags()
DoctypeToken
getDocType()
java.util.List<? extends TagNode>
getElementList(ITagNodeCondition condition, boolean isRecursive)
Get all elements in the tree that satisfy specified condition.java.util.List<? extends TagNode>
getElementListByAttValue(java.lang.String attName, java.lang.String attValue, boolean isRecursive, boolean isCaseSensitive)
java.util.List<? extends TagNode>
getElementListByName(java.lang.String findName, boolean isRecursive)
java.util.List<? extends TagNode>
getElementListHavingAttribute(java.lang.String attName, boolean isRecursive)
private TagNode[]
getElements(ITagNodeCondition condition, boolean isRecursive)
TagNode[]
getElementsByAttValue(java.lang.String attName, java.lang.String attValue, boolean isRecursive, boolean isCaseSensitive)
TagNode[]
getElementsByName(java.lang.String findName, boolean isRecursive)
TagNode[]
getElementsHavingAttribute(java.lang.String attName, boolean isRecursive)
(package private) java.util.List<? extends BaseToken>
getItemsToMove()
java.lang.String
getName()
java.util.Map<java.lang.String,java.lang.String>
getNamespaceDeclarations()
(package private) java.lang.String
getNamespaceURIOnPath(java.lang.String nsPrefix)
java.lang.CharSequence
getText()
private void
handleInterruption()
Called whenver the thread is interrupted.boolean
hasAttribute(java.lang.String attName)
Checks existence of specified attribute.boolean
hasChildren()
void
insertChild(int index, HtmlNode childToAdd)
Inserts specified node at specified position in array of childrenvoid
insertChildAfter(HtmlNode node, HtmlNode nodeToInsert)
Inserts specified node in the list of children after specified childvoid
insertChildBefore(HtmlNode node, HtmlNode nodeToInsert)
Inserts specified node in the list of children before specified childboolean
isAutoGenerated()
boolean
isCopy()
boolean
isEmpty()
boolean
isForeignMarkup()
(package private) boolean
isFormed()
boolean
isPruned()
boolean
isTrimAttributeValues()
TagNode
makeCopy()
void
removeAllChildren()
Removes all children (subelements and text content).void
removeAttribute(java.lang.String attName)
Removes specified attribute from this tag.boolean
removeChild(java.lang.Object child)
Remove specified child element from this node.boolean
removeFromTree()
Remove this node from the tree.private void
replaceAttributes(java.util.Map<java.lang.String,java.lang.String> attributes)
Clears existing attributes and puts replacement attributesvoid
serialize(Serializer serializer, java.io.Writer writer)
void
setAttributes(java.util.Map<java.lang.String,java.lang.String> attributes)
Replace the current set of attributes with a new set.void
setAutoGenerated(boolean autoGenerated)
void
setChildren(java.util.List<? extends BaseToken> children)
void
setDocType(DoctypeToken docType)
void
setForeignMarkup(boolean isForeignMarkup)
(package private) void
setFormed()
(package private) void
setFormed(boolean isFormed)
(package private) void
setItemsToMove(java.util.List<BaseToken> itemsToMove)
void
setPruned(boolean pruned)
void
setTrimAttributeValues(boolean isTrimAttributeValues)
void
traverse(TagNodeVisitor visitor)
Traverses the tree and performs visitor's action on each node.private boolean
traverseInternally(TagNodeVisitor visitor)
-
Methods inherited from class org.htmlcleaner.BaseHtmlNode
getParent, getSiblings, setParent
-
Methods inherited from class org.htmlcleaner.BaseTokenImpl
getCol, getRow, setCol, setRow
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface org.htmlcleaner.HtmlNode
getParent, getSiblings, setParent
-
-
-
-
Field Detail
-
attributes
private final java.util.LinkedHashMap<java.lang.String,java.lang.String> attributes
-
children
private final java.util.List<BaseToken> children
-
docType
private DoctypeToken docType
-
itemsToMove
private java.util.List<BaseToken> itemsToMove
-
nsDeclarations
private java.util.Map<java.lang.String,java.lang.String> nsDeclarations
-
isFormed
private transient boolean isFormed
-
autoGenerated
private boolean autoGenerated
Used to indicate a start tag that was auto generated becauseTagInfo.isContinueAfter(String)
(closedTag.getName()) returned true For example,foobar
would result in a new being created resulting infoobar
The second opening tag is marked as autogenerated. This allows the autogenerated tag to be removed if it is unneeded.
-
isForeignMarkup
private boolean isForeignMarkup
This flag is set if we are using namespace aware setting, and the tagnode belongs to a non-HTML namespace.
-
foreignMarkupFlagSet
private boolean foreignMarkupFlagSet
This flag is set if foreignMarkup is set; if it is false it means that the tagnode tree has not been built and so it isn't known whether this node is a HTML node or foreign markup such as SVG.
-
isTrimAttributeValues
private boolean isTrimAttributeValues
This flag is set if attribute values should be trimmed.
-
pruned
private boolean pruned
Indicates that the node was marked to be pruned out of the tree.
-
isCopy
private final boolean isCopy
Indicates that the node is a copy of another node.- See Also:
makeCopy()
-
-
Method Detail
-
getAttributeByName
public java.lang.String getAttributeByName(java.lang.String attName)
- Parameters:
attName
-- Returns:
- Value of the specified attribute, or null if it this tag doesn't contain it.
-
getAttributes
public java.util.Map<java.lang.String,java.lang.String> getAttributes()
Returns the attributes of the tagnode.- Returns:
- Map instance containing all attribute name/value pairs.
-
getAttributesInLowerCase
public java.util.Map<java.lang.String,java.lang.String> getAttributesInLowerCase()
Returns the attributes of the tagnode in lower case.- Returns:
- Map instance containing all attribute name/value pairs, with attribute names transformed to lower case
-
setAttributes
public void setAttributes(java.util.Map<java.lang.String,java.lang.String> attributes)
Replace the current set of attributes with a new set.- Parameters:
attributes
-
-
replaceAttributes
private void replaceAttributes(java.util.Map<java.lang.String,java.lang.String> attributes)
Clears existing attributes and puts replacement attributes- Parameters:
attributes
- the attributes to set
-
hasAttribute
public boolean hasAttribute(java.lang.String attName)
Checks existence of specified attribute.- Parameters:
attName
-- Returns:
- true if TagNode has attribute
-
addAttribute
public void addAttribute(java.lang.String attName, java.lang.String attValue)
Adds specified attribute to this tag or overrides existing one.- Specified by:
addAttribute
in classTagToken
- Parameters:
attName
-attValue
-
-
removeAttribute
public void removeAttribute(java.lang.String attName)
Removes specified attribute from this tag.- Parameters:
attName
-
-
getChildren
@Deprecated public java.util.List<TagNode> getChildren()
Deprecated.usegetChildTagList()
, will be refactored and possibly removed in future versions. TODO This method should be refactored because is does not properly match the commonly used Java's getter/setter strategy.- Returns:
- List of child TagNode objects.
-
setChildren
public void setChildren(java.util.List<? extends BaseToken> children)
-
getAllChildren
public java.util.List<? extends BaseToken> getAllChildren()
-
getChildTagList
public java.util.List<TagNode> getChildTagList()
- Returns:
- List of child TagNode objects.
-
hasChildren
public boolean hasChildren()
- Returns:
- Whether this node has child elements or not.
-
getChildTags
public TagNode[] getChildTags()
- Returns:
- An array of child TagNode instances.
-
getText
public java.lang.CharSequence getText()
- Returns:
- Text content of this node and it's subelements.
-
getChildIndex
public int getChildIndex(HtmlNode child)
- Parameters:
child
- Child to find index of- Returns:
- Index of the specified child node inside this node's children, -1 if node is not the child
-
insertChild
public void insertChild(int index, HtmlNode childToAdd)
Inserts specified node at specified position in array of children- Parameters:
index
-childToAdd
-
-
insertChildBefore
public void insertChildBefore(HtmlNode node, HtmlNode nodeToInsert)
Inserts specified node in the list of children before specified child- Parameters:
node
- Child before which to insert new nodenodeToInsert
- Node to be inserted at specified position
-
insertChildAfter
public void insertChildAfter(HtmlNode node, HtmlNode nodeToInsert)
Inserts specified node in the list of children after specified child- Parameters:
node
- Child after which to insert new nodenodeToInsert
- Node to be inserted at specified position
-
getDocType
public DoctypeToken getDocType()
-
setDocType
public void setDocType(DoctypeToken docType)
-
addChild
public void addChild(java.lang.Object child)
-
addChildren
public void addChildren(java.util.List newChildren)
Add all elements from specified list to this node.- Parameters:
newChildren
-
-
findElement
private TagNode findElement(ITagNodeCondition condition, boolean isRecursive)
Finds first element in the tree that satisfy specified condition.- Parameters:
condition
-isRecursive
-- Returns:
- First TagNode found, or null if no such elements.
-
findMatchingTagNodes
private java.util.List<TagNode> findMatchingTagNodes(ITagNodeCondition condition, boolean isRecursive)
Get all elements in the tree that satisfy specified condition.- Parameters:
condition
-isRecursive
-- Returns:
- List of TagNode instances.
-
getElementList
public java.util.List<? extends TagNode> getElementList(ITagNodeCondition condition, boolean isRecursive)
Get all elements in the tree that satisfy specified condition.- Parameters:
condition
-isRecursive
-- Returns:
- List of TagNode instances with specified name.
-
getElements
private TagNode[] getElements(ITagNodeCondition condition, boolean isRecursive)
- Parameters:
condition
-isRecursive
-- Returns:
- The array of all subelements that satisfy specified condition.
-
getAllElementsList
public java.util.List<? extends TagNode> getAllElementsList(boolean isRecursive)
-
getAllElements
public TagNode[] getAllElements(boolean isRecursive)
-
findElementByName
public TagNode findElementByName(java.lang.String findName, boolean isRecursive)
-
getElementListByName
public java.util.List<? extends TagNode> getElementListByName(java.lang.String findName, boolean isRecursive)
-
getElementsByName
public TagNode[] getElementsByName(java.lang.String findName, boolean isRecursive)
-
findElementHavingAttribute
public TagNode findElementHavingAttribute(java.lang.String attName, boolean isRecursive)
-
getElementListHavingAttribute
public java.util.List<? extends TagNode> getElementListHavingAttribute(java.lang.String attName, boolean isRecursive)
-
getElementsHavingAttribute
public TagNode[] getElementsHavingAttribute(java.lang.String attName, boolean isRecursive)
-
findElementByAttValue
public TagNode findElementByAttValue(java.lang.String attName, java.lang.String attValue, boolean isRecursive, boolean isCaseSensitive)
-
getElementListByAttValue
public java.util.List<? extends TagNode> getElementListByAttValue(java.lang.String attName, java.lang.String attValue, boolean isRecursive, boolean isCaseSensitive)
-
getElementsByAttValue
public TagNode[] getElementsByAttValue(java.lang.String attName, java.lang.String attValue, boolean isRecursive, boolean isCaseSensitive)
-
evaluateXPath
public java.lang.Object[] evaluateXPath(java.lang.String xPathExpression) throws XPatherException
Evaluates XPath expression on give node.
This is not fully supported XPath parser and evaluator. Examples below show supported elements:- //div//a
- //div//a[@id][@class]
- /body/*[1]/@type
- //div[3]//a[@id][@href='r/n4']
- //div[last() >= 4]//./div[position() = last()])[position() > 22]//li[2]//a
- //div[2]/@*[2]
- data(//div//a[@id][@class])
- //p/last()
- //body//div[3][@class]//span[12.2
- data(//a['v' < @id])
- Parameters:
xPathExpression
-- Returns:
- result of XPather evaluation.
- Throws:
XPatherException
-
removeFromTree
public boolean removeFromTree()
Remove this node from the tree.- Returns:
- True if element is removed (if it is not root node).
-
removeChild
public boolean removeChild(java.lang.Object child)
Remove specified child element from this node.- Parameters:
child
-- Returns:
- True if child object existed in the children list.
-
removeAllChildren
public void removeAllChildren()
Removes all children (subelements and text content).
-
addItemForMoving
void addItemForMoving(java.lang.Object item)
-
getItemsToMove
java.util.List<? extends BaseToken> getItemsToMove()
-
setItemsToMove
void setItemsToMove(java.util.List<BaseToken> itemsToMove)
-
isFormed
boolean isFormed()
-
setFormed
void setFormed(boolean isFormed)
-
setFormed
void setFormed()
-
setAutoGenerated
public void setAutoGenerated(boolean autoGenerated)
- Parameters:
autoGenerated
- the autoGenerated to set
-
isAutoGenerated
public boolean isAutoGenerated()
- Returns:
- the autoGenerated
-
isPruned
public boolean isPruned()
- Returns:
- true, if node was marked to be pruned.
-
setPruned
public void setPruned(boolean pruned)
-
isEmpty
public boolean isEmpty()
-
addNamespaceDeclaration
public void addNamespaceDeclaration(java.lang.String nsPrefix, java.lang.String nsURI)
Adds namespace declaration to the node- Parameters:
nsPrefix
- Namespace prefixnsURI
- Namespace URI
-
collectNamespacePrefixesOnPath
void collectNamespacePrefixesOnPath(java.util.Set<java.lang.String> prefixes)
Collect all prefixes in namespace declarations up the path to the document root from the specified node- Parameters:
prefixes
- Set of prefixes to be collected
-
getNamespaceURIOnPath
java.lang.String getNamespaceURIOnPath(java.lang.String nsPrefix)
-
getNamespaceDeclarations
public java.util.Map<java.lang.String,java.lang.String> getNamespaceDeclarations()
- Returns:
- Map of namespace declarations for this node
-
serialize
public void serialize(Serializer serializer, java.io.Writer writer) throws java.io.IOException
- Specified by:
serialize
in interfaceBaseToken
- Overrides:
serialize
in classBaseHtmlNode
- Throws:
java.io.IOException
-
makeCopy
public TagNode makeCopy()
-
isCopy
public boolean isCopy()
-
traverse
public void traverse(TagNodeVisitor visitor)
Traverses the tree and performs visitor's action on each node. It stops when it finishes all the tree or when visitor returns false.- Parameters:
visitor
- TagNodeVisitor implementation
-
traverseInternally
private boolean traverseInternally(TagNodeVisitor visitor)
-
isForeignMarkup
public boolean isForeignMarkup()
- Returns:
- the isForeignMarkup
-
setForeignMarkup
public void setForeignMarkup(boolean isForeignMarkup)
- Parameters:
isForeignMarkup
- the isForeignMarkup to set
-
isTrimAttributeValues
public boolean isTrimAttributeValues()
- Returns:
- the isTrimAttributeValues
-
setTrimAttributeValues
public void setTrimAttributeValues(boolean isTrimAttributeValues)
- Parameters:
isTrimAttributeValues
- the isTrimAttributeValues to set
-
attributesToLowerCase
private java.util.Map<java.lang.String,java.lang.String> attributesToLowerCase()
Returns a copy of the set of attributes for this node with lowercase names. Where there are duplicate attributes (e.g. class, CLASS) the first value is retained.- Returns:
- a map of attributes in key/value pairs with names in lowercase
-
handleInterruption
private void handleInterruption()
Called whenver the thread is interrupted. Currently this is a placeholder, but could hold cleanup methods and user interaction
-
-