Class TagNode

  • All Implemented Interfaces:
    BaseToken, HtmlNode
    Direct Known Subclasses:
    ProxyTagNode, Serializer.HeadlessTagNode

    public class TagNode
    extends TagToken
    implements HtmlNode

    XML node tag - basic node of the cleaned HTML tree. At the same time, it represents start tag token after HTML parsing phase and before cleaning phase. After cleaning process, tree structure remains containing tag nodes (TagNode class), content (text nodes - ContentNode), comments (CommentNode) and optionally doctype node (DoctypeToken).

    • Field Detail

      • attributes

        private final java.util.LinkedHashMap<java.lang.String,​java.lang.String> attributes
      • children

        private final java.util.List<BaseToken> children
      • itemsToMove

        private java.util.List<BaseToken> itemsToMove
      • nsDeclarations

        private java.util.Map<java.lang.String,​java.lang.String> nsDeclarations
      • isFormed

        private transient boolean isFormed
      • autoGenerated

        private boolean autoGenerated
        Used to indicate a start tag that was auto generated because TagInfo.isContinueAfter(String)(closedTag.getName()) returned true For example,
         foobar
         
        would result in a new being created resulting in
         foobar
         
        The second opening tag is marked as autogenerated. This allows the autogenerated tag to be removed if it is unneeded.
      • isForeignMarkup

        private boolean isForeignMarkup
        This flag is set if we are using namespace aware setting, and the tagnode belongs to a non-HTML namespace.
      • foreignMarkupFlagSet

        private boolean foreignMarkupFlagSet
        This flag is set if foreignMarkup is set; if it is false it means that the tagnode tree has not been built and so it isn't known whether this node is a HTML node or foreign markup such as SVG.
      • isTrimAttributeValues

        private boolean isTrimAttributeValues
        This flag is set if attribute values should be trimmed.
      • pruned

        private boolean pruned
        Indicates that the node was marked to be pruned out of the tree.
      • isCopy

        private final boolean isCopy
        Indicates that the node is a copy of another node.
        See Also:
        makeCopy()
    • Constructor Detail

      • TagNode

        public TagNode​(java.lang.String name)
      • TagNode

        private TagNode​(java.lang.String name,
                        boolean isCopy)
    • Method Detail

      • getName

        public java.lang.String getName()
        Overrides:
        getName in class TagToken
      • getAttributeByName

        public java.lang.String getAttributeByName​(java.lang.String attName)
        Parameters:
        attName -
        Returns:
        Value of the specified attribute, or null if it this tag doesn't contain it.
      • getAttributes

        public java.util.Map<java.lang.String,​java.lang.String> getAttributes()
        Returns the attributes of the tagnode.
        Returns:
        Map instance containing all attribute name/value pairs.
      • getAttributesInLowerCase

        public java.util.Map<java.lang.String,​java.lang.String> getAttributesInLowerCase()
        Returns the attributes of the tagnode in lower case.
        Returns:
        Map instance containing all attribute name/value pairs, with attribute names transformed to lower case
      • setAttributes

        public void setAttributes​(java.util.Map<java.lang.String,​java.lang.String> attributes)
        Replace the current set of attributes with a new set.
        Parameters:
        attributes -
      • replaceAttributes

        private void replaceAttributes​(java.util.Map<java.lang.String,​java.lang.String> attributes)
        Clears existing attributes and puts replacement attributes
        Parameters:
        attributes - the attributes to set
      • hasAttribute

        public boolean hasAttribute​(java.lang.String attName)
        Checks existence of specified attribute.
        Parameters:
        attName -
        Returns:
        true if TagNode has attribute
      • addAttribute

        public void addAttribute​(java.lang.String attName,
                                 java.lang.String attValue)
        Adds specified attribute to this tag or overrides existing one.
        Specified by:
        addAttribute in class TagToken
        Parameters:
        attName -
        attValue -
      • removeAttribute

        public void removeAttribute​(java.lang.String attName)
        Removes specified attribute from this tag.
        Parameters:
        attName -
      • getChildren

        @Deprecated
        public java.util.List<TagNode> getChildren()
        Deprecated.
        use getChildTagList(), will be refactored and possibly removed in future versions. TODO This method should be refactored because is does not properly match the commonly used Java's getter/setter strategy.
        Returns:
        List of child TagNode objects.
      • setChildren

        public void setChildren​(java.util.List<? extends BaseToken> children)
      • getAllChildren

        public java.util.List<? extends BaseToken> getAllChildren()
      • getChildTagList

        public java.util.List<TagNode> getChildTagList()
        Returns:
        List of child TagNode objects.
      • hasChildren

        public boolean hasChildren()
        Returns:
        Whether this node has child elements or not.
      • getChildTags

        public TagNode[] getChildTags()
        Returns:
        An array of child TagNode instances.
      • getText

        public java.lang.CharSequence getText()
        Returns:
        Text content of this node and it's subelements.
      • getChildIndex

        public int getChildIndex​(HtmlNode child)
        Parameters:
        child - Child to find index of
        Returns:
        Index of the specified child node inside this node's children, -1 if node is not the child
      • insertChild

        public void insertChild​(int index,
                                HtmlNode childToAdd)
        Inserts specified node at specified position in array of children
        Parameters:
        index -
        childToAdd -
      • insertChildBefore

        public void insertChildBefore​(HtmlNode node,
                                      HtmlNode nodeToInsert)
        Inserts specified node in the list of children before specified child
        Parameters:
        node - Child before which to insert new node
        nodeToInsert - Node to be inserted at specified position
      • insertChildAfter

        public void insertChildAfter​(HtmlNode node,
                                     HtmlNode nodeToInsert)
        Inserts specified node in the list of children after specified child
        Parameters:
        node - Child after which to insert new node
        nodeToInsert - Node to be inserted at specified position
      • setDocType

        public void setDocType​(DoctypeToken docType)
      • addChild

        public void addChild​(java.lang.Object child)
      • addChildren

        public void addChildren​(java.util.List newChildren)
        Add all elements from specified list to this node.
        Parameters:
        newChildren -
      • findElement

        private TagNode findElement​(ITagNodeCondition condition,
                                    boolean isRecursive)
        Finds first element in the tree that satisfy specified condition.
        Parameters:
        condition -
        isRecursive -
        Returns:
        First TagNode found, or null if no such elements.
      • findMatchingTagNodes

        private java.util.List<TagNode> findMatchingTagNodes​(ITagNodeCondition condition,
                                                             boolean isRecursive)
        Get all elements in the tree that satisfy specified condition.
        Parameters:
        condition -
        isRecursive -
        Returns:
        List of TagNode instances.
      • getElementList

        public java.util.List<? extends TagNode> getElementList​(ITagNodeCondition condition,
                                                                boolean isRecursive)
        Get all elements in the tree that satisfy specified condition.
        Parameters:
        condition -
        isRecursive -
        Returns:
        List of TagNode instances with specified name.
      • getElements

        private TagNode[] getElements​(ITagNodeCondition condition,
                                      boolean isRecursive)
        Parameters:
        condition -
        isRecursive -
        Returns:
        The array of all subelements that satisfy specified condition.
      • getAllElementsList

        public java.util.List<? extends TagNode> getAllElementsList​(boolean isRecursive)
      • getAllElements

        public TagNode[] getAllElements​(boolean isRecursive)
      • findElementByName

        public TagNode findElementByName​(java.lang.String findName,
                                         boolean isRecursive)
      • getElementListByName

        public java.util.List<? extends TagNode> getElementListByName​(java.lang.String findName,
                                                                      boolean isRecursive)
      • getElementsByName

        public TagNode[] getElementsByName​(java.lang.String findName,
                                           boolean isRecursive)
      • findElementHavingAttribute

        public TagNode findElementHavingAttribute​(java.lang.String attName,
                                                  boolean isRecursive)
      • getElementListHavingAttribute

        public java.util.List<? extends TagNode> getElementListHavingAttribute​(java.lang.String attName,
                                                                               boolean isRecursive)
      • getElementsHavingAttribute

        public TagNode[] getElementsHavingAttribute​(java.lang.String attName,
                                                    boolean isRecursive)
      • findElementByAttValue

        public TagNode findElementByAttValue​(java.lang.String attName,
                                             java.lang.String attValue,
                                             boolean isRecursive,
                                             boolean isCaseSensitive)
      • getElementListByAttValue

        public java.util.List<? extends TagNode> getElementListByAttValue​(java.lang.String attName,
                                                                          java.lang.String attValue,
                                                                          boolean isRecursive,
                                                                          boolean isCaseSensitive)
      • getElementsByAttValue

        public TagNode[] getElementsByAttValue​(java.lang.String attName,
                                               java.lang.String attValue,
                                               boolean isRecursive,
                                               boolean isCaseSensitive)
      • evaluateXPath

        public java.lang.Object[] evaluateXPath​(java.lang.String xPathExpression)
                                         throws XPatherException
        Evaluates XPath expression on give node.
        This is not fully supported XPath parser and evaluator. Examples below show supported elements:
        • //div//a
        • //div//a[@id][@class]
        • /body/*[1]/@type
        • //div[3]//a[@id][@href='r/n4']
        • //div[last() >= 4]//./div[position() = last()])[position() > 22]//li[2]//a
        • //div[2]/@*[2]
        • data(//div//a[@id][@class])
        • //p/last()
        • //body//div[3][@class]//span[12.2
        • data(//a['v' < @id])
        Parameters:
        xPathExpression -
        Returns:
        result of XPather evaluation.
        Throws:
        XPatherException
      • removeFromTree

        public boolean removeFromTree()
        Remove this node from the tree.
        Returns:
        True if element is removed (if it is not root node).
      • removeChild

        public boolean removeChild​(java.lang.Object child)
        Remove specified child element from this node.
        Parameters:
        child -
        Returns:
        True if child object existed in the children list.
      • removeAllChildren

        public void removeAllChildren()
        Removes all children (subelements and text content).
      • addItemForMoving

        void addItemForMoving​(java.lang.Object item)
      • getItemsToMove

        java.util.List<? extends BaseToken> getItemsToMove()
      • setItemsToMove

        void setItemsToMove​(java.util.List<BaseToken> itemsToMove)
      • isFormed

        boolean isFormed()
      • setFormed

        void setFormed​(boolean isFormed)
      • setFormed

        void setFormed()
      • setAutoGenerated

        public void setAutoGenerated​(boolean autoGenerated)
        Parameters:
        autoGenerated - the autoGenerated to set
      • isAutoGenerated

        public boolean isAutoGenerated()
        Returns:
        the autoGenerated
      • isPruned

        public boolean isPruned()
        Returns:
        true, if node was marked to be pruned.
      • setPruned

        public void setPruned​(boolean pruned)
      • isEmpty

        public boolean isEmpty()
      • addNamespaceDeclaration

        public void addNamespaceDeclaration​(java.lang.String nsPrefix,
                                            java.lang.String nsURI)
        Adds namespace declaration to the node
        Parameters:
        nsPrefix - Namespace prefix
        nsURI - Namespace URI
      • collectNamespacePrefixesOnPath

        void collectNamespacePrefixesOnPath​(java.util.Set<java.lang.String> prefixes)
        Collect all prefixes in namespace declarations up the path to the document root from the specified node
        Parameters:
        prefixes - Set of prefixes to be collected
      • getNamespaceURIOnPath

        java.lang.String getNamespaceURIOnPath​(java.lang.String nsPrefix)
      • getNamespaceDeclarations

        public java.util.Map<java.lang.String,​java.lang.String> getNamespaceDeclarations()
        Returns:
        Map of namespace declarations for this node
      • makeCopy

        public TagNode makeCopy()
      • isCopy

        public boolean isCopy()
      • traverse

        public void traverse​(TagNodeVisitor visitor)
        Traverses the tree and performs visitor's action on each node. It stops when it finishes all the tree or when visitor returns false.
        Parameters:
        visitor - TagNodeVisitor implementation
      • traverseInternally

        private boolean traverseInternally​(TagNodeVisitor visitor)
      • isForeignMarkup

        public boolean isForeignMarkup()
        Returns:
        the isForeignMarkup
      • setForeignMarkup

        public void setForeignMarkup​(boolean isForeignMarkup)
        Parameters:
        isForeignMarkup - the isForeignMarkup to set
      • isTrimAttributeValues

        public boolean isTrimAttributeValues()
        Returns:
        the isTrimAttributeValues
      • setTrimAttributeValues

        public void setTrimAttributeValues​(boolean isTrimAttributeValues)
        Parameters:
        isTrimAttributeValues - the isTrimAttributeValues to set
      • attributesToLowerCase

        private java.util.Map<java.lang.String,​java.lang.String> attributesToLowerCase()
        Returns a copy of the set of attributes for this node with lowercase names. Where there are duplicate attributes (e.g. class, CLASS) the first value is retained.
        Returns:
        a map of attributes in key/value pairs with names in lowercase
      • handleInterruption

        private void handleInterruption()
        Called whenver the thread is interrupted. Currently this is a placeholder, but could hold cleanup methods and user interaction