Class TagNode

All Implemented Interfaces:
BaseToken, HtmlNode
Direct Known Subclasses:
ProxyTagNode, Serializer.HeadlessTagNode

public class TagNode extends TagToken implements HtmlNode

XML node tag - basic node of the cleaned HTML tree. At the same time, it represents start tag token after HTML parsing phase and before cleaning phase. After cleaning process, tree structure remains containing tag nodes (TagNode class), content (text nodes - ContentNode), comments (CommentNode) and optionally doctype node (DoctypeToken).

  • Field Details

    • attributes

      private final LinkedHashMap<String,String> attributes
    • children

      private final List<BaseToken> children
    • docType

      private DoctypeToken docType
    • itemsToMove

      private List<BaseToken> itemsToMove
    • nsDeclarations

      private Map<String,String> nsDeclarations
    • isFormed

      private transient boolean isFormed
    • autoGenerated

      private boolean autoGenerated
      Used to indicate a start tag that was auto generated because TagInfo.isContinueAfter(String)(closedTag.getName()) returned true For example,
       foobar
       
      would result in a new being created resulting in
       foobar
       
      The second opening tag is marked as autogenerated. This allows the autogenerated tag to be removed if it is unneeded.
    • isForeignMarkup

      private boolean isForeignMarkup
      This flag is set if we are using namespace aware setting, and the tagnode belongs to a non-HTML namespace.
    • foreignMarkupFlagSet

      private boolean foreignMarkupFlagSet
      This flag is set if foreignMarkup is set; if it is false it means that the tagnode tree has not been built and so it isn't known whether this node is a HTML node or foreign markup such as SVG.
    • isTrimAttributeValues

      private boolean isTrimAttributeValues
      This flag is set if attribute values should be trimmed.
    • pruned

      private boolean pruned
      Indicates that the node was marked to be pruned out of the tree.
    • isCopy

      private final boolean isCopy
      Indicates that the node is a copy of another node.
      See Also:
  • Constructor Details

    • TagNode

      public TagNode(String name)
    • TagNode

      private TagNode(String name, boolean isCopy)
  • Method Details

    • getName

      public String getName()
      Overrides:
      getName in class TagToken
    • getAttributeByName

      public String getAttributeByName(String attName)
      Parameters:
      attName -
      Returns:
      Value of the specified attribute, or null if it this tag doesn't contain it.
    • getAttributes

      public Map<String,String> getAttributes()
      Returns the attributes of the tagnode.
      Returns:
      Map instance containing all attribute name/value pairs.
    • getAttributesInLowerCase

      public Map<String,String> getAttributesInLowerCase()
      Returns the attributes of the tagnode in lower case.
      Returns:
      Map instance containing all attribute name/value pairs, with attribute names transformed to lower case
    • setAttributes

      public void setAttributes(Map<String,String> attributes)
      Replace the current set of attributes with a new set.
      Parameters:
      attributes -
    • replaceAttributes

      private void replaceAttributes(Map<String,String> attributes)
      Clears existing attributes and puts replacement attributes
      Parameters:
      attributes - the attributes to set
    • hasAttribute

      public boolean hasAttribute(String attName)
      Checks existence of specified attribute.
      Parameters:
      attName -
      Returns:
      true if TagNode has attribute
    • addAttribute

      public void addAttribute(String attName, String attValue)
      Adds specified attribute to this tag or overrides existing one.
      Specified by:
      addAttribute in class TagToken
      Parameters:
      attName -
      attValue -
    • removeAttribute

      public void removeAttribute(String attName)
      Removes specified attribute from this tag.
      Parameters:
      attName -
    • getChildren

      @Deprecated public List<TagNode> getChildren()
      Deprecated.
      use getChildTagList(), will be refactored and possibly removed in future versions. TODO This method should be refactored because is does not properly match the commonly used Java's getter/setter strategy.
      Returns:
      List of child TagNode objects.
    • setChildren

      public void setChildren(List<? extends BaseToken> children)
    • getAllChildren

      public List<? extends BaseToken> getAllChildren()
    • getChildTagList

      public List<TagNode> getChildTagList()
      Returns:
      List of child TagNode objects.
    • hasChildren

      public boolean hasChildren()
      Returns:
      Whether this node has child elements or not.
    • getChildTags

      public TagNode[] getChildTags()
      Returns:
      An array of child TagNode instances.
    • getText

      public CharSequence getText()
      Returns:
      Text content of this node and it's subelements.
    • getChildIndex

      public int getChildIndex(HtmlNode child)
      Parameters:
      child - Child to find index of
      Returns:
      Index of the specified child node inside this node's children, -1 if node is not the child
    • insertChild

      public void insertChild(int index, HtmlNode childToAdd)
      Inserts specified node at specified position in array of children
      Parameters:
      index -
      childToAdd -
    • insertChildBefore

      public void insertChildBefore(HtmlNode node, HtmlNode nodeToInsert)
      Inserts specified node in the list of children before specified child
      Parameters:
      node - Child before which to insert new node
      nodeToInsert - Node to be inserted at specified position
    • insertChildAfter

      public void insertChildAfter(HtmlNode node, HtmlNode nodeToInsert)
      Inserts specified node in the list of children after specified child
      Parameters:
      node - Child after which to insert new node
      nodeToInsert - Node to be inserted at specified position
    • getDocType

      public DoctypeToken getDocType()
    • setDocType

      public void setDocType(DoctypeToken docType)
    • addChild

      public void addChild(Object child)
    • addChildren

      public void addChildren(List newChildren)
      Add all elements from specified list to this node.
      Parameters:
      newChildren -
    • findElement

      private TagNode findElement(ITagNodeCondition condition, boolean isRecursive)
      Finds first element in the tree that satisfy specified condition.
      Parameters:
      condition -
      isRecursive -
      Returns:
      First TagNode found, or null if no such elements.
    • findMatchingTagNodes

      private List<TagNode> findMatchingTagNodes(ITagNodeCondition condition, boolean isRecursive)
      Get all elements in the tree that satisfy specified condition.
      Parameters:
      condition -
      isRecursive -
      Returns:
      List of TagNode instances.
    • getElementList

      public List<? extends TagNode> getElementList(ITagNodeCondition condition, boolean isRecursive)
      Get all elements in the tree that satisfy specified condition.
      Parameters:
      condition -
      isRecursive -
      Returns:
      List of TagNode instances with specified name.
    • getElements

      private TagNode[] getElements(ITagNodeCondition condition, boolean isRecursive)
      Parameters:
      condition -
      isRecursive -
      Returns:
      The array of all subelements that satisfy specified condition.
    • getAllElementsList

      public List<? extends TagNode> getAllElementsList(boolean isRecursive)
    • getAllElements

      public TagNode[] getAllElements(boolean isRecursive)
    • findElementByName

      public TagNode findElementByName(String findName, boolean isRecursive)
    • getElementListByName

      public List<? extends TagNode> getElementListByName(String findName, boolean isRecursive)
    • getElementsByName

      public TagNode[] getElementsByName(String findName, boolean isRecursive)
    • findElementHavingAttribute

      public TagNode findElementHavingAttribute(String attName, boolean isRecursive)
    • getElementListHavingAttribute

      public List<? extends TagNode> getElementListHavingAttribute(String attName, boolean isRecursive)
    • getElementsHavingAttribute

      public TagNode[] getElementsHavingAttribute(String attName, boolean isRecursive)
    • findElementByAttValue

      public TagNode findElementByAttValue(String attName, String attValue, boolean isRecursive, boolean isCaseSensitive)
    • getElementListByAttValue

      public List<? extends TagNode> getElementListByAttValue(String attName, String attValue, boolean isRecursive, boolean isCaseSensitive)
    • getElementsByAttValue

      public TagNode[] getElementsByAttValue(String attName, String attValue, boolean isRecursive, boolean isCaseSensitive)
    • evaluateXPath

      public Object[] evaluateXPath(String xPathExpression) throws XPatherException
      Evaluates XPath expression on give node.
      This is not fully supported XPath parser and evaluator. Examples below show supported elements:
      • //div//a
      • //div//a[@id][@class]
      • /body/*[1]/@type
      • //div[3]//a[@id][@href='r/n4']
      • //div[last() >= 4]//./div[position() = last()])[position() > 22]//li[2]//a
      • //div[2]/@*[2]
      • data(//div//a[@id][@class])
      • //p/last()
      • //body//div[3][@class]//span[12.2invalid input: '<'position()]/@id
      • data(//a['v' invalid input: '<' @id])
      Parameters:
      xPathExpression -
      Returns:
      result of XPather evaluation.
      Throws:
      XPatherException
    • removeFromTree

      public boolean removeFromTree()
      Remove this node from the tree.
      Returns:
      True if element is removed (if it is not root node).
    • removeChild

      public boolean removeChild(Object child)
      Remove specified child element from this node.
      Parameters:
      child -
      Returns:
      True if child object existed in the children list.
    • removeAllChildren

      public void removeAllChildren()
      Removes all children (subelements and text content).
    • addItemForMoving

      void addItemForMoving(Object item)
    • getItemsToMove

      List<? extends BaseToken> getItemsToMove()
    • setItemsToMove

      void setItemsToMove(List<BaseToken> itemsToMove)
    • isFormed

      boolean isFormed()
    • setFormed

      void setFormed(boolean isFormed)
    • setFormed

      void setFormed()
    • setAutoGenerated

      public void setAutoGenerated(boolean autoGenerated)
      Parameters:
      autoGenerated - the autoGenerated to set
    • isAutoGenerated

      public boolean isAutoGenerated()
      Returns:
      the autoGenerated
    • isPruned

      public boolean isPruned()
      Returns:
      true, if node was marked to be pruned.
    • setPruned

      public void setPruned(boolean pruned)
    • isEmpty

      public boolean isEmpty()
    • addNamespaceDeclaration

      public void addNamespaceDeclaration(String nsPrefix, String nsURI)
      Adds namespace declaration to the node
      Parameters:
      nsPrefix - Namespace prefix
      nsURI - Namespace URI
    • collectNamespacePrefixesOnPath

      void collectNamespacePrefixesOnPath(Set<String> prefixes)
      Collect all prefixes in namespace declarations up the path to the document root from the specified node
      Parameters:
      prefixes - Set of prefixes to be collected
    • getNamespaceURIOnPath

      String getNamespaceURIOnPath(String nsPrefix)
    • getNamespaceDeclarations

      public Map<String,String> getNamespaceDeclarations()
      Returns:
      Map of namespace declarations for this node
    • serialize

      public void serialize(Serializer serializer, Writer writer) throws IOException
      Specified by:
      serialize in interface BaseToken
      Overrides:
      serialize in class BaseHtmlNode
      Throws:
      IOException
    • makeCopy

      public TagNode makeCopy()
    • isCopy

      public boolean isCopy()
    • traverse

      public void traverse(TagNodeVisitor visitor)
      Traverses the tree and performs visitor's action on each node. It stops when it finishes all the tree or when visitor returns false.
      Parameters:
      visitor - TagNodeVisitor implementation
    • traverseInternally

      private boolean traverseInternally(TagNodeVisitor visitor)
    • isForeignMarkup

      public boolean isForeignMarkup()
      Returns:
      the isForeignMarkup
    • setForeignMarkup

      public void setForeignMarkup(boolean isForeignMarkup)
      Parameters:
      isForeignMarkup - the isForeignMarkup to set
    • isTrimAttributeValues

      public boolean isTrimAttributeValues()
      Returns:
      the isTrimAttributeValues
    • setTrimAttributeValues

      public void setTrimAttributeValues(boolean isTrimAttributeValues)
      Parameters:
      isTrimAttributeValues - the isTrimAttributeValues to set
    • attributesToLowerCase

      private Map<String,String> attributesToLowerCase()
      Returns a copy of the set of attributes for this node with lowercase names. Where there are duplicate attributes (e.g. class, CLASS) the first value is retained.
      Returns:
      a map of attributes in key/value pairs with names in lowercase
    • handleInterruption

      private void handleInterruption()
      Called whenver the thread is interrupted. Currently this is a placeholder, but could hold cleanup methods and user interaction