Package org.w3c.tidy
Class Node
- java.lang.Object
-
- org.w3c.tidy.Node
-
public class Node extends java.lang.Object
Used for elements and text nodes element name is null for text nodes start and end are offsets into lexbuf which contains the textual content of all elements in the parse tree. Parent and content allow traversal of the parse tree in any direction. attributes are represented as a linked list of AttVal nodes which hold the strings for attribute/value pairs.- Version:
- $Revision$ ($Author$)
- Author:
- Dave Raggett dsr@w3.org , Andy Quick ac.quick@sympatico.ca (translation to Java), Fabrizio Giustina
-
-
Field Summary
Fields Modifier and Type Field Description protected org.w3c.dom.Node
adapter
DOM adapter.static short
ASP_TAG
node type: asp tag.protected AttVal
attributes
Attribute/Value linked list.static short
CDATA_TAG
node type: CDATA.protected boolean
closed
true if closed by explicit end tag.static short
COMMENT_TAG
node type: comment.protected Node
content
Contained node.static short
DOCTYPE_TAG
node type: doctype.protected java.lang.String
element
Tag name.protected int
end
end of span onto text array.static short
END_TAG
End tag.protected boolean
implicit
true if inferred.static short
JSTE_TAG
node type: jste tag.protected Node
last
last node.protected boolean
linebreak
true if followed by a line break.protected Node
next
next node.protected Node
parent
parent node.static short
PHP_TAG
node type: php tag.protected Node
prev
pevious node.static short
PROC_INS_TAG
node type: .static short
ROOT_NODE
node type: root.static short
SECTION_TAG
node type: section tag.protected int
start
start of span onto text array.static short
START_END_TAG
Start of an end tag.static short
START_TAG
Start tag.protected Dict
tag
tag's dictionary definition.static short
TEXT_NODE
node type: text.protected byte[]
textarray
the text array.protected short
type
TextNode, StartTag, EndTag etc.protected Dict
was
old tag when it was changed.static short
XML_DECL
node type: doctype.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
addAttribute(java.lang.String name, java.lang.String value)
Adds an attribute to the node.void
addClass(java.lang.String classname)
Add a css class to the node.void
checkAttributes(Lexer lexer)
Default method for checking an element's attributes.boolean
checkNodeIntegrity()
Checks for node integrity.protected Node
cloneNode(boolean deep)
Clone this node.static void
coerceNode(Lexer lexer, Node node, Dict tag)
Coerce a node.void
discardDocType()
Discard the doctype node.static Node
discardElement(Node element)
Remove node from markup tree and discard it.protected static Node
escapeTag(Lexer lexer, Node element)
Escapes the given tag.boolean
expectsContent()
Does the node expect contents?Node
findBody(TagTable tt)
Find the body node.Node
findDocType()
Find the doctype element.Node
findHEAD(TagTable tt)
Find the head tag.Node
findHTML(TagTable tt)
Find the "html" element.Node
findTITLE(TagTable tt)
static void
fixEmptyRow(Lexer lexer, Node row)
If a table row is empty then insert an empty cell.This practice is consistent with browser behavior and avoids potential problems with row spanning cells.protected org.w3c.dom.Node
getAdapter()
Returns a DOM Node which wrap the current tidy Node.AttVal
getAttrByName(java.lang.String name)
Returns an attribute with the given name in the current node.boolean
hasOneChild()
Does the node have one (and only one) child?static void
insertDocType(Lexer lexer, Node element, Node doctype)
The doctype has been found after other tags, and needs moving to before the html element.static boolean
insertMisc(Node element, Node node)
Insert a node at the end.void
insertNodeAfterElement(Node node)
Insert node into markup tree after element.static void
insertNodeAsParent(Node element, Node node)
Insert node into markup tree in pace of element which is moved to become the child of the node.void
insertNodeAtEnd(Node node)
Insert node into markup tree.void
insertNodeAtStart(Node node)
Insert a node into markup tree.static void
insertNodeBeforeElement(Node element, Node node)
Insert node into markup tree before element.boolean
isBlank(Lexer lexer)
Is the node content empty or blank? Assumes node is a text node.boolean
isDescendantOf(Dict tag)
Is this node contained in a given tag?boolean
isElement()
Is the node an element?boolean
isJavaScript()
Used to check script node for script language.boolean
isNewNode()
Is this a new (user defined) node? Used to determine how attributes without values should be printed.static void
moveBeforeTable(Node row, Node node, TagTable tt)
Unexpected content in table row is moved to just before the table in accordance with Netscape and IE.void
removeAttribute(AttVal attr)
Remove an attribute from node and then free it.void
removeNode()
Extract this node and its children from a markup tree.void
repairDuplicateAttributes(Lexer lexer)
The same attribute name can't be used more than once in each element.protected void
setType(short newType)
Setter for node type.java.lang.String
toString()
static void
trimEmptyElement(Lexer lexer, Node element)
Trim an empty element.static void
trimInitialSpace(Lexer lexer, Node element, Node text)
This maps<p> hello <em> world </em>
to<p> hello <em> world </em>
.static void
trimSpaces(Lexer lexer, Node element)
Move initial and trailing space out.static void
trimTrailingSpace(Lexer lexer, Node element, Node last)
This maps hello world to hello world .
-
-
-
Field Detail
-
ROOT_NODE
public static final short ROOT_NODE
node type: root.- See Also:
- Constant Field Values
-
DOCTYPE_TAG
public static final short DOCTYPE_TAG
node type: doctype.- See Also:
- Constant Field Values
-
COMMENT_TAG
public static final short COMMENT_TAG
node type: comment.- See Also:
- Constant Field Values
-
PROC_INS_TAG
public static final short PROC_INS_TAG
node type: .- See Also:
- Constant Field Values
-
TEXT_NODE
public static final short TEXT_NODE
node type: text.- See Also:
- Constant Field Values
-
START_TAG
public static final short START_TAG
Start tag.- See Also:
- Constant Field Values
-
END_TAG
public static final short END_TAG
End tag.- See Also:
- Constant Field Values
-
START_END_TAG
public static final short START_END_TAG
Start of an end tag.- See Also:
- Constant Field Values
-
CDATA_TAG
public static final short CDATA_TAG
node type: CDATA.- See Also:
- Constant Field Values
-
SECTION_TAG
public static final short SECTION_TAG
node type: section tag.- See Also:
- Constant Field Values
-
ASP_TAG
public static final short ASP_TAG
node type: asp tag.- See Also:
- Constant Field Values
-
JSTE_TAG
public static final short JSTE_TAG
node type: jste tag.- See Also:
- Constant Field Values
-
PHP_TAG
public static final short PHP_TAG
node type: php tag.- See Also:
- Constant Field Values
-
XML_DECL
public static final short XML_DECL
node type: doctype.- See Also:
- Constant Field Values
-
parent
protected Node parent
parent node.
-
prev
protected Node prev
pevious node.
-
next
protected Node next
next node.
-
last
protected Node last
last node.
-
start
protected int start
start of span onto text array.
-
end
protected int end
end of span onto text array.
-
textarray
protected byte[] textarray
the text array.
-
type
protected short type
TextNode, StartTag, EndTag etc.
-
closed
protected boolean closed
true if closed by explicit end tag.
-
implicit
protected boolean implicit
true if inferred.
-
linebreak
protected boolean linebreak
true if followed by a line break.
-
was
protected Dict was
old tag when it was changed.
-
tag
protected Dict tag
tag's dictionary definition.
-
element
protected java.lang.String element
Tag name.
-
attributes
protected AttVal attributes
Attribute/Value linked list.
-
content
protected Node content
Contained node.
-
adapter
protected org.w3c.dom.Node adapter
DOM adapter.
-
-
Constructor Detail
-
Node
public Node()
Instantiates a new text node.
-
Node
public Node(short type, byte[] textarray, int start, int end)
Instantiates a new node.- Parameters:
type
- node type: Node.ROOT_NODE | Node.DOCTYPE_TAG | Node.COMMENT_TAG | Node.PROC_INS_TAG | Node.TEXT_NODE | Node.START_TAG | Node.END_TAG | Node.START_END_TAG | Node.CDATA_TAG | Node.SECTION_TAG | Node. ASP_TAG | Node.JSTE_TAG | Node.PHP_TAG | Node.XML_DECLtextarray
- array of bytes contained in the Nodestart
- start positionend
- end position
-
Node
public Node(short type, byte[] textarray, int start, int end, java.lang.String element, TagTable tt)
Instantiates a new node.- Parameters:
type
- node type: Node.ROOT_NODE | Node.DOCTYPE_TAG | Node.COMMENT_TAG | Node.PROC_INS_TAG | Node.TEXT_NODE | Node.START_TAG | Node.END_TAG | Node.START_END_TAG | Node.CDATA_TAG | Node.SECTION_TAG | Node. ASP_TAG | Node.JSTE_TAG | Node.PHP_TAG | Node.XML_DECLtextarray
- array of bytes contained in the Nodestart
- start positionend
- end positionelement
- tag namett
- tag table instance
-
-
Method Detail
-
getAttrByName
public AttVal getAttrByName(java.lang.String name)
Returns an attribute with the given name in the current node.- Parameters:
name
- attribute name.- Returns:
- AttVal instance or null if no attribute with the iven name is found
-
checkAttributes
public void checkAttributes(Lexer lexer)
Default method for checking an element's attributes.- Parameters:
lexer
- Lexer
-
repairDuplicateAttributes
public void repairDuplicateAttributes(Lexer lexer)
The same attribute name can't be used more than once in each element. Discard or join attributes according to configuration.- Parameters:
lexer
- Lexer
-
addAttribute
public void addAttribute(java.lang.String name, java.lang.String value)
Adds an attribute to the node.- Parameters:
name
- attribute namevalue
- attribute value
-
removeAttribute
public void removeAttribute(AttVal attr)
Remove an attribute from node and then free it.- Parameters:
attr
- attribute to remove
-
findDocType
public Node findDocType()
Find the doctype element.- Returns:
- doctype node or null if not found
-
discardDocType
public void discardDocType()
Discard the doctype node.
-
discardElement
public static Node discardElement(Node element)
Remove node from markup tree and discard it.- Parameters:
element
- discarded node- Returns:
- next node
-
insertNodeAtStart
public void insertNodeAtStart(Node node)
Insert a node into markup tree.- Parameters:
node
- to insert
-
insertNodeAtEnd
public void insertNodeAtEnd(Node node)
Insert node into markup tree.- Parameters:
node
- Node to insert
-
insertNodeAsParent
public static void insertNodeAsParent(Node element, Node node)
Insert node into markup tree in pace of element which is moved to become the child of the node.- Parameters:
element
- child node. Will be inserted as a child of elementnode
- parent node
-
insertNodeBeforeElement
public static void insertNodeBeforeElement(Node element, Node node)
Insert node into markup tree before element.- Parameters:
element
- child node. Will be insertedbefore elementnode
- following node
-
insertNodeAfterElement
public void insertNodeAfterElement(Node node)
Insert node into markup tree after element.- Parameters:
node
- new node to insert
-
trimEmptyElement
public static void trimEmptyElement(Lexer lexer, Node element)
Trim an empty element.- Parameters:
lexer
- Lexerelement
- empty node to be removed
-
trimTrailingSpace
public static void trimTrailingSpace(Lexer lexer, Node element, Node last)
This maps hello world to hello world . If last child of element is a text node then trim trailing white space character moving it to after element's end tag.- Parameters:
lexer
- Lexerelement
- nodelast
- last child of element
-
escapeTag
protected static Node escapeTag(Lexer lexer, Node element)
Escapes the given tag.- Parameters:
lexer
- Lexerelement
- node to be escaped- Returns:
- escaped node
-
isBlank
public boolean isBlank(Lexer lexer)
Is the node content empty or blank? Assumes node is a text node.- Parameters:
lexer
- Lexer- Returns:
true
if the node content empty or blank
-
trimInitialSpace
public static void trimInitialSpace(Lexer lexer, Node element, Node text)
This maps<p> hello <em> world </em>
to<p> hello <em> world </em>
. Trims initial space, by moving it before the start tag, or if this element is the first in parent's content, then by discarding the space.- Parameters:
lexer
- Lexerelement
- parent nodetext
- text node
-
trimSpaces
public static void trimSpaces(Lexer lexer, Node element)
Move initial and trailing space out. This routine maps: hello world to hello world and hello world to hello world .- Parameters:
lexer
- Lexerelement
- Node
-
isDescendantOf
public boolean isDescendantOf(Dict tag)
Is this node contained in a given tag?- Parameters:
tag
- descendant tag- Returns:
true
if node is contained in tag
-
insertDocType
public static void insertDocType(Lexer lexer, Node element, Node doctype)
The doctype has been found after other tags, and needs moving to before the html element.- Parameters:
lexer
- Lexerelement
- documentdoctype
- doctype node to insert at the beginning of element
-
findBody
public Node findBody(TagTable tt)
Find the body node.- Parameters:
tt
- tag table- Returns:
- body node
-
isElement
public boolean isElement()
Is the node an element?- Returns:
true
if type is START_TAG | START_END_TAG
-
moveBeforeTable
public static void moveBeforeTable(Node row, Node node, TagTable tt)
Unexpected content in table row is moved to just before the table in accordance with Netscape and IE. This code assumes that node hasn't been inserted into the row.- Parameters:
row
- Row nodenode
- Node which should be moved before the tablett
- tag table
-
fixEmptyRow
public static void fixEmptyRow(Lexer lexer, Node row)
If a table row is empty then insert an empty cell.This practice is consistent with browser behavior and avoids potential problems with row spanning cells.- Parameters:
lexer
- Lexerrow
- row node
-
coerceNode
public static void coerceNode(Lexer lexer, Node node, Dict tag)
Coerce a node.- Parameters:
lexer
- Lexernode
- Nodetag
- tag dictionary reference
-
removeNode
public void removeNode()
Extract this node and its children from a markup tree.
-
insertMisc
public static boolean insertMisc(Node element, Node node)
Insert a node at the end.- Parameters:
element
- parent nodenode
- will be inserted at the end of element- Returns:
true
if the node has been inserted
-
isNewNode
public boolean isNewNode()
Is this a new (user defined) node? Used to determine how attributes without values should be printed. This was introduced to deal with user defined tags e.g. Cold Fusion.- Returns:
true
if this node represents a user-defined tag.
-
hasOneChild
public boolean hasOneChild()
Does the node have one (and only one) child?- Returns:
true
if the node has one child
-
findHTML
public Node findHTML(TagTable tt)
Find the "html" element.- Parameters:
tt
- tag table- Returns:
- html node
-
findHEAD
public Node findHEAD(TagTable tt)
Find the head tag.- Parameters:
tt
- tag table- Returns:
- head node
-
checkNodeIntegrity
public boolean checkNodeIntegrity()
Checks for node integrity.- Returns:
- false if node is not consistent
-
addClass
public void addClass(java.lang.String classname)
Add a css class to the node. If a class attribute already exists adds the value to the existing attribute.- Parameters:
classname
- css class name
-
toString
public java.lang.String toString()
- Overrides:
toString
in classjava.lang.Object
- See Also:
Object.toString()
-
getAdapter
protected org.w3c.dom.Node getAdapter()
Returns a DOM Node which wrap the current tidy Node.- Returns:
- org.w3c.dom.Node instance
-
cloneNode
protected Node cloneNode(boolean deep)
Clone this node.- Parameters:
deep
- if true deep clone the node (also clones all the contained nodes)- Returns:
- cloned node
-
setType
protected void setType(short newType)
Setter for node type.- Parameters:
newType
- a valid node type constant
-
isJavaScript
public boolean isJavaScript()
Used to check script node for script language.- Returns:
true
if the script node contains javascript
-
expectsContent
public boolean expectsContent()
Does the node expect contents?- Returns:
false
if this node should be empty
-
-