Package org.attoparser.select
package org.attoparser.select
Handlers for filtering a part or several parts of markup during parsing in a fast and efficient way.
Handler Implementations
There are two main handlers (implementations of IMarkupHandler
for
markup selection in this package:
BlockSelectorMarkupHandler
- For selecting entire blocks of markup (i.e. elements and all the nodes in their subtrees). This can be used, for example, for extracting fragments of markup during the parsing of the document, in a way so that discarded markup does never reach higher layers of the document processing infrastructure.
NodeSelectorMarkupHandler
- For selecting only specific nodes in markup (i.e. not including their subtrees). This can be used for modifying certain specific tags in markup during parsing, for example by adding additional attributes to them that are not present in the original parsed markup.
Markup Selector Syntax
Markup selectors used by handlers in this package use a specific syntax with features borrowed from XPath, CSS and jQuery selectors, in order to provide ease-of-use for most users. Many times there are several ways to express the same selector, depending on the user's preferences.
For example, all the following equivalent selectors will select every <div> with class content, in any position in markup:
//div[class='content']
//div[@class='content']
div[class='content']
div[@class='content']
//div.content
div.content
These are the different operations this syntax allows:
Basic selectors
- x
//x - Both are equivalent, and mean children of the current node with name x, at any depth in markup. If a reference resolver is being used, they will also be equivalent to %x (see below).
- /x
- Means direct children of the current node with name x.
- x/y
- Means direct children with name y of elements with name x, being the parent x elements at any level in markup.
- x//y
- Means children (at any level) with name y of elements with name x, being the parent x elements also at any level in markup.
- text()
comment()
cdata()
doctype()
xmldecl()
procinstr() - These can be used like x (in the same places) but instead of selecting elements (i.e. tags) will select, respectively: text nodes, comments, CDATA sections, DOCTYPE clauses, XML Declarations and Processing Instructions.
- content()
- This selector can be used for selecting the entire contents of an element (i.e. all its body), including all texts, comments, elements, etc. inside it. But, note, not the container element itself.
Attribute matching
- x[z='v']
x[z="v"]
x[@z='v']
x[@z="v"] - All four equivalent, mean elements with name x and an attribute called z with value v. Note attribute values can be surrounded by single or double quotes, and attribute names can be specified with a leading @ (as in XPath) or without it (more similar to jQuery). For the sake of simplicity, only the single-quoted, no-@ syntax will be used for the rest of the examples below.
- [z='v']
//[z='v'] - Means any elements with an attribute called z with value v.
- x[z]
- Means elements with name x and an attribute called z, with any value.
- x[!z]
- Means elements with name x and no attribute called z.
- x[z1='v1' and z2='v2']
- Means elements with name x and attributes z1 and z2 with values v1 and v2, respectively.
- x[z1='v1' or z2='v2']
- Means elements with name x and, either an attribute z1 with value v1, or an attribute z2 with value v2.
- x[z1='v1' and (z2='v2' or z3='v3')]
- Selects according to the specified attribute complex expression. As can be seen, these expressions can be parenthesized to introduce a certain evaluation order.
- x[z!='v']
x[z^='v']
x[z$='v']
x[z*='v'] - Similar to x[z='v'] but applying different operators to attribute matching instead of equality (=). Respectively: not equal (!=), starts with (^=), ends with ($=) and contains (*=).
- x.z
x[class='z'] - When parsing in HTML mode (and only then), these two selectors will be completely equivalent. Besides, in this case the selector will look for an x element which has the z class, knowing that HTML's class attribute allows the specification of several classes separated by white space. So something like <x class="z y w"> will be matched by this selector.
- x#z
x[id='z'] - When parsing in HTML mode (and only then), these two selectors will be completely equivalent, matching those x elements that have an ID with value z.
Index-based matching
- x[i]
- Means elements with name x positioned in index i among its siblings. Sibling here means node child of the same parent element, matching the same conditions (in this case "having x as name"). Note indexes start with 0.
- x[z='v'][i]
- Means elements with name x, attribute z with value v and positioned in number i among its siblings (same name, same attribute with that value).
- x[even()]
x[odd()] - Means elements with name x positioned in an even (or odd) index among its siblings. Note even includes the index number 0.
- x[>i]
x[<i] - Mean elements with name x positioned in an index greater (or lesser) than i among its siblings.
- text()[i]
comment()[>i] - Applies the specified index-based matching operations to nodes of types other than elements: texts, comments, CDATA sections, etc.
Reference-based matching
- x%ref
-
Means elements with name x and matching markup selector reference
with value ref. These markup selector references usually have a user-defined
meaning and are resolved to a markup selector without references by means of an instance of the
IMarkupSelectorReferenceResolver
interface passed to the selecting markup handlers (BlockSelectorMarkupHandler
andNodeSelectorMarkupHandler
) during construction. For example, a reference resolver could be configured that converts (resolves) %someref into div[class='someref' or id='someref']. Also, the Thymeleaf template engine uses this mechanism for resolving %fragmentName (or simply fragmentName, as explained below) into //[th:fragment='fragmentName' or data-th-fragment='fragmentName']. - %ref
- Means any elements (whichever the name) matching reference with value ref.
- ref
- Equivalent to %ref. When a markup selector reference resolver has been configured, ref can bean both "element with name x" and "element matching reference x" (both will match).
-
ClassDescriptionImplementation of the
IMarkupHandler
that adds an attribute (with a user-specified name) to all elements that match one or more selectors, as determined by aBlockSelectorMarkupHandler
orNodeSelectorMarkupHandler
handler.Implementation ofIMarkupHandler
able to apply block selection based on a set of specified markup selectors (seeorg.attoparser.select
).Interface modeling reference resolvers, the objects that can be used for tuning the selector matching operations done byBlockSelectorMarkupHandler
andNodeSelectorMarkupHandler
.Implementation ofIMarkupHandler
able to apply node-selection based on a set of specified markup selectors (seeorg.attoparser.select
).Class used for reporting the current selectors matching the different levels of selection specified at the handler chain by means of instances ofBlockSelectorMarkupHandler
andNodeSelectorMarkupHandler
instances.