Expressions and Location Paths

XML Tutorials - Herong's Tutorial Examples

∟Expressions and Location Paths

This section describes expressions and location paths. A sequence of data objects, operators, grouping parenthesis and function calls is called a location path, if it is used to represent a set of nodes in the source XML document.

Expression: A sequence of data objects, operators, grouping parenthesis and function calls. Examples of expressions:

( 6 + 2 ) * 3 div 4

Location Path: An expression resulting a node set of the source XML document. The semantics for a location path is:

LocationPath ::= RelativeLocationPath | AbsoluteLocationPath

AbsoluteLocationPath ::= '/' | '/'RelativeLocationPath

RelativeLocationPath ::= LocationStep 
   | RelativeLocationPath'/'LocationStep

LocationStep ::= AxisName'::'NodeTest | LocationStep'['Predicate']'
   | LocationStep'|'LocationStep

AxisName ::= 'ancestor' | 'ancestor-or-self' | 'attribute' | 'child' 
   | 'descendant' | 'descendant-or-self' | 'following' 
   | 'following-sibling' | 'namespace' | 'parent' | 'preceding' 
   | 'preceding-sibling' | 'self' 

NodeTest ::= NameTest | 'node()' | 'text()' | 'comment()' 
   | 'processing-instruction()' 
   | 'processing-instruction(' StingLiteral ')' 
         
NameTest ::= '*' | NameSpacePrefix':*' | NameSpacePrefix':'NodeName

NodeName ::= Any node name in the XML structure

StringLiteral ::= A sequence of valid characters enclosed in quotes

Predicate ::= A boolean expression | AxisName'::'NodeTest

Note that there are two operators introduced in here:

'/': Location step delimiter.
'|': Location step "or" operator.

Evaluation rules on location path:

If a location path starts '/', it's an absolute path. The '/' changes the context node to the root element.
If a location step is followed by another location step with '/' in between, each node in the node set produced by the first step will be used as the context node for the second step. The final node set will be the union of all node sets produced by apply the second step on each node of the first step.
If a location step is followed by another location step with '|' in between, the final node set will be the union of the two node sets produced by the first location path and second location path.
Axis defines how the node test should be applied. For example, 'child' defines that the node test being applied on the child element nodes of the context node. 'attribute' defines that the node test being applied on the attributes of the context node.
Axis also affects the order of nodes in the resulting node set. Forward axis produces node set with nodes ordered as they are in the XML structure. Reverse axis produces node set with nodes reversely ordered as they are in the XML structure.
'*' represents any node name. Text nodes are not part of '*', since text nodes have no names.
If predicate is specified, each node in the node set produced before this predicate will be used as the context node for the predicate to validate. If the validation fails, that node will be removed from the node set.
If a predicate results a number, it will be converted to true, if the number is equal to the context position.
If a predicate results a string, it will be converted to true, if the string's length is great than zero.
If a predicate results a node set, it will be converted to true, if the set not empty.

There is a number of abbreviations for the axis name and node test parts of the location step:

.       self::node()
..      parent::node()
name    child::name
@name   attribute::name
//      /descendant-or-self::node()/

Let's look at some examples of location paths:

"name" matches any child element nodes named "name".
"./name" matches any child element nodes named "name".
"name1/name2" matches any grand child element nodes named "name2" inside any child element nodes named "name1".
".//name" matches any element nodes named "name" in the current sub-tree.
"*" matches any child element nodes.
"*/*" matches any grand child element nodes in the current sub-tree.
"name1|name2" matches any child element nodes named "name1" or "name2".
"node()" matches any child nodes. This is a super-set of "*".
"next()" matches any child text nodes. This is a sub-set of "node()".
"name[1]" matches the first child element node named "name".
"name[position()>1]" matches any child element nodes named "name", except the first one.
"name1[name2]" matches any child element nodes named "name1", who has at least one child element node named "name2".
"name1[name2='text']" matches any child element nodes named "name1", who has at least one child element node named "name2" with context equal to "text".
"@name" matches any attribute nodes named "name".
"@*" matches any attribute nodes.
"name1[@name2]" matches any child element nodes named "name1", who has at least one attribute node named "name2".
"name1[@name2='value']" matches any child element nodes named "name1", who has at least one attribute node named "name2" with value equal to "value".