This section provides a tutorial example on how to write a DOM object browser, DOMBrowser.java, to browse through the DOM object tree structure and print the content at each tree node.
In DOM, an XML file is represented with a tree structure, called "document".
Every piece of information in an XML file is abstracted as an org.w3c.dom.Node object,
and represented by a node in the tree.
"Node" is actually an interface. It is implemented into many DOM classes to represent
different types of information in an XML file. Features that are common to DOM classes
are defined as methods in the Node interface. Major get methods of Node include:
getNodeType(): Returns the node type.
getNodeName(): Returns the node name.
getNodeValue(): Returns the value associated with this node.
getChildNodes(): Returns a list of nodes nested inside this node.
getAttributes(): Returns a list of nodes that represents the attributes
of this node.
Here is is a list of node types that are supported by DOM:
The Document object is also a Node object, which is presented by the
first line in the output.
The "xml" processing instruction is not part of the document object.
The second line in the output says that the root element is named as
"body", of type 1, has 1 child node, has 0 attribute, and has no value.
The third line in the output says that there is child node nested
inside the "body" node. The child node is called "#text", of type 3,
has 0 child node, could not have any attribute, and has a value of string
"Hello world!".
Note that the text enclosed by the "body" tags is parsed into
a node separated from the "body" node. So how can we link that text with
the tag name "body"?
Here is another XML file with more elements, user.xml:
<?xml version="1.0"?>
<user status="active">
<!-- This is not a real user. -->
<first_name>John</first_name>
<last_name>Smith</last_name>
</user>
Run DOMBrowser with this XML file, you will get:
#document: 9, 1, -1, null
user: 1, 7, 1, null
|-status: 2, 0, -1, active
#text: 3, 0, -1,
#comment: 8, 0, -1, This is not a real user.
#text: 3, 0, -1,
first_name: 1, 1, 0, null
#text: 3, 0, -1, John
#text: 3, 0, -1,
last_name: 1, 1, 0, null
#text: 3, 0, -1, Smith
#text: 3, 0, -1,
The output is more interesting:
Line breaks are also parsed into "#text" nodes. This is why node "user"
has 7 child nodes: 4 line breaks, 1 comment, and 2 elements: "first_name"
and "last_name".
For a node that represents an attribute of element, the node value is
the attribute value. See node "status" under "user".