DOMBrowser.java - Browsing DOM Tree Structure

JDK Tutorials - Herong's Tutorial Examples

∟DOM (Document Object Model) - API for XML Files

∟DOMBrowser.java - Browsing DOM Tree Structure

This section provides a tutorial example on how to write a DOM object browser, DOMBrowser.java, to browse through the DOM object tree structure and print the content at each tree node.

In DOM, an XML file is represented with a tree structure, called "document". Every piece of information in an XML file is abstracted as an org.w3c.dom.Node object, and represented by a node in the tree.

"Node" is actually an interface. It is implemented into many DOM classes to represent different types of information in an XML file. Features that are common to DOM classes are defined as methods in the Node interface. Major get methods of Node include:

getNodeType(): Returns the node type.
getNodeName(): Returns the node name.
getNodeValue(): Returns the value associated with this node.
getChildNodes(): Returns a list of nodes nested inside this node.
getAttributes(): Returns a list of nodes that represents the attributes of this node.

Here is is a list of node types that are supported by DOM:

 2 ATTRIBUTE_NODE
 4 CDATA_SECTION_NODE
 8 COMMENT_NODE
11 DOCUMENT_FRAGMENT_NODE
 9 DOCUMENT_NODE
10 DOCUMENT_TYPE_NODE
 1 ELEMENT_NODE
 6 ENTITY_NODE
 5 ENTITY_REFERENCE_NODE
12 NOTATION_NODE
 7 PROCESSING_INSTRUCTION_NODE
 3 TEXT_NODE

The following program illustrates how an XML file can be parse into a DOM document tree, and how get methods of Node can be used to browse the tree:

/* DOMBrowser.java
 * Copyright (c) HerongYang.com. All Rights Reserved.
 */
import java.io.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.*;
class DOMBrowser {
   public static void main(String[] args) {
      try {
         File x = new File(args[0]);
         DocumentBuilderFactory f
            = DocumentBuilderFactory.newInstance();
         DocumentBuilder b = f.newDocumentBuilder();
         Document d = b.parse(x);
         printNode(d, "");
      } catch (ParserConfigurationException e) {
         System.out.println(e.toString());
      } catch (SAXException e) {
         System.out.println(e.toString());
      } catch (IOException e) {
         System.out.println(e.toString());
      }
   }
   static void printNode(Node n, String p) {
      NodeList l = n.getChildNodes();
      NamedNodeMap m = n.getAttributes();
      int ml = -1;
      if (m!=null) ml = m.getLength();
      System.out.println(p+n.getNodeName()+": "+n.getNodeType()+", "
         +l.getLength()+", "+ml+", "+n.getNodeValue());
      for (int i=0; i<ml; i++) {
         Node c = m.item(i);
         printNode(c,p+" |-");
      }
      for (int i=0; i<l.getLength(); i++) {
         Node c = l.item(i);
         printNode(c,p+" ");
      }
   }
}

Now let's use this program to browse my first XML file, hello.xml:

<?xml version="1.0"?>
<body>Hello world!</body>

You will get the following output:

herong> java DOMBrowser.java hello.xml

#document: 9, 1, -1, null
 body: 1, 1, 0, null
  #text: 3, 0, -1, Hello world!

Here is how to read the output:

The Document object is also a Node object, which is presented by the first line in the output.
The "xml" processing instruction is not part of the document object.
The second line in the output says that the root element is named as "body", of type 1, has 1 child node, has 0 attribute, and has no value.
The third line in the output says that there is child node nested inside the "body" node. The child node is called "#text", of type 3, has 0 child node, could not have any attribute, and has a value of string "Hello world!".
Note that the text enclosed by the "body" tags is parsed into a node separated from the "body" node. So how can we link that text with the tag name "body"?

Here is another XML file with more elements, user.xml:

<?xml version="1.0"?>
<user status="active">
 <!-- This is not a real user. -->
 <first_name>John</first_name>
 <last_name>Smith</last_name>
</user>

Run DOMBrowser with this XML file JDK 17, I got:

herong> \progra~1\java\jdk-17.0.1\bin\java DOMBrowser user.xml

#document: 9, 1, -1, null
 user: 1, 7, 1, null
  |-status: 2, 1, -1, active
  |- #text: 3, 0, -1, active
  #text: 3, 0, -1,

  #comment: 8, 0, -1,  This is not a real user.
  #text: 3, 0, -1,

  first_name: 1, 1, 0, null
   #text: 3, 0, -1, John
  #text: 3, 0, -1,

  last_name: 1, 1, 0, null
   #text: 3, 0, -1, Smith
  #text: 3, 0, -1,

The output is more interesting:

Line breaks are also parsed into "#text" nodes. This is why node "user" has 7 child nodes: 4 line breaks, 1 comment, and 2 elements: "first_name" and "last_name".
For a node that represents an attribute of element, the node value is the attribute value. See node "status" under "user".
The attribute also a default "#text" child note that holds also holds the attribute value. In other words, a single attribute will be represented twice in the DOM tree.

But I ran DOMBrowser in JDK 1.8, I got:

herong> \progra~1\java\jdk1.8.0\bin\java DOMBrowser user.xml

#document: 9, 1, -1, null
 user: 1, 7, 1, null
  |-status: 2, 0, -1, active
  #text: 3, 0, -1,

  #comment: 8, 0, -1,  This is not a real user.
  #text: 3, 0, -1,

  first_name: 1, 1, 0, null
   #text: 3, 0, -1, John
  #text: 3, 0, -1,

  last_name: 1, 1, 0, null
   #text: 3, 0, -1, Smith
  #text: 3, 0, -1,