JDK (Java Development Kit) Tutorials
Dr. Herong Yang, Version 5.00

DOMBrowser.java - Browsing DOM Tree Structure

This section provides a tutorial example on how to write a DOM object browser, DOMBrowser.java, to browse through the DOM object tree structure and print the content at each tree node.

In DOM, an XML file is represented with a tree structure, called "document". Every piece of information in an XML file is abstracted as an org.w3c.dom.Node object, and represented by a node in the tree.

"Node" is actually an interface. It is implemented into many DOM classes to represent different types of information in an XML file. Features that are common to DOM classes are defined as methods in the Node interface. Major get methods of Node include:

  • getNodeType(): Returns the node type.
  • getNodeName(): Returns the node name.
  • getNodeValue(): Returns the value associated with this node.
  • getChildNodes(): Returns a list of nodes nested inside this node.
  • getAttributes(): Returns a list of nodes that represents the attributes of this node.

Here is is a list of node types that are supported by DOM:

 2 ATTRIBUTE_NODE
 4 CDATA_SECTION_NODE
 8 COMMENT_NODE
11 DOCUMENT_FRAGMENT_NODE
 9 DOCUMENT_NODE
10 DOCUMENT_TYPE_NODE
 1 ELEMENT_NODE
 6 ENTITY_NODE
 5 ENTITY_REFERENCE_NODE
12 NOTATION_NODE
 7 PROCESSING_INSTRUCTION_NODE
 3 TEXT_NODE

The following program illustrates how an XML file can be parse into a DOM document tree, and how get methods of Node can be used to browse the tree:

/**
 * DOMBrowser.java
 * Copyright (c) 2002 by Dr. Herong Yang
 */
import java.io.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.*;
class DOMBrowser {
   public static void main(String[] args) {
      try {
      	 File x = new File(args[0]);
         DocumentBuilderFactory f 
            = DocumentBuilderFactory.newInstance();
         DocumentBuilder b = f.newDocumentBuilder();
         Document d = b.parse(x);
         printNode(d, "");
      } catch (ParserConfigurationException e) {
         System.out.println(e.toString()); 	
      } catch (SAXException e) {
         System.out.println(e.toString()); 	
      } catch (IOException e) {
         System.out.println(e.toString()); 	
      }
   }
   static void printNode(Node n, String p) {
      NodeList l = n.getChildNodes();
      NamedNodeMap m = n.getAttributes();
      int ml = -1;
      if (m!=null) ml = m.getLength(); 
      System.out.println(p+n.getNodeName()+": "+n.getNodeType()+", "
         +l.getLength()+", "+ml+", "+n.getNodeValue());
      for (int i=0; i<ml; i++) {
         Node c = m.item(i);
         printNode(c,p+" |-");
      }
      for (int i=0; i<l.getLength(); i++) {
         Node c = l.item(i);
         printNode(c,p+" ");
      }
   }
}

Now let's use this program to browse my first XML file, hello.xml:

<?xml version="1.0"?>
<body>Hello world!</body>

You will get the following output:

#document: 9, 1, -1, null
 body: 1, 1, 0, null
  #text: 3, 0, -1, Hello world!

Here is how to read the output:

  • The Document object is also a Node object, which is presented by the first line in the output.
  • The "xml" processing instruction is not part of the document object.
  • The second line in the output says that the root element is named as "body", of type 1, has 1 child node, has 0 attribute, and has no value.
  • The third line in the output says that there is child node nested inside the "body" node. The child node is called "#text", of type 3, has 0 child node, could not have any attribute, and has a value of string "Hello world!".
  • Note that the text enclosed by the "body" tags is parsed into a node separated from the "body" node. So how can we link that text with the tag name "body"?

Here is another XML file with more elements, user.xml:

<?xml version="1.0"?>
<user status="active">
 <!-- This is not a real user. -->
 <first_name>John</first_name>
 <last_name>Smith</last_name>
</user>

Run DOMBrowser with this XML file, you will get:

#document: 9, 1, -1, null
 user: 1, 7, 1, null
  |-status: 2, 0, -1, active
  #text: 3, 0, -1,

  #comment: 8, 0, -1,  This is not a real user.
  #text: 3, 0, -1,

  first_name: 1, 1, 0, null
   #text: 3, 0, -1, John
  #text: 3, 0, -1,

  last_name: 1, 1, 0, null
   #text: 3, 0, -1, Smith
  #text: 3, 0, -1,

The output is more interesting:

  • Line breaks are also parsed into "#text" nodes. This is why node "user" has 7 child nodes: 4 line breaks, 1 comment, and 2 elements: "first_name" and "last_name".
  • For a node that represents an attribute of element, the node value is the attribute value. See node "status" under "user".

Last update: 2006.

Sections in This Chapter

DOMParser.java - Parsing XML Files with DOM

DOMBrowser.java - Browsing DOM Tree Structure

DOMNewDoc.java - Building a New DOM Document

DOMToXML.java - Converting DOM Documents to XML Files

Dr. Herong Yang, updated in 2008
DOMBrowser.java - Browsing DOM Tree Structure