JDK Tutorials - Herong's Tutorial Notes
Dr. Herong Yang, Version 4.32, 2006

Simple API for XML (SAX)

Part:   1  2  

JDK Tutorials - Herong's Tutorial Notes © Dr. Herong Yang

Internationalization

Character Set and Encoding

Socket Communication

Document Object Model (DOM)

XSD Validation in Java

XSL - Transformer in Java

JCA - Private and Public Key Pairs

JCE - Secret Key

SSL (Secure Socket Layer)

SSL - Client Authentication

... Table of Contents

(Continued from previous part...)

My SAX Based XML Browser

Let's build a simple SAX based XML browser by handling the events in the ContentHandler interface:

/**
 * SAXBrowser.java
 * Copyright (c) 2002 by Dr. Herong Yang
 */
import java.io.File;
import java.io.IOException;
import javax.xml.parsers.SAXParserFactory;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.ParserConfigurationException;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
class SAXBrowser {
   public static void main(String[] args) {
      try {
      	 File x = new File(args[0]);
         SAXParserFactory f = SAXParserFactory.newInstance();
         SAXParser p = f.newSAXParser();
         DefaultHandler h = new MyContentHandler();
         p.parse(x,h);
      } catch (ParserConfigurationException e) {
         System.out.println(e.toString()); 
      } catch (SAXException e) {
         System.out.println(e.toString()); 
      } catch (IOException e) {
         System.out.println(e.toString()); 
      }
   }
   private static class MyContentHandler extends DefaultHandler {
      static String p = "_";
      public void startDocument() throws SAXException {
         System.out.println("Starting document...");
      }
      public void endDocument() throws SAXException {
         System.out.println("Ending document...");
      }
      public void startElement(String ns, String sName, String qName,
         Attributes attrs) throws SAXException {
         String eName = sName;
         if (sName.equals("")) eName = qName;
         System.out.println("e"+p+eName);
         if (attrs!=null) {
            for (int i=0; i<attrs.getLength(); i++) {
               String aName = attrs.getLocalName(i);
               if (aName.equals("")) aName = attrs.getQName(i);
               System.out.println("a"+p+" "+aName+"="
                  +attrs.getValue(i));
            }
         }
         p = p + "_";
      }
      public void endElement(String ns, String sName, String qName)
         throws SAXException {
         p = p.replaceFirst("__", "_");
      }
      public void characters(char buf[], int offset, int len)
         throws SAXException {
         String s = new String(buf, offset, len);
         System.out.println("c"+p+s);
      }
      public void ignorableWhitespace(char buf[], int offset, int len)
         throws SAXException {
         String s = new String(buf, offset, len);
         System.out.println("i"+p+s);
      }
   }
}

Note that:

  • I cheated a little bit. Instead of implementing the ContentHandler interface directly, I extended the DefaultHandler class, which implemented handling methods for all events (by doing nothing). In this way, I only need to override the handling methods that I am interested in.
  • "_" character is used to indent sub-elements in nested elements.

Let's try this with hello.xml:

<?xml version="1.0"?>
<p>Hello world!</p>

Run java SAXBrowser hello.xml, I got:

Starting document...
e_p
c__Hello world!
Ending document...

Excellent. The program seems to be working.

Let's try it with user.xml:

<?xml version="1.0"?>
<user status="active">
 <!-- This is not a real user. -->
 <first_name>John</first_name>
 <last_name>Smith</last_name>
</user>

Run java SAXBrowser user.xml, I got:

Starting document...
e_user
a_ status=active
c__
c__

c__
c__
c__

c__
e__first_name
c___John
c__
c__

c__
e__last_name
c___Smith
c__
c__

Ending document...

The program still works. But why the parser fired so many "characters()" events? It looks like the parser didn't group the space character, line feed, and cartridge return into a single char[] and fire one "characters()" event. It fired multiple events, one per character.

Source: Herong's Notes on XML.

Part:   1  2  

Dr. Herong Yang, updated in 2006
JDK Tutorials - Herong's Tutorial Notes - Simple API for XML (SAX)