[an error occurred while processing this directive]
> developer > web app development
The Merits of SAX for Parsing XML Documents
by Murali Kashaboina, Senior Software Engineer, Novell
Date Created: 2001-06-11 14:05:00.000
  Introduction
  Design Model
  Utility class
  XMLReader Class
  DataUpload Class
  Conclusions
  References

introduction
In the ever-expanding technological arena, XML has evolved as a standard markup language for many applications, including Internet communications, business-to-business data exchange, offloading and reloading databases, electronic commerce, and a variety of scientific applications. And the number of XML applications continues to grow from day to day.

XML documents contain data in a structural fashion that conforms to certain syntactical rules as determined by XML standards. All XML-based applications must access the information contained in an XML document at some point during their life cycles. In order to accomplish this task, there should be some kind of software mechanism that can both understand and read the data while validating it against predefined XML syntactical rules. This mechanism should be intelligent enough to interpret the XML document precisely and accurately, especially since the XML document may contain multiple types of information, such as processing instructions, document type definition, namespaces, entity definitions, element and attribute definitions, character encoding, comments, and error and element values.

As a developer, you may be wondering whether in order to develop XML-based applications, you have to first develop special programs just to read and interpret XML documents. The answer is a solid "no." In fact, there are software components known as XML parsers that were introduced specifically to free developers from shouldering the responsibility of parsing XML documents. XML parsers enable programmatic access to the XML document while shielding the programmer from the intricacies of XML syntax.

There are mainly two types of parser APIs. They are Simple API for XML (SAX) and Document Object Model API (DOM). These APIs were created with the same intention of providing access to the information contained in an XML document. However, the functional methodology between the two is quite different.

DOM parsers create a hierarchical tree object in memory for the entire XML document. This tree model contains nodes in a hierarchical and sequential order as found in the actual XML document. By interacting with these nodes in a programmatic way, data can be accessed. This programmatic interaction is similar to the one in which tree objects are accessed in Swing. However, this type of data access best suits situations in which the data must be read in sequential order. For example, consider the data exchange between two different word processors. In this case, the sequential interpretation of information, such as paragraph properties, style, format, page layout, font, indentations etc., becomes very important. If some data is not correctly read in the right order, the whole document may not be rendered at all. There is one major limitation to this approach in that the entire document is transformed into an object in memory. So what happens if the actual XML document contains huge chunks of data, or if there is no need to keep track of the data sequence at all? In such instances, the object created to represent the document is going to eat away a lot of memory, which is often undesirable and cannot be justified. There should be an alternative way to deal with situations like this, and that's exactly where SAX comes into picture.

SAX, Simple API for XML, as the name indicates, is easy to use because it does not create a tree object in memory. SAX is an event-based API, not an object-based API like DOM. The SAXparser generates lexical events as it interprets XML syntax while reading the XML document; it can generate the following lexical events:

  • Document start
  • Processing instruction encountered
  • Comment encountered
  • Element start
  • Text encountered
  • Element end
  • Document end


By intercepting these events, custom objects can be built on the fly to process the XML content. SAX API provides a set of classes and interfaces known as "handlers" that can be used to handle the events generated while parsing the XML document. These handlers declare a set of methods that are invoked by the SAX parser while firing different lexical events. In order to implement these methods in your object, you will have to extend the handler base class and register your object with the SAX parser. The following table lists the available handlers and their methods, and briefly explains what each method is intended for.



These classes can be found in Apache's xerces.jar file. For more information on the SAX API, please refer to the API documentation at the following Web site:

http://xml.apache.org/xerces-j/

Apart from the interfaces listed above, SAX API provides a default implementation for these interfaces in the form of HandlerBase Class. Application writers can extend this class when they need to implement only part of an interface.

The following example illustrates how SAX API could be successfully used in practical applications. This example utilizes a builder-pattern approach by creating an XML element on the fly as the SAX parser reads data from the XML document.

Example Problem Statement The example problem describes a case in which certain user information needs to be uploaded into a database. This information is contained in an XML document. Each user's information is represented as an element in the XML document, and this element can contain multiple child nodes. The validity rules of each element are defined in a separate DTD file. This structure also conforms to the user table structure in the database.

The example problem also states that each user element should be validated against the DTD before the user data is uploaded into the database. If one of the users' data does not comply with the DTD, a notification should be logged in an error file.

The sample XML document, DTD, and user table structure are as follows:

XML document



<?xml version="1.0"?>

<root>

<user>

		<SSN></SSN>

		<LASTNAME></LASTNAME>

		<FIRSTNAME></FIRSTNAME>

		<MI></MI>

		<DATEOFBIRTH></DATEOFBIRTH>

		<GENDER></GENDER>

		<STREET></STREET>

		<CITY></CITY>

		<STATE></STATE>

		<ZIP></ZIP>

		<COUNTRY></COUNTRY>

		<PHONE></PHONE>

		<FAX></FAX>

		<EMAIL></EMAIL>

		<JOBTITLE></JOBTITLE>

</user>

</root>



DTD


<?xml version="1.0" encoding="ISO-8859-1"?>

<!ELEMENT user ( SSN, LASTNAME, FIRSTNAME, MI?,

DATEOFBIRTH?, GENDER?,  STREET?, CITY?, STATE?,

ZIP?, COUNTRY?, PHONE?, FAX?,  EMAIL?,  JOBTITLE? )>

<!ELEMENT SSN (#PCDATA)>

<!ELEMENT LASTNAME (#PCDATA)>

<!ELEMENT FIRSTNAME (#PCDATA)>

<!ELEMENT MI (#PCDATA)>

<!ELEMENT DATEOFBIRTH (#PCDATA)>

<!ELEMENT GENDER (#PCDATA)>

<!ELEMENT STREET (#PCDATA)>

<!ELEMENT CITY (#PCDATA)>

<!ELEMENT STATE (#PCDATA)>

<!ELEMENT ZIP (#PCDATA)>

<!ELEMENT COUNTRY (#PCDATA)>

<!ELEMENT PHONE (#PCDATA)>

<!ELEMENT FAX (#PCDATA)>

<!ELEMENT EMAIL (#PCDATA)>

<!ELEMENT JOBTITLE (#PCDATA)>



User Table Structure

Design Model
An object-oriented approach was taken in designing the example program for several reasons: to avoid redundant code; to reuse existing functionalities; to restrict a set of functionalities to separate objects; and to impose clarity and simplicity on the code so that it could be easily understood. The class diagram showing the member variables and methods is as follows:



There are three important classes defined in the model. They are Utility, XMLReader, and DataUpload. The class that we are interested in is the DataUpload because it is the actual class that implements the example problem. The following paragraphs explain these classes in detail.
Utility class
Utility class contains static implementations for common database and XML-related methods. The most important methods in this class are as follows:

  • GetDTDString: reads a specified DTD file with the specified root element name, and creates an internal DTD in a string form.
  • GetInputSource: creates an InputSource object by combining the specified string for the XML document body with an internal DTD string.
  • GetInsertString: creates a SQL insert query for the specified table name and the specified values in a hash table.
  • GetUpdateString: creates a SQL update query for the specified table name and the specified values in a hash table.
  • GetID: executes a SQL query for a table row ID and returns the ID as a number object.
  • LoadData: executes SQL statements to either insert the data or to update the data in a database.


The complete definition for Utility class is as follows:


import java.io.*;

import java.util.*;

import java.sql.*;

import org.xml.sax.*;

import com.sun.xml.tree.*;

import javax.xml.parsers.*;



public class Utility

{

    public static InputSource getInputSource( String pstrXmlDocBody, String pstrDTDContent )

    {

        InputSource mInputSource = null;



        try

        {

            ByteArrayOutputStream mByteArrayOutputStream = new ByteArrayOutputStream();

            OutputStreamWriter mOutputStreamWriter = new OutputStreamWriter( mByteArrayOutputStream, "UTF-8" );

            mOutputStreamWriter.write("<?xml version=\"1.0\"?>\r\n");

            mOutputStreamWriter.write( pstrDTDContent );

            mOutputStreamWriter.write( pstrXmlDocBody );

            mOutputStreamWriter.flush();

            byte[] mbytes = mByteArrayOutputStream.toByteArray();

            mOutputStreamWriter.close();

            mByteArrayOutputStream.close();

            ByteArrayInputStream mByteArrayInputStream = new ByteArrayInputStream( mbytes );

            mInputSource = new InputSource( mByteArrayInputStream );

        }

        catch (Exception e)

        {

            System.err.println(e);

            e.printStackTrace();

        }



        return mInputSource;

    }



    public static String getDTDString(  File pFile, String pstrRootName )

    {

        StringBuffer mStringBuffer = new StringBuffer();

        mStringBuffer.append( "<!DOCTYPE " );

        mStringBuffer.append( pstrRootName );

        mStringBuffer.append( " [ \n" );



        try

        {

            FileInputStream mFileInputStream = new FileInputStream( pFile );

            BufferedReader mBufferedReader = new BufferedReader( new InputStreamReader( mFileInputStream ) );

            String mstrValue = null;

            while( ( mstrValue = mBufferedReader.readLine() ) != null )

            {

                      mstrValue = mstrValue.trim();

                      if( !mstrValue.startsWith( "<?") )

                      {

                            mStringBuffer.append( mstrValue );

                            mStringBuffer.append( "\n" );

                      }

            }



            mBufferedReader.close();

            mBufferedReader = null;

            mFileInputStream.close();

            mFileInputStream = null;

        }

        catch( Exception pException )

        {

            System.out.println( "Error in getDTDString Method" );

            pException.printStackTrace();

        }



        mStringBuffer.append( "]> \n" );

        return mStringBuffer.toString();

    }



    public static boolean isNumeric( String pstrValue, int[] pintValue )

    {

        try

        {

            pintValue[0] = Integer.valueOf( pstrValue ).intValue();

        }

        catch( NumberFormatException pNumberException )

        {

            return false;

        }



        return true;

    }



    public static synchronized  Number getID( Connection pConnection, String pstrSQLQuery ) throws Exception

    {

        Statement mStatement = null;

        ResultSet mResultSet = null;

        Number mID = null;



        try

        {

            mStatement =  pConnection.createStatement();

            mResultSet = mStatement.executeQuery( pstrSQLQuery );

            if( mResultSet.next() )

            {

                mID = (Number)mResultSet.getObject( 1 );

            }



        }

        catch( SQLException pSQLException )

        {

            System.out.println( "Error in getID Method " );

            pSQLException.printStackTrace();

            throw new Exception( pSQLException.getMessage() );

        }

        finally

        {

            try

            {

                if( mResultSet != null )

                {

                    mResultSet.close();

                    mResultSet = null;

                }



                if( mStatement != null )

                {

                    mStatement.close();

                    mStatement = null;

                }

            }

            catch( SQLException pSQLException  )

            {

            }

        }



        return mID;

    }



  public static synchronized void loadData( Connection pConnection, String pstrSQLQuery ) throws Exception

  {

      Statement mStatement = null;



      try

      {

            mStatement =  pConnection.createStatement();

            mStatement.execute( pstrSQLQuery );



      }

      catch( SQLException pSQLException )

      {

          System.out.println( "Error in loadData Method " );

          pSQLException.printStackTrace();

          throw new Exception( pSQLException.getMessage() );

      }

      finally

      {

        try

        {

            if( mStatement != null )

            {

                mStatement.close();

                mStatement = null;

            }

        }

        catch( SQLException pSQLException )

        {

        }

      }

  }



  public static synchronized boolean isRecordExist( Connection pConnection, String pstrSQLQuery )

  {

      Statement mStatement = null;

      ResultSet mResultSet = null;

      boolean mblnRecordExists = false;



      try

      {

            mStatement =  pConnection.createStatement();

            mResultSet = mStatement.executeQuery( pstrSQLQuery );

            mblnRecordExists = mResultSet.next();

            return mblnRecordExists;

      }

      catch( SQLException pSQLException )

      {

          System.out.println( "Error in isRecordExist Method " );

          pSQLException.printStackTrace();

      }

      finally

      {



        try

        {

            if( mResultSet != null )

            {

                mResultSet.close();

                mResultSet = null;

            }



            if( mStatement != null )

            {

                mStatement.close();

                mStatement = null;

            }

        }

        catch( Exception pException )

        {

            System.out.println( "Error in isRecordExist Method finally block " );

            pException.printStackTrace();

        }

      }



      return mblnRecordExists;

  }



   public static String getInsertString( Hashtable pHashtable, String pstrTableName )

    {

            String mstrInsertString = "INSERT INTO " + pstrTableName + "( ";

            String mstrValuesString = "VALUES( ";

            String mstrKey = "";

            String mstrValue = "";

            boolean mblnFirstEntry = true;

            Enumeration mEnumKeys = pHashtable.keys();



            while( mEnumKeys.hasMoreElements() )

            {

                  mstrKey = ( String ) mEnumKeys.nextElement();

                  mstrValue =  ( String ) pHashtable.get( mstrKey );



                  if( mblnFirstEntry )

                  {

                      mstrInsertString = mstrInsertString + mstrKey;

                      mstrValuesString = mstrValuesString + "'" + mstrValue + "'";

                      mblnFirstEntry = false;

                  }

                  else

                  {

                      mstrInsertString = mstrInsertString + "," + mstrKey;

                      mstrValuesString = mstrValuesString + "," + "'" + mstrValue + "'";

                  }



            }



            mstrInsertString = mstrInsertString + ") \n";

            mstrValuesString = mstrValuesString + ") \n";

            mstrInsertString = mstrInsertString + mstrValuesString;

            return mstrInsertString;

    }



    public static String getUpdateString( Hashtable pHashtable, String pstrTableName )

    {

            String mstrUpdateString = "UPDATE " + pstrTableName + " SET ";

            String mstrKey = "";

            String mstrValue = "";

            boolean mblnFirstEntry = true;

            Enumeration mEnumKeys = pHashtable.keys();



            while( mEnumKeys.hasMoreElements() )

            {

                  mstrKey = ( String ) mEnumKeys.nextElement();

                  mstrValue =  ( String ) pHashtable.get( mstrKey );



                  if( mblnFirstEntry )

                  {

                      mstrUpdateString = mstrUpdateString + mstrKey + " = '" + mstrValue + "' ";

                      mblnFirstEntry = false;

                  }

                  else

                  {

                      mstrUpdateString = mstrUpdateString + "," + mstrKey + " = '" + mstrValue + "' ";

                  }

            }

            return mstrUpdateString;
    }

}
XMLReader class
XMLReader class is a wrapper class for the SAXParser. This class contains an instance of SAXParser as a private member. The constructor of this class either creates a validating or a non-validating SAXParser based on the boolean parameter. A validating parser is used to validate a given XML InputSource against a DTD while parsing it. A non-validating parser is used only to parse a given XML InputSource without actually validating it against a DTD. The XMLReader class also contains a private instance of HandlerBase Class. This instance is used as a default handler for listening to any SAX events while parsing an XML document. The main methods in XMLReader class are "parse" and "validate."

The parse method is used to read XML InputSource. This method actually delegates the control to the parse method in the SAXParser instance by sending the specified InputSource and the HandlerBase as parameters. If the SAXParser throws any exceptions while reading the InputSource, this method will return a boolean value of false. If the parsing is successful, it will return a boolean value of true. Custom HandlerBase objects can create an instance of this XMLReader class and register themselves as lexical event listeners while invoking this method.

The validate method is used to parse a given XML InputSource and validate it against the DTD specified in the actual document represented by the InputSource. This method actually calls the parse method by sending the specified InputSource and the default HandlerBase object as parameters. In order to use this method, custom classes should create a validating XMLReader.

The complete code for the XMLReader class is as follows:




import org.xml.sax.*;

import com.sun.xml.tree.*;

import javax.xml.parsers.*;

import java.io.*;



public class XMLReader

{

  public XMLReader( boolean mblnValidating )

  {

      cDTDHandlerBase = new HandlerBase();

      SAXParserFactory mfactory = SAXParserFactory.newInstance();

      mfactory.setValidating( mblnValidating );



      try

      {

          csaxParser = mfactory.newSAXParser();

      }

      catch (SAXException pSAXException )

      {

          System.out.println( "Error in XMLValidator Constructor " );

          pSAXException.printStackTrace();

      }

      catch ( ParserConfigurationException pParserConfigExcep )

      {

          System.out.println( "Error in XMLValidator Constructor " );

          pParserConfigExcep.printStackTrace();

      }

  }



  public boolean validate( InputSource pInputSource )

  {

      return parse( pInputSource, cDTDHandlerBase );

  }



  public boolean parse( InputSource pInputSource, HandlerBase pHandlerBase )

  {

      boolean mblnValue = false;



      try

      {

           csaxParser.parse( pInputSource, pHandlerBase  );

           mblnValue = true;

           return mblnValue;

      }

      catch ( SAXParseException  pSAXParseException  )

      {

          System.out.println( "Error in read InputSource Method " );

          pSAXParseException.printStackTrace();

      }

      catch (SAXException pSAXException )

      {

          System.out.println( "Error in read InputSource Method " );

          pSAXException.printStackTrace();

      }

      catch ( IOException pIOException )

      {

          System.out.println( "Error in read InputSource Method " );

          pIOException.printStackTrace();

      }

      catch ( Exception pRunTimeException )

      {

          System.out.println( "Error in read InputSource Method " );

          pRunTimeException.printStackTrace();

      }

      return mblnValue;

  }

  private SAXParser csaxParser = null;

  private HandlerBase cDTDHandlerBase = null;

}
DataUpload class
DataUpload is the main class of interest. This class contains the actual implementation for the example problem. This class utilizes the services of both XMLReader and the Utility classes. DataUpload class extends the HandlerBase class from the SAX API and overrides some of its methods. The overridden methods from the HandlerBase are "startElement," "characters," and "endElement." Since we are mainly interested in these three methods, other methods in the HandlerBase are not overridden. The other main methods in the DataUpload class are as follows: "initDataSource," "setDTDContent," "setErrorFile," "uploadFromXMLFile," and "writeToErrorFile." This class also contains a member instance of validating XMLReader known as cXMLValidator that is used to validate each user element against the DTD.

The initDataSource method is used to initialize database connection to the specified database source. To optimize the use of resources, only one instance of connection is used in the entire process.

The setDTDContent method is used to specify the DTD that will be used by the DataUpload object to validate each element in the XML document before uploading the data.

The setErrorFile method is used to specify the file to which the error messages are logged. This message logging is enabled by the writeToErrorFile method. In the current example, those elements in the XML document that do not comply with the specified DTD are discarded and logged to the error file.

The uploadFromXMLFile is the main method in the DataUpload class. This method is used to start the actual data loading process. The logic behind this process is very simple. The steps involved in this process are as follows:
  • Create an XML representation for each user element and validate it against the specified DTD.
  • If the validation is successful, determine if the user data already exists in the database. If it exists, update the database with the new data. If it does not exist, insert new user data in the database.
  • If the validation fails, log the user data and the error message to the error file.


This method first creates an instance of InputSource from the specified XML file. It then creates an instance of non-validating XMLReader and an instance of a StringBuffer. (We will see why we need a StringBuffer when we discuss overridden HandlerBase methods.) This method then invokes the parse method of XMLReader by sending the InputSource object as a source of XML and the current DataUpload object as a lexical event handler. This is the actual start of the processing of each user element in the XML document as the SAXParser fires lexical events while parsing the specified XML InputSource. The actual implementation of this method is as follows:




public void uploadFromXMLFile( File pFile )

    {

        try

        {

            FileInputStream mFileInputStream = new FileInputStream( pFile );

            InputSource mInputSource = new InputSource( mFileInputStream );

            XMLReader mXMLReader = new XMLReader( false );

            cStringBuffer = new StringBuffer();

            mXMLReader.parse( mInputSource, this );

            cSqlConnection.commit();

        }

        catch( Exception pException )

        {

            System.err.println( "Error Occured while parsing the xml file!! Check for Error log for more details!");

        }

    }



The DataUpload class inherits the startElement method from the HandlerBase class. Following is the implementation for the startElement method:


public void startElement( String pstrName, AttributeList pAttributeList ) throws SAXException

      {

              String mstrElementName = pstrName.trim();



              if( mstrElementName.equalsIgnoreCase( "Root" ) )

                    return;



              if( mstrElementName.equalsIgnoreCase( "User" ) )

              {

                    int mintLen = cStringBuffer.length();

                    cStringBuffer.delete( 0, mintLen );

                    cUserHashtable.clear();

              }



              cstrElementValue = "";

              cStringBuffer.append( "\n <" );

              cStringBuffer.append( mstrElementName );

              cStringBuffer.append( ">" );

      }



The SAXParser invokes this startElement method whenever it encounters a new element in the XML document. In the example implementation, the startElement method first checks what element is encountered. If it is the root element, it simply returns. If it is the user element, which means new user data is being loaded, the method cleans the StringBuffer object as well as the Hashtable object. The StringBuffer object, which is created in the uploadFromXMLFile method, is used to create string representation of an XML document for each user element. The Hashtable object is used to temporarily store the values for the child nodes. If this method encounters any element other than the "root" or "user" element, it appends this element as an XML element tag in the string buffer as shown in the method implementation.

The characters method is another method inherited by the DataUpload class from the HandlerBase. The SAXParser invokes this method whenever it encounters any text within an element in the XML document. The text content is passed to this method in the form of a character array. The start index and the length of the array are also sent as parameters. Following is the implementation of the characters method in the DataUpload class:


public void characters(char[] ch, int start, int length) throws SAXException

{

            String mstrValue = new String( ch, start, length );

            mstrValue = mstrValue.trim();

            if( mstrValue.length() > 0 )

            {

                  cstrElementValue = cstrElementValue + mstrValue;

                  cStringBuffer.append( mstrValue );

            }

}



This method first creates a string for the encountered text and trims any white space. It then appends this string to the StringBuffer instance. It also stores this string in the member string for element value. This member string is then used by the endElement method to store the element value in the Hashtable. If the text contains new lines, the entire text might not be sent to this method in just one invocation. In fact, the Characters method will be invoked whenever SAXParser encounters new lines of text. The invocation of the endElement method by the SAXParser indicates that the entire text within the element has been read.

The endElement method is yet another method inherited by DataUpload class from the HandlerBase class. This is a very important method since it marks the end of an element. SAXParser will invoke this method whenever it encounters the end of an element as it parses an XML document. Following is the implementation of this method in the DataUpload class:


public void endElement( String pstrName ) throws SAXException

{

              String mstrElementName = pstrName.trim();

              if( mstrElementName.equalsIgnoreCase( "Root" ) )

                    return;



              cStringBuffer.append( "</" );

              cStringBuffer.append( mstrElementName );

              cStringBuffer.append( ">" );



              if( mstrElementName.equalsIgnoreCase( "User" ) )

              {

                    String mstrXmlDocBody = cStringBuffer.toString();

                    InputSource mInputSource = null;



                    try

                    {

                            mInputSource = Utility.getInputSource( mstrXmlDocBody, cstrDTDContent );



                           if(  cXMLValidator.validate(mInputSource) )

	        {

                            	String mstrSSN = ( String ) cUserHashtable.get( "SSN" );

                            	mstrSSN = mstrSSN.trim();

                           	String mstrQuery = "SELECT * FROM User Where SSN = '" + mstrSSN + "'";

                            	Number mNumID = Utility.getID( cSqlConnection, mstrQuery );



                            	if( mNumID == null  )

                            	{

                                  		mstrQuery = Utility.getInsertString( cUserHashtable, "USER" );

                            	}

                            	else

                            	{

                                  		mstrQuery = Utility.getUpdateString( cUserHashtable, "USER" );

                                 	 	mstrQuery = mstrQuery + " WHERE UserID = " + mNumID.longValue();

                            	}



                            	Utility.loadData( cSqlConnection, mstrQuery );

	          }

                            else

                            {

		throw new Exception( "Bad Data" );

                             }

                    }

                    catch( Exception pException )

                    {

                            writeToErrorFile( "Bad Xml Data For User: "  );

                            writeToErrorFile( mstrXmlDocBody );

                    }

              }

              else

              {

                    cUserHashtable.put( mstrElementName.toUpperCase(),cstrElementValue );

              }

      }



This method first checks what type of element it is dealing with. If it is root element, it returns, indicating that SAXParser has parsed all the elements in the XML document. Otherwise it will append an element end tag for the current element in the StringBuffer. If the current element is a user element, it indicates that SAXParser has completed reading information for a specific user in the XML document. If this is the case, this method creates the XML string representation for the current user from the string buffer. It then creates an InputSource from the XML body string and the DTD content string by invoking Utility's "getInputSource" method. This InputSource is validated by invoking the validate method on the validating XMLReader member instance. If it is a valid entry, this method determines if this user data exists in the database. Based on this, the user data is then either inserted as a new entry or updated in the database. If the user data is invalid, this method will log an error message to the error log file.

The complete code for the DataUpload class is as follows:


import java.sql.*;

import java.io.*;

import java.util.*;

import org.xml.sax.*;



public class DataUpload extends HandlerBase

{

  public DataUpload()

  {

      cUserHashtable = new Hashtable();

      cXMLValidator = new XMLReader( true );

  }



    public void initDataSource( String pstrDataSource, String pstrUserName, String pstrPassword ) throws Exception

    {

        String mstrDBUri = "jdbc:odbc:" + pstrDataSource;

        Class.forName("sun.jdbc.odbc.JdbcOdbcDriver");

        cSqlConnection = DriverManager.getConnection( mstrDBUri, pstrUserName, pstrPassword);

        cSqlConnection.setAutoCommit( false );

    }



    public void uploadFromXMLFile( File pFile )

    {

        try

        {

            FileInputStream mFileInputStream = new FileInputStream( pFile );

            InputSource mInputSource = new InputSource( mFileInputStream );

            XMLReader mXMLReader = new XMLReader( false );

            cStringBuffer = new StringBuffer();

            mXMLReader.parse( mInputSource, this );

            cSqlConnection.commit();

        }

        catch( Exception pException )

        {

            System.err.println( "Error Occured while parsing the xml file!! Check for Error log for more details!");

        }

    }



      public void startElement( String pstrName, AttributeList pAttributeList ) throws SAXException

      {

              String mstrElementName = pstrName.trim();



              if( mstrElementName.equalsIgnoreCase( "Root" ) )

                    return;



              if( mstrElementName.equalsIgnoreCase( "User" ) )

              {

                    int mintLen = cStringBuffer.length();

                    cStringBuffer.delete( 0, mintLen );

                    cUserHashtable.clear();

              }



              cstrElementValue = "";

              cStringBuffer.append( "\n <" );

              cStringBuffer.append( mstrElementName );

              cStringBuffer.append( ">" );

      }



      public void endElement( String pstrName ) throws SAXException

{

              String mstrElementName = pstrName.trim();

              if( mstrElementName.equalsIgnoreCase( "Root" ) )

                    return;



              cStringBuffer.append( "</" );

              cStringBuffer.append( mstrElementName );

              cStringBuffer.append( ">" );



              if( mstrElementName.equalsIgnoreCase( "User" ) )

              {

                    String mstrXmlDocBody = cStringBuffer.toString();

                    InputSource mInputSource = null;



                    try

                    {

                            mInputSource = Utility.getInputSource( mstrXmlDocBody, cstrDTDContent );



                           if(  cXMLValidator.validate(mInputSource) )

	        {

                            	String mstrSSN = ( String ) cUserHashtable.get( "SSN" );

                            	mstrSSN = mstrSSN.trim();

                           	String mstrQuery = "SELECT * FROM User Where SSN = '" + mstrSSN + "'";

                            	Number mNumID = Utility.getID( cSqlConnection, mstrQuery );



                            	if( mNumID == null  )

                            	{

                                  		mstrQuery = Utility.getInsertString( cUserHashtable, "USER" );

                            	}

                            	else

                            	{

                                  		mstrQuery = Utility.getUpdateString( cUserHashtable, "USER" );

                                 	 	mstrQuery = mstrQuery + " WHERE UserID = " + mNumID.longValue();

                            	}



                            	Utility.loadData( cSqlConnection, mstrQuery );

	          }

                            else

                            {

		throw new Exception( "Bad Data" );

                             }

                    }

                    catch( Exception pException )

                    {

                            writeToErrorFile( "Bad Xml Data For User: "  );

                            writeToErrorFile( mstrXmlDocBody );

                    }

              }

              else

              {

                    cUserHashtable.put( mstrElementName.toUpperCase(),cstrElementValue );

              }

      }



      public void characters(char[] ch, int start, int length) throws SAXException

      {

            String mstrValue = new String( ch, start, length );

            mstrValue = mstrValue.trim();

            if( mstrValue.length() > 0 )

            {

                  cstrElementValue = cstrElementValue + mstrValue;

                  cStringBuffer.append( mstrValue );

            }

      }



  public void setErrorFile( File pFile )

  {

      try

      {

          FileOutputStream mFileOutputStream = new FileOutputStream( pFile );

          cPrintWriter = new PrintWriter( mFileOutputStream );

      }

      catch( Exception pException )

      {

          System.out.println( "Error in setErrorFile Method " );

      }

  }



    public void setDTDContent( String pstrDTDContent )

    {

        cstrDTDContent = pstrDTDContent;

    }



  public void close( )

  {

      try

      {

          if( cPrintWriter != null )

          {

              cPrintWriter.flush();

              cPrintWriter.close();



              if( cSqlConnection != null )

              {

                  cSqlConnection.close();

              }

          }

      }

      catch( Exception pException )

      {

          System.out.println( "Error in Close Method " );

      }

  }



  private void writeToErrorFile( String pstrErrorMessage )

  {

      try

      {

          if( cPrintWriter != null && pstrErrorMessage != null )

          {

              cPrintWriter.println( pstrErrorMessage );

              cPrintWriter.flush();

          }

      }

      catch( Exception pException )

      {

          System.out.println( "Error in setErrorFile Method " );

      }

  }



  private PrintWriter cPrintWriter = null;

  private XMLReader cXMLValidator = null;

  private Connection cSqlConnection = null;

  private StringBuffer cStringBuffer = null;

  private String cstrDTDContent = "";

  private String cstrElementValue = "";

  private Hashtable cUserHashtable = null;

}



The most important thing to note from this entire process is the beauty and simplicity of parsing XML documents using a SAXParser. Nowhere in the implementation did we create an instance that will represent the entire XML document. Moreover, SAXParser enabled the separation of the business logic from the XML parsing logic, and also let us optimize the creation of objects. We used only one instance of StringBuffer, one instance of validating XMLReader, and only one instance of SQL Connection.

The implementations for handlers could be more complicated than the one illustrated in the example. However, the complexity depends on the business requirements and the context. In this example, the implementation was intentionally kept simple to emphasize the fundamentals of parsing XML documents using SAX.
conclusions
The efficiency of parsing XML documents can be increased by using SAX. The SAX API for XML parsing offers the following merits as was evident from the example application:
  1. Since SAX is an event-based API, it offers more control on parsing XML documents by enabling the developer to handle the lexical events as per the requirements.
  2. SAX does not create a hierarchical tree representation in memory for the entire XML document, therefore it enables optimized usage of resources.
  3. SAX enables developers to create their own custom object models.
  4. Parsing XML documents using SAX can be very fast when the custom object model is not too complicated.
  5. SAX separates the business logic from the actual parser logic. This offers additional advantage in that the application can start business processes simultaneously as the SAXParser reads data from the XML document.
  6. A validating SAXParser can validate an XML document against a DTD.
  7. SAX API offers a set of standard classes and interfaces that could be used while parsing XML documents in custom applications.
references
James W. Cooper, Java Design Patterns (Addison-Wesley, 2000)
Elliotte Rusty Harold, XML Bible (IDG Books WorldWide, 1999)
Benoit Marchal, XML by Example (QUE, 1999)
Brett McLaughlin, Java and XML (O'Reilly & Associates, 2000)