> developer > dnu > courses > Web Services page 4
Web Services
April 2003
DeveloperNet University Course
Reader Rating    from ratings rate this article
View an eBook Version of this course - LARGE FILE! Send this page to a friend

XML

Most people already have an understanding or at least familiarity of XML. I will provide an overview of what XML is and how it is used. eXtensible Markup Language is the fundamental technology behind Web Services.

Simply put, XML represents data. XML is a markup language similar to HTML or other markup languages. Although its primary purpose is very different from HTML. XML describes the data and presents a structured format for the data with a specific purpose of transferring that data to another entity.

Consider this example. Imagine trying to programmatically parse through and HTML file. HTML tags are inconsistent. Some tags have a closing tag, and some do not. Parsing an HTML file would be very difficult because you would never have a consistent point of reference. You could never count on a particular structure of tags.

XML solves this problem. XML defines a syntax framework that must be followed strictly. This allows you to easily parse through the file and find the data that is needed.

XML is really meta-data. It is structured in tags similar to HTML, but does not define specific tags as does HTML. Instead, you can define the XML tags to be a name that describes the data the XML tag surrounds. The following is an example of an XML file.


   <employee>
<name>Jeff Fischer</name>
<address>888 Novell Place</address>
<title>DNU Developer</title>
</employee>

In this file, I have defined an element called employee. The employee has three elements called name, address, and title where I can specify the data that I want to be associated with an employee element. Using this file as an example, it is easy to see how XML is used to transfer data. If I needed to transfer the data in this file to another department in my company, I could send them the file. Their application could easily parse through the file, extracting the data, such as Jeff Fischer, 888 Novell Place, and DNU Developer, in order to use the data for its application needs.

XML does not define any tags in the recommendation. It is up to the document writer to choose the tags that fit the needs of the data that is going to be transferred. Since Jeff Fischer is an employee of Novell, it was easy for me to decide to use employee as the name of the tag, but I could just have easily used person, worker, or even a non-descriptive name such as `a' for the tag. I recommend that you describe the data with its tag name. That will help you and others who try to understand the data you transfer.

What Does XML Do?

You may ask what exactly does XML do? Well, the answer is nothing, by itself. XML originates from Structured Generalized Markup Language or SGML. The purpose of SGML is to define a structure for a document, not a look and feel for it. The purpose of XML is along this same line. Its real purpose is to define the structure of a document and make it easy to transfer the data. XML by itself does not do anything. You must use another XML based technology such as XSL or a development environment like Java to do something with XML. We'll discuss what you do with XML in a little bit.

A major difference between HTML and XML is that HTML defines how to present data, and XML defines the data. XML does not define how to present it. The architecture of XML clearly separates the data from the presentation. This provides many benefits to an enterprise that uses XML technologies. Businesses have long struggled to have good HTML developers and good back end programmers to develop the web site. The XML architecture allows for businesses to employ programmers to develop the data and User Interface designers to develop the User Interface.

XML Parsers

An XML parser seems self explanatory. It is just a parser that parses through an XML document element by element. It makes parsing through an XML document even easier because an XML parser defines how to parse through the document by element instead of you having to define how to parse through it by symbol, word, or other point of reference. Parsers have been developed for use in several languages including Java, C++, PHP, .NET and others. Several parsers have been developed by Microsoft, IBM, and Apache Group just to name a few.

XML APIs

In addition to parsers, standard APIs have been developed to specify how to parse an XML document which the parsers follow as they parse a document. These standards provide you with a simple to use API to access parts of an XML document and include functions to create, edit, read, and extract data from a document. The standard APIs for XML include DOM and SAX.

The Structure of an XML Document

An XML document has the following two parts:

  • Prolog

  • Body

XML Prolog

The Prolog is the "header" of the document. It consists of the top few lines of XML code. The header must be declared first. The prolog consists of processing instructions which are instructions to the XML parser. Processing instructions must be surrounded by question marks in a format like this: <?...?>. The most often used processing instruction is the declaration of the XML version being used. Here is an example of a typical processing instruction that will be used in each XML file.


<?xml version= "1.0"?>

This example is the simplest processing instruction. In addition to the XML version can also include other attributes in this declaration such as the character encoding scheme and whether or not the document is standalone or not.


<?xml version="1.0" encoding="UTF-8" standalone="no"?>

XML Body

The body of an XML document contains the data of the XML document. Data is represented as tags, just like in HTML or other markup languages. Each tag, called an element is created or "invented" by the document writer. Elements are the most important part of the document because they hold the data of the document. An element can also have attributes that specify additional information about the data inside the tags.

The most important element of the document is the root element. It is the central element of the document and every XML document must contain a root element. The tags of the root element surround the entire document, similar to the <HTML> tag of an HTML document. The root element represents the entity of the XML document you are creating. For example, if you are creating an XML document that would include data about a bank account, then you might want to call the root element <bankaccount>.


<bankaccount>
<type>Checking</type>
<accountnumber>4568218</accountnumber>
<owner>Jeff Fischer</owner>
</bankaccount>

Syntax Rules of XML

XML defines a strict syntax. As a document writer, you need to follow all of the rules for XML syntax. The following list outlines several XML syntax rules.

  • Quotation marks must surround all element attributes. For example, the attribute name for the element bankaccount must have quotation marks such as <bankaccount name = "checking">. Every tag must have a beginning and end tag. For example, a tag named <bankaccount> must have a closing tag to match it such as </bankaccount>.

  • Elements are case sensitive. Element <BankAccount> is different from element <bankaccount>.

  • Tag names must begin with a letter.

  • Punctuation marks are allowed inside the tag name except for the colon. The colon is reserved in XML.

  • Tags cannot overlap. This example is not allowed in XML.


<bankaccount><bank>Jeff Fischer</bankaccount></bank>

    But if you end the most recent tag it will work.

<bankaccount><bank>Jeff Fischer</bank></bankaccount>

  • White space is preserved in XML. This is different from HTML.

  • Comments in XML are written with <!-- Comment. -->


<!--  This is a comment in XML.  --> 

Sample XML Document

Here is a sample XML document representing an email:


<?xml version = "1.0"?>
<to>DNU Team</to>
<from>Jeff Fischer</from>
<cc>My boss</cc>
<subject>Learn XML</subject>
<date>Today</date>
<body>We all need to learn the fundaments of XML in order to write
good Web Services.</body>

Schemas and DTDs

In XML we have a way of defining the structure of the document including which elements must be present, and which elements can be child elements of other elements in the document. This structure is defined by including a Document Type Definition or more commonly referred to as a DTD or a Schema document. Both XML Schemas and DTDs accomplish the same thing, which is to define the structure of the XML document, but do this in their own way. DTDs and Schema documents are XML documents themselves. Defining the syntax of DTDs and Schemas is beyond the scope of this course, but you should be familiar with the fact that XML documents can and most often will be accompanied by either a DTD or a Schema document, especially when XML documents are used in a Web Service.

Well-formed and Valid XML Documents

You should also be familiar with the following terms as they relate to XML documents. The concept of well-formed refers to an XML document that strictly adheres to all of the rules of XML syntax. Each time you write an XML document, you should check to be sure that your document contains proper XML syntax. We can liken it to compiling your XML document, because the document must follow XML rules.

An XML document is said to be valid when it follows the structure of the accompanying DTD or XML document. The XML parser you use will help you check if the document is well-formed and validate your XML document. Many XML IDEs have been developed to facilitate the creation of DTDs and Schemas.

I have included two very simple examples of using a Schema and DTD to define an XML document in the download section at the bottom of the page.

XML Namespaces

Since XML does not define any tags, it is very possible that XML document writers could define the same tags. If an application references an element name that is the same name in two different documents, but the document writer intended for the elements to be completely different, we have a name conflict. This name conflict needs to be resolved in order for the application to understand how to treat each element.

Many programming languages provide a method to resolve this conflict called namespaces. In C++, you use the using namespace directive to create a namespace for your program. That way, no variable or function names in your program will conflict with another library. In small simple programs, it is also common to use the using namespace std; directive to open up the standard C++ namespace.

XML also uses namespaces to solve the potential for element name conflicts in an XML document. To use a namespace, you use the namespace attribute, xmlns. The namespace attribute xmlns defines a name for the namespace and a location identified by a URI or Uniform Resource Identifier that distinguishes the namespace from other namespaces. A URI is normally a URL that just defines the namespace as unique. The URI is not used by a parser and does not even have to define a valid location. Its only purpose is to make the namespace unique from other namespaces.

The syntax of the namespace attribute is as follows:


<namespace_name:element-name xmlns:namespace_name="URI">
</namespace_name:element_name>

Look at the following examples:


<foo:table xmlns:foo="http://www.novell.com/table">
<foo:tablename>My Table</foo:tablename>
<foo:tabletype>Kitchen</foo:tablename>
<foo:tablewood>Oak</foo:tablewood>
</foo:table>


<bar:table xmlns:bar="http://www.novell.com/table">
<bar:tablename>My Table</bar:tablename>
<bar:tabletype>Kitchen</bar:tablename>
<bar:tablewood>Oak</bar:tablewood>
</bar:table>

The above example shows an element called table defined in the namespace called foo. Table is a good example because <table> is also defined in HTML. If I did not define a separate namespace for <table>, there would be a name conflict between my table that I have defined as a kitchen table, and an HTML table that formats data for display.

It is important to understand namespaces in preparation to learn about SOAP and WSDL.

How Does XML Relate to Web Services?

XML is the "communication language" of a Web Service. Web Services communicate with each other and with end users through XML. Web Services use XML because it is an open standard, it is relatively easy to write, and it communicates using the HTTP protocol or TCP so no adjustments in the company firewall have to be made to allow for specialized communication on a separate port. Also, because XML has a defined syntax and structure, it is easy for the programmer to parse either manually or using an XML parser. DTDs and Schema make it easy to communicate using XML because the DTD or Schema document can be shared which defines the structure of the XML document. With the structure in place, another business can receive XML documents easily because the structure for communication has already been defined by the Schema or DTD document. XML, the fundamental pillar of Web Services, makes it easy to exchange information.

Previous Contents Next
download sample file