What is XML?
I've described how to
write XML. But in order to move on to the next level, we need take a step back
and figure out what XML actually is.
XML is a language for
writing documents that can hold any arbitrary data. But you can also think of
XML as a language for writing languages. You can write a document that's just
in XML, as in the "EmployeesList" example above. But you can also define a tag-set
and rules, to create a language with a specific purpose, such as a language
for describing a chemical compound, or a magazine article, or the user interface
components of a program. Then you can write a document of that particular kind -
such as a Chemical Markup Language (CML) document, or an XML User-Interface
Language (XUL) document.
Let. s compare XML with
HTML. HTML is designed for one particular purpose: to format data for presentation
to a human. HTML defines which tags are allowed, what they mean, and which tags
can go inside which other tags. Similarly, CML was designed for one particular
purpose: to describe chemical compounds. It also has a defined set of tags,
rules, and meanings.
XML, by contrast, is the
parent of
CML, XUL, and lots of other special-purpose languages. XML itself does not define
any tags or rules or meanings. You can use any tags you want - or, you
can define a particular language based on XML, like CML or XUL, which has tags,
rules, and meanings that you specify. The file that stores these tags and rules
is called a DTD (Document
Type Definition). For example, the CML DTD defines the tags and rules that a
CML document must use. (The DTD doesn't explicitly define the meaning of the
tags - you have to document these (in English or another human language).
You can put this documentation in comments in your DTD, though.)
You can think of an analogy
with files on your hard drive. You can store any arbitrary bytes you want in
a file, just as you can write an XML document with any tags and any structure
you like. However, if you want to make a file on your hard drive a particular
kind of
file, such as a GIF, JPEG, or a Microsoft Word document, you have to make sure
that its bytes follow the rules of the particular kind of document you'e trying
to write. Similarly in XML, if you want to write a CML document, you must make
sure that the document follows the rules for CML.
DTD overview
A correctly-written XML
document is called "well-formed". It's very easy to write a well-formed XML
document: start tags must match end tags, tags must be properly nested, and
so on. All XML documents must be well-formed in order for them to work correctly.
But there's another level
that an XML document can aspire to: it can be "valid." This means that not only
is the document a well-formed XML document, but it also follow the rules for
its particular kind of
XML document. For example, you may want to write a document that follows the
rules of the "XML User-Interface Language" (XUL),
created by the Mozilla/Netscape people. (XUL is used to specify a user interface;
it knows about windows, menus, and the like.)
In order to be valid,
an XML document must specify which DTD it follows. In this case, it's the XUL
DTD. This DTD specifies that a XUL document talks about "windows", and that
windows can contain "menubars", and menubars contain "menus", etc. The software
that reads the XML document can then validate
it against the specified DTD. If it finds, for example, that a menubar contains
a window, or that a "hippopotamus" tag shows up anywhere, then the document
is not a valid XUL document (even though it might be well-formed).
DTD Elements
SilverStream has provided
DTDs which specify the XML input for each SilverCmd. Here's the "itemlist.dtd,"
which is the DTD for several SilverCmds, including Delete, Build, and Publish.
<!-- This
is the itemlist DTD -->
<!ELEMENT
obj_ItemList (Items)>
<!ELEMENT
Items (el*)>
<!ATTLIST
Items type (StringArray) "StringArray">
<!ELEMENT
el (#PCDATA)>
To help you follow along,
here's a sample XML file that conforms to this DTD:
<!-- This
is a sample itemlist document -->
<obj_ItemList>
<Items type="StringArray">
<el>Forms/test1</el>
<el>Objects/com/myco/pkg1/MyObject</el>
</Items>
</obj_ItemList>
First, notice that comments
are the same in DTDs as in XML documents. (That's about all that's the same.)
Element types are declared
in the DTD by the <!ELEMENT>
tags. The first line declares that there is an element
type named "obj_ItemList," which must contain exactly one "Items"
tag, and nothing else. In turn, "Items" may contain zero or more
"el" tags, and nothing else.
By combining symbols (*
+ , | ?) a DTD can express many different types of structures:
A*
The item 'A' may occur zero or more times - it's optional, and can appear
more than once.
A+
The item 'A' may occur one or more times - it's required, and can appear
more than once.
A?
The item 'A' may occur zero or one times - it's optional, and can't appear
more than once.
A
The item 'A' must occur exactly once - it's required, and can't appear
more than once.
A, B The
items 'A' and 'B' must occur in the order specified.
A | B Either
item 'A' or 'B' may occur at this position.
Here are some other sample
element declarations, to show you all the possibilities:
<!ELEMENT
A (B, C, D?)>
Element A
must contain a B, followed by a C, optionally followed by a D. That is, either
BCD or BC.
<!ELEMENT
E (F*, G+, H)>
Element E
must contain zero or more Fs, followed by one or more Gs, followed by exactly
one H. For example, FFFFGGGGH, GH, or FGH. However, FH is not valid.
<!ELEMENT
I (J | K | (L, M?)>
Element I
can contain either a J, or a K, or an L optionally followed by an M. That is,
either J, K, L, or LM. However, JK is not valid.
Jumping down to the last
line of our example, it specifies that the "el" element contains text data ("parsed
character data"). Instead of containing other elements, the "el" element contains
text (such as "Forms/test1").
DTD Attributes
The other type of declaration
in our example is the <!ATTLIST>
declaration, which describes the attributes that
an element can have. Here' s the declaration again:
<!ATTLIST
Items type (StringArray) "StringArray">
This declares an attribute
named "type," which applies to the element named "Items." It says that the value
of this attribute can only be "StringArray," and furthermore that the default
value is also "StringArray." (In other words, the "type" of an "Items" element
is always "StringArray".)
Here's how ATTLIST works:
<!ATTLIST
[element-name]
[attribute-name]
[CDATA or enumeration]
[default value
or #REQUIRED or #IMPLIED]>
The name of the element
that the attribute belongs to is listed first, followed by the name of the attribute
we. re describing. The next parameter describes the allowed values of the attribute.
this can be either the keyword CDATA,
which means that any text string is allowed, or an enumeration
of allowed values, like (apple
| orange | pear). The final parameter can be one of
three things: a default value (for example "pear");
the keyword #REQUIRED,
which means that there's no default value, and that
the XML file must provide a value for this attribute; or the keyword #IMPLIED,
which means that there's no default value, and
it's OK if the XML file doesn. t provide a value for this attribute. (The word
"implied" is kind of confusing here, perhaps "optional" would have been clearer.)
Here are some examples,
with their meanings:
The element
named "Items" contains an attribute called "type". The attribute's value can
be any text ("character data"), and it has a default value of "Ohio." (If the
attribute is omitted in the XML document, the value of "Ohio" is assumed.)
The element
named "Items" contains an attribute called "type". The attribute. s value can
be either "red", "green", or "blue", and it is required. (Because this attribute
is required, there is no default value. The attribute must always be specified
and given a value in the XML document.)
The element
named "Items" contains an attribute called "type". The attribute. s value can
be either "table", "chair", or "lamp". (#IMPLIED specifies that the attribute
not required and there's no default value. If the attribute is omitted in the
XML, then the element "Items" simply has no value for attribute "type".)
Unlike element declarations,
attributes can only use the | symbol ("or"). They cannot use an of (+ ? , *).
To put it another way, an attribute value may either be CDATA (meaning any text),
or an enumeration of possible values.
You might wonder, why
is it called "ATTLIST"? Because a single ATTLIST declaration can describe more
than one attribute of an element. For example, the following two snippets are
equivalent:
<!ATTLIST
Item name CDATA #REQUIRED
size (small | medium
| large) "medium">
and
<!ATTLIST
Item name CDATA #REQUIRED>
<!ATTLIST
Item size (small
| medium | large) "medium">
Also, note that like XML,
DTDs are case-sensitive. "ELEMENT", "ATTLIST", "CDATA", etc. must be all upper-case.
More on DTDs and Validation
As I mentioned earlier,
XML is designed so it can be used with or without a DTD. But this can get a
bit confusing. An XML document must have a DTD if you want to validate it. But
a DTD can also change the meaning (value) of the XML document, with or without
validation.
For example, suppose you
define an attribute "size" with the default value "medium." If an element in
the XML document doesn't contain this attribute explicitly, it still contains
the implied value "medium" - as long as the DTD is present. If the DTD
is not present, then the element has no value at all for this attribute.
You. ll notice this in
SilverStream XML documents. For example, as we saw earlier, the itemlist DTD
defines an attribute named "type" for the element "Items", and it specifies
that the default value (and in fact, the only allowed value) of that attribute
is "StringArray". But if you look at delete_sample.xml (which follows the itemlist
DTD), you. ll notice that it explicitly gives a value of "StringArray" for the
"type" attribute. It does this in case the XML file is used without the DTD.
How does an XML document
specify that it uses a DTD? By adding a line near the top of the file (after
the <?xml?>
section, if there is one), like
this:
<!DOCTYPE
obj_ItemList SYSTEM c:\SilverStream\DTDs\itemlist.dtd>
The second part ("obj_ItemList")
must match the topmost (root) element in the XML file. Basically, the DOCTYPE
declaration is saying, "This XML document contains an element called 'obj_ItemList'.
You can find the definition of the 'obj_ItemList' element (and all of the
elements it contains) in the following DTD file."
Why do the sample XML
files in that SilverStream provides in version 3.0 lack a DOCTYPE definition?
There are a few reasons, but the main reason is to cut down on confusion over
the location of the DTD. The DTD location can be either an absolute or a relative
path to the DTD file. We can't put an absolute path in the sample XML files,
because we don't know where you're going to install SilverStream. We could put
a relative path (just "itemlist.dtd"), which would work fine for the examples,
since they're in the same directory as the DTD. But as soon as someone modified
a sample for their own use, and put it in a new location, it would fail. (We
also could have used a URL to a location on our Web site, instead of a filename,
but then you'd have to be connected to the Internet to use the XML file.)
So we decided to leave
the DOCTYPE out, and simply add a comment to each sample XML file that says
which DTD it uses. This means that the XML document will not be validated when
it's edited or used, unless you add a DOCTYPE statement to it. This is usually
fine when an XML file is used (parsed), but if you use an XML-aware editor to
edit your files (instead of a plain text editor), you might want to add the
DOCTYPE statement. That way, the XML editor can check to make sure that the
XML you're writing is valid, according to that DTD.
Note that, even if the
XML document has a DTD, the software that reads it doesn't necessarily have
to parse it. It's up to the particular program that's reading the XML. If it
tries to validate the XML, and the XML doesn't specify a DTD, it will cause
an error. (SilverCmd doesn't perform XML validation on its inputs, although
of course it will give you an error if some required piece of information is
missing.)
|