XML Syntax Standards

The body of standards defining XML is actually quite large, but only two core specifications directly concern users of Serena XML. These are the XML Version 1.0 syntax specification and the XML Schema specification. These and other XML specifications are established by the World Wide Web Consortium (W3C) and are published online at http://www.w3c.org.

To use the Serena XML programming interface to XML Services, you first need a basic familiarity with this core XML syntax.

XML Tag Names

Programmers familiar with Web markup will note that XML syntax resembles HTML syntax. Like HTML, XML makes use of tags (of the form <tag>) and attributes (of the form name="value"). Like HTML tags, XML tags delimit units of content and identify that content by tag name. Generally, XML statements look something like this:

<tag attribute="value">data value or structured content</tag>

In standard-compliant XML, tag and attribute names are case-sensitive — that is, <tag> is not the same as <Tag>. Tag and attribute names may include alphanumeric characters, hyphens, underscores, and periods. Other punctuation marks are generally prohibited, since they may have special meanings in XML.

XML Data Elements

Functionally, XML tags mark data elements in text. Data elements are of two types:

Simple data elements contain basic data types such as integers, dotted decimal numbers, dates, times, fixed-length or variable-length character strings, or the like. Simple data elements cannot be decomposed into subordinate XML data elements; they are, in that sense, “atomic” units of data. Such a tag might look something like this:

    <package>ACCT000025</package>

Complex data elements contain a data structure composed of one or more subordinate XML data elements, each delimited by its own pair of subtags within the main tag pair. The subordinate elements may themselves be either simple or complex. Complex tags may be built up from successively simpler tags to form a hierarchical tree structure. A complex tag structure with just one level of subtags might look something like this:

    <response>
        <statusMessage>CMN8700I - LIST Package service completed</statusMessage>
        <statusReturnCode>00</statusReturnCode>
        <statusReasonCode>8700</statusReasonCode> 
    </response>

The contents of an actual data element must conform to whatever data validation restrictions are imposed by the tag definition. For simple data elements, such restrictions would include data type, data pattern, allowable value range, and/or membership in a predefined value list.

For complex data elements, the data structure must also conform to the tag definition. Restrictions at this level include allowable subtags, subtag sequencing, mutually exclusive subtag choices, and mandatory subtag inclusion. Restrictions on the minimum and maximum number of consecutive tag repetitions, if any, must also be met.

XML Tag Attributes

Attributes qualify the manner in which a tag is used or processed. One tag may have multiple attributes, so each attribute must be explicitly named. The value assigned to an attribute must appear in double quotes and must be a simple data type — such as a date, a character string, or an integer.

Attributes are not (or should not be) used to hold application data. That’s what data elements — i.e., tags and subtags — are for! Attributes are used to:

Identify the subtype of a tag that is complex enough to have alternative formats, substructures, or validation requirements.
Identify a particular tag instance to distinguish it uniquely from other instances of use.
Set a flag for the target application to use when choosing among several data interpretations or processing options.

In the case of Serena XML, attributes are used primarily to identify which of many alternative data structures is intended when a particular tag is used. Depending on the value of the attribute, the allowed subtag content and sequence may vary.

Comments

In addition to tags and attributes, standard-compliant XML allows comments. XML comments, like those in HTML, begin with <!-- and end with --\>. Multi-line comments are permitted. The end-of-comment delimiter must be preceded by a blank or be the first item on a new line. Double hyphens cannot appear anywhere within the comment body.

An XML comment might look something like this:

<!-- This is a comment, line 1.
     This is a comment, line 2. -->

Character Entities

XML relies on reserved characters (e.g., angle brackets and double quotes) to delimit language-specific constructs (e.g., tags and attribute values). If you include one of XML’s reserved characters in your tag data or in attribute values, the XML parser will attempt to treat it as a reserved character — e.g., as the opening angle bracket for a tag name — with unpredictable results. To get around this difficulty, XML provides a mechanism for escaping these characters from the special treatment they normally receive, so that they can be included in ordinary data. This is achieved using character entity codes.

Character entity codes begin with an ampersand (&) and end with a semicolon (;). Between these delimiters is a character entity name that identifies the character represented by the entity code. Numeric character entity codes are also allowed in generic XML; however, the XML Services parser does not support numeric character entities at this time.

Five character entities have predefined names in XML. They are listed in the following table:

Exhibit 3-1. XML Character Entities

Entity Code	Character Represented
`<`	Less-than symbol or opening angle bracket (<)
`>`	Greater-than symbol or closing angle bracket (>)
`&`	Ampersand (&)
`"`	Straight, double quotation mark (")
`'`	Apostrophe or straight, single quotation mark ( ‘ )

For example, you might use ampersands in the names of program modules that you mention in your package implementation instructions. Simply typing an ampersand, in most cases, would generate a parser error. To insert the ampersand without generating an error, use the & character entity where you would normally type an ampersand. For example:

<packageImplInst>Requires prior execution of USR&amp;001.</packageImplInst>

XML parsers vary in their sensitivity to the occurrence of reserved characters in data. You can usually get away with using a regular apostrophe ( ‘ ) instead of the \' character entity in data strings, for example. But you should always escape any ampersands or angle brackets in your data strings, and escape all special characters in attribute values.

Tip

Use character entities instead of special characters in data or attribute values.

XML Documents as Complex Data Elements

XML documents as a whole are themselves defined as complex data elements. The start and end of the document is identified by a root tag. Nested within the root tag are the subtags that make up the content of an instance document — that is, an actual XML document containing data. There is one and only one root element in an XML document, and the overall structure of the document is always a hierarchical tree. Data structures that loop back upon themselves are forbidden anywhere in an XML document.

The structure of an XML document and its component data elements is defined externally in one of two types of files: a Document Type Definition (DTD) or an XML schema. XML Services uses the schema approach, because schemas support more sophisticated and rigorous data typing than DTDs. XML documents can be validated against the relevant schema by an XML parser to ensure data validity.

Well-Formed Documents

The elements of XML syntax must be combined in a way that conforms to XML rules for a well-formed document. If XML Services receives XML input that is not well-formed, it will return an error and make no attempt to process the service request.

XML rules for a well-formed document mirror those in the latest version of HTML. Unlike past practice with HTML, however, the rules for XML are strictly enforced. In particular:

Only one root tag is allowed in a document. A well-formed XML document must map to an n-way tree data structure. Such a tree has exactly one root node. The root node may have multiple branches to lower-level nodes, each of which may also branch similarly to any depth. Nodes in the tree structure correspond to tags in the XML syntax.
Every opening tag must be matched by a closing tag. Closing tags have the same tag name as the opening tag, preceded by a forward slash. For example, the opening tag <tag> must be paired with the closing tag </tag>.
Standalone tags must be self-closing. Standalone tags are defined to mark points in a document rather than contain data; they are explicitly declared to be “empty” in the XML schema. Since it contains no data, the standalone opening tag is also the closing tag. As such, it includes a final slash just before the ending angle bracket. For example:
```
<tagname/>
```
...
Attribute values must be enclosed in double quotes. The quotes are never optional. For example:
```
<tag attribute="value"\>
```
...
Nested tags must be opened and closed in the proper order. The rules for pairing the opening and closing tags in a nested data structure are the same as those for pairing the opening and closing parentheses in a mathematical expression. The first tag opened must be the last tag closed, the next tag opened must be the next-to-last tag closed, and the last tag opened must be the first tag closed. Visually:
```
<firstTag><nextTag><lastTag> . . . . . </lastTag></nextTag></firstTag>
```
...
XML comments are comments — and nothing else. The frequent HTML practice of embedding non-markup processing instructions in comments is not allowed in XML. Instead, non-XML processing instructions and other non-XML declarations should precede the root tag in the document file.

...

Strict enforcement of these syntax rules prevents ambiguity when interpreting XML documents. This is vital in XML, because general-purpose XML parsers, unlike their HTML counterparts, can’t rely on the names of tags to help resolve ambiguity.

For example, if you see the tag <p> in an HTML file, you can assume it marks a paragraph. This works because HTML predefines what each tag and attribute name means in advance and all HTML parsers build in at least some of that knowledge.

However, in XML, you cannot assume anything about the tag <p>. XML leaves the interpretation of document markup and document content completely to the application that reads it. Tag meaning is defined externally to the document in either a DTD specification or an XML schema specification.