How to read the DTD Tree

Element Type Relationships

Each node in a DTD Tree represents an element type. The branches tieing the various nodes represents the parent-child relationship between the element types. Lets consider a simple dtd -
<!ELEMENT root (a, b)>
<!ELEMENT a (#PCDATA)>
<!ELEMENT b (#PCDATA)>
The above dtd will be represented in a dtdtree as -
root
|
|- a
|
+- b
This visually represents the content of element type root - that it has two mandatory children - a and b. None of which have any child elements. If element type a has child elements, they are presented right below it.
<!ELEMENT root (a, b)>
<!ELEMENT a (c, d)>
<!ELEMENT b (#PCDATA)>
<!ELEMENT d (#PCDATA)>
<!ELEMENT c (#PCDATA)>
dtdtree -
root
|
|- a
|  |
|  |- c
|  |
|  +- d
|
+- b

Cardinality

The cardinality of the element within the parent element's content model is represented by a single char following the element name. The char could be one of four possible values -
  1. none - mandatory
  2. ? - optional
  3. + - one or more
  4. * - zero or more
If b was optional in the content model for root -
<!ELEMENT root (a, b?)>
<!ELEMENT a (#PCDATA)>
<!ELEMENT b (#PCDATA)>
Then the tree would be -
root
|
|- a
|
+- b?

Repeated Element Types

Its quite possible that an element type occurs in the content model of more than one element type. consider this dtd -
<!ELEMENT root (a, b)>
<!ELEMENT a (c, d)>
<!ELEMENT c (#PCDATA)>
<!ELEMENT d (#PCDATA)>
<!ELEMENT b (a)>
This will be represented as -
root
|
|- a
|  |
|  |- c
|  |
|  +- d
|
+- b?
   |
   +- a  -->
The "-->" after the a under the element b indicates that the definition of a has already been represented in the tree. This behaviour is optional though. The user/tool may represent the structure of the repeated element type again - provided the structures match exactly. So the above dtd could also be represented as -
root
|
|- a
|  |
|  |- c
|  |
|  +- d
|
+- b?
   |
   +- a
      |
      |- c
      |
      +- d
The "-->" notation is used only when the repeated content type has child elements. So the dtd -
<!ELEMENT root (a, b)>
<!ELEMENT a (c, d)>
<!ELEMENT c (#PCDATA)>
<!ELEMENT d (#PCDATA)>
<!ELEMENT b (c)>
is represented as -
root
|
|- a
|  |
|  |- c
|  |
|  +- d
|
+- b
   |
   +- c

Recursive Inclusion

Its possible that the content model of a child element type has one of its ancestors. This will lead to a tree and extends infinitely. Such an inclusion is denoted by a "**" next to the element type node. For e.g., consider the dtd -
<!ELEMENT root (a, b)>
<!ELEMENT a (c, d)>
<!ELEMENT c (#PCDATA)>
<!ELEMENT d (a?, b)>
<!ELEMENT b (#PCDATA)>
The dtdtree for this dtd will be -
root
|
|- a
|  |
|  |- c
|  |
|  +- d
|     |
|     |- a?  **
|     |
|     +- b
|
+- b?
Of course, if a wasn't optional in element d's content model, then no valid xml document could be created for the dtd!

PCDATA vs EMPTY Content model

There are two types of content models that have no children - #PCDATA and EMPTY. [Note: The ANY content model does imply that the element type has children.] The EMPTY content model is distinguished in the tree by having a '=' char holding it onto the tree instead of a '-'. Consider the dtd -
<!ELEMENT root (a, b)>
<!ELEMENT a (#PCDATA)>
<!ELEMENT b EMPTY>
This dtd will be represented as -
root
|
|- a
|
+= b

ANY Content model

An ANY content model specifies that the element type can have any of the declared element types as its children. The dtdtree in this case enumerates all the possible children. For the dtd -
<!ELEMENT root (a, b?)>
<!ELEMENT a (c, d?)>
<!ELEMENT b EMPTY>
<!ELEMENT d ANY>
<!ELEMENT c (#PCDATA)>
the dtdtree will be
root
|
|- a
|  |
|  |- c
|  |
|  +- d?
|     |
|     |= b
|     |
|     |- a  **
|     |
|     |- root  **
|     |
|     |- d  **
|     |
|     +- c
|
+= b?

Attribute Lists

The attributes for an element are represented within parenthesis next to the element node. The dtd -
<!ELEMENT root (a, b)>
<!ELEMENT a (#PCDATA)>
<!ATTLIST a
	a1	CDATA	#REQUIRED
>
<!ELEMENT b (#PCDATA)>
will be represented by the dtdtree -
root
|
|- a (a1)
|
+- b

Attribute Cardinality

An attribute could have one of four cardinalities.
  1. Required - the attribute could be mandatory
  2. Implied - the attribute could be optional
  3. Fixed - the attribute could have a fixed value
  4. Default - the attribute could be optionally specified; if omitted it will have the specified default value
The cardinality of attributes is depicted as follows: For required attributes, there is no embellishments. For implied attributes, a '?' follows the attribute name. For Fixed attributes, the attribute name is followed by a '==' followed by the fixed value. For default attributes, the attribute name is followed by a '=' followed by the default value. [Note that its a single equals sign, as opposed to double equals sign for the fixed attribute] So, the dtd -
<!ELEMENT root (a, b)>
<!ATTLIST root
	version	CDATA	#FIXED "1.0"
>
<!ELEMENT a (#PCDATA)>
<!ATTLIST a
	a1	CDATA	#REQUIRED
	a2	CDATA	#IMPLIED
	a3	CDATA	"default"
>
<!ELEMENT b (#PCDATA)>
will be represented by the dtdtree -
root (version=="1.0")
|
|- a (a1, a2?, a3="default")
|
+- b

About Matra

Matra is a java based XML DTD parser utility. It is available from http://matra.sourceforge.net. It is freely available under the open-source MPL 1.1 license.