Matra - An XML DTD Parser Utility

Introduction

Matra, the project, started as a simple dtdtree generator. The dtdtree is a simple visual representation of the dtd. I came up with the tree representation, in 1999, for the dtds I was working on, as a simple visual representation of the structure of the dtds - i.e. the relationships between the element types declared in the dtd. You can find details of the dtdtree structure in the article How to read the DTD Tree. Until 2000 I was generating these dtdtrees manually - i.e. until I wrote matra to automate that process. Ironically, one of the first dtdtrees I generated using Matra was for the XML Schema.

So how did the dtd parser come into being? Well, to generate the dtdtree I needed a dtd parser. But since I couldn't find one in java, I wrote one to parse the element type and attribute list declarations. Over the various versions (and years), I've refactored the code to separate out the dtdtree generator and the dtd parser. Matra continues to evolve this way to this day.

So, whats the use of a dtdtree?

I use the dtdtree for two purposes - to understand an existing dtd and to design new dtds.

Before I sit down to write the dtd, I write the structure of the dtd using the dtdtree. I refine the tree and then finally convert it to a dtd. [I do this part manually now, though I had written a few word macros to do that. I misplaced the macros somehow :)]

On the other hand, when I encounter a new or complex dtd, I use the dtdtree to visually understand the dtd structure. And while writing xml documents conformant to a dtd, I use it as a reference. Say, while writing a svg file I need to find out the attributes of the line element type - all I need to do is check the svg dtdtree instead of poring though the svg specs to determine that.

Matra can also be used to check a dtd. When dealing with large dtds its easy to miss an element type from the content models. I call these element types "hanging elements"/orphan elements, for lack of a better term. These elements do not have a content model and do not belong to the content models of any other element types. Such element types are easy to track using the dtdtree. An example would be the MARC dtds which has a few hanging elements - see the dtdtrees for MARC's Authority/Classification Record and Bibliographic/Holdings/Community Information Record dtds. [Update: I notified MARC about the problem (in the Feb 19th version of the dtds) and they have fixed it in the current versions of the dtds. The dtdtrees for the modified dtds are here - MARC Authority/Classification Record and MARC Bibliographic/Holdings/Community Information Record dtds. ]

Does it represent the true/complete structure of the dtd?

The dtdtree represents a partial view of the dtd. You cannot determine the true content model by viewing the dtdtree.

For e.g., looking at the dtdtree below, one cannot determine if the content model of a is (b | c) or (b, c). We just know that both b and c are valid children of a. [I plan to modify the dtdtree to distinguish the two models in a later version.]


root (version=="1.0")
|
|- a
|  |
|  |- b
|  |
|  +- c
|
+- d

For more information on what is represented in a dtdtree read the article How to read the DTD Tree

Class Diagrams

I generated the class diagrams using Omondo EclipseUML - exported it in the svg format. I modified the svg output before placing it online. You'll need a svg viewer plugin (e.g., Adobe SVG Viewer) to view the Matra Class Diagrams

Where to find Matra

The Matra project is located at http://matra.sourceforge.net.

You can download the latest release from the Sourceforge download site.

Or you could browse the cvs repository.

The javadoc documentation for the Matra classes is present at the Matra home page.

Feedback

Feedback is definetely welcome on any aspects of the Matra project. If you find any bugs in it, please report the bug and you can file Feature requests too at its sourceforge site.

Where else to find Matra

I have registered the Matra project at these sites -