Charles Engelke's Blog

December 4, 2003

XML Basics

Filed under: docbook — Charles Engelke @ 7:33 pm

Before we get into the guts of what a DocBook book
is, let’s cover some XML fundamentals. It won’t take long.

First, as you’ve seen, an XML file begins with an XML declaration
and then a DOCTYPE declaration. Actually, both those declarations
are optional in general, but we need to use them for DocBook
XML files. The contents of the XML file must be enclosed in
a pair of matching tags. In this case, those tags are
<book> and </book>.
If the DOCTYPE had said we were writing an article,
then the enclosing tags would be <article>
and </article>.

The content between the first tag and its eventual match consists
of plain text, elements (which are plain text
enclosed in matching tags), and entities (which
are shorthand ways to write special characters or plug in a
block of text). Tags match when they have
the same tag name (as in
book in our file), and the closing tab
has a slash before the tag name.

If matching tags enclose no text, the pair of tags can be
abbreviated as one tag with a slash after
the tag name. For example, our
<book></book> empty book could
have just been written as <book/>.
This is pretty rare in DocBook files, but you might see it

Element names are case-sensitive, so if you try to open an
element with <chapter> but close it
with </Chapter>, you’ve made an error
and DocBook processors won’t accept it.

Opening tags can have attributes defined within
them. Attributes are entered by typing the name, then
=, then an attribute value enclosed in
quotation marks. For example, we could have started our
book with the tag
<book status="draft">. What
effect the attribute values have often depends on the processor
we use to publish the document. For example, labelling the
book as a draft could result in a watermark of the word
“draft” on every page of a PDF rendering of the document.

Entities have a lot of uses, but we’re mostly going to use them
for special characters for now. Entities begin with an
ampersand (&), then the entity name, then a semicolon.
For example, the bullet character in Trns•port is entered
as the entity &bull; and the entire word is entered as

Finally, we have special characters. The only ones we need to
care about (at least for now, maybe forever) are the tag
delimiters < and >, and the entity flag &. If we want
these characters in our text, we need to enter them as the
entities &lt;, &gt;, and &amp;.

Okay, all the boring fundamentals are over. Next time, we
write and publish a short book!


Blog at

%d bloggers like this: