Charles Engelke's Blog

December 7, 2003

Entering Some Content

Filed under: docbook — Charles Engelke @ 7:06 pm

Okay, we’ve created a legal DocBook file. Now let’s create a
potentially useful one, with content between <book>
and </book>. But we can’t just put
raw text there, or it won’t be a legal DocBook file, and we
won’t be able to process it.

What can we put inside the book
element? Where can we find the rules? Our DOCTYPE
declaration at the beginning of the file tells us that: the
rules are at
http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd.
Open that link in your browser (you might have to download it
and open it in a text editor instead, because some browsers see
the file type and don’t realize it’s plain text they could
just display). And, it just looks like gibberish. That’s
partly because of the XML syntax it’s written in, and partly
because most of the rules are other files that are included
by reference. This isn’t going to help.

Lucky for us there are DocBook references available that are
written for people to read, not programs. The canonical
reference for DocBook has to be
DocBook:
The Definitive Guide
. The book is even available for free
online,
in several different formats. However, that edition isn’t
up to date. A draft of the next edition is
available
as well. In fact, you can download it in
Microsoft
HTML Help
format, which I find very convenient for
reference.

Whichever version you use, you can look up the book
element in the DocBook Element Reference section and see its
“content model”:

book ::=
((title,subtitle?,titleabbrev?)?,
bookinfo?,
(dedication|toc|lot|glossary|bibliography|preface|chapter|
reference|part|article|appendix|index|setindex|colophon)*)

This is still technical, but much more readable. The first line
says that this is the syntactic definition of a
book. The second line shows content
that is completely optional (that’s what the ? at the end means).
However, if we choose to include it, the second line says the
content must start with a title element, then
can have an optional subtitle element, and
then an optional titleabbrev element.

The next line says that we can have a bookinfo
element next, or skip it if we wish. Finally, the last two lines
say we can have zero or more (the meaning of the * at the end)
elements, each of which can be a dedication,
toc, lot,
glossary, bibliography,
preface, chapter,
reference, part,
article, appendix,
index, setindex,
or colophon.

So everything that can go inside a book
element is optional, which is why the empty DocBook from our
last entry was legal. Now let’s start filling it in.

We could start with a title element,
but I peeked ahead and saw that the bookinfo
element can contain a title. It makes
more sense to me to put all the information about
the book in an element named bookinfo,
so I’m going to start with that. I have a lot of choices
next, but since this is a simple book, I’ll just have a sequence
of a few chapter elements. My basic
DocBook file looks like this:

<?xml version="1.0"?>
<!DOCTYPE book
PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
<book>

<bookinfo>
</bookinfo>

<chapter>
</chapter>

<chapter>
</chapter>

</book>

Since the bookinfo and chapter
elements are contained in the book
element, programmers like me would tend to indent them to show
that relationship. However, after a few tries, I found that
all my real content was pushed way over to the right in my
text editor, because of so many levels of indentation. So I’m
not going to start indenting until we get to the next level of
content.

Now we need to see what we can put in the bookinfo
element, and what can go in the chapter
elements. Returning to The Definitive
Guide
, we see that
the
contents
of bookinfo elements
are one or more (the + sign at the end) of a sequence of any
of a few dozen element types. Scanning around, I ended up
deciding that my simple document would have title,
subtitle, author,
and copyright elements, as below:

<bookinfo>
<title></title>
<subtitle></subtitle>
<author></author>
<copyright></copyright>
</bookinfo>

But what goes inside those elements? Will
this recursive search never end? Well, it eventually does.
The title and subtitle
elements are allowed to contain #PCDATA.
That really just means plain text. It stands for
“parsed character data”, which means it can contain any text,
but that the parser is going to look for the special characters
< and & as the beginning of enclosed elements or entities,
so you’d better not include them.

The author and copyright
elements have subelements that must be included, too. You can
look those up yourselves. Cutting to the chase, the
bookinfo element could look like the
following:

<bookinfo>
<title>My First DocBook</title>
<subtitle>Is It Worth the Hassle?</subtitle>
<author>
<firstname>Charles</firstname>
<surname>Engelke</surname>
</author>
<copyright>
<year>2003</year>
<holder>Info Tech, Inc.</holder>
</copyright>
</bookinfo>

You’ll note that I added white space to make it easier to see
the structure. That’s almost always okay; the extra white space
will be ignored. We’ll worry about exceptions much later on!

Finally, what goes into a chapter? The
documentation
again gives us far more choices than we really want at this
point. Working through all the options, I ended up saying that,
for my first DocBook file, a chapter
will start with a title (which is optional)
and then have a sequence of para (paragraph)
elements. Each of those can just contain text, so we get a
file like the following:

<?xml version="1.0"?>
<!DOCTYPE book
PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
<book>

<bookinfo>
<title>My First DocBook</title>
<subtitle>Is It Worth the Hassle?</subtitle>
<author>
<firstname>Charles</firstname>
<surname>Engelke</surname>
</author>
<copyright>
<year>2003</year>
<holder>Info Tech, Inc.</holder>
</copyright>
</bookinfo>

<chapter>
<title>Why DocBook?</title>

<para>
Well, why not DocBook?  It seems like a structured approach to
developing and especially maintaining documents.  It's particularly
well suited to technical documents.
</para>

<para>
Really, would I lie to you?
</para>
</chapter>

<chapter>
<title>Creating a DocBook File</title>

<para>
It's easy to create a DocBook file.  Just use a text editor.
</para>

<para>
Oh, you want to know what you type into the text editor?  That's
beyond the scope of this chapter.
</para>
</chapter>

</book>

I used DocBook publishing tools to render that sample file
in html,
PDF,
and Microsoft
HTML Help
. I even created a Microsoft
Rich
Text Format
version that you can open in Word and edit.

It’s taken us a while to get to this point, but we finally
know how to create a fairly simple kind of DocBook file. There’s
a lot more we can do with DocBook, but I’m leaving that for
later. Next time, we’re going to start setting up
our DocBook workbench so that we can easily publish the files
we write.

Advertisements

Blog at WordPress.com.

%d bloggers like this: