In this note we are going to figure out what a DocBook file is,
how to create it, and how to edit it. By the end of this note
we will have a legal DocBook file.
A DocBook file is just a plain text file. It contains the text
you want to publish, along with additional markup that describes
the structure of your text. Basically, what you do when you write
a DocBook file is describe your document, so
that a program can publish it for you. Suppose you had a person
who was going to handle publishing your book. You’d have to tell
him or her the title and author of the book, and describe each
chapter. For each chapter, you’d give the publisher the title,
then each paragraph in order. Once in a while you’ll want to
deviate from straightforward text, perhaps making it more
emphatic. A standard way to do that is to
underline the text. The published document won’t actually have
underlining in it, that’s just an instruction you use to tell the
publisher to put the text into italics, or bold type, or whatever
method is going to be used to emphasize text.
A DocBook file is really just that set of instructions in a
standardized, computer-usable format. This isn’t a new
idea; publishing systems have worked this way for decades. At
first each publishing layout system had its own syntax for doing
the markup, then some non-proprietary syntaxes (like
troff and TeX)
were created.
In a way, DocBook is just one more syntax for markup, no better or
worse than those that came before it. But DocBook has one virtue
that they lack: it’s built on top of a general syntax called
XML (eXtensible Markup Language). Well, that’s
not quite true; DocBook is about five years older than XML, and
was originally built on top of an ancestor of XML called
SGML. However, we’re going to be sticking to
XML in these notes.
Why is building on top of XML a good thing? Because it means that
we can use all sorts of tools that were created for XML in general,
and apply them to DocBook. We can use XML editors to work with
our documents if we want, instead of having to stick with plain
text editors or create an editor just for our syntax. We can
build our publishing tools using general XML transformation tools,
rather than having to build them from scratch. And there’s a whole
bunch of syntactic issues that come up whenever you design a
computer language. For example, how do you publish text that
looks like a command in the language, without it being interpreted
as a command? How do you work with special characters that
your editor doesn’t support, such as the in Trnsport?
XML has dealt with all these issues in general, so DocBook doesn’t
have to reinvent those wheels.
So I guess we should start out by learning all about XML, right?
Well, wrong. XML consists of all sorts of technologies, some of
which are pretty complicated. People that start learning DocBook
by first learning XML are likely to drop out before long. There’s
too much foundation work before any payoff. Instead, we will
mention XML issues only as they related to DocBook.
Our First DocBook File
DocBook files are just plain text, so we can create them with
any text editor. In Windows, Notepad will work fine (select
Start/Run, and enter
notepad to launch it). The command-line
edit command also works fine. Later on, we
will look at using a programmer’s text editor such as
GWD Text Editor to
make editing files more efficient.
Start whichever text editor you want, and begin entering text.
DocBook files are XML files, so they should begin with an XML
declaration. This declaraction must
start in the first column of the first line, and at a minimum
contain the following:
<?xml version="1.0"?>
The next thing we need to put in our DocBook file is a
declaration that this is indeed a DocBook file, instead of some
other kind of XML file. We do that
with the lines:
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN" "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
The first line says that this document will be a book.
The second line says what defines a book for the
purposes of this document (in this case, the Document Type
Definition for the XML kind of English language DocBook,
version 4.2, as maintained by the OASIS standards body). The
third line tells any processing programs where to find the
document type definition referenced in line two. In this case,
that’s at a web address, but it could refer to a local file
instead.
Now we get to our document itself. In the DOCTYPE
declaration we said we were writing a book, so the
body of our document will be enclosed in book
tags:
<book> </book>
All our content goes between the <book>
opening tag and the </book> closing
tag. DocBook doesn’t only define what a book is,
though, it also defines article, set
(of books), and refentry (a Unix-style “man” page).
Just use any of those names in place of book to
create that type of document.
So far, we have a complete, but pretty useless, DocBook file:
<?xml version="1.0"?> <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN" "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"> <book> </book>
Still, it’s a legal DocBook file, and if we knew how to publish
such files, we could generate a (blank) PDF file from it, or
a (blank) web page, or a (blank) Windows help file.