Charles Engelke's Blog

December 19, 2003

DocBook to HTML

Filed under: docbook — Charles Engelke @ 1:10 pm

It turns out that DocBook conversions have been written, though not
exactly as separate traditional programs. Instead, DocBook
developers have set up a joint
open-source project
at SourceForge with lots of DocBook conversion tools. Instead of writing these
tools in traditional computer languages, they’ve built them on
top of general XML conversion engines (and older SGML conversion
platforms, but we’re sticking with XML). The conversion engines
being used are called XSLT processors.
An XSLT (eXtensible Style Language Transformations) processor
takes an XML file as input, along with an XSLT style
sheet
(which is another XML file in a particular
syntax for XSL), and produces as output whatever the style
sheet specifies.

We’re going to want to perform these
transformations, so we’re going to at least need the XSLT style
sheets and an XSLT processor. We’ll start by getting the style
sheets and putting them
in a C:\Program Files\docbook directory:

Procedure 3.1. Initial docbook Directory Setup.

  1. Create a folder called docbook in
    C:\Program Files.

  2. Download the latest stable distribution of the
    DocBook
    XSL style sheets
    to a temporary directory. You’ll
    want the ZIP file
    version for Windows machines. The
    tar.gz version has
    the exact same contents, but would be harder to use on a
    typical Windows PC.

  3. Extract the files in the ZIP
    file you just downloaded to your
    C:\Program Files\docbook folder. This
    will create a folder named something like
    C:\Program Files\docbook\docbook-xsl-1.64.1
    (the latest version when I wrote this note was 1.64.1).

  4. Rename the newly created folder to xsl.
    That way we can always just refer to the
    C:\Program Files\docbook\xsl folder
    without worrying about exactly which version we have.

You now have the style sheets available to use. If you look at
the folder you just created, you’ll see that it has a fairly complex
structure:

Directory of C:\Program Files\docbook\xsl

12/19/2003  12:18 PM    <DIR>          .
12/19/2003  12:18 PM    <DIR>          ..
11/02/1999  09:18 AM               240 BUGS
12/19/2003  09:04 AM             7,871 ChangeLog
12/19/2003  12:18 PM    <DIR>          common
12/19/2003  12:18 PM    <DIR>          doc
12/19/2003  12:18 PM    <DIR>          docsrc
12/19/2003  12:18 PM    <DIR>          eclipse
12/19/2003  12:18 PM    <DIR>          extensions
12/19/2003  12:18 PM    <DIR>          fo
12/19/2003  12:18 PM    <DIR>          html
12/19/2003  12:18 PM    <DIR>          htmlhelp
12/19/2003  12:18 PM    <DIR>          images
12/19/2003  12:18 PM    <DIR>          javahelp
12/19/2003  12:18 PM    <DIR>          lib
12/19/2003  12:18 PM    <DIR>          manpages
12/19/2003  12:18 PM    <DIR>          params
12/19/2003  12:18 PM    <DIR>          profiling
10/23/2002  07:00 AM             3,803 README
12/19/2003  09:00 AM            44,884 RELEASE-NOTES.html
12/19/2003  08:50 AM            33,104 RELEASE-NOTES.xml
12/19/2003  12:18 PM    <DIR>          template
04/02/2001  08:44 AM                70 TODO
12/19/2003  12:18 PM    <DIR>          tools
12/17/2003  09:26 AM             2,900 VERSION
12/19/2003  09:06 AM            12,004 WhatsNew
12/19/2003  12:18 PM    <DIR>          xhtml
8 File(s)        104,876 bytes

The stylesheet to convert DocBook to HTML is in the
html folder. Other subfolders have stylesheets
for other target formats, or to help in using those stylesheets.

In order to use a stylesheet, you’ll also need an XSLT processor.
There are a lot to choose from, and I’ve tried several. For example,
under Windows, you might want to use
MSXSL.
It’s described and available for download at that link (until
Microsoft moves it; google msxsl site:microsoft.com
to find it again).

Once you’ve downloaded and installed MSXSL,
you can convert your DocBook files to HTML with a simple command
(the command is all on one line. I’ve used a \ to show
where the contents of the following line really should be; don’t
type the trailing \ at the end of the line):

msxsl myfile.docbook \
"c:\Program Files\docbook\xsl\html\docbook.xsl"

The HTML file will be sent to standard output. You can either
redirect that output to a file, or use the
-o parameter to specify an
output file:

msxsl -o myfile.html myfile.docbook \
"c:\Program Files\docbook\xsl\html\docbook.xsl"

These commands are a bit unwieldy, so we will create batch files
for them. Before doing that, however, we will install a different
XSLT processor for our use. It turns out that some output formats
require the XSLT processor to have capabilities that aren’t yet
standard, and MSXSL doesn’t have all
the ones we need. Next time we will install and use
Saxon, an XSLT processor that does
seem to meet all DocBook requirements.

Advertisement

Blog at WordPress.com.

%d bloggers like this: