Charles Engelke's Blog

July 7, 2003

XSLT: A Tutorial

Filed under: OSCON 2003 — Charles Engelke @ 2:45 pm

We’re just about to start the session, provided by Mike Fitzgerald of
Wy’east Communications. He’s starting by telling us about some of the
books he recommends:

XSLT is Extensible Stylesheet Language Transformations. XPath is a companion
specification in non-XML syntax. Both published by W3C on November 16, 1999.
Both are at version 1.0, with working drafts of version 2.0. XSL-FO
(format objects) deals with appearance. Originally was just part of
XSLT, split in April 1999.

The basics: XSLT defines and describes transformations, mostly from XML to other
formats. Can also start with plain HTML and plain text.
A source tree is an XML document, which can be a file or stream. The
result tree can be almost anything.

Templates are the heart of XSLT. They match patterns in source trees,
which is generally a location in the structure of the source tree. When
a template finds a pattern it can be instantiated and create a result
in output.

Here’s a ridiculous but legal XML document:

<msg/>

Here’s a brief stylesheet:

<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform">
<output method="text"/>
<template match="msg">Found it!</template>
&lt/stylesheet>

Put this file in msg.xsl, with the XML file in msg.xml, then run
saxon msg.xml msg.xsl. It prints Found it!. (Actually, I had
to run msxsl msg.xml msg.xsl, and it printed F o u n d i t
!
because that tool outputs to UTF-16, and the command line expects
ASCII.)

Here’s a cute XML file:

<?xml-stylesheet href="msg.xsl" type="text/xsl"?>
<msg/>

Put that in a file and open it with the web browser. The
browser will apply the stylesheet, which searches the XML document for
a msg element. It finds it, and outputs the result of the
stylesheet (Found it!) rather than the contents of the XML file.

More sophistication. message.xml contains

<?xml version="1.0"?>

<message priority="high">Welcome to OSCON 2003&lt/message>

and message.xsl has

<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform">
<output method="text"/>

<template match="message">
<value-of select="text()"/>
</template>

</stylesheet>

When run (I used msxsl because the other tools aren’t working for me)
here’s what happens

C:\>msxsl message.xml message.xsl

gets

Welcome to OSCON 2003

(but in UTF-16, unlike the other tools).

Formatting dates. Here’s a nice example. Suppose you have dates stored
in a reasonable XML way, as in

<?xml version="1.0"?>

<date>
<year>2003</year>
<month>07</month>
<day>07</day>
</date>

Then you can use different stylesheets to get this data in
different formats. To get ISO format, use this stylesheet:

<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform">
<output method="text"/>
<strip-space elements="*"/>

<template match="date">Today's date: <apply-templates/>
</template>

<template match="year"><value-of select="."/>-</template>

<template match="month"><value-of select="."/>-</template>

<template match="day"><value-of select="."/></template>

</stylesheet>

and you get
Today’s date: 2003-07-07
But use this
stylesheet:

<stylesheet version="1.0" xmlns="http://www.w3.org/1999/XSL/Transform">
<output method="text"/>
<strip-space elements="*"/>

<template match="date">Today's date: <apply-templates select="month"/>
<apply-templates select="day"/>
<apply-templates select="year"/>
</template>

<template match="year"><value-of select="."/></template>

<template match="month"><value-of select="."/>/</template>

<template match="day"><value-of select="."/>/</template>

</stylesheet>

and the output is
Today’s date: 07/07/2003
Pretty slick.
Note the use of “.” for value-of select instead of “text()”. He’s not
clear on what the difference is, except that usually they do the same thing
and “.” is just “better”.

What if you left out that <strip-space elements=”*”/>? For this
source XML file, no difference. But if there were extra whitespace in the
year, month, or day elements and you left out the strip-space command, the
whitespace would show up in the output, too.

Some templates are built-in and can cause confusion.

XPath seems XML documents as a set of one more nodes of seven types:

  1. Root (called document in XPath 2.0 because the root element isn’t
    the root node, so this is confusing)
  2. Element
  3. Attribute
  4. Text
  5. Namespace
  6. Comment
  7. Processing Instruction

The root node is the root of the tree. It has at most one element node
child, which is the root or document element.

I’m finding it really difficult to keep up with these notes, not so much
because of the session, but because the tools aren’t working for me and I’m
trying to download new ones, but the network wasn’t working! (It just
started working.) “Instant saxon” isn’t working because it uses the
Microsoft Java VM, and Windows XP doesn’t have that, and Microsoft won’t
let you get it. Java saxon probably isn’t working because I haven’t set up
the Java environment right. And msxsl is working, but it defaults to UTF-16
output which looks s p r e a d o u t to programs expecting ASCII (or
UTF-8).

Advertisements

Blog at WordPress.com.

%d bloggers like this: