University of Victoria Humanities Computing and Media Centre. Released under the Mozilla Public Licence 1.1.
No source: created in machine-readable form.
Over the last few years, the Humanities Computing community has been producing XML documents at a remarkable rate. Generally, these documents are presented to readers through a Web interface by using a two-stage process: first, they are converted to HTML using XSLT and other transformation methods, and then they are styled for the browser using CSS. Most Web browsers, however, are perfectly capable of presenting XML documents directly using CSS. In many contexts, it is actually unnecessary to go through the intermediate step of transforming XML to HTML (a process that usually involves discarding much useful information from the original markup). Moreover, CSS is easier to learn and more human-readable than XSLT. This workshop will introduce participants to the practice of styling XML through CSS, with particular focus on the XML of the Text Encoding Initiative (TEI).
In the early days of XML usage, XML code was often processed using text-manipulation tools such as PERL, or template scripting languages such as XMLScript. These approaches can be very fast and effective, especially on files with simple structures such as XML output from a database query, but they become rather unwieldy when complex, deeply nested documents such as TEI files are involved.
The typical way to produce PDF from XML is to use XSLT to create
In all of these approaches, the core idea is that XML is useful as a source document for data storage, but it's no good for presentation; it must be converted into something else which is more appropriate (HTML,
If HTML is the output format from a conversion, it's typically styled for presentation using CSS. This is a very roundabout way of approaching the display of XML; you have to learn extra languages such as XSLT, and create and maintain extra files. If you change the XML structure (by, say, adding a feature), you have to change the XSLT, to create a different HTML output, then change the CSS for the HTML to handle the presentation of the new feature. In the case of PDF, the situation is similar, although even more complicated. XSL:FO is itself heavily dependent on CSS properties and values to define the appearance of elements.
In other words, you'll probably have to write CSS anyway. Why not write CSS that applies directly to the XML, and keep it simple?
As usual, Internet Explorer is only partially compliant with the specifications.
It will happily apply the CSS it understands to any XML document, and its XML support is excellent (multiple namespaces etc.). However, its support for CSS is limited. For example, IE6 does not support position: fixed
, nor does it support more sophisticated selectors such as >
, +
, or attribute-based selectors. IE7 supports more of CSS 2.1 and 3, but our testing has shown that it is still unable to render many documents properly. Never mind. IE is awful in many other ways, too. Let's not use it, eh?
xml-stylesheet
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="my-style.css" type="text/css"?>
p{
font-size: 12pt;
text-align: justify;
}
div{...}
(This is the simplest type of selector. It selects all elements with the tag name "div".
div p{...}
(This selects all p
elements which are descendants of div
.
title[level="m"]{...}
(This selects all title
elements which have an attribute level="m"
.
quote:before{...}
and quote:after{...}
(These pseudo-selectors are a little more complicated; they enable you to add content which will appear before or after the element. Using these selectors, for instance, you can opening and closing quotation marks before and after a quote.
The link above goes to the CSS 2.1 specification; CSS 3 defines many more selectors and pseudo-selectors, but only a few are currently supported by browsers.
display: block | inline | none;
(hiding and showing elements)Elements can be displayed as blocks (such as paragraphs, which are multi-line wrapping blocks of text), inline (words or phrases that occur within a block, such as italicized terms) or "none" (which means not displayed at all).
width: 60%;
(sizing elements)Width and height can be specified for any block elements, in percentages (relative to the containing block) or units such as pixels (400px) or inches (6in).
margin-top: 1em;
(space around elements)A margin setting does just what you would think: it puts a space between the element and anything contiguous to it. Margins can be set in percentages or units such as pixels, inches or ems (the height of the element's font). Margins can be set separately on all four sides of the element, or in a single setting (margin: 2%
).
text-align: left | right | center | justify;
font-size: 150%;
font-family: georgia, "times new roman", serif;
The font-family property takes a series of comma-separated values, in order of preference; the browser will use the first one which is available. Font names with spaces should be enclosed in quotes, and the series should end with one of the generic font families (serif
, sans-serif
, cursive
, fantasy
or monospace
).
font-style: italic;
font-weight: bold;
color: black;
background-color: white;
Some elements are not normally viewed in a document; one good example might be the teiHeader, which is often used only by metadata parsers or coders comfortable with reading XML. Such elements can be hidden by setting display: none
.
By default, all elements are displayed as inline
. The first step in organizing the layout is to determine which elements should be block elements. Typical examples in TEI would be head
tags and p
tags.
Space out the elements on your page by setting margin
values. Other properties to consider at this stage are padding
and border
.
Some elements (such as headings) might be centred, others (such as paragraphs) will look best when justified.
The hierarchy of a text is often signalled through setting different font sizes for different levels of heading. Blockquotes are sometimes shown with a smaller font size than the surrounding paragraph text.
TEI emph
tags will typically be italicized, acronyms or abbreviations might be bold, and monograph titles will typically be in italics.
(X)HTML has built-in event handlers which can be assigned during an XSLT transformation to scripting actions (such as hiding or showing elements when something is clicked on). Also, some (X)HTML attributes come with built-in functionality (such as the anchor tag with its href attribute). Pure CSS can't provide any interactivity like this (although there are ways to attach script actions to XML elements, and this will become easier in future).
Key print-related properties such as page-break-before
and page-break-inside
are not supported by any major browser, so content blocks are often unacceptably fragmented across pages. In addition, printing from a browser requires manual intervention to set up the browser print properties and page settings, so that backgrounds print, margins are correct, and no extra information such as the page url or the page count is added to headers and footers.
Given a generic TEI XML document, a browser can't be expected to know that (for instance) a graphic
element refers to an image which can be retrieved from the location specified by its url
attribute. There are workarounds using script (such as dynamically replacing graphic
elements with img
elements in the XHTML namespace) but support for this kind of thing is patchy at best.
CSS cascades only in one direction (top down). Thus we can apply style to a term
element based on the fact that it is inside a p
element, but not vice versa; we cannot apply style to a p
element on the basis that it contains a term
element. This is part of the basic design of CSS; traversing a document tree in both directions is time-consuming and would make CSS-based rendering very slow. However, it does represent a limitation on what we can do (as opposed to what can be done with XSLT, for instance).
If your document is fairly simple, and your objective is to create an attractive onscreen rendering of it which allows the user to search it (using Control + F
) and perform operations such as copy/paste, then CSS is a useful and simple approach.
CSS can provide one or more simple, clear views of the text of an XML document, and these can be used for proofing and checking during and after the XML markup process. Alternate stylesheets can be switched in and out to highlight different key elements.
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet title="Full view" href="gowers_full.css" type="text/css" ?>
<?xml-stylesheet title="Names only" href="names_only.css" alternate="true" type="text/css" ?>
<?xml-stylesheet title="teiHeader only" href="teiheader_only.css" alternate="true" type="text/css" ?>
Using processing instructions as shown above, you can include more than one stylesheet in your document. Each is given a title, and alternative stylesheets have the attribute alternate="true"
. You can then switch between them using the browser's View
menu.
<script xmlns="http://www.w3.org/1999/xhtml type="application/ecmascript" src="xml_events.js"></script>
Using another namespace, you can add a link to a JavaScript file containing code which will operate on the XML DOM.
window.addEventListener('load', doSetup, false);
You can put this line in your JavaScript file (external to any function) causing this the doSetup function to be called when the browser loads the XML file.
The code in this library does a number of things:
<graphic>
elements with <img>
elements in the XHTML namespace, using the same attribute values, so that images are displayed by the browser.title
attributes in the XHTML namespace to TEI <abbr>
elements, using the expan
sibling element's value, so that abbreviations have tooltip popups.
It's certainly arguable that once we start adding this kind of code to a project, we might as well be writing XSLT and creating XHTML from the beginning, but it is intriguing to experiment with this kind of code.