eECHO BLOG

A journey of a thousand miles starts with a single step.

Introduction to Docutils

By Ollie Rutherfurd | 2004-10-06

print

Introduction

Docutils is a set of tools for translating documents written in plain text into other formats, such as HTML, XML, and LaTeX. Docutils documents are authored using reStructuredText (reST), an easy-to-read (and easy-to-write) plain-text markup, and a set of tools to perform translations.

In this article, the first of two on Docutils and reST, let’s look at some of the most commonly used markup in reST documents, including titles, sections, paragraphs, lists, hyperlinks, roles, literal blocks, doctest blocks, tables, and comments. Let’s also see how to use directives to generate a table of contents, include files and images, and use admonitions. And, since reST is intended to be converted to other formats, let’s learn how to do that, and see how to customize the styling of generated documents. (If the article refers to an construct, but doesn’t go into details about it, a link will point to the construct’s appropriate section in the reStructuredText Markup Specification.)

In Part Two, we’ll delve into Docutils code to see how to programmatically convert reST to HTML and how to customize and extend Docutils.
Getting Started

reST markup is designed to be easy-to-read and easy-to-write. Even without any prior knowledge of reST, you should be able to read and understand reST documents. Commonly used constructs, such as styling text and delineating blocks of code, are unobtrusive, derived from informal conventions used in plain-text documents. Simple needs have simple solutions; more advanced and rarely used reST constructs may have more complicated markup.

To get started, let’s look at a short reST document:

Using reST to Write Documents
======================

Overview
========

This document contains the following:

* A document title (“A Short reST Document”)
* A section (titled “Overview”)
* A paragraph (“This document contains…”)
* A bullet list (this)

In reST, document and section titles are “underlined” using a string of suitable characters, such as ========. The underline must be as long as the length of the title, but may be longer, and must appear immediately below the title.

In a reST document, the first title found is the title for the document, and subsequent titles are section titles. In the example above, Using reST to Write Documents is the document title, and Overview is a section title.

To create subsections within a section, use a different underline character. For example, if you use a string of = characters for the document title and the title of sections, use a string of hyphens for subsections. Recommended title characters include =, – (hyphen), \` (backquote), :, . (period), ‘ (apostrophe), ” (quote), ~ (tilde), ^ (caret), _ (underscore), *, +, and # (octothorpe). (See http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html#sections for a complete list of valid characters.) reST doesn’t require you to use a specific character for sections, subsections, sub-subsections, and so on, but your usage must be consistent.

Section titles serve two purposes. Aside from being the title of the section, a section title also marks the start of the section. A section continues, including subsections, until the end of the document or a new section of the same or higher level is found.

Besides titles, the example shown above also includes a paragraph and a bullet list. Paragraphs are blocks of test separated by one or more blank lines. Bullet list items may begin with – (hyphen), +, or *, and are followed by a whitespace and the list item content.
Lists

In addition to bullet lists, reST offers four other kinds of lists: enumerated lists, lists of definitions, field lists and option lists.

Enumerated lists are lists of items prefixed by an enumeration sequence, such as “1.”, “2.”, “3.”, or “a.”, “b.”, “c.”. As with lists in other markup languages, a reST list item may contain sub-lists.

For example, here’s an enumerated list where the third item contains a bullet list.

1. one
2. two
3. three

– a
– b
– c
4. four

As shown above, each item in the list starts with its sequence number (or letter), followed by whitespace and the item’s content. To nest lists, insert one or more blank lines immediately before the nested list.

Definition lists define a set of terms. You can use definition lists for a glossary or to document a list of method parameters. Each definition list item consists of a term, an optional classifier (predeced by a colon and followed by whitespace), and the definition, indented on a new line. As with other list types, individual definition list items may be separated by blank lines, but aren’t required to be. However, you may not have a blank line between the term and its definition.

Here’s a definition list with two items:

term
This term has no classifier.
term: classifier
This term has a classifier…

…and a second paragraph.

A classifier lets you provide additional information about the term. For example, if you use definition lists to document function parameters, the classifier can be used to show the expected types:

def foo(x,y):
“”"
x: `int`
parameter, the first
y: `int`
parameter, the second
“”"

Hyperlinks

reST supports several ways to create hyperlinks to locations both inside and outside a document.

1. Plain URIs

If your document contains a URI, for example http://docutils.sf.net/, Docutils inserts a link and automatically sets both the link text and the target to “http://docutils.sf.net/.”

2. Reference and target pair

A hyperlink reference is a named reference with an associated target, where the reference is the text displayed and the target is the hyperlink target. You specify the reference with:

`reference`_

And you specify the target with:

.. _reference: target

When processed, “reference” will be a hyperlink to “target”.

Here’s an example:

The `Docutils`_ project is hosted by Sourceforge.net.

.. _Docutils: http://docutils.sourceforge.net/

A hyperlink target may only be defined once, but may be referenced multiple times.

One nice feature of Docutils, is that it automatically creates hyperlink targets for all section titles in a document.:

A Section
=========

Here’s a link to `A Section`_.

3. Embedded URIs

Embedded URIs allow you to have a specify the reference and target in one go:

`reference `_

For example, the snippet:

Docutils is written in `Python `_.

is equivalent to:

Docutils is written in `Python`_.

.. _Python: http://www.python.org

At first, embedded URIs may seem tempting, as they require less work. However, they can quickly make a document cluttered and difficult to read.

For more information about hyperlinks, see Hyperlink References in the reStructuredText Markup Specification for more information.
Text Roles

reST allows for inline markup of text using roles. Roles include text effects such as emphasis, strong emphasis, title reference, literal, subscript, and superscript.

The syntax for assigning a role to text is:

:role:`text`

If :role: isn’t specified, the default role (“title reference”) is used.

For example, the text:

You can use roles for :subscript:`subscript` and
:superscript:`superscript`.

generates:

You can use roles for subscript and superscript.

For roles such as subscript and superscript, you need to use the (explicit) role syntax shown above. However, for other, more commonly used roles, shorter markup exists: *emphasis* shows emphasis; **strong emphasis** yields strong emphasis; and “literal“ produces literal.
Literal Blocks

Literal blocks are equivalent to HTML’s

 element: text contained within a literal block isn't interpreted as reST. Literal blocks are used for code examples and pre-formatted text, among others.

A literal block is started by either ending a paragraph with ::, or having :: followed by an indented block. Here's an example of the former:

A simple Python script::

if __name__ == '__main__':
print 'Hello, World!'

And an example using an indented block:

Blah, blah, blah...

::

if __name__ == '__main__':
print 'Hello, World!'

If a paragraph ends with ::, the :: is replaced with : (colon) in the generated document.
Doctest Blocks

As Docutils is written in Python, it is well-suited for writing Python documentation. If a paragraph begins with >>>, the Python interpreter's main prompt, the paragraph is treated as a Doctest Block, and displayed as a literal block in generated documents. Here's an example:

A Doctest Block follows:

>>> 2 + 2
4

Doctest blocks ensure one can use reST to write docstring comments and use doctest together.
Tables

reST supports two types of tables: simple tables and grid tables.

Simple tables use "=" to layout columns and headers, with one or more spaces between columns. Underlines and over-lines for columns must span the column contents:

====================================== ====================
Album Artist
====================================== ====================
Heartattack & Vine Tom Waits
Mermaid Avenue Billy Bragg & Wilco
Rain Dogs Tom Waits
Where'd You Hide the Body James McMurtry
====================================== ====================

Column headers are optional.
Directives

Directives are an extension mechanism for reST. They provide a way to add features without adding new syntax. You can use directives to include files, insert images, highlight admonitions, create`figures`_, and sidebars _, and generate tables of contents.

The syntax for using a directive is:

.. name:: [content]
:flags:
[content]

Both [content] parts may or may not be relevant, depending on the directive.
Table of Contents

The contents directive generates a table of contents. The directive is replaced by the table of contents, listing all sections and subsections in the document. The directive can accept a title as an option, but if not provided, "Contents" is used. You may also specify the depth of the table of contents.

Here's an example:

.. contents:: Table of Contents

Including Files

You can include another document in your document by using the include directive. By default, the include directive behaves just like the #include C pre-processor macro: the directive is replaced with the contents of the document.

For example, this includes the file chapter1.txt when the enclosing file is processed:

.. include:: chapter1.txt

Using the literal flag, you can include a file as a literal block, which is useful for example code and configuration files:

.. include:: example.py
:literal:

Including Images

Images can be included using the image directive. The image directive requires an image path, and accepts, height, width, scale, alt, target, and alignment options.

For instance, the following snippet includes the file image.png and provides an alternate description if the image cannot be rendered:

.. image:: image.png
:alt: some image

Admonitions

Admonitions are a class of directives to call attention to something. reST supports the following admonitions: attention, caution, danger, error, hint, important, note, tip, and warning. Admonition content must be indented.

Here's an example:

.. warning::

Consider yourself warned!

Comments

While comments are not directives, the syntax for comments is similar. Comments begin with ..

.. This is a comment.

Unofrtunately, the similarity between comments and directives can be a source of easy mistakes. For example, the following image directive is missing a second :, so rather than including an image, you get a comment (and no image):

.. image: image.png

Converting reST to Other Formats

Now that you've learned enough markup to write reST documents, let's see how to convert reST documents to other formats.
HTML

The Docutil utility rst2html.py converts reST to HTML. By default, it reads from stdin and writes to stdout (so nothing happens if you just type rst2html.py).

To convert a reST source file to HTML, type:

$ rst2html.py

For a complete list of options supported by rst2html.py, use --help:

$ rst2html.py --help

Docutils includes a CSS stylesheet for generated HTML documents. If you've installed from source, you'll find it at tools/stylesheets/default.css. You may specify an alternate stylesheet to use, but if none is supplied, the generated file references "default.css" in the document's output directory.

If you want to change the look of your documents, create a second stylesheet that imports default.css, overrides existing styles, and adds new styles. That way, you won't have to integrate your customizations with future versions of default.css.

For example:

/* custom.css */

@import url(default.css); /* import default */

/* customizations */
p { text-align: justify; }
pre { font-family: Lucida Console; }

Then, when generating HTML, do:

$ rst2html.py --stylesheet=custom.css

PDF

While Docutils lacks native support for generating Adobe PDF documents directly from reST, you can use Docutils' LaTeX writer to generate LaTeX from reST and then generate a PDF from LaTeX.

To generate LaTeX from reST, type:

$ rst2latex.py --embed-stylesheet \
--stylesheet-path=~/code/docutils/tools/style.tex \

And to generate PDF from LaTeX:

$ pdflatex

As with the HTML stylesheet, you must have the Docutils source package available, as it's not installed when you install Docutils.
Docbook XML

Docutils has an experimental yet supported writer for generating Docbook XML, available in the Docutils Sandbox.

To install docbook support, you must first checkout the code from Docutils' CVS:

$ cvs -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/docutils login
$ cvs -z3 -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/docutils \
co -d rst2docbook sandbox/oliverr/docbook

Then install it:

$ cd rst2docbook
$ python setup.py install

Usage is similar to other front-end scripts:

$ rst2docbook.py

To specify what type of docbook document (article, book, or chapter) to generate, use the --doctype option, as in:

$ rst2docbook.py --doctype=article

The Docbook writer is not yet included as part of Docutils, because its support for bibliographic elements is a bit flakey. (And since it's not part of Docutils, it tends to lag with support for newly added features.)
Editors

For a programmer, one great advantage of plain text documentation is not having to use a word processor like Microsoft Word or OpenOffice.org Writer -- you can just use whichever editor you're already familiar with and attached to.

Here's a brief overview of editors with support for reST:

* Emacs: Some Emacs support is available with Docutils, in "tools/editors/emacs". You can also see what's available at http://cvs.sourceforge.net/viewcvs.py/docutils/docutils/tools/editors/ emacs/.
* jEdit: As of 4.2, jEdit comes with syntax highlighting for reST.
* ReSTedit: A minimal editor designed for reST editing. ReSTedit does on-the-fly HTML rendering, which makes it a handy tool for learning and experimenting with reST. (Mac OS X only.)
* Vim: As of 6.3, Vim comes with syntax highlighting for reST.

Summary

Docutils and reST provide an easy-to -use and capable set of markup and tools for generating documentation.

In this article, you've seen markup for commonly used reST constructs, learned how to convert reST to other formats, and explored how to customize the style of your generated HTML.

In the next half of this article, we'll dig deeper into Docutils. We'll see how you use Docutils to programatically convert reST to HTML in your applications, and we'll see how to extend Docutils by implementing a custom text role and a custom directive.
References

For more information on reST and Docutils, see:

* Docutils Project Documentation Overview (http://docutils.sourceforge.net/docs/index.html)
* reStructuredText Primer (http://docutils.sourceforge.net/docs/user/rst/quickstart.html)
* Quick reStructuredText (http://docutils.sourceforge.net/docs/user/rst/quickref.html)
* reST Cheatsheet (http://docutils.sourceforge.net/docs/user/rst/cheatsheet.txt)
* reStructuredText Markup Specification (http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html)

Comments are closed.