Pandoc User's Guide

John MacFarlane


Table of Contents

Using Pandoc
Character encodings
Wrappers
markdown2pdf
hsmarkdown
Command-line options
Templates
Pandoc's markdown vs. standard markdown
Backslash escapes
Subscripts and superscripts
Strikeout
Nested Lists
Ordered Lists
Definition lists
Reference links
Footnotes
Tables
Delimited Code blocks
Images with captions
Title blocks
Markdown in HTML blocks
Header identifiers in HTML
Blank lines before headers and blockquotes
Math
Inline TeX
Producing S5 with Pandoc
Literate Haskell support

Pandoc is a Haskell library for converting from one markup format to another, and a command-line tool that uses this library. It can read markdown and (subsets of) reStructuredText, HTML, and LaTeX; and it can write plain text, markdown, reStructuredText, HTML, LaTeX, ConTeXt, RTF, DocBook XML, OpenDocument XML, ODT, GNU Texinfo, MediaWiki markup, groff man pages, and S5 HTML slide shows. Pandoc's enhanced version of markdown includes syntax for footnotes, tables, flexible ordered lists, definition lists, delimited code blocks, superscript, subscript, strikeout, title blocks, automatic tables of contents, embedded LaTeX math, and markdown inside HTML block elements. (These enhancements can be disabled if a drop-in replacement for Markdown.pl is desired.)

In contrast to most existing tools for converting markdown to HTML, which use regex substitutions, Pandoc has a modular design: it consists of a set of readers, which parse text in a given format and produce a native representation of the document, and a set of writers, which convert this native representation into a target format. Thus, adding an input or output format requires only adding a reader or writer.

© 2006–9 John MacFarlane (jgm at berkeley dot edu). Released under the GPL, version 2 or greater. This software carries no warranty of any kind. (See COPYRIGHT for full copyright and warranty notices.) Contributors: Recai Oktaş (build system, debian package, wrapper scripts), Peter Wang (Texinfo writer), Andrea Rossato (OpenDocument writer).

Using Pandoc

If you run pandoc without arguments, it will accept input from stdin. If you run it with file names as arguments, it will take input from those files. By default, pandoc writes its output to stdout.[1] If you want to write to a file, use the -o option:

pandoc -o hello.html hello.txt

Note that you can specify multiple input files on the command line. pandoc will concatenate them all (with blank lines between them) before parsing:

pandoc -s ch1.txt ch2.txt refs.txt > book.html

(The -s option here tells pandoc to produce a standalone HTML file, with a proper header, rather than a fragment. For more details on this and many other command-line options, see below.)

Instead of a filename, you can specify an absolute URI. In this case pandoc will attempt to download the content via HTTP:

pandoc -f html -t markdown http://www.fsf.org

The format of the input and output can be specified explicitly using command-line options. The input format can be specified using the -r/--read or -f/--from options, the output format using the -w/--write or -t/--to options. Thus, to convert hello.txt from markdown to LaTeX, you could type:

pandoc -f markdown -t latex hello.txt

To convert hello.html from html to markdown:

pandoc -f html -t markdown hello.html

Supported output formats include markdown, latex, context (ConTeXt), html, rtf (rich text format), rst (reStructuredText), docbook (DocBook XML), opendocument (OpenDocument XML), odt (OpenOffice text document), texinfo, (GNU Texinfo), mediawiki (MediaWiki markup), man (groff man), and s5 (which produces an HTML file that acts like powerpoint).

Supported input formats include markdown, html, latex, and rst. Note that the rst reader only parses a subset of reStructuredText syntax. For example, it doesn't handle tables, option lists, or footnotes. But for simple documents it should be adequate. The latex and html readers are also limited in what they can do.

If you don't specify a reader or writer explicitly, pandoc will try to determine the input and output format from the extensions of the input and output filenames. Thus, for example,

pandoc -o hello.tex hello.txt

will convert hello.txt from markdown to LaTeX. If no output file is specified (so that output goes to stdout), or if the output file's extension is unknown, the output format will default to HTML. If no input file is specified (so that input comes from stdin), or if the input files' extensions are unknown, the input format will be assumed to be markdown unless explicitly specified.

Character encodings

All input is assumed to be in the UTF–8 encoding, and all output is in UTF–8 (unless your version of pandoc was compiled using GHC 6.12 or higher, in which case the local encoding will be used). If your local character encoding is not UTF–8 and you use accented or foreign characters, you should pipe the input and output through iconv. For example,

iconv -t utf-8 source.txt | pandoc | iconv -f utf-8 > output.html

will convert source.txt from the local encoding to UTF–8, then convert it to HTML, then convert back to the local encoding, putting the output in output.html.



[1] The exception is for odt. Since this is a binary output format, an output file must be specified explicitly.