Table of Contents
Pandoc is a Haskell
library for converting from one markup format to another, and a
command-line tool that uses this library. It can read
markdown
and (subsets of)
reStructuredText,
HTML, and
LaTeX; and it
can write plain text,
markdown,
reStructuredText,
HTML,
LaTeX,
ConTeXt,
RTF,
DocBook XML,
OpenDocument XML,
ODT,
GNU Texinfo,
MediaWiki markup,
groff man
pages, and
S5 HTML
slide shows. Pandoc's enhanced version of markdown includes syntax
for footnotes, tables, flexible ordered lists, definition lists,
delimited code blocks, superscript, subscript, strikeout, title
blocks, automatic tables of contents, embedded LaTeX math, and
markdown inside HTML block elements. (These enhancements can be
disabled if a drop-in replacement for
Markdown.pl is desired.)
In contrast to most existing tools for converting markdown to HTML, which use regex substitutions, Pandoc has a modular design: it consists of a set of readers, which parse text in a given format and produce a native representation of the document, and a set of writers, which convert this native representation into a target format. Thus, adding an input or output format requires only adding a reader or writer.
© 2006–9 John MacFarlane (jgm at berkeley dot edu). Released under the GPL, version 2 or greater. This software carries no warranty of any kind. (See COPYRIGHT for full copyright and warranty notices.) Contributors: Recai Oktaş (build system, debian package, wrapper scripts), Peter Wang (Texinfo writer), Andrea Rossato (OpenDocument writer).
If you run pandoc without arguments, it will
accept input from stdin. If you run it with file names as
arguments, it will take input from those files. By default,
pandoc writes its output to
stdout.[1]
If you want to write to a file, use the -o
option:
pandoc -o hello.html hello.txt
Note that you can specify multiple input files on the command line.
pandoc will concatenate them all (with blank
lines between them) before parsing:
pandoc -s ch1.txt ch2.txt refs.txt > book.html
(The -s option here tells
pandoc to produce a standalone HTML file, with a
proper header, rather than a fragment. For more details on this and
many other command-line options, see below.)
Instead of a filename, you can specify an absolute URI. In this case pandoc will attempt to download the content via HTTP:
pandoc -f html -t markdown http://www.fsf.org
The format of the input and output can be specified explicitly
using command-line options. The input format can be specified using
the -r/--read or -f/--from
options, the output format using the -w/--write
or -t/--to options. Thus, to convert
hello.txt from markdown to LaTeX, you could
type:
pandoc -f markdown -t latex hello.txt
To convert hello.html from html to markdown:
pandoc -f html -t markdown hello.html
Supported output formats include markdown,
latex, context (ConTeXt),
html, rtf (rich text format),
rst (reStructuredText),
docbook (DocBook XML),
opendocument (OpenDocument XML),
odt (OpenOffice text document),
texinfo, (GNU Texinfo),
mediawiki (MediaWiki markup),
man (groff man), and s5
(which produces an HTML file that acts like powerpoint).
Supported input formats include markdown,
html, latex, and
rst. Note that the rst reader
only parses a subset of reStructuredText syntax. For example, it
doesn't handle tables, option lists, or footnotes. But for simple
documents it should be adequate. The latex and
html readers are also limited in what they can
do.
If you don't specify a reader or writer explicitly,
pandoc will try to determine the input and
output format from the extensions of the input and output
filenames. Thus, for example,
pandoc -o hello.tex hello.txt
will convert hello.txt from markdown to LaTeX.
If no output file is specified (so that output goes to stdout), or
if the output file's extension is unknown, the output format will
default to HTML. If no input file is specified (so that input comes
from stdin), or if the input files' extensions are unknown, the
input format will be assumed to be markdown unless explicitly
specified.
All input is assumed to be in the UTF–8 encoding, and all output is
in UTF–8 (unless your version of pandoc was compiled using GHC 6.12
or higher, in which case the local encoding will be used). If your
local character encoding is not UTF–8 and you use accented or
foreign characters, you should pipe the input and output through
iconv.
For example,
iconv -t utf-8 source.txt | pandoc | iconv -f utf-8 > output.html
will convert source.txt from the local encoding
to UTF–8, then convert it to HTML, then convert back to the local
encoding, putting the output in output.html.
[1]
The exception is for odt. Since this is a binary
output format, an output file must be specified explicitly.