Pandoc is a Haskell library for converting from one markup format to another, and a command-line tool that uses this library. It can read markdown and (subsets of) Textile, reStructuredText, HTML, LaTeX, MediaWiki markup, and DocBook XML; and it can write plain text, markdown, reStructuredText, XHTML, HTML 5, LaTeX (including beamer slide shows), ConTeXt, RTF, DocBook XML, OpenDocument XML, ODT, Word docx, GNU Texinfo, MediaWiki markup, EPUB (v2 or v3), FictionBook2, Textile, groff man pages, Emacs Org-Mode, AsciiDoc, and Slidy, Slideous, DZSlides, or S5 HTML slide shows. It can also produce PDF output on systems where LaTeX is installed.
Pandoc’s enhanced version of markdown includes syntax for footnotes,
tables, flexible ordered lists, definition lists, fenced code
blocks, superscript, subscript, strikeout, title blocks, automatic
tables of contents, embedded LaTeX math, citations, and markdown
inside HTML block elements. (These enhancements, described below
under Pandoc’s markdown, can
be disabled using the markdown_strict input or
output format.)
In contrast to most existing tools for converting markdown to HTML, which use regex substitutions, Pandoc has a modular design: it consists of a set of readers, which parse text in a given format and produce a native representation of the document, and a set of writers, which convert this native representation into a target format. Thus, adding an input or output format requires only adding a reader or writer.
If no input-file is specified, input is read
from stdin. Otherwise, the
input-files are concatenated (with a blank
line between each) and used as input. Output goes to
stdout by default (though output to
stdout is disabled for the
odt, docx,
epub, and epub3 output
formats). For output to a file, use the -o
option:
pandoc -o output.html input.txt
Instead of a file, an absolute URI may be given. In this case pandoc will fetch the content using HTTP:
pandoc -f html -t markdown http://www.fsf.org
If multiple input files are given, pandoc will
concatenate them all (with blank lines between them) before
parsing.
The format of the input and output can be specified explicitly
using command-line options. The input format can be specified
using the -r/--read or
-f/--from options, the output format using the
-w/--write or -t/--to
options. Thus, to convert hello.txt from
markdown to LaTeX, you could type:
pandoc -f markdown -t latex hello.txt
To convert hello.html from html to markdown:
pandoc -f html -t markdown hello.html
Supported output formats are listed below under the
-t/--to option. Supported input formats are
listed below under the -f/--from option. Note
that the rst, textile,
latex, and html readers are
not complete; there are some constructs that they do not parse.
If the input or output format is not specified explicitly,
pandoc will attempt to guess it from the
extensions of the input and output filenames. Thus, for example,
pandoc -o hello.tex hello.txt
will convert hello.txt from markdown to LaTeX.
If no output file is specified (so that output goes to
stdout), or if the output file’s extension is
unknown, the output format will default to HTML. If no input file
is specified (so that input comes from
stdin), or if the input files’ extensions are
unknown, the input format will be assumed to be markdown unless
explicitly specified.
Pandoc uses the UTF-8 character encoding for both input and
output. If your local character encoding is not UTF-8, you should
pipe input and output through iconv:
iconv -t utf-8 input.txt | pandoc | iconv -f utf-8
Earlier versions of pandoc came with a program,
markdown2pdf, that used pandoc and pdflatex to
produce a PDF. This is no longer needed, since
pandoc can now produce pdf
output itself. To produce a PDF, simply specify an output file
with a .pdf extension. Pandoc will create a
latex file and use pdflatex (or another engine, see
--latex-engine) to convert it to PDF:
pandoc test.txt -o test.pdf
Production of a PDF requires that a LaTeX engine be installed (see
--latex-engine, below), and assumes that the
following LaTeX packages are available:
amssymb, amsmath,
ifxetex, ifluatex,
listings (if the --listings
option is used), fancyvrb,
longtable, url,
graphicx, hyperref,
ulem, babel (if the
lang variable is set),
fontspec (if xelatex or
lualatex is used as the LaTeX engine),
xltxtra and xunicode (if
xelatex is used).
A user who wants a drop-in replacement for
Markdown.pl may create a symbolic link to the
pandoc executable called
hsmarkdown. When invoked under the name
hsmarkdown, pandoc will
behave as if invoked with
-f markdown_strict --email-obfuscation=references,
and all command-line options will be treated as regular arguments.
However, this approach does not work under Cygwin, due to problems
with its simulation of symbolic links.