Pandoc for Haskell Hackers

John MacFarlane

BayHac 2014

Why am I here?

I created a virus…

…that spreads GHC




debian popcon Hackage
pandoc 1997 26220
darcs 1908 3420
xmonad 1733 6432

How did it happen?

trs80 kim1

First release

Libraries begat libraries

  • highlighting-kate
  • zip-archive
  • texmath
  • pandoc-citeproc (citeproc-hs)
  • gitit

The command-line tool

A quick demonstration.

git clone https://github.com/jgm/BayHac2014
cd BayHac2014/demo
vim script.txt

By default pandoc works as a pipe, reading from stdin and writing to stdout. Try it:

pandoc

Hit Ctrl-D (Ctrl-Z on Windows) when you’re finished typing text.

You can use options. This one triggers “smart typography” (quotes, dashes, ellipses).

pandoc --smart

Let’s convert to latex instead of HTML.

pandoc --to latex

Or to mediawiki:

pandoc --to mediawiki

Let’s convert a latex file to markdown:

pandoc -f latex -t markdown example.tex

For help and information on which options pandoc supports:

pandoc --help

More detail can be found in the pandoc README. Or:

man pandoc
man pandoc_markdown

The --standalone or -s option creates a standalone document with header, footer, and metadata:

pandoc --standalone --smart -o r.html -t html5 README

Let’s add a table of contents and use some custom CSS:

pandoc --standalone --smart --toc -o r.html -t html5 \
  --css my.css README

Standalone documents are constructed from templates. To see the default template for a format, use -D:

pandoc -D html5 > my.html5

The template language is documented here.

sample1.txt contains some nice structured metadata. This is YAML but with strings interpreted as markdown.

---
title: My demonstration
author:
 - Kurt Gödel
 - Haskell Curry
version:
 - number: 1.0
   date: July 13, 1945
   log:  Initial commit
 - number: 1.1
   date: August 14, 1946
   log:  Added some math
---

The metadata has a nice version history. Let’s edit my.html5 to include this before the </header> tag:

<ul class="versions">
$for(version)$
<li>Version $version.number$ ($version.date$): $version.log$</li>
$endfor$
</ul>

Let’s try our custom template:

pandoc -s -S --template my.html5 -t html5 sample1.txt \
  -o sample1.html --mathjax

We can create a PDF. Pandoc shells out to pdflatex for this.

pandoc sample1.txt -o sample1.pdf

If you want to use xelatex instead, use --latex-engine=xelatex.

We can create a word document without opening Word:

pandoc sample1.txt -o sample1.docx

Note that the TeX math in the markdown file gets converted to native Word equations.

or an epub:

pandoc sample1.txt -t epub3 -o sample1.epub

Pandoc can process citations using bibtex bibliographies (or several other formats). Take a look at sample2.txt and sample2.bib.

We tell it to use pandoc-citeproc as a filter:

pandoc -s --filter pandoc-citeproc sample2.txt -o sample2.docx

Try changing the bibliography style. Edit sample2.txt to uncomment

csl: chicago-fullnote-bibliography.csl.

Then:

pandoc -s --filter pandoc-citeproc sample2.txt -o sample2.docx

Citations work in all formats supported by pandoc:

pandoc -s --filter pandoc-citeproc sample2.txt -o sample2.org
emacs sample2.org

Source code highlighting is automatic for marked code blocks. It works in HTML, PDF, and docx:

pandoc -s sample3.txt -o sample3.html
pandoc -s sample3.txt -o sample3.pdf
pandoc -s sample3.txt -o sample3.docx

You can change the highlighting style:

pandoc -s sample3.txt -o sample3.html --highlight-style=monochrome

Pandoc has native support for literate haskell.

paste.lhs is a literate Haskell file with markdown text:

ghci paste.lhs
pandoc paste.lhs -f markdown+lhs -t html -s -o paste.html
pandoc paste.lhs -f markdown+lhs -t latex+lhs -s -o paste.lhs.tex

Pandoc can also convert to and from haddock, though this needs updating in light of recent changes in haddock’s markup.

pandoc -f markdown -t haddock
pandoc -f haddock -t markdown

Pandoc also supports beamer and several HTML slide show formats. This slide show was written with pandoc:

pandoc slides.txt -o slides.html -t revealjs --css slides.css \
  -S --highlight-style=espresso

A tour of pandoc’s API

Readers and writers

Text.Pandoc

Prelude> :m + Text.Pandoc
Text.Pandoc> let doc = readMarkdown def "*hi*"
Text.Pandoc> doc
Pandoc (Meta {unMeta = fromList []}) [Para [Emph [Str "hi"]]]
Text.Pandoc> writeLaTeX def doc
"\\emph{hi}"
Text.Pandoc> readMarkdown def{readerSmart = True} "dog's"
Pandoc (Meta {unMeta = fromList []}) [Para [Str "dog\8217s"]]

The Pandoc types

Text.Pandoc.Definition

You can use pandoc -t native and pandoc -f native to explore:

% echo "[*link*](/foo)" | pandoc -t native
[Para [Link [Emph [Str "link"]] ("/foo","")]]

Builder

Text.Pandoc.Builder

Concatenating lists is slow. So we use special types Inlines and Blocks that wrap Sequences of Inline and Block elements.

A simple example

Here’s a JSON data source about CNG fueling stations in the Chicago area: cng_fuel_chicago.json. Boss says: write me a letter in Word listing all the stations that take the Voyager card.

No need to open Word for this job! fuel.hs

Transforming a Pandoc document

Text.Pandoc.Generic

Text.Pandoc.Walk

Example: walk

module AllCaps (allCaps) where
import Text.Pandoc.Definition
import Data.Char (toUpper)

allCaps :: Inline -> Inline
allCaps (Str xs) = Str $ map toUpper xs
allCaps x = x
% ghci AllCaps.hs
*AllCaps > Text.Pandoc.Walk.walk allCaps $ Para [Emph [Str "hi"]]
Para [Emph [Str "HI"]]

Filters

Suppose we have a program that defines a transformation

f :: Pandoc -> Pandoc

Since Pandoc has Read and Show instances, we can write a pipe:

-- f.hs
main = interact (show . f . read)

And use it thus:

pandoc -t native -s | runghc f.hs | pandoc -f native -s -t latex

JSON filters

Read and Show are really slow. Better to use JSON serialization:

pandoc -t json -s | runghc fjson.hs | pandoc -f json -s -t latex

To simplify this pattern, we added --filter:

pandoc -s -t latex --filter fjson.hs

toJSONFilter

Text.Pandoc.JSON

toJSONFilter takes any function a -> a or a -> [a] or a -> IO a, where a is a Pandoc type, and turns it into a JSON filter.

import Text.Pandoc.JSON
import AllCaps (allCaps)

main = toJSONFilter allCaps

Example: emphToCaps.hs

-- pandoc --filter ./emphToCaps.hs
import Text.Pandoc.JSON
import Text.Pandoc.Walk
import AllCaps (allCaps)

emphToCaps :: Inline -> [Inline]
emphToCaps (Emph xs) = walk allCaps xs
emphToCaps x = [x]

main :: IO ()
main = toJSONFilter emphToCaps

Output format conditionalization

pandoc --filter passes the name of the output format as first argument to the filter. So the filter’s behavior can depend on the output format.

toJSONFilter makes this easy: just use a function whose first argument is Maybe Format.

Example: emphToCaps2.hs

Emph as Small Caps in LaTeX and HTML, ALL CAPS otherwise:

-- pandoc --filter ./emphToCaps2.hs
import Text.Pandoc.JSON
import Text.Pandoc.Walk
import AllCaps (allCaps)

emphToCaps :: Maybe Format -> Inline -> [Inline]
emphToCaps (Just f) (Emph xs)
  | f == Format "html" || f == Format "latex" = [SmallCaps xs]
emphToCaps _ (Emph xs) = walk allCaps xs
emphToCaps _ x = [x]

main :: IO ()
main = toJSONFilter emphToCaps

Exercises

http://johnmacfarlane.net/BayHac2014/exercises.pdf