% Pandoc for Haskell Hackers
% John MacFarlane
% BayHac 2014

# Why am I here?

<div class="notes">
I'm a philosophy professor.  And this is just the sort of question
you expect a philosophy professor to ask---but I mean it in the
most mundane sense.  I have no formal background whatsoever
in computer science.  I think I'm a decent Haskell programmer, but nothing
close to an expert.  So why am I giving a talk at a Haskell hackathon?
</div>

# I created a virus...

---

# ...that spreads GHC

<div class="notes">
Because I created the most effective virus for spreading
GHC installations: pandoc.
</div>

| ![](images/virus1.png)
| ![](images/virus2.png)
| ![](images/virus3.png)
| ![](images/virus4.png)

----


           debian popcon            Hackage
-------- --------------- ------------------
pandoc              1997              26220
darcs               1908               3420
xmonad              1733               6432
-------- --------------- ------------------

![](images/downloads-google.png)

<!--

![Debian popcon graph](images/popcon-graph.png)

-->


# How did it happen?

![trs80](images/TRS80.jpg) ![kim1](images/KIM1.jpg)

<div class="notes">
I grew up programming video games in BASIC on a TRS-80 and 6502 assembly
on a KIM-1 that my grandfather bought (he must have been among the first
microcomputer hobbyists).  In college I worked summers as a programmer,
writing scientific software in Pascal on VAX workstations with whopping
20 MB hard drives.

I got out of programming when GUIs took over, but started to get
interested again in 2004 after installing linux.  I messed around with
python and lisp and ruby, and wrote a website for my department.  Then
in 2005, an Australian philosopher/logician colleague of mine, Greg
Restall, mentioned that his favorite programming language was Haskell.
So I checked it out, and found that I liked it too.

I started messing around with parsec, the parser combinator library.
I asked myself, "what would be fun to try to parse?"  I'd gotten
interested in lightweight markup languages and was using
reStructuredText for lecture notes and handouts.  And I'd just seen
this "markdown" thing that John Gruber introduced.  So I said to
myself, "Self, let's try writing a markdown parser."  That seemed
like a good challenge, because markdown is about as parser-unfriendly a
language as you can get.  An asterisk, for example, can be either an
open or close tag for emphasis, or *part of* an open or close tag for
strong emphasis, or just a literal asterisk---and which it is often
depends on what comes after.  At the time, the only markdown parsers
around were just big regex transformations.  Parsec worked well for
this.  Before long I had a working parser.  It was considerably faster
and more accurate than John Gruber's perl script.  And it was much
easier to maintain and extend.

One thing led to another.  I wasn't entirely happy with the docutils
(reStructuredText) tool chain, and I saw some advantages to "making my
own tools" for writing.  I needed to convert my existing documents in
reStructuredText to markdown, so I added a reStructuredText parser
and a markdown writer.  I needed output in LaTeX as well as HTML, so
I added a LaTeX writer.  I needed footnotes, inline LaTeX math, and
other features, so I extended pandoc's markdown dialect.  The project
provided hours of pleasant procrastination.
</div>

# First release

![](images/first-pandoc-web-page.png)

<div class="notes">
Then something possessed me to release it. In August 2006 I released the
first version.  Not long after that, a Debian developer, Recai Oktas,
contacted me offering to add it to debian.  So I became an open-source
developer, and I've worked on pandoc in my spare time ever since.
</div>

# Libraries begat libraries

> - highlighting-kate
> - zip-archive
> - texmath
> - pandoc-citeproc (citeproc-hs)
> - gitit

# The command-line tool

A quick demonstration.

    git clone https://github.com/jgm/BayHac2014
    cd BayHac2014/demo
    vim script.txt

<div class="notes">
Our goal is to learn how to use pandoc as a library,
but let's first have a quick demo of the command-line tool.

Run through `script.txt` in `demo/`.
</div>

----

By default pandoc works as a pipe, reading from stdin and writing to
stdout.  Try it:

    pandoc


Hit Ctrl-D (Ctrl-Z on Windows) when you're finished typing text.

---

You can use options.  This one triggers "smart typography" (quotes,
dashes, ellipses).

    pandoc --smart

---

Let's convert to latex instead of HTML.

    pandoc --to latex

. . .

Or to mediawiki:

    pandoc --to mediawiki

---

Let's convert a latex file to markdown:

    pandoc -f latex -t markdown example.tex

---

For help and information on which options pandoc supports:

    pandoc --help

More detail can be found in the pandoc
[README](http://johnmacfarlane.net/pandoc/README.html).  Or:

    man pandoc
    man pandoc_markdown

---

The `--standalone` or `-s` option creates a standalone document with
header, footer, and metadata:

    pandoc --standalone --smart -o r.html -t html5 README

. . .

Let's add a table of contents and use some custom CSS:

    pandoc --standalone --smart --toc -o r.html -t html5 \
      --css my.css README

---

Standalone documents are constructed from templates.  To see the
default template for a format, use `-D`:

    pandoc -D html5 > my.html5

The template language is documented
[here](http://johnmacfarlane.net/pandoc/README.html#templates).

---

[`sample1.txt`](demo/sample1.txt) contains some nice structured
metadata.  This is YAML but with strings interpreted as markdown.

    ---
    title: My demonstration
    author:
     - Kurt Gödel
     - Haskell Curry
    version:
     - number: 1.0
       date: July 13, 1945
       log:  Initial commit
     - number: 1.1
       date: August 14, 1946
       log:  Added some math
    ---

---

The metadata has a nice version history.
Let's edit `my.html5` to include this before the `</header>` tag:

    <ul class="versions">
    $for(version)$
    <li>Version $version.number$ ($version.date$): $version.log$</li>
    $endfor$
    </ul>

---

Let's try our custom template:

    pandoc -s -S --template my.html5 -t html5 sample1.txt \
      -o sample1.html --mathjax

---

We can create a PDF. Pandoc shells out to pdflatex for this.

    pandoc sample1.txt -o sample1.pdf

If you want to use xelatex instead, use `--latex-engine=xelatex`.

---

We can create a word document without opening Word:

    pandoc sample1.txt -o sample1.docx

Note that the TeX math in the markdown file gets converted to
native Word equations.

. . .

or an epub:

    pandoc sample1.txt -t epub3 -o sample1.epub

---

Pandoc can process citations using bibtex bibliographies (or several
other formats).  Take a look at [`sample2.txt`](demo/sample2.txt) and
[`sample2.bib`](demo/sample2.bib).

We tell it to use `pandoc-citeproc` as a filter:

    pandoc -s --filter pandoc-citeproc sample2.txt -o sample2.docx

---

Try changing the bibliography style.  Edit
[`sample2.txt`](demo/sample2.txt) to uncomment

    csl: chicago-fullnote-bibliography.csl.

Then:

    pandoc -s --filter pandoc-citeproc sample2.txt -o sample2.docx

---

Citations work in all formats supported by pandoc:

    pandoc -s --filter pandoc-citeproc sample2.txt -o sample2.org
    emacs sample2.org

---

Source code highlighting is automatic for marked code blocks.
It works in HTML, PDF, and docx:

    pandoc -s sample3.txt -o sample3.html
    pandoc -s sample3.txt -o sample3.pdf
    pandoc -s sample3.txt -o sample3.docx

. . .

You can change the highlighting style:

    pandoc -s sample3.txt -o sample3.html --highlight-style=monochrome

---

Pandoc has native support for literate haskell.

[`paste.lhs`](demo/paste.lhs) is a literate Haskell file with markdown
text:

    ghci paste.lhs
    pandoc paste.lhs -f markdown+lhs -t html -s -o paste.html
    pandoc paste.lhs -f markdown+lhs -t latex+lhs -s -o paste.lhs.tex

---

Pandoc can also convert to and from haddock, though this needs updating
in light of recent changes in haddock's markup.

    pandoc -f markdown -t haddock
    pandoc -f haddock -t markdown

---

Pandoc also supports beamer and several HTML slide show formats.
[This slide show](slides.txt) was written with pandoc:

    pandoc slides.txt -o slides.html -t revealjs --css slides.css \
      -S --highlight-style=espresso

# A tour of pandoc's API

# Readers and writers

[Text.Pandoc](doc/pandoc/Text-Pandoc.html)

. . .

``` haskell
Prelude> :m + Text.Pandoc
Text.Pandoc> let doc = readMarkdown def "*hi*"
Text.Pandoc> doc
Pandoc (Meta {unMeta = fromList []}) [Para [Emph [Str "hi"]]]
Text.Pandoc> writeLaTeX def doc
"\\emph{hi}"
Text.Pandoc> readMarkdown def{readerSmart = True} "dog's"
Pandoc (Meta {unMeta = fromList []}) [Para [Str "dog\8217s"]]
```

# The Pandoc types

[Text.Pandoc.Definition](doc/pandoc-types/Text-Pandoc-Definition.html)

. . .

You can use `pandoc -t native` and `pandoc -f native` to explore:

```
% echo "[*link*](/foo)" | pandoc -t native
[Para [Link [Emph [Str "link"]] ("/foo","")]]
```


# Builder

[Text.Pandoc.Builder](doc/pandoc-types/Text-Pandoc-Builder.html)


Concatenating lists is slow.  So we use special types `Inlines` and
`Blocks` that wrap `Sequence`s of `Inline` and `Block` elements.

# A simple example

Here's a JSON data source about CNG fueling stations in the Chicago
area:  [cng_fuel_chicago.json](cng_fuel_chicago.json.html).
Boss says:  write me a letter in Word listing all the stations
that take the Voyager card.

. . .

No need to open Word for this job! [fuel.hs](./fuel.hs.html)

# Transforming a Pandoc document

[Text.Pandoc.Generic](doc/pandoc-types/Text-Pandoc-Generic.html)

[Text.Pandoc.Walk](doc/pandoc-types/Text-Pandoc-Walk.html)

# Example: `walk`

```haskell
module AllCaps (allCaps) where
import Text.Pandoc.Definition
import Data.Char (toUpper)

allCaps :: Inline -> Inline
allCaps (Str xs) = Str $ map toUpper xs
allCaps x = x
```
. . .

```
% ghci AllCaps.hs
*AllCaps > Text.Pandoc.Walk.walk allCaps $ Para [Emph [Str "hi"]]
Para [Emph [Str "HI"]]
```

# Filters

Suppose we have a program that defines a transformation

```haskell
f :: Pandoc -> Pandoc
```

Since `Pandoc` has `Read` and `Show` instances, we can
write a pipe:

```haskell
-- f.hs
main = interact (show . f . read)
```

And use it thus:

    pandoc -t native -s | runghc f.hs | pandoc -f native -s -t latex

# JSON filters

`Read` and `Show` are really slow.  Better to use JSON serialization:

    pandoc -t json -s | runghc fjson.hs | pandoc -f json -s -t latex

. . .

To simplify this pattern, we added `--filter`:

    pandoc -s -t latex --filter fjson.hs

# toJSONFilter

[Text.Pandoc.JSON](doc/pandoc-types/Text-Pandoc-JSON.html)

`toJSONFilter` takes any function `a -> a` or `a -> [a]` or
`a -> IO a`, where `a` is a Pandoc type, and turns it into a
JSON filter.

```haskell
import Text.Pandoc.JSON
import AllCaps (allCaps)

main = toJSONFilter allCaps
```

# Example: `emphToCaps.hs`

```haskell
-- pandoc --filter ./emphToCaps.hs
import Text.Pandoc.JSON
import Text.Pandoc.Walk
import AllCaps (allCaps)

emphToCaps :: Inline -> [Inline]
emphToCaps (Emph xs) = walk allCaps xs
emphToCaps x = [x]

main :: IO ()
main = toJSONFilter emphToCaps
```

# Output format conditionalization

`pandoc --filter` passes the name of the output format as
first argument to the filter.  So the filter's behavior
can depend on the output format.

`toJSONFilter` makes this easy:  just use a function whose
first argument is `Maybe Format`.

# Example:  `emphToCaps2.hs`

Emph as <span style="font-variant:small-caps;">Small Caps</span>
in LaTeX and HTML, ALL CAPS otherwise:

```haskell
-- pandoc --filter ./emphToCaps2.hs
import Text.Pandoc.JSON
import Text.Pandoc.Walk
import AllCaps (allCaps)

emphToCaps :: Maybe Format -> Inline -> [Inline]
emphToCaps (Just f) (Emph xs)
  | f == Format "html" || f == Format "latex" = [SmallCaps xs]
emphToCaps _ (Emph xs) = walk allCaps xs
emphToCaps _ x = [x]

main :: IO ()
main = toJSONFilter emphToCaps
```

# Exercises

<http://johnmacfarlane.net/BayHac2014/exercises.pdf>