html2rst

html2rst - a partial converter from html to reStructuredText

There are two versions, one in Haskell and one in OCaml. The OCaml version is faster and has a few more features, but I think the Haskell version works better on the whole. Neither one is a 100% solution (for example, neither one handles tables), but they will get you 90% of the way.

Haskell version

Haskell source code: html2rst.hs

To compile (with the GHC Haskell compiler):

ghc --make -package-name hxt -o html2rst html2rst.hs

You’ll need the Haskell XML Toolbox.

Update: With GHC 6.6 and the latest HXT, a different incantation is needed:

ghc -fglasgow-exts --make

Usage:

html2rst [http://mypage.mydomain.org](http://mypage.mydomain.org)

prints a reStructuredText version of the web page at the URL given.

(C) 2006 John MacFarlane. This program may be freely used and distributed. It carries no warranty of any kind.

OCaml Version

OCaml source code: html2rst.ml

I recommend using an OCaml distribution built from source using GODI. GODI will install a basic OCaml system, including a package manager (godi_console) that allows you to install third-party libraries. Use godi_console to install the following packages: godi-pcre, base-pcre, godi-ocamlnet, godi-netclient. Then compile and link html2rst with the following command:

ocamlfind ocamlopt -package netstring -package str -package
netclient -linkpkg -o html2rst html2rst.ml

(C) 2004–6 John MacFarlane This program may be freely used and distributed. It carries no warranty of any kind.