Preprocessing in OCaml

The compiler pipeline

Input file

โ†’

Compiler

โ†’

Output file

Source Preprocessor

PreProcessor eXtension

Examples:

#define PI 3

return PI;

โ†’



return 3;
#if OCAML_VERSION >= (4,13)
  | Type_variant (tll,_) ->  
#else
  | Type_variant tll ->
#endif
        do_something

โ†’

# 764 "src/loader/cmi.ml"
  | Type_variant (tll,_) ->  

# 768 "src/loader/cmi.ml"
        do_something
module Pretty = struct
  #include "prettyprint.ml"
end

โ†’

module Pretty = struct
# 1 "prettyprint.ml"
< content of the file >
# 1 "prettyprint.ml"
end
  • Breaks most developper tools.

  • Good for "completely new syntax".

  • Difficult to do anything "OCaml-specific".

  • Handled by build system or ocamlopt's -pp option.

Don't use source preprocessing. But let's do a quick exercise anyway ๐Ÿคจ

Start with 0_source_preprocessing_exercise/README.md (10min)

What is the result of parsing

In OCaml: a Parsetree.

Expressions

(x + 0) * (2 + 3)

โ†’

List.map
  (fun x -> expr)
  (1 :: [])

โ†’

Types

float -> int option

โ†’

Modules

module X = struct
  let n : float -> int option =
    expr
  let n2 : t = expr
end

โ†’

A walk in the Parsetree

Try anything you are curious about!

How to rewrite the parsetree?

  1. Paperwork

  2. val transform : Parsetree.t -> Parsetree.t

  3. Paperwork

Don't write PPX by hand. But let's do a quick exercise anyway ๐Ÿคจ

Start with 1_handmade_ppx_exercise/README.md (20min)

What could go wrong?

Everything. Here is a small list:

  • Composability
  • Opaqueness
  • Efficiency
  • Compatibility
  • Maintenance
  • Boilerplate
  • Build complexity

We have a solution: ppxlib!

  • Handles the boilerplate
  • Handles the compatibility
  • Orchestrates rewriters
  • Cooperates with dune
  • Better performance
  • Better composability
  • Less opaqueness

Don't write global transformation by hand. But let's do a quick exercise anyway ๐Ÿคจ

Start with 4_global_transformations/README.md (10min)

Also, record your ideas for an interesting PPX.

Restricting the rewriting

We have seen that dune and ppxlib coordinate to solve the boilerplate/build complexity.

How about opaqueness and composability?

We need two Jokers ๐Ÿƒ๐Ÿƒ.

Attributes, an attached Joker ๐Ÿƒ

Attributes are extra named Parsetree nodes that can be attached to a Parsetree node.

Expressions/types/...

g[@inlined] x

โ†’

Structure/signature items

val f : int -> int
[@@ocaml.deprecated
  "Please use function g instead"]

โ†’

Standalone

[@@@ocaml.warnings "-42"]

โ†’

Extension nodes, a replacing Joker ๐Ÿƒ

Extension nodes are named Parsetree nodes that can replace a Parsetree node.

Expressions/types/...

1 + [%hello "world"]

โ†’

Structure/signature items

module M = struct
  [%%rewrite_me let f x = x]

  let%rewrite_me f x = x
end

โ†’

Back to restricting

We are going to restrict the transformation in:

  • Their input: No full Parsetree as input. Context-free.

  • Their effect: No full Parsetree as output. Local.

_ Deriver Extender
Input Attributed node (right name) Extension node (right name)
Output Nodes to append Node to replace

Examples

let before = 1

type foo = Bar of int | Baz [@@deriving show]

let after = 1

rewritten into

let before = 1

type foo =
  | Bar of int
  | Baz [@@deriving show]
let rec pp_foo : Format.formatter -> foo -> unit =
  fun fmt -> function
   | Bar a0 ->
       Format.fprintf fmt "(@[<2>Bar@ ";
       Format.fprintf fmt "%d" a0;
       Format.fprintf fmt "@])"
   | Baz -> Format.pp_print_string fmt "Baz"
and show_foo : foo -> string =
  fun x -> Format.asprintf "%a" pp_foo x

let after = 1

Examples

let before = 1

let to_ocaml = [%html "<a href='ocaml.org'>OCaml!</a>"]

let after = 1

rewritten into

let before = 1

let to_ocaml =
  Html.a ~a:[Html.a_href (Html.Xml.W.return "ocaml.org")]
    (Html.Xml.W.cons
       (Html.Xml.W.return (Html.txt (Html.Xml.W.return "OCaml!")))
       (Html.Xml.W.nil ()))

let after = 1
  • Less opaqueness

  • More guarantees

  • More composability

  • Many new ideas for transformations!

Derivers:

  • ppx_deriving.show for deriving pretty printers.
  • ppx_deriving.eq for deriving equality
  • ppx_deriving.ord for deriving comparison
  • ppx_deriving.enum for deriving enumerators
  • ppx_deriving.iter for deriving iters
  • ppx_deriving.map for deriving maps
  • ppx_deriving.fold for deriving folds
  • ppx_deriving.make for deriving constructors
  • ppx_show for deriving pretty-printers
  • ppx_yojson_conv for deriving json converters
  • ppx_deriving_yaml for deriving yaml converters
  • ppx_sexp_conv for deriving sexp converters
  • ppx_accessor for deriving record accessors

Extenders:

  • ppx_expect for expect tests
  • ppx_inline_test for inline tests
  • ppx_lwt for monadic Lwt
  • ppx_tyxml for writing HTML naturally
  • ppx_rapper for writing SQL naturally

Let's use some of them:

Feel free to reuse in everyday life!

Start with 2_use_deriver_exercise/README.md

Then, 3_use_extender_exercise/README.md (15 min total)

Let's write a deriver now ๐Ÿ˜ฑ

Feel free to reuse in everyday life!

Start with write_deriver_exercise/README.md (150 min total)

Some comments:

  • ppxlib has a documentation.

  • Remembering how ppxlib and dune cooperate can help.

  • The Parsetree is a big and unstable type.

  • Ppxlib provides some stability (at the expense of a little complexity, especially for pattern-matching).

  • Be careful with locations. Having them wrong can harm the user experience, with Merlin being confused.

  • Be careful with shadowing. If you access a module, do not expect anymodule to be open, qualify everything completely, and do not shadow existing definition.