writing-ppxs (ppxlib.writing-ppxs)

Defining a transformation

For ppxlib, a transformation is a description of a way to modify a given AST into another one. A transformation can be:

A global transformation, which takes the simple form of a function of type structure -> structure or signature -> signature, that can sometimes takes extra information as additional arguments. Such a transformation is applied in the global transformation phase, unless it has a good reason to have been registered in another phase.
A context-free transformation, which only act on a portion of the AST. In the ppxlib framework, those transformations are represented by values of type Context_free.Rule.t, and are executed in the context-free phase.

Whenever possible, a context-free transformation should be preferred: global transformations have many drawbacks, while context-free ones have many advantages.

In order to register a transformation to the ppxlib driver, one should use the Driver.V2.register_transformation. This function is used to register every type of rewriter in every different phase, except derivers which are abstracted away in Deriving.

Context-free transformation

In ppxlib, the type for context-free transformation is Context_free.Rule.t. Rules will be applied during the top-down traverse of the AST of the context-free pass. A rule contains the information about when it should be applied in the traversal, as well as the transformation to apply.

Currently, rules can only be defined to apply in four different contexts:

On extensions points, such as [%ext_point payload],
on some structure or signature items with an attribute, such as type t = Nil [@@deriving show],
on litterals with modifiers, such as 41g or 43.2x,
on function application or identifiers, such as meta_function "99" and meta_constant.

In order to define rules on extensions points, we will use the Extension module. In order to define rules on attributed items, we will use the Deriving module. For the two other rules, we will directly use the Context_free.Rule module.

Extenders

An extender is characterised by several things:

The context on which it applies: indeed, extension points can be used in many different places: expression, pattern, core type... and the extender should be restricted to one context, as it produces code of a single type.
The name of the extension points on which it is triggered,
the structure of the payload it expects,
the expander, a function outputting the generated AST.

The extender context

The context is a value of type Extension.Context.t. For instance, to define an extender for expression-extension points, the correct context is Extension.Context.expression. Consult the API of the Extension.Context module for the list of all contexts!

# let context = Extension.Context.expression;;
val context : expression Extension.Context.t =
  Ppxlib.Extension.Context.Expression

The extender name

The extension point name on which it applies is simply a string.

# let extender_name = "add_suffix" ;;
val extender_name : string = "add_suffix"

See below for examples on when the above name and context will trigger rewriting:

(* will trigger rewriting: *)
let _ = [%add_suffix "payload"]

(* won't trigger rewriting: *)
let _ = [%other_name "payload"] (* wrong name *)
let _ = match () with [%add_suffix "payload"] -> () (* wrong context *)

The payload extraction

An extension node contains a payload, which will be passed to the transformation function. However, while this payload contains all information, it is not always structured the best way for the transformation function. For instance, in [%add_suffix "payload"], the string "payload" is encoded as a structure item consisting of the evaluation of an expression which is a constant which is a string...

ppxlib allows separating the transformation function, from the extraction of the relevant information from the payload. As explained in depth in the Destructing AST nodes chapter, this extraction is done by destructing the structure of the payload (which is therefore restricted: [%add_suffix 12] would be refused by the rewriter of the example below). The extraction is defined by a value of type Ast_pattern.t. The Ast_pattern module provides some kind of pattern-matching on AST nodes: a way to structurally extract values from an AST node, in order to generate a value of another kind.

For instance, a value of type (payload, int -> float -> expression, expression) Ast_pattern.t means that it defines a way to extract an int and a float from a payload, that should be then combined to define a value of type expression.

In our case, the matched value will always be a payload, as that's the type for extension points' payloads. The type of the produced node will have to match the type of extension node we rewrite, expression in our example.

# let extracter () = Ast_pattern.(single_expr_payload (estring __)) ;;
val extracter : unit -> (payload, string -> 'a, 'a) Ast_pattern.t = <fun>

The unit argument is not important, it is added so that value restriction do not add noise to the type variables. See the Destructing AST nodes chapter and the Ast_pattern API for more explanation on the types and functions used above.

The expand function

The expander is the function that takes the values extracted from the payload, and produces the value that replace the extension node.

Building and inspecting AST nodes can be painful, due to how large the AST type is. ppxlib provides several helper modules to ease this generation, such as Ast_builder, Ppxlib_metaquot, Ast_pattern and Ast_traverse, which are explained in their own chapters: Generating AST nodes, Destructing AST nodes and Traversing AST nodes.

In the example below, you can ignore the body of the function until reading those chapters.

# let expander ~ctxt s =
    let loc = Expansion_context.Extension.extension_point_loc ctxt in
    Ast_builder.Default.(estring ~loc (s ^ "_suffixed")) ;;
val expander : ctxt:Expansion_context.Extension.t -> string -> expression =
<fun>

The expander takes ctxt as a named argument, that is ignored here. This argument corresponds to additional information, such as the location of the extension node. More precisely, it is of type Expansion_context.Extension.t, and includes:

The location of the extension node,
the tool that called the rewriting (merlin, ocamlc, ocaml, ocamlopt, ...),
the name of the input file given to the driver (see Expansion_context.Base.input_name),
the code_path, see Expansion_context.Base.input_name and Code_path.

Declaring an extender

When we have defined the four prerequisites, we are able to combine all of them to define an extender, using the Extension.V3.declare function.

# V3.declare ;;
  string ->
  'context Context.t ->
  (payload, 'a, 'context) Ast_pattern.t ->
  (ctxt:Expansion_context.Extension.t -> 'a) ->
  t

Note that the type is consistent: the context on which the expander is applied, and the value produced by the expander, need to be equal (indeed, 'a must be of the form 'extacted_1 -> 'extracted_2 -> ... -> 'context, by the constraints given by Ast_pattern).

We are thus able to create the extender given by the previous examples:

# let my_extender = Extension.V3.declare extender_name context (extracter()) expander ;;
val my_extender : Extension.t = <abstr>

Note that we use the V3 version of the declare function, which pass the expansion context to the expander. Previous versions were kept for retro-compatibility.

We can finally turn the extender into a rule, using Context_free.Rule.extension, and register it to the driver:

# let extender_rule = Context_free.Rule.extension my_extender ;;
val extender_rule : Context_free.Rule.t = <abstr>
# Driver.register_transformation ~rules:[extender_rule] "rien" ;;
- : unit = ()

Now, the following:

let () = print_endline [%add_suffix "helloworld"]

would be rewritten by the PPX in:

let () = print_endline "helloworld_suffixed"

Derivers

A deriver is characterised by several things:

The way to parse of the arguments passed through the attribute payload,
the set of other derivers that need to be run before it is applied,
the actual generator function.

Contrary to extenders, the registration of the deriver as a Context_free.Rule.t is not made by the user via Driver.register_transformation, but rather by Deriving.add.

Derivers arguments

In ppxlib, a derivers is applied by adding an attribute containing the names of the deriver to apply:

type tree = Leaf | Node of tree * tree  [@@deriving show, yojson]

However, it is also possible to pass arguments to the derivers, either through a record or through labelled arguments:

type tree = Leaf | Node of tree * tree  [@@deriving my_deriver ~flag ~option1:52]

type tree = Leaf | Node of tree * tree  [@@deriving my_deriver { flag; option1=52 }]

The flag argument is a flag, it can only be present or absent but not take a value. The option1 argument is a regular argument, it is also optional but can take a value.

In ppxlib, arguments have the type Deriving.Args.t. Similarly to the Ast_pattern.t type, a value of type (int -> string -> structure, structure) Args.t means that it provides a way to extract from the argument an integer and a string from the options, to be later combined to create a structure.

The way to define an Deriving.Args.t value is to start with the value describing an empty set of arguments, Deriving.Args.empty, and add one by one the arguments, using the combinator Deriving.Args.(+>). Each argument is created using either Deriving.Args.arg for optional arguments (with value extracted using Ast_pattern), or Deriving.Args.flag for optional arguments without values.

# let args () = Deriving.Args.(empty +> arg "option1" (eint __) +> flag "flag") ;;
val args : (int option -> bool -> 'a, 'a) Deriving.Args.t = <abstr>

Derivers dependency

ppxlib allows declaring that a deriver depends on the previous application of another deriver. This is expressed simply as a list of derivers. For instance, the csv deriver depends on the fields deriver to be run first.

# let deps = [] ;;
val deps : 'a list = []

In this example, we do not include any dependency.

Generator function

Similarly to the expand function of an extender, the function generating new code in derivers also takes a context, and the arguments extracted from the attribute payload. Here again, the body of the function given as example can be safely ignored as it relies on later chapters.

# let generate_impl ~ctxt _ast option1 flag =
    let return s =  (* See "Generating code" chapter *)
      let loc = Expansion_context.Deriver.derived_item_loc ctxt in
      [ Ast_builder.Default.(pstr_eval ~loc (estring ~loc s) []) ]
    in
    if flag then return "flag is on"
    else
      match option1 with
      | Some i -> return (Printf.sprintf "option is %d" i)
      | None -> return "flag and option are not set" ;;
val generate_impl :
ctxt:Expansion_context.Deriver.t ->
'a -> int option -> bool -> structure_item list = <fun>

Similarly to extenders, there is an additional (ignored in the example) argument to the function: the context. This time, this context is of type Expansion_context.Deriver.t and includes:

The location of the derived item,
whether the code generation is going to be inlined (see Inlining transformations)
the tool that called the rewriting (merlin, ocamlc, ocaml, ocamlopt, ...),
the name of the input file given to the driver (see Expansion_context.Base.input_name),
the code_path, see Expansion_context.Base.input_name and Code_path.

Registering a deriver

Once the generator function is defined, we can combine the argument extraction and the generator function to create a Deriving.Generator.t:

# let generator () = Deriving.Generator.V2.make (args()) generate_impl ;;
val generator : unit -> (structure_item list, 'a) Deriving.Generator.t = <abstr>

This generator can then be registered as a deriver through the Deriving.add function. Note that, Deriving.add will call Driver.register_transformation itself, so you won't need to do it yourself. Adding a deriver is done in a way that no two deriver with the same name can be registered. This includes derivers registered through the ppx_deriving library.

# let my_deriver = Deriving.add "my_deriver" ~str_type_decl:(generator()) ;;
val my_deriver : Deriving.t = <abstr>

The different optional named argument allows registering generators, to be applied in different contexts, in one function call. Indeed, remember that you can only add one deriver with a given name, even if applied on different contexts. As the API shows it, derivers are restricted to apply on the following contexts:

type declarations (type t = Foo of int),
type extensions (type t += Foo of int),
exceptions (exception E of int),
module type declarations (module type T = sig end)

in both structures and signatures.

Constant rewriting

OCaml integrates a syntax to define special constants: Any g..z or G..Z suffix appended after a float or int is accepted by the parser (but refused later by the compiler). This means that they have to be rewritten by a PPX.

ppxlib provides the Context_free.Rule.constant function to rewrite those litteral constants. The character (between g and z or G and Z) has to be provided, as well as the constant kind (float or int), and both the location and the litteral as a string will be passed to a rewriting function:

# let kind = Context_free.Rule.Constant_kind.Integer ;;
val kind : Context_free.Rule.Constant_kind.t =
  Ppxlib.Context_free.Rule.Constant_kind.Integer
# let rewriter loc s = Ast_builder.Default.eint ~loc (int_of_string s * 100) ;;
val rewriter : location -> string -> expression = <fun>
# let rule = Context_free.Rule.constant kind 'g' rewriter ;;
val rule : Context_free.Rule.t = <abstr>
# Driver.register_transformation ~rules:[ rule ] "constant" ;;
- : unit = ()

As an example, with the above transformation, let x = 2g + 3g will be rewritten to let x = 200 + 300.

Special functions

ppxlib supports registering functions to be applied at compile time. A registered identifier f_macro will trigger rewriting in two situations:

When it plays the role of the function in a function application,
Additionally, anywhere it appears in an expression.

For instance, in

let _ = (f_macro arg1 arg2, f_macro)

the rewriting will be triggered once for the left-hand side f_macro arg1 arg2, and once for the right hand side f_macro. It is the expansion function which is responsible to distinguish between the two cases: using pattern-matching to distinguish between a function application in one, and a single identifier in the other case.

In order to register a special function, one need to use Context_free.Rule.special_function, indicating the name of the special function, and the rewriter. The rewriter will take the expression (without expansion context) and should output an expression option, where:

None signifies that no rewriting should be done: the top-down pass can continue (potentially inside the expression).
Some exp signifies the original expression should be replaced by expr. The top-down pass continues with expr.

The difference between fun expr -> None and fun expr -> Some expr is that the former will continue the top-down pass inside expr, while the latter will continue the top-down pass from expr (included), therefore starting an infinite loop.

# let expand e =
    let return n = Some (Ast_builder.Default.eint ~loc:e.pexp_loc n) in
    match e.pexp_desc with
    | Pexp_apply (_, arg_list) -> return (List.length arg_list)
    | _ -> return 0
  ;;
val expand : expression -> expression option = <fun>
# let rule = Context_free.Rule.special_function "n_args" expand ;;
val rule : Context_free.Rule.t = <abstr>
# Driver.register_transformation ~rules:[ rule ] "special_function_demo" ;;
 - : unit = ()

With such a rewriter registered:

# Printf.printf "n_args is applied with %d arguments\n" (n_args ignored "arguments");;
n_args is applied with 2 arguments
- : unit = ()

Global transformation

Global transformations are the most general kind of transformation. As such, they allow doing virtually any modifications, but this comes with several drawbacks:

It is harder for the user to know exactly what parts of the AST will be changed.
It is harder for ppxlib to combine several global transformations: there is no guarantee that the effect of one will work well with the effect of another.
The job done by two global transformations (e.g. an AST traverse) cannot be factorized, resulting in slower compilation time.

For all these reasons, a global transformation should be avoided whenever a context-free transformation could do the job, which by experience seems to be most of the time.

The API for defining a global transformation is easy: a global transformation consists simply of the function, and can directly be registered with Driver.register_transformation.

# let f str = List.filter (fun _ -> Random.bool ()) str;; (* Randomly omit structure items *)
val f : 'a list -> 'a list = <fun>
# Driver.register_transformation ~impl:f "absent_minded_transformation"
- : unit = ()

Inlining transformations

When using a PPX, the transformation happen at compile time, and the produced code could be directly inlined into the original code. This allows to drop the dependency on ppxlib and the PPX used to generate the code.

This mechanism is implemented for derivers implemented in ppxlib, and is convenient to use, especially in conjunction with Dune. When applying a deriver, using [@@deriving_inline deriver_name] will apply the inline mode of deriver_name instead of the normal mode.

Inline derivers will generate a .corrected version of the file, that Dune can use to promote your file. For more information on how to use this feature to remove from your project a dependency on ppxlib and a specific PPX, refer to this guide.

Integration with Dune

If your PPX is written as a dune project, you'll need to specify the kind field in your dune file, with one of the following two values:

ppx_rewriter, or
ppx_deriver.

The two possible values exists to allow co-existence of old ppx_deriving derivers and ppxlib derivers in the toplevel (see this issue for a discussion on this). On other matter, both stanzas are equivalent. If you are interested in toplevel support, here are the implications of the choice:

If you use ppx_rewriter, your rewriter will work out of the box for compilation using ocamlfind. As a consequence, it will also work in utop, which uses ocamlfind. However, it won't be possible to use any other ppx_deriving deriver in the toplevel.
If you use ppx_deriver, co-existence of your deriver and ppx_deriving derivers in utop, will be ensured. However, ppx_deriving has to become a dependency of your project to work on the toplevel, even without ppx_deriving plugins.

Here is a minimal dune stanza for a rewriter:

(library
  (public_name my_ppx_rewriter)
  (kind ppx_rewriter)
  (libraries ppxlib))

The public name you chose here is the name your users will refer to your ppx in the preprocess field. E.g. here to use this ppx rewriter one would add the (preprocess (pps my_ppx_rewriter)) to their library or executable stanza.

Defining AST transformations

In this chapter, we only focused on all the ppxlib ceremony to declare all kinds of transformations. However, we did not cover at all how to write the actual generative function, backbone of the transformation. ppxlib provides several modules to help with code generation and matching, which are covered in more depth in the next chapters of this documentation:

Ast_traverse, which helps in defining AST traversals, such as maps, folds, iter,...
Ast_helper and Ast_builder, for generating AST nodes in a simpler way than directly dealing with the Parsetree types, and provides a more stable API,
Ast_pattern, the sibling of Ast_builder for matching on AST nodes, extracting values for them.
Ppxlib_metaquot, a PPX to manipulate code much simply, by quoting and unquoting code.

This documentation also includes some guidelines on how to generate nice code, that you should read and follow to produce high quality PPXs:

A section on good error reporting,
A section on the mechanism,
A section on how to test your PPX,
A section on how to collaborate well with Merlin by being careful with locations,

< The Driver

Generating AST nodes >