ppxlib sits in between the PPXs authors, and the compiler toolchain. For the PPX author, it provides an API to define the transformation, and register it to ppxlib. Then, all registered transformations can be turned into a single executable, called the driver, that is responsible for applying all the transformations, and that will be called by the compiler.
The PPX authors register their transformations using the Driver.register_transformation function, as explained in the Writing PPXs section. The different argument of this function correspond to the different kinds of PPXs supported by ppxlib, or the phase at which they will be executed.
The driver is created by calling either of Driver.standalone, or Driver.run_as_ppx_rewriter. Note that, when used through Dune, none of these functions will need to be called by the PPX author: as we will see, Dune will be responsible for generating the driver, after all required PPXes, from different libraries, have been registered. These functions will interpret the command lines arguments and start the rewriting accordingly.
The Driver.standalone function creates an executable that parses an OCaml file, transforms it according to the registered transformations, and output the transformed file. This makes it suitable for use with the -pp option of the OCaml compiler. It is a preprocessor for sources, and is standalone in the sense that it can be called independently of the OCaml compiler (e.g. it includes an OCaml parser).
On the other hand, the Driver.run_as_ppx_rewriter-generated driver is a proper PPX, it will read and output directly a Parsetree marshalled value. This version is suitable for use with the -ppx option of the OCaml compiler, as well as any tool that requires control on the parsing of the file: for instance, Merlin includes an OCaml parser that tries hard to recover from error in order to generate a valid AST most of the time.
Several arguments can be passed to the driver when executing it. Those arguments can also be easily passed using Dune, as explained in its manual. PPX authors can add arguments to their generated drivers, using Driver.add_arg. Here are the default arguments for respectively standalone and run_as_ppx_rewriter generated drivers:
driver.exe [extra_args] [<files>] -as-ppx Run as a -ppx rewriter (must be the first argument) --as-ppx Same as -as-ppx -as-pp Shorthand for: -dump-ast -embed-errors --as-pp Same as -as-pp -o <filename> Output file (use '-' for stdout) - Read input from stdin -dump-ast Dump the marshaled ast to the output file instead of pretty-printing it --dump-ast Same as -dump-ast -dparsetree Print the parsetree (same as ocamlc -dparsetree) -embed-errors Embed errors in the output AST (default: true when -dump-ast, false otherwise) -null Produce no output, except for errors -impl <file> Treat the input as a .ml file --impl <file> Same as -impl -intf <file> Treat the input as a .mli file --intf <file> Same as -intf -debug-attribute-drop Debug attribute dropping -print-transformations Print linked-in code transformations, in the order they are applied -print-passes Print the actual passes over the whole AST in the order they are applied -ite-check (no effect -- kept for compatibility) -pp <command> Pipe sources through preprocessor <command> (incompatible with -as-ppx) -reconcile (WIP) Pretty print the output using a mix of the input source and the generated code -reconcile-with-comments (WIP) same as -reconcile but uses comments to enclose the generated code -no-color Don't use colors when printing errors -diff-cmd Diff command when using code expectations (use - to disable diffing) -pretty Instruct code generators to improve the prettiness of the generated code -styler Code styler -output-metadata FILE Where to store the output metadata -corrected-suffix SUFFIX Suffix to append to corrected files -loc-filename <string> File name to use in locations -reserve-namespace <string> Mark the given namespace as reserved -no-check Disable checks (unsafe) -check Enable checks -no-check-on-extensions Disable checks on extension point only -check-on-extensions Enable checks on extension point only -no-locations-check Disable locations check only -locations-check Enable locations check only -apply <names> Apply these transformations in order (comma-separated list) -dont-apply <names> Exclude these transformations -no-merge Do not merge context free transformations (better for debugging rewriters). As a result, the context-free transformations are not all applied before all impl and intf. -cookie NAME=EXPR Set the cookie NAME to EXPR --cookie Same as -cookie -help Display this list of options --help Display this list of options
and
driver.exe [extra_args] <infile> <outfile> -loc-filename <string> File name to use in locations -reserve-namespace <string> Mark the given namespace as reserved -no-check Disable checks (unsafe) -check Enable checks -no-check-on-extensions Disable checks on extension point only -check-on-extensions Enable checks on extension point only -no-locations-check Disable locations check only -locations-check Enable locations check only -apply <names> Apply these transformations in order (comma-separated list) -dont-apply <names> Exclude these transformations -no-merge Do not merge context free transformations (better for debugging rewriters). As a result, the context-free transformations are not all applied before all impl and intf. -cookie NAME=EXPR Set the cookie NAME to EXPR --cookie Same as -cookie -help Display this list of options --help Display this list of options
Cookies are values that are passed to the driver via the command line, or set as side effects of transformations, and which can be accessed by the transformations. They have a name to identify them, and a value consisting of an OCaml expression. The module to access cookies is Driver.Cookies.
The Dune build system is well integrated with the ppxlib mechanism of registering transformations. Dune will read in every dune file the set of PPXs that are to be used. For a given set of rewriters, it will generate a driver using Driver.run_as_ppx_rewriter containing the registered transformation of the whole set. Using a single driver for multiple transformations from multiple PPXs ensure a better composition semantics and improves the speed of the combined transformations.
Moreover, ppxlib communicate with Dune through .corrected files, to allow for promotion. Inlining rewriters generate such files, but the PPX author can generate its own promotion suggestion using the Driver.register_correction function.
One of the important issues with working with the Parsetree is that the API is not stable. For instance, in the 4.13 release of OCaml, the following two changes were made to the Parsetree type. Although they are small changes, they may break any PPX that is written as directly manipulating the (evolving) type.
This unstability causes an issue with maintainance: PPX authors wish to maintain a single version of their PPX, not one per OCaml version, and ideally not have to update their code when an irrelevant (for them) field is changed in the Parsetree.
ppxlib helps to solve both issues. The first one, having to maintain a single PPX version working for every OCaml version, is done by migrating the Parsetree. The PPX author only maintains a version working with the latest version, and the ppxlib driver will be responsible for converting the values from one version to another.
Concretely, say a deriver is applied in the context of OCaml 4.08. After the 4.08 Parsetree has been given to it, the ppxlib driver will migrate this value into the latest Parsetree version, using the Astlib module. The "latest" here depends on the version of ppxlib, but at any given time, the latest released version of ppxlib will always use the latest released version of the Parsetree.
After the migration to the latest Parsetree has occured, the driver run all transformations on it, which ends with a rewritten Parsetree of the latest version. However, since the context of rewriting is OCaml 4.08 (in this example), the driver needs to migrate back the rewritten Parsetree to a 4.08 OCaml version. Again, ppxlib uses the Astlib module for this migration. Once the OCaml 4.08 rewritten AST is obtained, the compilation can continue as usual.
ppxlib defines several kinds of transformations whose core property is that they can only read and modify the code locally: the parts of the AST that are given to the transformation are only portions of the whole AST. In this regard, they are usually called context-free transformations. While being not as general-purpose as plain AST transformations, they are more than often sufficient and have many nice properties such as a well-defined semantic for composition. The two most important of such context-free transformations are derivers and extenders.
A deriver is a context-free transformation that, given a structure or signature item, will generate code to append after this item. The given code is never modified. A deriver can be very useful to generate values depending on the structure of a user-defined type, for instance a converter of a value of a type to and from a JSON value. A deriver is triggered by adding an attribute to a structure or signature item. For instance, the folowing code:
type t = Int of int | Float of float [@@deriving show]
let x = ...would be rewritten to:
type ty = Int of int | Float of float [@@deriving show]
let ty_of_yojson = ...
let ty_to_yojson = ...
let x = ...An extender is a context-free transformation that is triggered on extension nodes, and that will replace the extension node by some code generated from the payload of the extension node. This can be very useful to generate values of a DSL using a more user-friendly syntax, for instance to generate OCaml values from the json syntax.
For instance, the folowing code:
let json =
[%yojson
[ { name = "Anne"; grades = ["A"; "B-"; "B+"] }
; { name = "Bernard"; grades = ["B+"; "A"; "B-"] }
]
]could be rewritten into:
let json =
`List
[ `Assoc
[ ("name", `String "Anne")
; ("grades", `List [`String "A"; `String "B-"; `String "B+"])
]
; `Assoc
[ ("name", `String "Bernard")
; ("grades", `List [`String "B+"; `String "A"; `String "B-"])
]
]The advantages of using context-free transformations are multiple. First, they provide to the user a much clearer understanding of the parts of the AST that will be rewritten, than a fully general AST rewriting. Secondly, they provide a much better composition semantic, which does not depend on the order. Finally, context-free transformation are applied in a single phase factorising the work for all transformation, resulting in a much faster driver than when combining multiple whole AST transformations. More details on the execution of this phase are given in the dedicated section.
See the Writing PPXs section for how to define derivers and extenders.
The actual rewriting of the AST is done in multiple phases:
When registering a transformation through the Driver.register_transformation function, the phase in which the transformation has to be applied is specified. The multiplicity of phases is mostly to account for potential constraints on the order of execution. However, most of the time there is no such constraints, and in this case either the context-free or the global transformation phase should be used (note that whenever possible, which should be almost always, context-free transformation are possible and better). If you register in another phase, be sure to know what you are doing.
Linters are preprocessors that takes as input the whole AST, and output a list of "lint" errors. Such an error is of type Driver.Lint_error.t and includes a string (the error message) and the location of the error. The errors will be reported as preprocessors warnings.
This is the first phase, so linting errors can only be reported for code handwritten by the user.
An example of a PPX registered in this phase is ppx_js_style.
The preprocessing phase is the first transformation that actually alters the AST. In fact, the property of being the "first transformation applied" is what defines this phase, and ppxlib will thus ensure that only one transformation is registered in this phase: it will generate an error otherwise.
An example of a PPX registered in this phase is metapp.
This phase is for transformations that need to be run before the context-free phase. Historically, it was meant for instrumentation-related PPXs, hence the name. Unlike the preprocessing phase, registering to this phase provides no guarantee that the transformation is run early in the rewriting, as there is no limit in the number of transformation registered in this phase, which are then applied in the alphabetical order of their name.
If it is not crucial for a transformation to be run before the context-free phase, it should be registered to the global transformation phase.
The execution of all registered context-free rules is done in a single top-down pass through the AST. Whenever the top-down pass encounter a situation triggering rewriting, the corresponding transformation is called. For instance, when encountering an extension point corresponding to a rewriting rule, the extension point is replaced by the execution of the rule, and the top-down pass continues inside the generated code. Similarly, when a deriving attribute is found to be attached to a structure or signature item, the result of the application of the deriving rule is appended to the AST, and the top-down pass continues in the generated code.
Note that the code generation for derivers is applied when "leaving" the AST node, that is when all rewriters have been run. Indeed, a deriver like this:
type t = [%my_type] [@@deriving deriver_from_type]would need the information generated by the my_type extender to match on the structure of t.
Also note that in this phase, the execution of the context-free rules are intertwined altogether, and it would not make sense to speak about the order of application, contrary to the next phase.
The global transformation phase is the phase where registered transformations, seen as function from and to the Parsetree, are run. The order on which they are applied might matter and change the outcome, but since ppxlib knows nothing about the transformations, the order applied is alphabetical on the names of the transformation.
This phase is for global transformation to escape the alphabetical order and be executed as a last phase. For instance, bissect_ppx needs to be executed after all rewriting has occurred.
Note that only one global transformation can be executed last. If several transformations rely on being the last transformation, it will be true for only one of them. Thus, only register your transformation in this phase if it is absolutely vital to be the last transformation, as your PPX will become incompatible with any other that register a transformation during this phase.