How to Drive odoc

odoc is a CLI tool to create API and documentation for OCaml projects. However, it operates at a rather low level, taking individual files through several distinct phases until the HTML output is generated.

For this reason, just like for building any multifiles OCaml project, odoc needs to be driver by a higher level tool. The driver will take care of calling the odoc command with the right arguments throughout the different phases. Several drivers for odoc exist, such as: dune and odig.

The odoc tools also contains a "reference driver", that is kept up-to-date with the latest development of odoc.

This document explains how to drive odoc, as of version 3. It is not needed to know any of this to use odoc, it is targeted at driver authors, tools that interact with odoc, or any curious passerby. This includes several subjects:

In addition to the documentation, the reference driver is a good tool to understand how to build odoc projects. It can be useful to look at the implementation code, but it can also help to simply look at all invocations of odoc during a run of the driver.

Cluster of documentation

In its third major version, odoc has been improved so that the same documentation can work on multiple scenarios, from local switches to big monorepos, or the ocaml.org hub of documentation for all packages, without anything breaking, especially references.

The idea is that we have named group of documentation, that we'll call cluster here. We have two kinds of them: page clusters, and modules clusters. Inside the clusters, everything is managed by odoc. Outside of the cluster, the driver is free to arrange them however they like. In order to reference another cluster, a documentation author can use the name of the cluster in the reference.

Different situations will give different meanings to the clusters. In the case of opam packages, though, there is a natural meaning to give to those clusters (you'll find more details in the convention for opam-installed packages). Any opam package will have an associated "documentation cluster", named with the name of the package. Any of its libraries will have an associated "module cluster", named with the name of the library. Another package can thus refer to the doc using the package name, or to any of its library using the library name, no matter where the package is located in the hierarchy.

The doc generation pipeline

Just like when compiling OCaml modules, generating docs for these modules need to be run in a specific order, as some information for generating docs for a file might reside in another one. However, odoc actually allows a particular file to reference a module that depends on it, seemingly creating a circular dependency.

This circular dependency problem is one of the reason we have several phases in odoc. Let's review them:

The compile phase

The compile phase takes as input a set of .cm{i;t;ti} as well as .mld files, and builds a directory hierarchy of .odoc files.

There are distinct commands for this phase: odoc compile for interfaces and pages, odoc compile-impl for implementations, and odoc compile-asset for assets.

Compiling interfaces

Let's have a look at a generic invocation of odoc during the compile phase:

$ odoc compile --output-dir <od> --parent-id <pid> -I <dir1> -I <dir2> <input-file>.<ext>

A concrete example for such command would be:

$ odoc compile ~/.opam/5.2.0/lib/ppxlib/ppxlib__Extension.cmti --output-dir _odoc/ -I _odoc/ocaml-base-compiler/lib/compiler-libs.common -I _odoc/ocaml-base-compiler/lib/stdlib -I _odoc/ocaml-compiler-libs/lib/ocaml-compiler-libs.common -I _odoc/ppxlib/lib/ppxlib -I _odoc/ppxlib/lib/ppxlib.ast -I _odoc/ppxlib/lib/ppxlib.astlib -I _odoc/ppxlib/lib/ppxlib.stdppx -I _odoc/ppxlib/lib/ppxlib.traverse_builtins -I _odoc/sexplib0/lib/sexplib0 --parent-id ppxlib/lib/ppxlib

Compiling implementations

A compile-impl command is pretty similar:

$ odoc compile-impl --output-dir <od> --source-id <sid> --parent-id <pid> -I <dir1> -I <dir2> <input-file>.<ext>

A concrete example for such command would be:

$ odoc compile-impl ~/.opam/5.2.0/lib/ppxlib/ppxlib__Spellcheck.cmt --output-dir _odoc/ -I _odoc/ocaml-base-compiler/lib/compiler-libs.common -I _odoc/ocaml-base-compiler/lib/stdlib -I _odoc/ocaml-compiler-libs/lib/ocaml-compiler-libs.common -I _odoc/ppxlib/lib/ppxlib -I _odoc/ppxlib/lib/ppxlib.ast -I _odoc/ppxlib/lib/ppxlib.astlib -I _odoc/ppxlib/lib/ppxlib.stdppx -I _odoc/sexplib0/lib/sexplib0 --enable-missing-root-warning --parent-id ppxlib/lib/ppxlib --source-id ppxlib/src/ppxlib/spellcheck.ml

Compiling assets

Assets are given during the generation phase. But we still need to create an .odoc file, for odoc's resolution mechanism.

$ odoc compile-asset --output-dir <od> --parent-id <pid> --name <assetname>

The link phase requires the directory of the compile phase to generate its set of .odocl files. This phase resolves references and canonicals.

A generic link command is:

$ odoc link
    -I <dir1> -I <dir2>
    -P <pname1>:<pdir1> -P <pname2>:<pdir2>
    -L <lname1>:<ldir1> -L <lname2>:<ldir2>
    <path/to/file.odoc>

The indexing phase

The indexing phase refers to the "crunching" of information split in several .odocl files. Currently, there are two use-cases for this phase:

Counting occurrences

This step counts the number of occurrences of each value/type/... in the implementation, and stores them in a big table. A generic invocation is:

$ odoc count-occurrences <dir1> <dir2> -o <path/to/name.odoc-occurrences>

An example of such command:

$ odoc count-occurrences _odoc/ -o _odoc/occurrences-all.odoc-occurrences

Indexing entries

The odoc compile-index produces an .odoc-index file, from .odocl files, other .odoc-index files, and possibly some .odoc-occurrences files.

To create an index for the page and documentation units, we use the -P and -L arguments.

$ odoc compile-index -o path/to/<indexname>.odoc-index -P <pname1>:<ppath1> -P <pname2>:<ppath2> -L <lname1>:<lpath1> -L <lname2>:<lpath2> --occurrences <path/to/name.odoc-occurrences>

An example of such command:

$ odoc compile-index -o _odoc/ppxlib/index.odoc-index -P ppxlib:_odoc/ppxlib/doc -L ppxlib:_odoc/ppxlib/lib/ppxlib -L ppxlib.ast:_odoc/ppxlib/lib/ppxlib.ast -L ppxlib.astlib:_odoc/ppxlib/lib/ppxlib.astlib -L ppxlib.metaquot:_odoc/ppxlib/lib/ppxlib.metaquot -L ppxlib.metaquot_lifters:_odoc/ppxlib/lib/ppxlib.metaquot_lifters -L ppxlib.print_diff:_odoc/ppxlib/lib/ppxlib.print_diff -L ppxlib.runner:_odoc/ppxlib/lib/ppxlib.runner -L ppxlib.runner_as_ppx:_odoc/ppxlib/lib/ppxlib.runner_as_ppx -L ppxlib.stdppx:_odoc/ppxlib/lib/ppxlib.stdppx -L ppxlib.traverse:_odoc/ppxlib/lib/ppxlib.traverse -L ppxlib.traverse_builtins:_odoc/ppxlib/lib/ppxlib.traverse_builtins --occurrences _odoc/occurrences-all.odoc-occurrences

The generation phase

The generation phase is the phase that takes all information computed in previous files, and actually generates the documentation. It can take the form of HTML, Latex and manpages, although currently HTML is the odoc backend that supports the most functionalities (such as images, videos, ...).

In this manual, we describe the situation for generating HTML. Usually, generating for other backend boils down to replacing html-generate by latex-generate or man-generate, refer to the manpage to see the diverging options.

Given an .odocl file, odoc might generate a single .html file, or a complete directory of .html files. The --output-dir option specifies the root for generating those output.

A JavaScript file for search requests

odoc provides a way to plugin a JavaScript file, containing the code to answer user's queries. In order to never block the UI, this file will be loaded in a web worker to perform searches:

Interfaces and pages

A generic html-generate command for interfaces has the following form:

$ odoc html-generate
  --output-dir <odir>
  --index <path/to/file.odoc-index>
  --search-uri <relative/to/output-dir/file.js> --search-uri <relative/to/output-dir/file2.js>
  <path/to/file.odocl>

The output directory or file can be computed from this command's --output-dir, the initial --parent-id given when creating the .odoc file, as well as the unit name. In the case of a module, the output is a directory named with the name of the module. In the case of a page, the output is a file with the name of the page and the .html extension.

An example of such command is:

$ odoc html-generate _odoc/ppxlib/doc/page-index.odocl --index _odoc/ppxlib/index.odoc-index --search-uri ppxlib/sherlodoc_db.js --search-uri sherlodoc.js -o _html/

Source code

$ odoc html-generate-source --output-dir <odir> --impl <path/to/impl-file.odocl> <path/to/source/file.ml>

The output file can be computed from this command's --output-dir, and the initial --source-id and --name given when creating the impl-*.odoc file.

An example of such command is:

$ odoc html-generate-source --impl _odoc/ppxlib/lib/ppxlib/impl-ppxlib__Reconcile.odocl /home/panglesd/.opam/5.2.0/lib/ppxlib/reconcile.ml -o _html/

Generating docs for assets

This is the phase where we pass the actual asset. We pass it as a positional argument, and give the asset unit using --asset-unit.

$ odoc html-generate-asset --output-dir <odir> --asset-unit <path/to/asset-file.odocl> <path/to/asset/file.ext>

Convention for installed packages

In order to build the documentation for installed package, the driver needs to give a meaning to various of the concept above. In particular, it needs to define the pages and libraries clusters, know where to find the pages and assets, what id to give them, when linking it needs to know to which clusters the artifact may be linking...

So that the different drivers and installed packages play well together, we define here a convention for building installed packages. If both the package and the driver follow it, building the docs should go well!

The -P and -L clusters, and their root ids

Each package define a set of cluster, each of them having a root ids. These roots will be used in --parent-id and in -P and -L.

The driver can decide any set of mutually disjoint set of roots, without posing problem to the reference resolution. For instance, both -P pkg:<output_dir>/pkg/doc and -P pkg:<output_dir>/pkg/version/doc are acceptable versions. However, we define here "canonical" roots:

Each installed package <p> define a single page root id: <p>/doc.

For each package <p>, each library <l> defines a library root id: <p>/lib/<l>.

For instance, a package foo with two libraries: foo and foo.bar will define three clusters:

Installed OPAM packages need to specify which clusters they may be referencing during the link phase, so that the proper -P and -L arguments are added. (Note that these dependencies can be circular without problem, as they happen during the link phase and only require the artifact from the compile phase.)

An installed package <p> specifies its cluster dependencies in a file at <opam root>/doc/<p>/odoc-config.sexp. This file contains s-expressions.

Stanzas of the form (packages p1 p2 ...) specifies that page clusters p1, p2, ..., should be added using the -P argument: with the canonical roots, it would be -P p1:<output_dir>/p1/doc -P p2:<output_dir>/p2/doc -P ....

Stanzas of the form (libraries l1 l2 ...) specifies that module clusters l1, l2, ..., should be added using the -L argument: with the canonical roots, it would be -L l1:<output_dir>/p1/lib/l1 -L l2<output_dir>/p2/lib/l2 -L ..., where p1 is the package l1 is in, etc.

The units

The module units of a package p are all files installed by p that can be found in <opam root>/lib/p/ or a subdirectory.

The page units are those files that can be found in <opam root>/doc/odoc-pages/ or a subdirectory, and that have an .mld extension.

The asset units are those files that can be found in <opam root>/doc/odoc-pages/ or a subdirectory, but that do not have an .mld extension. Additionally, they are all files found in <opam root>/doc/odoc-assets/.

The --parent-id arguments

Interface and implementation units have as parent id the root of the library cluster they belong to: with "canonical" roots, <pkgname>/lib/<libname>.

Page units that are found in <opam root>/doc/<pkgname>/odoc-pages/<relpath>/<name>.mld have the parent id from their page cluster, followed by <relpath>. So, with canonical roots, <pkgname>/doc/<relpath>.

Asset units that are found in <opam root>/doc/<pkgname>/odoc-pages/<relpath>/<name>.<ext> have the parent id from their page cluster, followed by <relpath>. With canonical roots, <pkgname>/doc/<relpath>.

Asset units that are found in <opam root>/doc/<pkgname>/odoc-assets/<filename> have the parent id from their page cluster, followed by _asset/<filename> <p>/doc/_assets/<filename>.

The --source-id arguments

The driver could chose the source id without breaking references. However, following the canonical roots convention, implementation units must have as source id: <pkgname>/src/<libraryname>/<filename>.ml.