---
title: Why OCaml
author: hannes
tags: overview, background
abstract: a gentle introduction into OCaml
---

## Programming

For me, programming is fun.  I enjoy doing it, every single second.  All the way
from designing over experimenting to debugging why it does not do what I want.
In the end, the computer is dumb and executes only what you (or code from
someone else which you rely on) tell it to do.

To abstract from assembly code, which is not portable, programming languages were
developed.  Different flavoured languages vary in
expressive power and static guarantees.  Many claim to be general purpose or
systems languages; depending on the choices of
the language designer and tooling around the language, it is a language which lets you conveniently develop programs in.

A language designer decides on the builtin abstraction mechanisms, each of which
is both a burden and a blessing:  it might be interfering (which to use? `for` or `while`, `trait` or `object`),
orthogonal (one way to do it), or even synergistic (higher order functions and anonymous functions).  Another choice is whether the language includes a type
system, and if the developer can cheat on it (by allowing arbitrary type casts, a *weak* type system).  A strong static type system
allows a developer to encode invariants, without the need to defer to runtime
assertions.  Type systems differ in their expressive power ([dependent typing](https://en.wikipedia.org/wiki/Dependent_type) are the hot research area at the moment).  Tooling depends purely
on the community size, natural selection will prevail the useful tools
(community size gives inertia to other factors: demand for libraries, package manager, activity on stack overflow, etc.).


## Why OCaml?

As already mentioned in [other](/Posts/About)
[articles](/Posts/OperatingSystem) here, it is a
combination of sufficiently large community, runtime stability and performance, modularity,
carefully thought out abstraction mechanisms, maturity (OCaml recently turned 20), and functional features.

The latter is squishy, I'll try to explain it a bit: you define your concrete
*data types* as *products* (`int * int`, a tuple of integers), *records* (`{
foo : int ; bar : int }` to name fields), sums (`type state = Initial | WaitingForKEX | Established`, or variants, or tagged union in C).
These are called [*algebraic data types*](https://en.wikipedia.org/wiki/Algebraic_data_type).  Whenever you have a
state machine, you can encode the state as a variant and use a
pattern match to handle the different cases.  The compiler checks whether your pattern match is complete
(contains a line for each member of the variant).  Another important aspect of
functional programming is that you can pass functions to other functions
(*higher-order functions*).  Also, *recursion* is fundamental for functional
programming: a function calls itself -- combined with a variant type (such as
`type 'a list = Nil | Cons of 'a * 'a list`) it is trivial to show termination.

*Side effects* make the program interesting, because they
communicate with other systems or humans.  Side effects should be isolated and
explicitly stated (in the type!).  Algorithm and protocol
implementations should not deal with side effects internally, but leave this to an
effectful layer on top of it.  The internal pure functions
(which receive arguments and return values, no other way of communication) inside
preserve [*referential
transparency*](https://en.wikipedia.org/wiki/Referential_transparency_%28computer_science%29).
Modularity helps to separate the concerns.

The holy grail is [declarative programing](https://en.wikipedia.org/wiki/Declarative_programming), write *what*
a program should achieve, not *how to* achieve it (like often done in an imperative language).

OCaml has a object and class system, which I do not use.  OCaml also contains
exceptions (and annoyingly the standard library (e.g. `List.find`) is full of
them), which I avoid as well.  Libraries should not expose any exception (apart from out of memory, a really exceptional situation).  If your
code might end up in an error state (common for parsers which process input
from the network), return a variant type as value (`type ('a, 'b) result = Ok of 'a | Error of 'b`).
That way, the caller has to handle
both the success and failure case explicitly.

## Where to start?

The [OCaml website](https://ocaml.org) contains a [variety of
tutorials](https://ocaml.org/learn/tutorials/) and examples, including
[introductionary
material](https://ocaml.org/learn/tutorials/get_up_and_running.html) how to get
started with a new library.  Editor integration (at least for emacs, vim, and
atom) is via [merlin](https://github.com/the-lambda-church/merlin/wiki)
available.

A very good starting book is [OCaml from the very
beginning](http://ocaml-book.com/) to learn the functional ideas in OCaml (also
its successor [More
OCaml](http://ocaml-book.com/more-ocaml-algorithms-methods-diversions/)).
Another good book is [real world OCaml](https://realworldocaml.org), though it
is focussed around the "core" library (which I do not recommend due to its
huge size).

There are [programming
guidelines](https://ocaml.org/learn/tutorials/guidelines.html), best to re-read
on a regular schedule.  Daniel wrote [guidelines](http://erratique.ch/software/rresult/doc/Rresult.html#usage) how to handle with errors and results.

[Opam](https://opam.ocaml.org) is the OCaml package manager.
The [opam repository](https://opam.ocaml.org/packages/) contains over 1000
libraries.  The quality varies, I personally like the small libraries done by
[Daniel Bünzli](http://erratique.ch/software), as well as our
[nqsb](https://nqsb.io) libraries (see [mirleft org](https://github.com/mirleft)),
[notty](https://github.com/pqwy/notty).
A concise library (not much code),
including tests, documentation, etc. is
[hkdf](https://github.com/hannesm/ocaml-hkdf).  For testing I currently prefer
[alcotest](https://github.com/mirage/alcotest).  For cooperative tasks,
[lwt](https://github.com/ocsigen/lwt) is decent (though it is a bit convoluted by
integrating too many features).

I try to stay away from big libraries such as ocamlnet, core, extlib, batteries.
When I develop a library I do not want to force anyone into using such large
code bases.  Since opam is widely used, distributing libraries became easier,
thus the trend is towards small libraries (such as
[astring](http://erratique.ch/software/astring),
[ptime](http://erratique.ch/software/ptime),
[PBKDF](https://github.com/abeaumont/ocaml-pbkdf), [scrypt](https://github.com/abeaumont/ocaml-scrypt-kdf)).

What is needed?  This depends on your concrete goal.  There are lots of
issues in lots of libraries, the MirageOS project also has a [list of
Pioneer projects](https://github.com/mirage/mirage-www/wiki/Pioneer-Projects) which
would be useful to have.  I personally would like to have a native [simple
authentication and security layer (SASL)](https://tools.ietf.org/html/rfc4422)
implementation in OCaml soon (amongst other things, such as using an [ELF section for
data](https://github.com/mirage/mirage/issues/489),
[strtod](https://github.com/mirage/mirage-platform/issues/118)).

A [dashboard](https://github.com/rudenoise/mirage-dashboard) for MirageOS is
under development, which will hopefully ease tracking of what is being actively
developed within MirageOS.  Because I'm impatient, I setup an [atom
feed](https://github.com/miragebot.private.atom?token=ARh4hnusZ1kC_bQ_Q6_HUzQteEEGTqy8ks61Fm2LwA==)
which watches lots of MirageOS-related repositories.

I hope I gave some insight into OCaml, and why I currently enjoy it.  A longer read on applicability of OCaml is our Usenix 2015 paper
[Not-quite-so-broken TLS: lessons in re-engineering a security protocol
specification and
implementation](https://nqsb.io/nqsbtls-usenix-security15.pdf).  I'm interested in feedback, either via
[twitter](https://twitter.com/h4nnes) or as an issue on the [data repository on
GitHub](https://github.com/hannesm/hannes.nqsb.io/issues).

## Other updates in the MirageOS ecosystem

- Canopy now sends out appropriate [content type](https://github.com/Engil/Canopy/pull/23) HTTP headers
- [mirage-http 2.5.2](https://github.com/mirage/mirage-http/releases/tag/v2.5.2) was released to [opam](https://opam.ocaml.org/packages/mirage-http/mirage-http.2.5.2/) which fixes the resource leak
- regression in [mirage-net-xen 1.6.0](https://github.com/mirage/mirage-net-xen/issues/39), I'm back on 1.4.1
- I stumbled upon [too large crunch for MirageOS](https://github.com/mirage/mirage/issues/396), no solution apart from using a FAT image ([putting the data into an ELF section](https://github.com/mirage/mirage/issues/489) would solve the issue, if anyone is interested in MirageOS, that'd be a great project to start with)
- unrelated, [X.509 0.5.2](https://opam.ocaml.org/packages/x509/x509.0.5.2/) fixes [this bug](https://github.com/mirleft/ocaml-x509/commit/1a1476308d24bdcc49d45c4cd9ef539ca57461d2) in certificate chain construction