---
title: Exfiltrating log data using syslog
author: hannes
tags: mirageos, protocol, logging
abstract: sometimes preservation of data is useful
---

It has been a while since my last entry... I've been busy working on too many
projects in parallel, and was also travelling on several continents.  I hope to
get back to a biweekly cycle.

## What is syslog?

According to [Wikipedia](https://en.wikipedia.org/wiki/Syslog), syslog is a
standard for message logging.  Syslog permits separation of the software which
generates, stores, reports, and analyses the message.  A syslog message contains
at least a timestamp, a facility, and a severity.  It was initially specified in
[RFC 3164](https://tools.ietf.org/html/rfc3164), though usage predates this RFC.

For a unikernel, which likely won't have any persistent storage, syslog is a way
to emit log messages (HTTP access log, debug messages, ...) via the network, and
defer the persistency problem to some other service.

Lots of programming languages have logger libraries, which reflect the different
severity of syslog roughly as log levels (debug, informational, warning, error,
fatal).  So does OCaml since the beginning of 2016, there is the
[Logs](http://erratique.ch/software/logs) library which separates log message
generation from reporting: the closure producing the log string is only
evaluated if there is a reporter which needs to send it out.  Additionally, the
reporter can extend the message with the log source name, a timestamp, etc.

The Logs library is slowly getting adopted by the MirageOS community (you can
see an incomplete list
[here](https://opam.ocaml.org/packages/logs/logs.0.6.2/)), there are reporters
available which integrate into [Apple System
Log](https://github.com/mirage/ocaml-asl), [Windows event
log](https://github.com/djs55/ocaml-win-eventlog), and also for [MirageOS
console](https://github.com/mirage/mirage-logs).  There is a command-line
argument interface to set the log levels of your individual sources, which is
pretty neat.  For debugging and running on Unix, console output is usually
sufficient, but for production usage having a console in some `screen` or `tmux`
or dumped to a file is usually annoying.

Gladly there was already the
[syslog-message](https://github.com/verbosemode/syslog-message) library, which
encodes and decodes syslog messages from the wire format to a typed
representation.  I plugged those together and [implemented a
reporter](https://hannesm.github.io/logs-syslog/doc/Logs_syslog.html).  The
[simplest
one](https://github.com/hannesm/logs-syslog/blob/e35ffe704e998d9a6867f3f504c103861a4408ef/src/logs_syslog_unix.ml#L4-L32)
emits each log message via UDP to a log collector.  All reporters contain a
socket and handle socket errors themselves (trying to recover) - your
application (or unikernel) shouldn't fail just because the log collector is
currently offline.

The setup for Unix is straightforward:

```OCaml
Logs.set_reporter (udp_reporter (Unix.inet_addr_of_string "127.0.0.1") ())
```

It will report all log messages (you have to set the Log level yourself,
defaults to warning) to your local syslog.  You might have already listening a
collector on your host, look in `netstat -an` for UDP port 514 (and in your
`/etc/syslog.conf` to see where log messages are routed to).

You can even do this from the OCaml toplevel (after `opam install logs-syslog`):
```OCaml
$ utop
# #require "logs-syslog.unix";;
# Logs.set_reporter (Logs_syslog_unix.udp_reporter (Unix.inet_addr_of_string "127.0.0.1") ());;
# Logs.app (fun m -> m "hello, syslog world");;
```

I configured my syslog to have all `informational` messages routed to
`/var/log/info.log`, you can also try `Logs.err (fun m -> m "err");;` and look
into your `/var/log/messages`.

This is a good first step, but we want more: on the one side integration into
MirageOS, and a more reliable log stream (what about authentication and
encryption?).  I'll cover both topics in the rest of this article.

### MirageOS integration

Since Mirage3, syslog is integrated (see
[documentation](http://docs.mirage.io/mirage/Mirage/index.html#type-syslog_config)).
Some additions to your `config.ml` are needed, see [ns
example](https://github.com/hannesm/ns.nqsb.io/blob/master/config.ml) or
[marrakech
example](https://github.com/mirage/marrakech2017/blob/master/config.ml).

```OCaml
let logger =
  syslog_udp (* or _tcp or _tls *)
    (syslog_config ~truncate:1484 "my_first_unikernel"
       (Ipaddr.V4.of_string_exn "10.0.0.1")) (* your log host *)
    stack

let () =
  register "my_first_unikernel" [
    foreign ~deps:[abstract logger]
    ...
```

### Reliable syslog

The old BSD syslog RFC is obsoleted by [RFC
5424](https://tools.ietf.org/html/rfc5424), which describes a new wire format,
and also a transport over TCP, and [TLS](https://tools.ietf.org/html/rfc5425) in
a subsequent RFC.  Unfortunately the `syslog-message` library does not yet
support the new format (which supports user-defined structured data (key/value
fields), and unicode encoding), but I'm sure one day it will.

Another competing syslog [RFC 3195](https://tools.ietf.org/html/rfc3195) uses
XML encoding, but I have not bothered to look deeper into that one.

I implemented both the transport via TCP and via TLS.  There are various
solutions used for framing (as described in [RFC
6587](https://tools.ietf.org/html/rfc6587)): either prepend a decimal encoded
length (also specified in RFC6524, but obviously violates streaming
characteristics: the log source needs to have the full message in memory before
sending it out), or have a special delimiter between messages (0 byte, line
feed, CR LN, a custom byte sequence).

The [TLS
reporter](https://hannesm.github.io/logs-syslog/doc/Logs_syslog_lwt_tls.html)
uses our TLS library written entirely in OCaml, and requires mutual
authentication, both the log reporter has a private key and certificate, and the
log collector needs to present a certificate chain rooted in a provided CA
certificate.

Logs supports synchronous and asynchronous logging (where the latter is the
default, please read the [note on synchronous
logging](http://erratique.ch/software/logs/doc/Logs.html#sync)).  In logs-syslog
this behaviour is not altered.  There is no buffer or queue and single writer
task to emit log messages, but a mutex and error recovery which tries to
reconnect once for each log message (of course only if there is not already a
working connection).  It is still not clear to me what the desired behaviour
should be, but when introducing buffers I'd loose the synchronous logging (or
will have to write rather intricate code).

To rewrap, `logs-syslog` implements the old BSD syslog protocol via UDP, TCP,
and TLS.  There are reporters available using only the Caml
[Unix](https://hannesm.github.io/logs-syslog/doc/Logs_syslog_unix.html) module
(dependency-free!), using
[Lwt](https://hannesm.github.io/logs-syslog/doc/Logs_syslog_lwt.html) (also
[lwt-tls](https://hannesm.github.io/logs-syslog/doc/Logs_syslog_lwt_tls.html),
and using [MirageOS
interface](https://hannesm.github.io/logs-syslog/doc/Logs_syslog_mirage.html)
(also
[TLS](https://hannesm.github.io/logs-syslog/doc/Logs_syslog_mirage_tls.html)).
The code size is below 500 lines in total.

### MirageOS syslog in production

As collector I use syslog-ng, which is capable of receiving both the new and the
old syslog messages on all three transports.  The configuration snippet for a
BSD syslog TLS collector is as following:

```
source s_tls {
  tcp(port(6514)
      tls(peer-verify(require-trusted)
          cert-file("/etc/log/server.pem")
          key-file("/etc/log/server.key")
          ca-dir("/etc/log/certs"))); };

destination d_tls { file("/var/log/ng-tls.log"); };

log { source(s_tls); destination(d_tls); };
```

The `"/etc/log/certs"` directory contains the CA certificates, together with
links to their hashes (with a 0 appended: ``ln -s cacert.pem `openssl x509
-noout -hash -in cacert.pem`.0``).  I used
[certify](https://github.com/yomimono/ocaml-certify) to generate the CA
infrastructure (CA cert, a server certificate for syslog-ng, and a client
certificate for my MirageOS unikernel).

It is running since a week like a
charm (already collected 700KB of HTTP access log), and feels much better than
previous ad-hoc solutions to exfiltrate log data.

The downside of syslog is obviously that it only works when the network is up --
thus it does not work while booting, or when a persistent network failure
occured.

[Code is on GitHub](https://github.com/hannesm/logs-syslog), [documentation is
online](https://hannesm.github.io/logs-syslog/doc), released in opam.

I'm interested in feedback, either via
[twitter](https://twitter.com/h4nnes) or as an issue on the [data repository on
GitHub](https://github.com/hannesm/hannes.nqsb.io/issues).