152 lines
No EOL
11 KiB
Text
152 lines
No EOL
11 KiB
Text
<!DOCTYPE html>
|
|
<html xmlns="http://www.w3.org/1999/xhtml"><head><title>Exfiltrating log data using syslog</title><meta charset="UTF-8"/><link rel="stylesheet" href="/static/css/style.css"/><link rel="stylesheet" href="/static/css/highlight.css"/><script src="/static/js/highlight.pack.js"></script><script>hljs.initHighlightingOnLoad();</script><link rel="alternate" href="/atom" title="Exfiltrating log data using syslog" type="application/atom+xml"/><meta name="viewport" content="width=device-width, initial-scale=1, viewport-fit=cover"/></head><body><nav class="navbar navbar-default navbar-fixed-top"><div class="container"><div class="navbar-header"><a class="navbar-brand" href="/Posts">full stack engineer</a></div><div class="collapse navbar-collapse collapse"><ul class="nav navbar-nav navbar-right"><li><a href="/About"><span>About</span></a></li><li><a href="/Posts"><span>Posts</span></a></li></ul></div></div></nav><main><div class="flex-container"><div class="post"><h2>Exfiltrating log data using syslog</h2><span class="author">Written by hannes</span><br/><div class="tags">Classified under: <a href="/tags/mirageos" class="tag">mirageos</a><a href="/tags/protocol" class="tag">protocol</a><a href="/tags/logging" class="tag">logging</a></div><span class="date">Published: 2016-11-05 (last updated: 2024-10-11)</span><article><p>It has been a while since my last entry... I've been busy working on too many
|
|
projects in parallel, and was also travelling on several continents. I hope to
|
|
get back to a biweekly cycle.</p>
|
|
<h2 id="what-is-syslog">What is syslog?</h2>
|
|
<p>According to <a href="https://en.wikipedia.org/wiki/Syslog">Wikipedia</a>, syslog is a
|
|
standard for message logging. Syslog permits separation of the software which
|
|
generates, stores, reports, and analyses the message. A syslog message contains
|
|
at least a timestamp, a facility, and a severity. It was initially specified in
|
|
<a href="https://tools.ietf.org/html/rfc3164">RFC 3164</a>, though usage predates this RFC.</p>
|
|
<p>For a unikernel, which likely won't have any persistent storage, syslog is a way
|
|
to emit log messages (HTTP access log, debug messages, ...) via the network, and
|
|
defer the persistency problem to some other service.</p>
|
|
<p>Lots of programming languages have logger libraries, which reflect the different
|
|
severity of syslog roughly as log levels (debug, informational, warning, error,
|
|
fatal). So does OCaml since the beginning of 2016, there is the
|
|
<a href="http://erratique.ch/software/logs">Logs</a> library which separates log message
|
|
generation from reporting: the closure producing the log string is only
|
|
evaluated if there is a reporter which needs to send it out. Additionally, the
|
|
reporter can extend the message with the log source name, a timestamp, etc.</p>
|
|
<p>The Logs library is slowly getting adopted by the MirageOS community (you can
|
|
see an incomplete list
|
|
<a href="https://opam.ocaml.org/packages/logs/logs.0.6.2/">here</a>), there are reporters
|
|
available which integrate into <a href="https://github.com/mirage/ocaml-asl">Apple System
|
|
Log</a>, <a href="https://github.com/djs55/ocaml-win-eventlog">Windows event
|
|
log</a>, and also for <a href="https://github.com/mirage/mirage-logs">MirageOS
|
|
console</a>. There is a command-line
|
|
argument interface to set the log levels of your individual sources, which is
|
|
pretty neat. For debugging and running on Unix, console output is usually
|
|
sufficient, but for production usage having a console in some <code>screen</code> or <code>tmux</code>
|
|
or dumped to a file is usually annoying.</p>
|
|
<p>Gladly there was already the
|
|
<a href="https://github.com/verbosemode/syslog-message">syslog-message</a> library, which
|
|
encodes and decodes syslog messages from the wire format to a typed
|
|
representation. I plugged those together and <a href="https://hannesm.github.io/logs-syslog/doc/Logs_syslog.html">implemented a
|
|
reporter</a>. The
|
|
<a href="https://github.com/hannesm/logs-syslog/blob/e35ffe704e998d9a6867f3f504c103861a4408ef/src/logs_syslog_unix.ml#L4-L32">simplest
|
|
one</a>
|
|
emits each log message via UDP to a log collector. All reporters contain a
|
|
socket and handle socket errors themselves (trying to recover) - your
|
|
application (or unikernel) shouldn't fail just because the log collector is
|
|
currently offline.</p>
|
|
<p>The setup for Unix is straightforward:</p>
|
|
<pre><code class="language-OCaml">Logs.set_reporter (udp_reporter (Unix.inet_addr_of_string "127.0.0.1") ())
|
|
</code></pre>
|
|
<p>It will report all log messages (you have to set the Log level yourself,
|
|
defaults to warning) to your local syslog. You might have already listening a
|
|
collector on your host, look in <code>netstat -an</code> for UDP port 514 (and in your
|
|
<code>/etc/syslog.conf</code> to see where log messages are routed to).</p>
|
|
<p>You can even do this from the OCaml toplevel (after <code>opam install logs-syslog</code>):</p>
|
|
<pre><code class="language-OCaml">$ utop
|
|
# #require "logs-syslog.unix";;
|
|
# Logs.set_reporter (Logs_syslog_unix.udp_reporter (Unix.inet_addr_of_string "127.0.0.1") ());;
|
|
# Logs.app (fun m -> m "hello, syslog world");;
|
|
</code></pre>
|
|
<p>I configured my syslog to have all <code>informational</code> messages routed to
|
|
<code>/var/log/info.log</code>, you can also try <code>Logs.err (fun m -> m "err");;</code> and look
|
|
into your <code>/var/log/messages</code>.</p>
|
|
<p>This is a good first step, but we want more: on the one side integration into
|
|
MirageOS, and a more reliable log stream (what about authentication and
|
|
encryption?). I'll cover both topics in the rest of this article.</p>
|
|
<h3 id="mirageos-integration">MirageOS integration</h3>
|
|
<p>Since Mirage3, syslog is integrated (see
|
|
<a href="http://docs.mirageos.org/mirage/Mirage/index.html#type-syslog_config">documentation</a>).
|
|
Some additions to your <code>config.ml</code> are needed, see <a href="https://github.com/hannesm/ns.nqsb.io/blob/master/config.ml">ns
|
|
example</a> or
|
|
<a href="https://github.com/mirage/marrakech2017/blob/master/config.ml">marrakech
|
|
example</a>.</p>
|
|
<pre><code class="language-OCaml">let logger =
|
|
syslog_udp (* or _tcp or _tls *)
|
|
(syslog_config ~truncate:1484 "my_first_unikernel"
|
|
(Ipaddr.V4.of_string_exn "10.0.0.1")) (* your log host *)
|
|
stack
|
|
|
|
let () =
|
|
register "my_first_unikernel" [
|
|
foreign ~deps:[abstract logger]
|
|
...
|
|
</code></pre>
|
|
<h3 id="reliable-syslog">Reliable syslog</h3>
|
|
<p>The old BSD syslog RFC is obsoleted by <a href="https://tools.ietf.org/html/rfc5424">RFC
|
|
5424</a>, which describes a new wire format,
|
|
and also a transport over TCP, and <a href="https://tools.ietf.org/html/rfc5425">TLS</a> in
|
|
a subsequent RFC. Unfortunately the <code>syslog-message</code> library does not yet
|
|
support the new format (which supports user-defined structured data (key/value
|
|
fields), and unicode encoding), but I'm sure one day it will.</p>
|
|
<p>Another competing syslog <a href="https://tools.ietf.org/html/rfc3195">RFC 3195</a> uses
|
|
XML encoding, but I have not bothered to look deeper into that one.</p>
|
|
<p>I implemented both the transport via TCP and via TLS. There are various
|
|
solutions used for framing (as described in <a href="https://tools.ietf.org/html/rfc6587">RFC
|
|
6587</a>): either prepend a decimal encoded
|
|
length (also specified in RFC6524, but obviously violates streaming
|
|
characteristics: the log source needs to have the full message in memory before
|
|
sending it out), or have a special delimiter between messages (0 byte, line
|
|
feed, CR LN, a custom byte sequence).</p>
|
|
<p>The <a href="https://hannesm.github.io/logs-syslog/doc/Logs_syslog_lwt_tls.html">TLS
|
|
reporter</a>
|
|
uses our TLS library written entirely in OCaml, and requires mutual
|
|
authentication, both the log reporter has a private key and certificate, and the
|
|
log collector needs to present a certificate chain rooted in a provided CA
|
|
certificate.</p>
|
|
<p>Logs supports synchronous and asynchronous logging (where the latter is the
|
|
default, please read the <a href="http://erratique.ch/software/logs/doc/Logs.html#sync">note on synchronous
|
|
logging</a>). In logs-syslog
|
|
this behaviour is not altered. There is no buffer or queue and single writer
|
|
task to emit log messages, but a mutex and error recovery which tries to
|
|
reconnect once for each log message (of course only if there is not already a
|
|
working connection). It is still not clear to me what the desired behaviour
|
|
should be, but when introducing buffers I'd loose the synchronous logging (or
|
|
will have to write rather intricate code).</p>
|
|
<p>To rewrap, <code>logs-syslog</code> implements the old BSD syslog protocol via UDP, TCP,
|
|
and TLS. There are reporters available using only the Caml
|
|
<a href="https://hannesm.github.io/logs-syslog/doc/Logs_syslog_unix.html">Unix</a> module
|
|
(dependency-free!), using
|
|
<a href="https://hannesm.github.io/logs-syslog/doc/Logs_syslog_lwt.html">Lwt</a> (also
|
|
<a href="https://hannesm.github.io/logs-syslog/doc/Logs_syslog_lwt_tls.html">lwt-tls</a>,
|
|
and using <a href="https://hannesm.github.io/logs-syslog/doc/Logs_syslog_mirage.html">MirageOS
|
|
interface</a>
|
|
(also
|
|
<a href="https://hannesm.github.io/logs-syslog/doc/Logs_syslog_mirage_tls.html">TLS</a>).
|
|
The code size is below 500 lines in total.</p>
|
|
<h3 id="mirageos-syslog-in-production">MirageOS syslog in production</h3>
|
|
<p>As collector I use syslog-ng, which is capable of receiving both the new and the
|
|
old syslog messages on all three transports. The configuration snippet for a
|
|
BSD syslog TLS collector is as following:</p>
|
|
<pre><code>source s_tls {
|
|
tcp(port(6514)
|
|
tls(peer-verify(require-trusted)
|
|
cert-file("/etc/log/server.pem")
|
|
key-file("/etc/log/server.key")
|
|
ca-dir("/etc/log/certs"))); };
|
|
|
|
destination d_tls { file("/var/log/ng-tls.log"); };
|
|
|
|
log { source(s_tls); destination(d_tls); };
|
|
</code></pre>
|
|
<p>The <code>"/etc/log/certs"</code> directory contains the CA certificates, together with
|
|
links to their hashes (with a 0 appended: <code>ln -s cacert.pem `openssl x509 -noout -hash -in cacert.pem`.0</code>). I used
|
|
<a href="https://github.com/yomimono/ocaml-certify">certify</a> to generate the CA
|
|
infrastructure (CA cert, a server certificate for syslog-ng, and a client
|
|
certificate for my MirageOS unikernel).</p>
|
|
<p>It is running since a week like a
|
|
charm (already collected 700KB of HTTP access log), and feels much better than
|
|
previous ad-hoc solutions to exfiltrate log data.</p>
|
|
<p>The downside of syslog is obviously that it only works when the network is up --
|
|
thus it does not work while booting, or when a persistent network failure
|
|
occured.</p>
|
|
<p><a href="https://github.com/hannesm/logs-syslog">Code is on GitHub</a>, <a href="https://hannesm.github.io/logs-syslog/doc">documentation is
|
|
online</a>, released in opam.</p>
|
|
<p>I'm interested in feedback, either via
|
|
<a href="https://twitter.com/h4nnes">twitter</a> or via eMail.</p>
|
|
</article></div></div></main></body></html> |