diff --git a/Posts/Traceroute b/Posts/Traceroute new file mode 100644 index 0000000..9c24883 --- /dev/null +++ b/Posts/Traceroute @@ -0,0 +1,381 @@ +--- +title: Traceroute +author: hannes +tags: mirageos, protocol +abstract: A MirageOS unikernel which traces the path between itself and a remote host. +--- + +## Traceroute + +Is a diagnostic utility which displays the route and measures transit delays of +packets across an Internet protocol (IP) network. + +```bash +$ doas solo5-hvt --net:service=tap0 -- traceroute.hvt --ipv4=10.0.42.2/24 --ipv4-gateway=10.0.42.1 --host=198.167.222.207 + | ___| + __| _ \ | _ \ __ \ +\__ \ ( | | ( | ) | +____/\___/ _|\___/____/ +Solo5: Bindings version v0.6.5 +Solo5: Memory map: 512 MB addressable: +Solo5: reserved @ (0x0 - 0xfffff) +Solo5: text @ (0x100000 - 0x212fff) +Solo5: rodata @ (0x213000 - 0x24bfff) +Solo5: data @ (0x24c000 - 0x317fff) +Solo5: heap >= 0x318000 < stack < 0x20000000 +2020-06-22 15:41:25 -00:00: INF [netif] Plugging into service with mac 76:9b:36:e0:e5:74 mtu 1500 +2020-06-22 15:41:25 -00:00: INF [ethernet] Connected Ethernet interface 76:9b:36:e0:e5:74 +2020-06-22 15:41:25 -00:00: INF [ARP] Sending gratuitous ARP for 10.0.42.2 (76:9b:36:e0:e5:74) +2020-06-22 15:41:25 -00:00: INF [udp] UDP interface connected on 10.0.42.2 +2020-06-22 15:41:25 -00:00: INF [application] 1 10.0.42.1 351us +2020-06-22 15:41:25 -00:00: INF [application] 2 192.168.42.1 1.417ms +2020-06-22 15:41:25 -00:00: INF [application] 3 192.168.178.1 1.921ms +2020-06-22 15:41:25 -00:00: INF [application] 4 88.72.96.1 16.716ms +2020-06-22 15:41:26 -00:00: INF [application] 5 * +2020-06-22 15:41:27 -00:00: INF [application] 6 92.79.215.112 16.794ms +2020-06-22 15:41:27 -00:00: INF [application] 7 145.254.2.215 21.305ms +2020-06-22 15:41:27 -00:00: INF [application] 8 145.254.2.217 22.05ms +2020-06-22 15:41:27 -00:00: INF [application] 9 195.89.99.1 21.088ms +2020-06-22 15:41:27 -00:00: INF [application] 10 62.115.9.133 20.105ms +2020-06-22 15:41:27 -00:00: INF [application] 11 213.155.135.82 30.861ms +2020-06-22 15:41:27 -00:00: INF [application] 12 80.91.246.200 30.716ms +2020-06-22 15:41:27 -00:00: INF [application] 13 80.91.253.163 28.315ms +2020-06-22 15:41:27 -00:00: INF [application] 14 62.115.145.27 30.436ms +2020-06-22 15:41:27 -00:00: INF [application] 15 80.67.4.239 42.826ms +2020-06-22 15:41:27 -00:00: INF [application] 16 80.67.10.147 47.213ms +2020-06-22 15:41:27 -00:00: INF [application] 17 198.167.222.207 48.598ms +Solo5: solo5_exit(0) called +``` + +This means with a traceroute utility you can investigate which route is taken +to a destination host, and what the round trip time(s) on the path are. The +sample output above is taken from a virtual machine on my laptop to the remote +host 198.167.222.207. You can see there are 17 hops between us, with the first +being my laptop with a tiny round trip time of 351us, the second and third are +using private IP addresses, and are my home network. The round trip time of the +fourth hop is much higher, this is the first hop on the other side of my DSL +modem. You can see various hops on the public Internet: the packets pass from +my Internet provider's backbone across some exchange points to the destination +Internet provider somewhere in Sweden. + +The implementation of traceroute relies mainly on the time-to-live (ttl) field +(in IPv6 lingua it is "hop limit") of IP packets, which is meant to avoid route +cycles that would infinitely forward IP packets in circles. Every router, when +forwarding an IP packet, first checks that the ttl field is greater than zero, +and then forwards the IP packet where the ttl is decreased by one. If the ttl +field is zero, instead of forwarding, an ICMP time exceeded packet is sent back +to the source. + +Traceroute works by exploiting this mechanism: a series of IP packets with +increasing ttls is sent to the destination. Since upfront the length of the +path is unknown, it is a reactive system: first send an IP packet with a ttl of +one, if a ICMP time exceeded packet is returned, send an IP packet with a ttl of +two, etc. -- until an ICMP packet of type destination unreachable is received. +Since some hosts do not reply with a time exceeded message, it is crucial for +not getting stuck to use a timeout for each packet: when the timeout is reached, +an IP packet with an increased ttl is sent and an unknown for the ttl is +printed (see the fifth hop in the example above). + +The packets send out are conventionally UDP packets without payload. From a +development perspective, one question is how to correlate the ICMP packet +with the sent UDP packet. Conveniently, ICMP packets contain the IP header and +the first eight bytes of the next protocol - the UDP header containing source +port, destination port, checksum, and payload length (each fields of size two +bytes). This means when we record the outgoing ports together with the sent +timestamp, and correlate the later received ICMP packet to the sent packet. +Great. + +But as a functional programmer, let's figure whether we can abolish the +(globally shared) state. Since the ICMP packet contains the original IP +header and the first eight bytes of the UDP header, this is where we will +embed data. As described above, the data is the sent timestamp and the value +of the ttl field. For the latter, we can arbitrarily restrict it to 31 (5 bits). +For the timestamp, it is mainly a question about precision and maximum expected +round trip time. Taking the source and destination port are 32 bits, using 5 for +ttl, remaining are 27 bits (an unsigned value up to 134217727). Looking at the +decimal representation, 1 second is likely too small, 13 seconds are sufficient +for the round trip time measurement. This implies our precision is 100ns, by +counting the digits. + +Finally to the code. First we need forth and back conversions between ports +and ttl, timestamp: + +```OCaml +(* takes a time-to-live (int) and timestamp (int64, nanoseconda), encodes them + into 16 bit source port and 16 bit destination port: + - the timestamp precision is 100ns (thus, it is divided by 100) + - use the bits 27-11 of the timestamp as source port + - use the bits 11-0 as destination port, and 5 bits of the ttl +*) +let ports_of_ttl_ts ttl ts = + let ts = Int64.div ts 100L in + let src_port = 0xffff land (Int64.(to_int (shift_right ts 11))) + and dst_port = 0xffe0 land (Int64.(to_int (shift_left ts 5))) lor (0x001f land ttl) + in + src_port, dst_port + +(* inverse operation of ports_of_ttl_ts for the range (src_port and dst_port + are 16 bit values) *) +let ttl_ts_of_ports src_port dst_port = + let ttl = 0x001f land dst_port in + let ts = + let low = Int64.of_int (dst_port lsr 5) + and high = Int64.(shift_left (of_int src_port) 11) + in + Int64.add low high + in + let ts = Int64.mul ts 100L in + ttl, ts +``` + +They should be inverse over the range of valid input: ports are 16 bit numbers, +ttl expected to be at most 31, ts a int64 expressed in nanoseconds. + +Related is the function to print out one hop and round trip measurement: + +```OCaml +(* write a log line of a hop: the number, IP address, and round trip time *) +let log_one now ttl sent ip = + let now = Int64.(mul (logand (div now 100L) 0x7FFFFFFL) 100L) in + let duration = Mtime.Span.of_uint64_ns (Int64.sub now sent) in + Logs.info (fun m -> m "%2d %a %a" ttl Ipaddr.V4.pp ip Mtime.Span.pp duration) +``` + +The most logic is when a ICMP packet is received: + +```OCaml +module Icmp = struct + type t = { + send : int -> unit Lwt.t ; + log : int -> int64 -> Ipaddr.V4.t -> unit ; + task_done : unit Lwt.u ; + } + + let connect send log task_done = + let t = { send ; log ; task_done } in + Lwt.return t + + (* This is called for each received ICMP packet. *) + let input t ~src ~dst buf = + let open Icmpv4_packet in + (* Decode the received buffer (the IP header has been cut off already). *) + match Unmarshal.of_cstruct buf with + | Error s -> + Lwt.fail_with (Fmt.strf "ICMP: error parsing message from %a: %s" Ipaddr.V4.pp src s) + | Ok (message, payload) -> + let open Icmpv4_wire in + (* There are two interesting cases: Time exceeded (-> send next packet), + and Destination (port) unreachable (-> we reached the final host and can exit) *) + match message.ty with + | Time_exceeded -> + (* Decode the payload, which should be an IPv4 header and a protocol header *) + begin match Ipv4_packet.Unmarshal.header_of_cstruct payload with + | Ok (pkt, off) when + (* Ensure this packet matches our sent packet: the protocol is UDP + and the destination address is the host we're tracing *) + pkt.Ipv4_packet.proto = Ipv4_packet.Marshal.protocol_to_int `UDP && + Ipaddr.V4.compare pkt.Ipv4_packet.dst (Key_gen.host ()) = 0 -> + let src_port = Cstruct.BE.get_uint16 payload off + and dst_port = Cstruct.BE.get_uint16 payload (off + 2) + in + (* Retrieve ttl and sent timestamp, encoded in the source port and + destination port of the UDP packet we sent, and received back as + ICMP payload. *) + let ttl, sent = ttl_ts_of_ports src_port dst_port in + (* Log this hop. *) + t.log ttl sent src; + (* Sent out the next UDP packet with an increased ttl. *) + let ttl' = succ ttl in + Logs.debug (fun m -> m "ICMP time exceeded from %a to %a, now sending with ttl %d" + Ipaddr.V4.pp src Ipaddr.V4.pp dst ttl'); + t.send ttl' + | Ok (pkt, _) -> + (* Some stray ICMP packet. *) + Logs.debug (fun m -> m "unsolicited time exceeded from %a to %a (proto %X dst %a)" + Ipaddr.V4.pp src Ipaddr.V4.pp dst pkt.Ipv4_packet.proto Ipaddr.V4.pp pkt.Ipv4_packet.dst); + Lwt.return_unit + | Error e -> + (* Decoding error. *) + Logs.warn (fun m -> m "couldn't parse ICMP time exceeded payload (IPv4) (%a -> %a) %s" + Ipaddr.V4.pp src Ipaddr.V4.pp dst e); + Lwt.return_unit + end + | Destination_unreachable when Ipaddr.V4.compare src (Key_gen.host ()) = 0 -> + (* We reached the final host, and the destination port was not listened to *) + begin match Ipv4_packet.Unmarshal.header_of_cstruct payload with + | Ok (_, off) -> + let src_port = Cstruct.BE.get_uint16 payload off + and dst_port = Cstruct.BE.get_uint16 payload (off + 2) + in + (* Retrieve ttl and sent timestamp. *) + let ttl, sent = ttl_ts_of_ports src_port dst_port in + (* Log the final hop. *) + t.log ttl sent src; + (* Wakeup the waiter task to exit the unikernel. *) + Lwt.wakeup t.task_done (); + Lwt.return_unit + | Error e -> + (* Decoding error. *) + Logs.warn (fun m -> m "couldn't parse ICMP unreachable payload (IPv4) (%a -> %a) %s" + Ipaddr.V4.pp src Ipaddr.V4.pp dst e); + Lwt.return_unit + end + | ty -> + Logs.debug (fun m -> m "ICMP unknown ty %s from %a to %a: %a" + (ty_to_string ty) Ipaddr.V4.pp src Ipaddr.V4.pp dst + Cstruct.hexdump_pp payload); + Lwt.return_unit +end +``` + +Now, the remaining main unikernel is the module `Main`: + +```OCaml +module Main (R : Mirage_random.S) (M : Mirage_clock.MCLOCK) (Time : Mirage_time.S) (N : Mirage_net.S) = struct + module ETH = Ethernet.Make(N) + module ARP = Arp.Make(ETH)(Time) + module IPV4 = Static_ipv4.Make(R)(M)(ETH)(ARP) + module UDP = Udp.Make(IPV4)(R) + + (* Global mutable state: the timeout task for a sent packet. *) + let to_cancel = ref None + + (* Send a single packet with the given time to live. *) + let rec send_udp udp ttl = + (* This is called by the ICMP handler which successfully received a + time exceeded, thus we cancel the timeout task. *) + (match !to_cancel with + | None -> () + | Some t -> Lwt.cancel t ; to_cancel := None); + (* Our hop limit is 31 - 5 bit - should be sufficient for most networks. *) + if ttl > 31 then + Lwt.return_unit + else + (* Create a timeout task which: + - sleeps for --timeout interval + - logs an unknown hop + - sends another packet with increased ttl + *) + let cancel = + Lwt.catch (fun () -> + Time.sleep_ns (Duration.of_ms (Key_gen.timeout ())) >>= fun () -> + Logs.info (fun m -> m "%2d *" ttl); + send_udp udp (succ ttl)) + (function Lwt.Canceled -> Lwt.return_unit | exc -> Lwt.fail exc) + in + (* Assign this timeout task. *) + to_cancel := Some cancel; + (* Figure out which source and destination port to use, based on ttl + and current timestamp. *) + let src_port, dst_port = ports_of_ttl_ts ttl (M.elapsed_ns ()) in + (* Send packet via UDP. *) + UDP.write ~ttl ~src_port ~dst:(Key_gen.host ()) ~dst_port udp Cstruct.empty >>= function + | Ok () -> Lwt.return_unit + | Error e -> Lwt.fail_with (Fmt.strf "while sending udp frame %a" UDP.pp_error e) + + (* The main unikernel entry point. *) + let start () () () net = + let cidr = Key_gen.ipv4 () + and gateway = Key_gen.ipv4_gateway () + in + let log_one = fun port ip -> log_one (M.elapsed_ns ()) port ip + (* Create a task to wait on and a waiter to wakeup. *) + and t, w = Lwt.task () + in + (* Setup network stack: ethernet, ARP, IPv4, UDP, and ICMP. *) + ETH.connect net >>= fun eth -> + ARP.connect eth >>= fun arp -> + IPV4.connect ~cidr ~gateway eth arp >>= fun ip -> + UDP.connect ip >>= fun udp -> + let send = send_udp udp in + Icmp.connect send log_one w >>= fun icmp -> + + (* The callback cascade for an incoming network packet. *) + let ethif_listener = + ETH.input + ~arpv4:(ARP.input arp) + ~ipv4:( + IPV4.input + ~tcp:(fun ~src:_ ~dst:_ _ -> Lwt.return_unit) + ~udp:(fun ~src:_ ~dst:_ _ -> Lwt.return_unit) + ~default:(fun ~proto ~src ~dst buf -> + match proto with + | 1 -> Icmp.input icmp ~src ~dst buf + | _ -> Lwt.return_unit) + ip) + ~ipv6:(fun _ -> Lwt.return_unit) + eth + in + (* Start the callback in a separate asynchronous task. *) + Lwt.async (fun () -> + N.listen net ~header_size:Ethernet_wire.sizeof_ethernet ethif_listener >|= function + | Ok () -> () + | Error e -> Logs.err (fun m -> m "netif error %a" N.pp_error e)); + (* Send the initial UDP packet with a ttl of 1. This entails the domino + effect to receive ICMP packets, send out another UDP packet with ttl + increased by one, etc. - until a destination unreachable is received, + or the hop limit is reached. *) + send 1 >>= fun () -> + t +end +``` + +The configuration (`config.ml`) for this unikernel is as follows: + +```OCaml +open Mirage + +let host = + let doc = Key.Arg.info ~doc:"The host to trace." ["host"] in + Key.(create "host" Arg.(opt ipv4_address (Ipaddr.V4.of_string_exn "141.1.1.1") doc)) + +let timeout = + let doc = Key.Arg.info ~doc:"Timeout (in millisecond)" ["timeout"] in + Key.(create "timeout" Arg.(opt int 1000 doc)) + +let ipv4 = + let doc = Key.Arg.info ~doc:"IPv4 address" ["ipv4"] in + Key.(create "ipv4" Arg.(required ipv4 doc)) + +let ipv4_gateway = + let doc = Key.Arg.info ~doc:"IPv4 gateway" ["ipv4-gateway"] in + Key.(create "ipv4-gateway" Arg.(required ipv4_address doc)) + +let main = + let packages = [ + package ~sublibs:["ipv4"; "udp"; "icmpv4"] "tcpip"; + package "ethernet"; + package "arp-mirage"; + package "mirage-protocols"; + package "mtime"; + ] in + foreign + ~keys:[Key.abstract ipv4 ; Key.abstract ipv4_gateway ; Key.abstract host ; Key.abstract timeout] + ~packages + "Unikernel.Main" + (random @-> mclock @-> time @-> network @-> job) + +let () = + register "traceroute" + [ main $ default_random $ default_monotonic_clock $ default_time $ default_network ] +``` + +And voila, that's all the code. If you copy it together (or download the two +files from [the GitHub repository](https://github.com/roburio/traceroute)), +and have OCaml, opam, and [mirage (>= 3.8.0)](https://mirage.io/wiki/install) installed, +you should be able to: +```bash +$ mirage configure -t hvt +$ make depend +$ make +$ solo5-hvt --net:service=tap0 -- traceroute.hvt ... +... get the output shown at top ... +``` + +Enhancements may be to use a different protocol (TCP? or any other protocol ID (may be used to encode more information), encode data into IPv4 ID, or the full 8 bytes of the upper protocol), encrypt/authenticate the data transmitted (and verify it has not been tampered with in the ICMP reply), improve error handling and recovery, send multiple packets for improved round trip time measurements, ... + +If you develop enhancements you'd like to share, please sent a pull request to the git repository. + +Motivation for this traceroute unikernel was while talking with [Aaron](https://twitter.com/networkservice) and [Paul](https://github.com/phaer), who contributed several patches to the IP stack which pass the ttl through. + +If you want to support our work on MirageOS unikernels, please [donate to robur](https://robur.coop/Donate). I'm interested in feedback, either via [twitter](https://twitter.com/h4nnes), [hannesm@mastodon.social](https://mastodon.social/@hannesm) or an issue on the [data repository](https://github.com/hannesm/hannes.nqsb.io/issues).