This commit is contained in:
Hannes Mehnert 2016-05-03 14:28:48 +01:00
parent 5135b367e1
commit d3733137e1

View file

@ -14,25 +14,23 @@ mirage security advisory
## BAD RECORD MAC
A TLS alert if the [message authentication code](https://en.wikipedia.org/wiki/Message_authentication_code) is wrong. From [RFC 5246](https://tools.ietf.org/html/rfc5246): This message is always fatal and should never be observed in communication between proper implementations (except when messages were corrupted in the network).
Roughly 2 weeks ago, a user informed me that a TLS alert pops up in their browser sometimes when they read this website. Their browser reported that the [message authentication code](https://en.wikipedia.org/wiki/Message_authentication_code) was wrong. From [RFC 5246](https://tools.ietf.org/html/rfc5246): This message is always fatal and should never be observed in communication between proper implementations (except when messages were corrupted in the network).
Roughly 2 weeks ago I was informed that this alert pops up in their browser sometimes when they read this website. I tried hard, but could not reproduce, but was very worried and was eager to find the root cause (some little fear remained that it was in our TLS stack). I setup this website with some TLS-level tracing (extending the code from our [TLS handshake server](https://tls.openmirage.org)).
We tried to reproduce the issue with traces and packet captures (both on client and server side) in place from our computer labs office with no success. Later they tried from their home and after 45MB of wire data ran into this issue. Finally, evidence. Isolating the TCP flow with the alert resulted in just about 200KB of pcap (TLS ASCII trace around 650KB).
I tried hard, but could not reproduce, but was very worried and was eager to find the root cause (some little fear remained that it was in our TLS stack). I setup this website with some TLS-level tracing (extending the code from our [TLS handshake server](https://tls.openmirage.org)). We tried to reproduce the issue with traces and packet captures (both on client and server side) in place from our computer labs office with no success. Later, the user tried from their home and after 45MB of wire data, ran into this issue. Finally, evidence! Isolating the TCP flow with the alert resulted in just about 200KB of packet capture data (TLS ASCII trace around 650KB).
![encrypted alert](https://www.cl.cam.ac.uk/~hm519/encrypted-alert.png)
What is happening on the wire? After some data is successfully transferred, at some point the client sends an encrypted alert (see above). The TLS session used a RSA key exchange and I could decrypt the TLS stream with wireshark, the alert was indeed a bad record mac. Wiresharks follow SSL stream showed all client requests, but not all server responses. The TLS level trace from the server showed properly encrypted data. I tried to spot the TCP payload which caused the bad record mac, starting from the alert in the client capture (the offending TCP frame should be closely before the alert).
What is happening on the wire? After some data is successfully transferred, at some point the client sends an encrypted alert (see above). The TLS session used a RSA key exchange and I could decrypt the TLS stream with wireshark, which revealed that the alert was indeed a bad record MAC. Wireshark's "follow SSL stream" showed all client requests, but not all server responses. The TLS level trace from the server showed properly encrypted data. I tried to spot the TCP payload which caused the bad record MAC, starting from the alert in the client capture (the offending TCP frame should be closely before the alert).
![client TCP frame](https://www.cl.cam.ac.uk/~hm519/tcp-frame-client.png)
There is plaintext data which looks like a HTTP request in the TCP frame sent by the server to the client? WTF? This should never happen! The same TCP frame on the server side looked even more strange, it had an invalid checksum.
There is plaintext data which looks like a HTTP request in the TCP frame sent by the server to the client? WTF? This should never happen! The same TCP frame on the server side looked even more strange: it had an invalid checksum.
![server TCP frame](https://www.cl.cam.ac.uk/~hm519/tcp-frame-server.png)
What do we have so far? We spotted some plaintext data in a TCP frame which is part of a TLS session. The TCP checksum is invalid.
This at least explains why we were not able to reproduce from our office: usually, TCP frames with invalid checksums are dropped by the receiving TCP stack. TCP is a reliabla byte stream: the sender will retransmit TCP frames which have not been acknowledget by the recipient. Our traces are from a client behind a router doing [network address translation](https://en.wikipedia.org/wiki/Network_address_translation), which has to recompute the [TCP checksum](https://en.wikipedia.org/wiki/Transmission_Control_Protocol#Checksum_computation) because it modifies destination IP address and port. It seems like this specific router does not validate the TCP checksum before recomputing it.
This at least explains why we were not able to reproduce from our office: usually, TCP frames with invalid checksums are dropped by the receiving TCP stack, and the sender will retransmit TCP frames which have not been acknowledged by the recipient. However, this mechanism only works if the checksums haven't been changed by a well-meaning middleman to be correct! Our traces are from a client behind a router doing [network address translation](https://en.wikipedia.org/wiki/Network_address_translation), which has to recompute the [TCP checksum](https://en.wikipedia.org/wiki/Transmission_Control_Protocol#Checksum_computation) because it modifies destination IP address and port. It seems like this specific router does not validate the TCP checksum before recomputing it, so it replaced the invalid TCP checksum with a valid one.
Next steps are: what did the TLS layer intended to send? Why is there a TCP frame with an invalid checksum emitted?
@ -51,21 +49,24 @@ Looking into the TLS trace, the TCP payload in question should have started with
0090 95 38 FD AD 4A D6 35 E4 66 86 6E 03 AF 2C C9 00 .8..J.5. f.n..,..
```
The ethernet, IP, TCP header is in total 54 bytes, thus we have to compare starting at 0x0036 in the screenshot above. The first 74 bytes (till 0x007F in the screenshot, 0x0049 in the text dump) are very much the same, but then they diverge (for another 700 bytes).
The ethernet, IP, and TCP headers are in total 54 bytes, thus we have to compare starting at 0x0036 in the screenshot above. The first 74 bytes (till 0x007F in the screenshot, 0x0049 in the text dump) are very much the same, but then they diverge (for another 700 bytes).
I manually computed the TCP checksum using the TCP/IP payload from the TLS trace, and it matches the one reported as invalid. Thus, a big relief: both the TLS nor the TCP/IP stack have used the correct data. Our memory disclosure issue must be after the [TCP checksum is computed](https://github.com/mirage/mirage-tcpip/blob/1617953b4674c9c832786c1ab3236b91d00f5c25/tcp/wire.ml#L78). After this, only the [IP frame header is filled](https://github.com/mirage/mirage-tcpip/blob/1617953b4674c9c832786c1ab3236b91d00f5c25/lib/ipv4.ml#L98), the [mac addresses are put](https://github.com/mirage/mirage-tcpip/blob/1617953b4674c9c832786c1ab3236b91d00f5c25/lib/ipv4.ml#L126) into the ethernet frame, and it is then passed to [mirage-net-xen for sending](https://github.com/mirage/mirage-net-xen/blob/541e86f53cb8cf426aabdd7f090779fc5ea9fe93/lib/netif.ml#L538) via the Xen hypervisor. As mentioned [earlier](https://hannes.nqsb.io/Posts/OCaml) I'm still using mirage-net-xen release 1.4.1.
I manually computed the TCP checksum using the TCP/IP payload from the TLS trace, and it matches the one reported as invalid. Thus, a big relief: both the TLS nor the TCP/IP stack have used the correct data. Our memory disclosure issue must be after the [TCP checksum is computed](https://github.com/mirage/mirage-tcpip/blob/1617953b4674c9c832786c1ab3236b91d00f5c25/tcp/wire.ml#L78). After this:
* the [IP frame header is filled](https://github.com/mirage/mirage-tcpip/blob/1617953b4674c9c832786c1ab3236b91d00f5c25/lib/ipv4.ml#L98
* the [mac addresses are put](https://github.com/mirage/mirage-tcpip/blob/1617953b4674c9c832786c1ab3236b91d00f5c25/lib/ipv4.ml#L126) into the ethernet frame
* the frame is then passed to [mirage-net-xen for sending](https://github.com/mirage/mirage-net-xen/blob/541e86f53cb8cf426aabdd7f090779fc5ea9fe93/lib/netif.ml#L538) via the Xen hypervisor. (As mentioned [earlier](https://hannes.nqsb.io/Posts/OCaml) I'm still using mirage-net-xen release 1.4.1.)
Communication with the Xen hypervisor is done via shared memory, allocated by mirage-net-xen, access to which is granted to the hypervisor using [Xen grant tables](http://wiki.xen.org/wiki/Grant_Table). The TX protocol is implemented [here in mirage-net-xen](https://github.com/mirage/mirage-net-xen/blob/541e86f53cb8cf426aabdd7f090779fc5ea9fe93/lib/netif.ml#L84-L132), which includes allocation of a [ring buffer](https://github.com/mirage/shared-memory-ring/blob/2955bf502c79bc963a02d090481b0e8958cc0c49/lwt/lwt_ring.mli), and writing requests, waiting for responses, identified using a 16bit integer. When a response is polled from dom0, the respective page is returned into the pool of [shared pages](https://github.com/mirage/mirage-net-xen/blob/541e86f53cb8cf426aabdd7f090779fc5ea9fe93/lib/netif.ml#L134-L213), to be reused by the next packet to be transmitted.
Communication with the Xen hypervisor is done via shared memory. The memory is allocated by mirage-net-xen, which then grants access to the hypervisor using [Xen grant tables](http://wiki.xen.org/wiki/Grant_Table). The TX protocol is implemented [here in mirage-net-xen](https://github.com/mirage/mirage-net-xen/blob/541e86f53cb8cf426aabdd7f090779fc5ea9fe93/lib/netif.ml#L84-L132), which includes allocation of a [ring buffer](https://github.com/mirage/shared-memory-ring/blob/2955bf502c79bc963a02d090481b0e8958cc0c49/lwt/lwt_ring.mli). The TX protocol also has implementations for writing requests and waiting for responses, both of which are identified using a 16bit integer. When a response has arrived from the hypervisor, the respective page is returned into the pool of [shared pages](https://github.com/mirage/mirage-net-xen/blob/541e86f53cb8cf426aabdd7f090779fc5ea9fe93/lib/netif.ml#L134-L213), to be reused by the next packet to be transmitted.
Actually, instead of a whole page (4096 byte) per request/response, each page is [split into two blocks](https://github.com/mirage/mirage-net-xen/blob/541e86f53cb8cf426aabdd7f090779fc5ea9fe93/lib/netif.ml#L194-L198) (since the most common [MTU](https://en.wikipedia.org/wiki/Maximum_transmission_unit) for ethernet is 1500 bytes). The [identifier in use](https://github.com/mirage/mirage-net-xen/blob/541e86f53cb8cf426aabdd7f090779fc5ea9fe93/lib/netif.ml#L489-L490) is the grant reference, which might be unique per page, but not per block.
Instead of a whole page (4096 byte) per request/response, each page is [split into two blocks](https://github.com/mirage/mirage-net-xen/blob/541e86f53cb8cf426aabdd7f090779fc5ea9fe93/lib/netif.ml#L194-L198) (since the most common [MTU](https://en.wikipedia.org/wiki/Maximum_transmission_unit) for ethernet is 1500 bytes). The [identifier in use](https://github.com/mirage/mirage-net-xen/blob/541e86f53cb8cf426aabdd7f090779fc5ea9fe93/lib/netif.ml#L489-L490) is the grant reference, which might be unique per page, but not per block.
Thus, when two blocks are requested to be sent, the first [polled response](https://github.com/mirage/mirage-net-xen/blob/541e86f53cb8cf426aabdd7f090779fc5ea9fe93/lib/netif.ml#L398) will immediately [release](https://github.com/mirage/mirage-net-xen/blob/541e86f53cb8cf426aabdd7f090779fc5ea9fe93/lib/netif.ml#L182-L185) both into the list of free blocks. When another packet is send, the block still waiting to be sent in the ringbuffer can be reused. This leads to corrupt data being sent.
The fix was already applied [back in December](https://github.com/mirage/mirage-net-xen/commit/47de2edfad9c56110d98d0312c1a7e0b9dcc8fbf) to the master branch of mirage-net-xen, and has now been [backported to the 1.4 branch](https://github.com/mirage/mirage-net-xen/pull/40/commits/ec9b1046b75cba5ae3473b2d3b223c3d1284489d). In addition, a patch to [avoid collisions on the receiving side](https://github.com/mirage/mirage-net-xen/commit/0b1e53c0875062a50e2d5823b7da0d8e0a64dc37) has been applied to both branches (and released in versions 1.4.2 resp. 1.6.1).
What can we learn from this? Read the interface documentation (if there is any), make sure unique identifiers are really unique. Think about the lifecycle of pieces of memory.
What can we learn from this? Read the interface documentation (if there is any), and make sure unique identifiers are really unique. Think about the lifecycle of pieces of memory.
We have seen data which was supposed to be sent to the logging host in a TLS encrypted stream. This is actually the same code used in the [Piñata](http://ownme.ipredator.se) for logging, thus it might have been yours (I tried hard, but couldn't get the Piñata to leak data).
We have seen plain data in a TLS encrypted stream. The plain data was intended to be sent to the dom0 for logging access to the webserver. The [same code](https://github.com/mirleft/btc-pinata/blob/master/logger.ml) is used used in our [Piñata](http://ownme.ipredator.se), thus it might have been yours (although I tried hard and couldn't get the Piñata to leak data).
---