As discordian folk celebrated Jake Day yesterday, decided that I've had it with random hanging userspace state-machines, stuck forever with tcp connections that are not legitimately dead, just waiting on both sides.
And since pretty much every tool can handle transient connection failures and reconnects, decided to come up with some simple and robust-enough solution to break such links without (or rather before) patching all the apps to behave.
I've used such technique to test some twisted-things in the past, so it was easy to dig scapy-automata code for doing that, though the real trick is not to craft/send FIN or RST packet, but rather to guess TCP seq/ack numbers to stamp it with.
Alas, none of the existing tools (e.g. tcpkill) seem to do anything clever in this regard.
cutter states that
There is a feature of the TCP/IP protocol that we could use to good effect here - if a packet (other than an RST) is received on a connection that has the wrong sequence number, then the host responds by sending a corrective "ACK" packet back.
But neither the tool itself nor the technique described seem to work, and I actually failed to find (or recall) any mentions (or other uses) of such corrective behavior. Maybe it was so waaay back, dunno.
Naturally, as I can run such tool on the host where socket endpoint is, local kernel has these numbers stored, but apparently no one really cared (or had a legitimate enough use-case) to expose these to the userspace... until very recently, that is.
Patching kernel to just expose stuff via /proc looks like bit of a burden as well, though an isolated module code would probably do the job well. Weird that there doesn't seem to be one of these around already, closest one being tcp_probe.c code, which hooks into tcp_recv code-path and doesn't really get seqs without some traffic either.
One interesting idea that got my attention and didn't require a single line of extra code was proposed on the local xmpp channel - to use tcp keepalives.
Sure, they won't make kernel drop connection when it's userspace that hangs on both ends, with connection itself being perfectly healthy, but every one of these carries a seq number that can be spoofed and used to destroy that "healthy" state.
Pity these are optional and can't be just turned on for all sockets system-wide on linux (unlike some BSD systems, apparently), and nothing uses these much by choice (which can be seen in netstat --timer).
And tuning keepalives to be frequent-enough seem to be a no-brainer and shouldn't have any effect on 99% of legitimate connections at all, as they probably pass some traffic every other second, not after minutes or hours.
net.ipv4.tcp_keepalive_time = 900 net.ipv4.tcp_keepalive_probes = 5 net.ipv4.tcp_keepalive_intvl = 156
(default is to send empty keepalive packet after 2 hours of idleness)
With that, tool has to run ~7 min on average to kill any tcp connection in the system, which totally acceptable, and no fragile non-portable ptrace-shellcode magic involved (at least yet, I bet it'd be much easier to do in the future).
Code and some docs for the tool/approach can be found on github.
More of the same (update 2013-08-11):
Actually, lacking some better way to send RST/FIN from a machine to itself than swapping MACs (and hoping that router is misconfigured enough to bounce packet "from itself" back) or "-j REJECT --reject-with tcp-reset" (plus a "recent" match or transient-port matching, to avoid blocking reconnect as well), countdown for a connection should be ~7 + 15 min, as only next keepalive will reliably produce RST response.
With a bit of ipset/iptables/nflog magic, it was easy to make the one-time REJECT rule, snatching seq from dropped packet via NFLOG and using that to produce RST for the other side as well.
Whole magic there goes like this:
-A conn_cutter ! -p tcp -j RETURN -A conn_cutter -m set ! --match-set conn_cutter src,src -j RETURN -A conn_cutter -p tcp -m recent --set --name conn_cutter --rsource -A conn_cutter -p tcp -m recent ! --rcheck --seconds 20\ --hitcount 2 --name conn_cutter --rsource -j NFLOG -A conn_cutter -p tcp -m recent ! --rcheck --seconds 20\ --hitcount 2 --name conn_cutter --rsource -j REJECT --reject-with tcp-reset -I OUTPUT -j conn_cutter
"recent" matcher there is a bit redundant in most cases, as outgoing connections usually use transient-range tcp ports, which shouldn't match for different attempts, but some apps might bind these explicitly.
ipset turned out to be quite a neat thing to avoid iptables manipulations (to add/remove match).
It's interesting that this set of rules handles RST to both ends all by itself if packet arrives from remote first - response (e.g. ACK) from local socket will get RST but won't reach remote, and retransmit from remote will get RST because local port is legitimately closed by then.
Current code allows to optionally specify ipset name, whether to use nflog (via spin-off scapy-nflog-capture driver) or raw sockets, and doesn't do any mac-swapping, only sending RST to remote (which, again, should still be sufficient with frequent-enough keepalives).