Shoot yourself in the foot with iptables and kmod auto-loading

As some of you might know, we had an outage yesterday. We believe that in every mistake there is something to learn from, so after each outage we are writing post-mortems. Usually we do this internally because the issues we run into are very specific to our infrastructure.

This time we ran into a quite nasty issue which could affect everyone running a linux system with a lot sessions on it and we thought you might be interested to know about that pitfall.

What happened?

At 4:40pm CEST, we got reports about Yikes (503/504 errors) on SoundCloud. Around the same time, our monitoring alerted for a high amount of 503s at our caching layer and right after that one of our L7 routing nginx instances was reported down.

We were still able to log into that system. dmesg showed:

Aug 13 14:46:52 ams-mid006.int.s-cloud.net kernel: [8623919.136122] nf_conntrack: table full, dropping packet.
Aug 13 14:46:52 ams-mid006.int.s-cloud.net kernel: [8623919.136138] nf_conntrack: table full, dropping packet.

N.B.: Our systems are set to UTC timezone

That wasn’t expected. The first thought was: “Someone must have changed the sysctl tunings for that”. Then we realized that this system has no need for connection tracking, so nf_conntrack shouldn’t be loaded at all. As a quick contermeasure we raised net.ipv4.netfilter.ip_conntrack_max. This fixed that situation and brought the service back up.

Why did it happen?

After bringing the site back up, we investigated what caused the kernel to enable connection tracking. Doing a lsmod showed that connection tracking and iptables modules were actually loaded. Another look into dmesg revealed that right before the outage the ip_tables netfilter module was loaded:

Aug 13 14:38:27 ams-mid006.int.s-cloud.net kernel: [8623415.007818] ip_tables: (C) 2000-2006 Netfilter Core Team
Aug 13 14:38:35 ams-mid006.int.s-cloud.net kernel: [8623422.444931] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)

So what happened? One of our enginners was doing some preparations for scaling that layer of our infrastructure. To verify we don’t use any specific iptable rules on that system, he did:

iptables -L iptables -t nat -L

Those commands themself are pretty harmless. They will just list configured iptables rules. The first one rules in the filter table, the second one in the nat table. Nothing which should change any system configuration, right? Nope. Let’s try to reproduce it. Just boot up some system (I’ve tried it on my Ubuntu Laptop). No iptables module should be loaded:

root@apollon:~# lsmod|grep ipt

Now just list your iptable rules:

iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination

Chain FORWARD (policy ACCEPT)
target prot opt source destination

Chain OUTPUT (policy ACCEPT)
target prot opt source destination

And check again for loaded modules:

root@apollon:~# lsmod|grep ipt
iptable_filter 12810 0
ip_tables      27473 1 iptable_filter
x_tables       29846 2 iptable_filter,ip_tables

Okay, that loaded some iptables module to make it possible to add filter rules via iptables. This shouldn’t cause any problems, since without any actual rules the impact on the kernel is negligible. But now check your nat table:

root@apollon:~# iptables -t nat -L
Chain PREROUTING (policy ACCEPT)
target prot opt source destination

Chain INPUT (policy ACCEPT)
target prot opt source destination

Chain OUTPUT (policy ACCEPT)
target prot opt source destination

Chain POSTROUTING (policy ACCEPT)
target prot opt source destination

Completely empty as well. But now look at your kernel modules:

 root@apollon:~# lsmod|grep ipt
 iptable_nat 13229 0
 nf_nat 25891 1 iptable_nat
 nf_conntrack_ipv4 19716 3 iptable_nat,nf_nat
 nf_conntrack 81926 3 iptable_nat,nf_nat,nf_conntrack_ipv4
 iptable_filter 12810 0
 ip_tables 27473 2 iptable_nat,iptable_filter
 x_tables 29846 3 iptable_nat,iptable_filter,ip_tables

By just listing your iptable rules for the nat table, the kernel loaded nf_conntrack which enabled connection tracking. See dmesg:

[75024.007681] nf_conntrack version 0.5.0 (16384 buckets, 65536 max

On your Laptop you probably don’t care – it’s even quite convenient. On a production server that handles a large number of connections the fairly small default nf_conntrack table will overflow quite fast and cause dropped connections.

How do we prevent it?

iptables doesn’t load the nf_conntrack itself, it only loads ip_tables which again loads modules it depends on via the kernel’s kmod facility.

But since that module loader uses the modprobe user-space helpers like modprobe, the auto-loading process will honour modprobe.d/ settings. Unfortunatelly there is no easy way to disable loading of a module altogether, but there is a workaround for that.

Since we don’t need iptables at all on that system, we’ve created a /etc/modprobe.d/netfilter.conf like this:

 alias ip_tables off
 alias iptable off
 alias iptable_nat off
 alias iptable_filter off
 alias x_tables off
 alias nf_nat off
 alias nf_conntrack_ipv4 off
 alias nf_conntrack off

This will make modprobe load off instead of the actual kernel module.

Trying to run any iptables command now, should now give you now:

iptables -t nat -L
FATAL: Module off not found. iptables v1.4.12: can't initialize iptables table `nat': Table does not exist (do you need to insmod?) Perhaps iptables or your kernel needs to be upgraded.