As you are aware, no service runs alone on some server. You always have to take into account the communication layer between your server and client services.
One of the most tricky thing a lot of people overlook is the “housekeeping” of the communication layer.
In a properly setup enterprise environment your communication layer (firewalls, routers , switches) perform behind the scenes a lot of self maintenance and cleanup jobs that may affect the application layer without the developer being aware of that.
One example is the cleanup of long running tcp connections. This is a common cleanup task that is setup usually in firewalls. At a given moment in time (midnight usually) a firewall will kill all the connections to ensure that dead, timed-out or not properly closed connections are not hogging firewall resources. The intentions are good but it may affect application layer that is not aware of that clean-up task and assumes that a connection can last forever.
The usual case when we have this assumption that a connection will last forever is in case of long running tasks like replication of data between services. To solve this Linux kernel offers a way to keep a TCP connection active even in case of prolonged inactivity. This is the so called “keepalive” feature.
In short this kernel feature ensures that a TCP connection will be kept active by simulating traffic on it so it is not marked by the communication layer as inactive. Starting from this manual entry I extracted only the most important stuff.
First there are 3 kernel parameters that control this feature:
the interval between the last data packet sent (simple ACKs are not considered data) and the first keepalive probe; after the connection is marked to need keepalive, this counter is not used any further
the interval between subsequent keepalive probes, regardless of what the connection has exchanged in the meantime
the number of unacknowledged probes to send before considering the connection dead and notifying the application layer
Note that the first two parameters are expressed in seconds, and the last is the pure number.
The easiest way to alter this parameters (and ensure they stay the same also after reboot) is to place them in /etc/sysctl.conf or in a new file under /etc/sysctl.d/ depending on your flavor of Linux.
net.ipv4.tcp_keepalive_time = 60
net.ipv4.tcp_keepalive_intvl = 5
net.ipv4.tcp_keepalive_probes = 3
Then apply the new settings using:
# sysctl -p /etc/sysctl.conf
The above parameters will cause the following behavior:
If there are 60 sec of inactivity on a TCP connection (and connection was not explicitly closed by one of the end points) the kernel will mark the connection as a candidate for keepalive. Then every 5 seconds send a data packet (probes) to simulate traffic on that connection. If more than 3 probes go unanswered then mark the connection as dead and inform the application layer of this.
Contribute to this site maintenance !
This is a self hosted site, on own hardware and Internet connection. The old, down to earth way 🙂. If you think that you found something useful here please contribute. Choose the form below (default 1 EUR) or donate using Bitcoin (default 0.0001 BTC) using the QR code. Thank you !