TCP backlog

先贴TCP状态机 tcp

服务端在TCP接收新连接三次握手的时候会经历一个中间状态sync_recv,来表示已收到客户端syn包,并且已回syn+ack,当收到客户端的ack之后,状态才会改为established。这中间内核需要几个队列来处理处于这中间状态的连接。一个sync队列,一个accept队列

  • sync队列

用来存放已收到客户端syn包,但是服务端还未回syn+ack的连接。这个队列大小不能由应用程序控制,可以在/proc/sys/net/ipv4/tcp_max_syn_backlog里全局设置

  • accept队列

用来存放已经建立(已接收客户端ack)的连接,这里的连接可以交给上层accept函数。队列大小由min(backlog, somaxconn)决定。其中,backlog是listen中应用程序传入的参数;somaxconn是内核参数/proc/sys/net/core/somaxconn指定的值

有队列就肯定有满了之后溢出的情况,查看两个队列的方法

1
2
3
# netstat -s | grep listen -i
    113 times the listen queue of a socket overflowed
    113 SYNs to LISTEN sockets dropped

第一行表示有113次accept队列溢出,第二行表示有113次sync队列溢出

以backlog=2为例,查看具体某个端口的队列使用情况

1
2
3
4
5
6
# ss -nat | grep 8888
LISTEN     3      2                         *:8888                     *:*
ESTAB      0      0            192.168.21.231:8888        192.168.21.130:60069
SYN-RECV   0      0            192.168.21.231:8888        192.168.21.130:61266
ESTAB      0      0            192.168.21.231:8888        192.168.21.130:60070
ESTAB      0      0            192.168.21.231:8888        192.168.21.130:60068

可以看到3个ESTABLISHED的连接,一个SYN_RECV连接处于sync队列中

当accept队列满了之后,收到客户端三次握手最后一个ack时,会根据/proc/sys/net/ipv4/tcp_abort_on_overflow的值来决定后续处理。

tcp_abort_on_overflow=0时,忽略客户端ack,服务端定时重发syn+ack

1
2
3
4
5
6
7
14:30:02.943805 IP 192.168.21.130.60062 > 192.168.21.231.8888: Flags [S], seq 1666001410, win 65535, options [mss 1460,nop,wscale 6,nop,nop,TS val 524108570 ecr 0,sackOK,eol], length 0
14:30:02.944141 IP 192.168.21.231.8888 > 192.168.21.130.60062: Flags [S.], seq 2229961269, ack 1666001411, win 28960, options [mss 1460,sackOK,TS val 59931201 ecr 524108570,nop,wscale 7], length 0
14:30:02.946989 IP 192.168.21.130.60062 > 192.168.21.231.8888: Flags [.], ack 1, win 2058, options [nop,nop,TS val 524108652 ecr 59931201], length 0
14:30:03.967467 IP 192.168.21.231.8888 > 192.168.21.130.60062: Flags [S.], seq 2229961269, ack 1666001411, win 28960, options [mss 1460,sackOK,TS val 59931304 ecr 524108652,nop,wscale 7], length 0
14:30:04.000055 IP 192.168.21.130.60062 > 192.168.21.231.8888: Flags [.], ack 1, win 2058, options [nop,nop,TS val 524109689 ecr 59931201], length 0
14:30:06.047440 IP 192.168.21.231.8888 > 192.168.21.130.60062: Flags [S.], seq 2229961269, ack 1666001411, win 28960, options [mss 1460,sackOK,TS val 59931512 ecr 524109689,nop,wscale 7], length 0
14:30:06.049674 IP 192.168.21.130.60062 > 192.168.21.231.8888: Flags [.], ack 1, win 2058, options [nop,nop,TS val 524111747 ecr 59931201], length 0

重试次数由/proc/sys/net/ipv4/tcp_synack_retries决定,默认5,重试的间隔时间从1s开始,下次的重试间隔时间是前一次的双倍,5次的重试时间间隔为1s, 2s, 4s, 8s, 16s,总共31s,第5次发出后还要等32s都知道第5次也超时了,所以,总共需要 1s + 2s + 4s+ 8s+ 16s + 32s = 63s,服务端才会断开这个连接

tcp_abort_on_overflow=1时,直接回复客户端rst

1
2
3
4
14:31:27.329427 IP 192.168.21.130.60071 > 192.168.21.231.8888: Flags [S], seq 3194421208, win 65535, options [mss 1460,nop,wscale 6,nop,nop,TS val 524192788 ecr 0,sackOK,eol], length 0
14:31:27.329722 IP 192.168.21.231.8888 > 192.168.21.130.60071: Flags [S.], seq 1085891051, ack 3194421209, win 28960, options [mss 1460,sackOK,TS val 59939640 ecr 524192788,nop,wscale 7], length 0
14:31:27.332744 IP 192.168.21.130.60071 > 192.168.21.231.8888: Flags [.], ack 1, win 2058, options [nop,nop,TS val 524192791 ecr 59939640], length 0
14:31:27.333010 IP 192.168.21.231.8888 > 192.168.21.130.60071: Flags [R], seq 1085891052, win 0, length 0

有一点不是很明白,上文中,ss查到的RECV-Q长度为3,但是backlog传进去的是2。多次实验发现,max(RECV-Q) = backlog + 1,不是很理解这种现象,kernel里也没找到相关代码,求大佬解答。