tcp连接在断网后的恢复能力
时间:2024-08-23 05:10:35 点击:198

  做项目中遇到一个问题。两台机器上用socket建立一个TCP连接,双向通信,流量很大,这时,通过在路由器上设置100%的丢包率将网络断开,这时 socket当然是发不了包,也收不了,出现大量的重传,然后,取消路由器上的设置,恢复网络,结果,TCP连接client去往server的流量正常了,但server去往client却不通,任凭你如何使劲的send,返回值就是0,而且errno为EAGAIN。

  我用tcpdump看了一下此时的包数据(tc2是server,tc1是client):

  12:08:21.020291 IP tc1.corp.com.42171 > tc2.corp.com.3003: S 4009389430:4009389430(0) win 5840

  12:08:21.020571 IP tc2.corp.com.3003 > tc1.corp.com.42171: R 0:0(0) ack 4009389431 win 0

  12:08:38.934329 IP tc2.corp.com.3903 > tc1.corp.com.3904: P 2398055392:2398056153(761) ack 2538876742 win 724

  12:08:38.934519 IP tc1.corp.com.3904 > tc2.corp.com.3903: . ack 2165 win 13756

  12:08:39.958457 IP tc1.corp.com.3904 > tc2.corp.com.3903: P 1:763(762) ack 2165 win 13756

  12:08:39.958485 IP tc2.corp.com.3903 > tc1.corp.com.3904: . ack 763 win 1448

  12:08:39.958653 IP tc1.corp.com.3904 > tc2.corp.com.3903: P 763:881(118) ack 2165 win 13756

  12:08:39.958660 IP tc1.corp.com.3904 > tc2.corp.com.3903: P 881:997(116) ack 2165 win 13756

  12:08:39.958719 IP tc2.corp.com.3903 > tc1.corp.com.3904: . ack 997 win 1448

  12:08:39.958890 IP tc1.corp.com.3904 > tc2.corp.com.3903: P 997:1114(117) ack 2165 win 13756

  12:08:39.958898 IP tc1.corp.com.3904 > tc2.corp.com.3903: P 1114:1232(118) ack 2165 win 13756

  12:08:39.958903 IP tc1.corp.com.3904 > tc2.corp.com.3903: P 1232:1349(117) ack 2165 win 13756

  12:08:39.958971 IP tc2.corp.com.3903 > tc1.corp.com.3904: . ack 1349 win 1448

  12:08:39.959141 IP tc1.corp.com.3904 > tc2.corp.com.3903: P 1349:1466(117) ack 2165 win 13756

  12:08:39.959149 IP tc1.corp.com.3904 > tc2.corp.com.3903: P 1466:1583(117) ack 2165 win 13756

  12:08:39.959154 IP tc1.corp.com.3904 > tc2.corp.com.3903: P 1583:1700(117) ack 2165 win 13756

  12:08:39.959222 IP tc2.corp.com.3903 > tc1.corp.com.3904: . ack 1700 win 1448

  tc2不发自己的数据,却只是一味的ACK从tc1传来的数据,等上半个小时,依然如此。它为什么不发呢?

  最后发现是因为我们在socket上设了TCP_NODELAY。去掉这个设置,重启程序,断网恢复以后,TCP双向正常工作。同样用tcpdump看:

  16:05:38.782427 IP tc2.corp.alimama.com.3903 > tc1.corp.alimama.com.3904: P 0:887(887) ack 1 win 26064

  16:05:38.782619 IP tc1.corp.alimama.com.3904 > tc2.corp.alimama.com.3903: . ack 3783 win 25352

  16:05:38.782634 IP tc2.corp.alimama.com.3903 > tc1.corp.alimama.com.3904: . 3783:5231(1448) ack 1 win 26064

  16:05:38.782637 IP tc2.corp.alimama.com.3903 > tc1.corp.alimama.com.3904: . 5231:6679(1448) ack 1 win 26064

  16:05:38.782890 IP tc1.corp.alimama.com.3904 > tc2.corp.alimama.com.3903: . ack 5231 win 25352

  16:05:38.782896 IP tc2.corp.alimama.com.3903 > tc1.corp.alimama.com.3904: . 6679:8127(1448) ack 1 win 26064

  16:05:38.782898 IP tc2.corp.alimama.com.3903 > tc1.corp.alimama.com.3904: . 8127:9575(1448) ack 1 win 26064

  16:05:38.782901 IP tc1.corp.alimama.com.3904 > tc2.corp.alimama.com.3903: . ack 6679 win 25352

  16:05:38.782904 IP tc2.corp.alimama.com.3903 > tc1.corp.alimama.com.3904: . 9575:11023(1448) ack 1 win 26064

  16:05:38.783183 IP tc1.corp.alimama.com.3904 > tc2.corp.alimama.com.3903: . ack 8127 win 25352

  16:05:38.783188 IP tc2.corp.alimama.com.3903 > tc1.corp.alimama.com.3904: . 11023:12471(1448) ack 1 win 26064

  16:05:38.783191 IP tc1.corp.alimama.com.3904 > tc2.corp.alimama.com.3903: . ack 9575 win 25352

  16:05:38.783193 IP tc2.corp.alimama.com.3903 > tc1.corp.alimama.com.3904: . 12471:13919(1448) ack 1 win 26064

  16:05:38.783196 IP tc1.corp.alimama.com.3904 > tc2.corp.alimama.com.3903: . ack 11023 win 25352

  16:05:38.783199 IP tc2.corp.alimama.com.3903 > tc1.corp.alimama.com.3904: . 13919:15367(1448) ack 1 win 26064

  16:05:38.783201 IP tc2.corp.alimama.com.3903 > tc1.corp.alimama.com.3904: . 15367:16815(1448) ack 1 win 26064

  16:05:38.783502 IP tc1.corp.alimama.com.3904 > tc2.corp.alimama.com.3903: . ack 12471 win 25352

  16:05:38.783506 IP tc2.corp.alimama.com.3903 > tc1.corp.alimama.com.3904: . 16815:18263(1448) ack 1 win 26064

  16:05:38.783509 IP tc1.corp.alimama.com.3904 > tc2.corp.alimama.com.3903: . ack 13919 win 25352

  16:05:38.783512 IP tc2.corp.alimama.com.3903 > tc1.corp.alimama.com.3904: . 18263:19711(1448) ack 1 win 26064

  16:05:38.783514 IP tc1.corp.alimama.com.3904 > tc2.corp.alimama.com.3903: . ack 15367 win 25352

  16:05:38.783517 IP tc2.corp.alimama.com.3903 > tc1.corp.alimama.com.3904: . 19711:21159(1448) ack 1 win 26064

  16:05:38.783519 IP tc1.corp.alimama.com.3904 > tc2.corp.alimama.com.3903: . ack 16815 win 25352

  tc2这次发自己的数据流了,tc1对其ACK,过了一段时间,tc1也开始发数据,最后双向正常。

  为什么带了TCP_NODEALY的socket,在网络好了以后恢复不了正常?

  看看recv系统调用的实现(2.6.9内核),一直追溯到tcp_recvmsg函数:

  [net/ipv4/tcp.c --> tcp_recvmsg]

  813 while (--iovlen >= 0) {

  814 int seglen = iov->iov_len;

  815 unsigned char __user *from = iov->iov_base;

  816

  817 iov++;

  818

  819 while (seglen > 0) {

  820 int copy;

  821

  822 skb = sk->sk_write_queue.prev;

  823

  824 if (!sk->sk_send_head ||

  825 (copy = mss_now - skb->len) <= 0) {

  826

#p#副标题#e#

  827 new_segment:

  828 /* Allocate new segment. If the interface is SG,

  829 * allocate skb fitting to single page.

  830 */

  831 if (!sk_stream_memory_free(sk))

  832 goto wait_for_sndbuf;

  833

  834 skb = sk_stream_alloc_pskb(sk, select_size(sk, tp),

  835 0, sk->sk_allocation);

  836 if (!skb)

  837 goto wait_for_memory;

  831行判断sndbuf里还有没有空间,如果没有,跳到wait_for_sndbuf

  [n

展开 ↓

最新游戏更多

最新软件更多

  • 玩家推荐
  • 游戏攻略

峰溢下载站 Copyright(C) 2008- ytdonghua.net All Rights Reserved!

闽ICP备2023006282号-2| 免责声明