Now we have a cluster of three servers (truly an Apache Doris cluster, with the servers transmitting knowledge through a third-party library brpc). When transmitting knowledge between servers through TCP, we regularly encounter section retransmission issues (maybe packet loss, the reason being at the moment unknown).
By means of tcpdump packet seize, we discovered that the system has SACK enabled, and the sender will provoke retransmission solely after receiving a whole bunch of dup ACKs.
I perceive that based on the essential quick retransmission rule, the sender will provoke a quick retransmission after receiving 3 dup ACKs. I perceive that some RFC requirements permit the system to dynamically alter DupThresh, however in my tcpdump packet seize outcomes, retransmission was initiated solely after receiving 429 dup ACKs (truly, it could be RTO timeout). Is the dynamic DupThresh so massive? Or are there different retransmission guidelines? Please enlighten me!
Beneath are some screenshots of the tcpdump outcomes. I’ll submit the finished packet seize recordsdata later. Take the packet numbered 486831 for example, which is a packet despatched from server 64 to 67. Packet No. 486888 is the primary ACK from 67 to this packet:
Then 64 despatched a number of extra packets to 67. Packet No. 496991 is the second ACK for Packet No. 486831 and packet No. 486995 is the third ACK, packet No. 487045 is the fourth, and so forth:
It was not till packet No. 496896 that the server retransmitted packet No. 486831 after receiving the 430th ACK, at which greater than 7 seconds had handed because the first transmission of this knowledge packet: