I like this paper, but the authors have not convinced me that exponential backoff needs to be removed.
The core argument made by the authors is that as long as the packet conservation principle is followed, exponential backoff is not necessary for ensuring stability. In other words, if the RTO is really a good upper bound on RTT even in periods of heavy congestion, then there is no need to increase it further, even if packets are dropped.
The authors deserve high marks for boldness: they question one of the basic mechanisms that (supposedly) ensures the stability of the internet.
However, showing that exponential backoff is unnecessary is not sufficient to justify its removal.
There are three problems.
First, as the authors themselves show, flows that don’t use exponential backoff can impact the performance of “traditional” TCP flows (yes, even the TCP*(3) version can increase the response time of o\vanilla TCP by 20-40%).
(*) We agree with the reviewer’s concern. However, given that it is not possible to upgrade all TCP stacks to TCP($\inf$) overnight, a two-step deployment process looks most promising to us. If the above degradation is an issue, it can be effectively addressed by applying a more than a 2 step approach. Still, the key point is that a realistic incremental deployment of the proposed change is feasible. We address this issue by adding text in Section 6.
Second, it is not clear that by removing exponential backoff brings significant benefit. Even in Figure 3, where the authors claim that the performance improvement is “dramatic”, only about 1% of the flows see any improvement in response time (later, the authors themselves point this out in Fig 4).
(*) Figure 4 does not tell that only 1% of the flows see any improvement in response time. It rather compares 99-th percentile of response times for TCP and TCP($\inf$) stack. Furthermore, the key conclusion from the figure is that in a dynamic network environment dominated by short flows, admitting new flows while not serving already admitted flows (because they are moved into long pause due to exponential backoff mechanism) degrades the overall response times of all flows.
In general, we agree that the performance improvement is not the main issue with respect to our paper. In Section 8, conclusions, we explicitly address this issue by stating that “Our key contribution is not in bringing massive performance improvements, but rather in radically increasing our knowledge about congestion control fundamentals.”
Third, given that we currently believe that this mechanisms is essential for the stability of the Internet, more evaluation is needed. Specifically, direct evaluation in the wide area internet would have been very useful (e.g. using planetlab).
(*) There are three issues here. First, we cannot modify the TCP stack of planetlab machines to deploy TCP($\inf$) stack. Second, even if we use user-level TCP stack for our experiments, exponential backoff has no impact on response time unless the load on the bottleneck link is more that 90-95%, and creating this much load on the paths between the planetlab nodes requires cross-traffic of the order of hundreds of Mbps. Third, it is not possible to explain the observation in a setup where we have absolutely no clue about the cross traffic whether the improvement we would see is due to TCP($\inf$) stack or due to TCP($\inf$) starving TCP cross traffic. Thus, we evaluate TCP($\inf$)in a controlled yet realistic setup with kernel-level TCP implementation, realistic flow size distribution and heterogeneous RTT distribution.
In short, this is an interesting, though-provoking and well-written paper, but I am not yet ready to dump the good old exponential backoff!
Here's my take on "Removing Exponential Backoff from TCP" by Mondal and Kuzmanovic.
They make an interesting argument that exponential backoff was adopted in haste to solve an immediate problem and thus never underwent a rigorous review. They further argue that upon review, it doesn't appear to be appropriate. The paper raises some intriguing questions, I wish I could have devoted more time to this review.
My inclination would be accept.
Reasons to accept:
- Paper is mostly well-written, with only a few notable lapses in places.
- The authors are willing to revisit and re-examine one of the long-held tenants of TCP.
- They provide a reasonable background explanation of the current status-quo.
- They make reasonable, if limited, test studies of their proposed alternatives.
- They report on actual experiments in a test network instead of relying solely on simulation results.
Reasons to reject:
- While they put a lot of effort into making their claim that TCP's exponential backoff behavior is not needed (at least with regards to preventing congestive collapse), they put very little effort into explaining why it is harmful. Mostly they argue that it is not provably appropriate.
(*) In Section 4.2, we show that exponential backoff can degrade the response times of short flows by an order of magnitude due to unnecessary backoff during light-to-moderate congestion scenarios. In Section 4.4, we show that in a dynamic network environment admitting new flows into the system while not serving the already admitted flows (as they are moved to long pause due to exponential backoff) degrades the overall response times of all flows.
- The third of their three "rationales for revision" is very muddled. They make a statement about current Internet characteristics, then say that their objection to TCP's exponential backoff algorithm is independent of current Internet chrematistics, and this is somehow a rationale for revision? Instead, their objections to TCP's exponential backoff (as exhibited in rationales one and two) are the actual rationales.
(*) We change the text in the manuscript.
“Moreover, finite flow sizes, dynamic network environment ….”
- What they call the "implicit packet conversion principle" is fairly well-known, although it's not entirely clear whether or not they are claiming to be the first to have made this observation.
(*) As we explain in Section 7, the “implicit packet conservation principle” is closely related to the so-called request conservation principle from reference , which is still an explicit conservation approach. The key difference is that we show that a similar approach can effectively be applied to network bandwidth as a resource. We agree with the reviewer that from the congestion control perspective, the “implicit packet conservation principle” is fairly simple. Still, our key contribution lies in providing an important implication of this principle, which is that the TCP’s exponential backoff mechanism can be removed.
- When making a strong claim against a long-held piece of conventional wisdom, it behooves one to have done extensive studies that support your argument. While they did do some test studies of limited topologies in their lab and some larger ones in simulation, more studies are needed covering a wider set of situations. Some of their experiments, such as the measurement of the degree to which exponential backoff happens in today's Internet, were done on questionably small scales.
(*) We have done extensive evaluation in Emulab with kernel-level TCP implementation, and simulations with large complex topologies, realistic traffic distribution, TCP variants, queuing disciplines, and range of bottleneck rates to justify our claim. We believe that our work will spur further research in this direction.
Regarding the observation that the measurement of the degree to which exponential backoff happens in today’s Internet is small in scale, note that this is not the case. As explained on page 4, we probed about 75,000 distinct web servers in the Internet.
The paper poses an intriguing question: is the exponential backoff in TCP really necessary? it correctly identifies that what is minimally needed for stability is not exponential backoff but packet conservation. It then shows using simulation and emulation that hell does not break loose and performance improves under certain scenarios.
The exploration, however, is by no means slam dunk. There are several issues that are left unexplored. This is perhaps unavoidable for a CCR submission, especially because the question is a fundamental one. But I do hope that you’ll explore this question further in more detail. For instance, one downside of what you propose is the time it takes for the network to recover after heavy congestion. While both exponential backoff and packet conservation produce stability, the former presumably leads to faster recovery. Another issue, which I think you do allude to in the paper, is what happens when RTT changes all of a sudden and RTO < RTT. This can happen, for instance, when routing changes over to a congested path.
(*) There are two issues here. First, we agree that exponential backoff leads to faster recovery, and quickly clears the congestion. However since no admission control is deployed in the current Internet, while the already admitted flows run into long backoff, new flows can enter the system. Thus, during persistently overloaded scenarios exponential backoff degrades the overall response times of all the flows. We explored this issue in depth in Sections 4.1 and 4.4. Second, regarding the issue when RTT changes all of a sudden and RTO
While the presentation quality of the paper is good, some of the claims should be toned down or backed up properly. For instance, i don’t understand how your work “opens the doors for evaluating other well-accepted pieces of TCP congestion control.” Also, your point about incremental deployability, which you bring up prominently in several places, is a bit ludicrous. You essentially say that because one big step is too big, lets have two small steps. What is the transition plan? It seems that you’ll have TCP and TCP* operating simultaneously. (And I would have liked to see that the two small steps are small enough.)
(*) Regarding the first issue raised by the reviewer, i.e., the statement that our work “opens the doors for evaluating other well-accepted pieces of TCP congestion control,” we cite reference . In  the authors question the need for the TCP startup phase, which is another well-accepted piece of TCP’s congestion control algorithm. Our paper questions another well-accepted piece of congestion control exponential backoff, and hence we hope that this trend might encourage others to revisit other pieces of TCP’s congestion control.
Regarding the incremental deployability issue, a similar concern has been raised by reviewer-I. However, given that it is not possible to upgrade all TCP stacks to TCP($\inf$) overnight, a two-step deployment process looks most promising to us. However, if necessary a more than 2 step approach can be adopted. Still, the key point is that a realistic incremental deployment of the proposed change is feasible. We change the text in Section 6 to address this issue.
Finally, how does your work relate to various high-speed TCP strains and TCP over wireless? Like you, both try to reduce the impact of individual losses on the sending rate of TCP.
(*) Both our work and work related to high speed TCP and TCP over wireless are same in spirit -- the commonality lies in promoting a more agile endpoint behavior.