Review for The Power of Explicit Congestion Notification

----------------PC Summary ---------------------

The committee liked this paper for the thoroughness of the analysis, the clarity of the writing, and the significant performance gains achieved. There were concerns that the security "benefits" of only marking the SYN_ACK packets were overstated, and that the traffic mix used in the analysis might not be sufficiently representative. These issues should be addressed, and a shepherd will be assigned.

------------------Review #1---------------------

Rating: Top 50% but not top 25%

Qualification: I know a lot about this area

Summary: The paper proposes a small modification to the ECN mechanism and conducts simulations and experimentations to prove that this modification improves the performance of TCP traffic. The modification consists in setting the ECN bit in the IP header of the TCP SYN and ACK packets. Actually, the ECN bit is set in the IP header only after the two peers of the TCP connection verifies that they are both ECN capable. By setting the ECN bit in the IP header of the TCP control packets (SYN and ACK), those two control packets are marked rather than dropped at the onset of congestion (if the routers support ECN), which avoids the long timeout at the beginning of the connection and improves the performance.

Strength: A modification introduced to the ECN proposal and validated with a campaign of simulations and experimentations. The modification is shown to improve the throughput and response time.

Weakness: A small modification to an already mature protocol. The simulation scenario considered doesn't account for cases where this modification can hurt the performance. The incremental deployability of routers and servers is not studied, only that of clients is studied.

Comments: The modification proposed by the paper seems reasonable, however it is a small modification that I don't believe makes a paper in the standard of sigcomm. It is more a modification that can be the subject of an internet draft with the hope to become an RFC if the Internet committee is convinced. At the same time, I am convinced that there is a fundamental reason for not setting the ECN bit in the control packets at the beginning of a TCP connection (and I doubt the ECN designers didn't plenty think about this modification). The reason is simply not to overload the network in case the server is not ECN capable. Indeed, when the ECN bit is set and the server is not ECN capable, the SYN packet is marked rather than dropped. The server being ECN non capable, will ignore this bit set and will not echo it back to the client. Instead it will send a normal ACK. The client will continue increasing its window as if no congestion has occurred. This will overload the network in the forward direction, which is already overloaded (otherwise the router would not have marked the packet in the forward direction). I agree that this overload may not be that much, but if it happens that many SYN packets are sent, having all of them get through will penalize the other normal traffic since it will push the router in the always-drop-packets mode.

I have comments about the simulations conducted by the authors. The simulations are done in such a way that servers support ECN and only routers in the reverse direction are congested. This scenario does not allow to study the shortcoming of the side effect of the modification stated above. It is clear that if all servers are supposed to support ECN and if congestion only appears in the reverse direction, the new modification (i.e. ECN+) will show a gain over normal ECN. I suggest to the authors to ECN+consider the scenario I mentioned.

A better solution to the above problem could be the client setting the ECN bit in the TCP header and the server setting the bit in the IP header if it was ECN capable, the server being sure that the client supports ECN.

I have few other comments:

- The model proposed by the authors for the Internet congested routers is not new. It is already proposed in the following reference. I suggest to the authors to look at this reference:

Urtzi Ayesta, Kostya Avranchenkov, Eitan Altman, Chadi Barakat, Parijat Dube, " Multilevel Approach for Modeling TCP/IP", in proceedings of PFLDnet'03: workshop on Protocols for Fast Long-Distance networks, CERN-Geneva, Switzerland, February 2003.

- explain what do you mean by response time.

- what is the definition of offered load ? explain how do you compute and how do you implement it into your simulation.

- the simulation scenario needs to be better explained. a figure is necessary to explain the topology you are simulating.

- how many simulation runs do you conduct ? is there any confidence interval associated to your results ?

- how can you have RED* without ECN ? RED* marks all packets above the threshold, but if there is no ECN, what do you do ?

- footnote 6, ... in this mode." which mode ? please explain.

- you state that in figure 7 the delay is moderate. which criterion do you use to classify it as moderate ?

- in figure 11, why the second and fourth and sixth lines are different ? they should be the same since there is no ECN, if I am not wrong.

- you consider the incremental client deployability. what about incremental router deployability ? incremental server deployability?

------------------Review #2---------------------

Rating: Top 25% but not top 10%

Qualification: I have passing familiarity

Summary: The paper suggests that a major problem with ECN is that it hurts small flows, because (as defined) it doesn't protect SYN+ACK packets from being lost, leading to lots of 3 (or more) second timouts. ECN+ fixes this problem; the paper asserts that ECN+ is a way to improve any queue management scheme, including RED, and will not lead to new DoS attacks. The paper includes lots of simulation results and some simple experimental results.

Strength: The insight that a minor change to ECN can greatly improve its performance seems like a very useful result, and the paper does a good job of analyzing the situation from several different angles.

The paper shows what appears to be a good command of the topic area and of the related work (with one exception, see below).

Weakness: The paper makes some unsupported statements, and there are some questionable choices of what to simulate or experiment with that might tend to make ECN+ look better than it really is.

Lots of English grammar/punctuation/usage errors.

Comments: The authors have clearly chosen to ignore the requirements for 1-inch margins, and the request to number pages.

The paper makes several questionable statements (or at least they need to be supported by citations or other evidence): (1) "Despite [ECN] being implemented in vast majority of Internet routers and end-hosts" -- is this really true? (2) Section 4.3.1, re: RED and RED*: "Interestingly, both versions are represented in today's Internet." -- basis for this? (3) section 5.2.2, "the initial window size is two packets[7]". Is this true in reality?

Section 4.2 assumes non-persistent HTTP connections (contrasted with section 7, which does use persistent connections). How does the analysis depend on this? I would think that the use of non-persistent connections would bias the results in favor of ECN+, because this reduces the number of chances for a SYN+ACK to be lost.

The simulations in section 4 all use some sort of AQM (RED, RED*, REM, PI). Section 5 extends this to look at the no-AQM case, but only "Threshold with ECN+". I think it would be a good idea the "Threshold, no ECN" case as well, as a baseline against which to compare all of the other schemes. Maybe this is just repeating old work, but it would help to validate whether the simulations are consistent with other studies. It would also help to have one graph that shows directly the comparison between Threshold, RED, RED*, REM, PI, all with ECN+. (actually, versions at both 90% and 105% load would be helpful)

Given your results, is there every any reason to use RED*, or is one of the other AQM schemes always better, even with ECN+?

And given that the simulations should RED* to be worse than RED, I would have preferred the analysis in section 6 and the experiments in section 7 to use RED rather than RED*.

Beyond that, section 6 should probably analyze the situation where some servers use ECN+ and others use ECN. That is, since ECN+ flows will probably be successful more often than ECN flows (that is the whole point of ECN+), would this lead to unfairness against ECN flows in the same network?

Note re: incremental deployment of TCP Vegas or TCP FAST, there is some possibly relevant but hypothetical work in: Upgrading Transport Protocols Using Mobile Code Parveen Patel, Andrew Whitaker, David Wetherall, Jay Lepreau, Tim Stack 19th Symposium on Operating System Principles, Oct, 2003.

I'm a little surprised this paper didn't cite:

K. K. Ramakrishnan and Raj Jain, A Binary Feedback Scheme for Congestion Avoidance in Computer Networks, ACM Transactions on Computer Systems, Vol.8, No.2, pp. 158-181, May 1990.

which (as I understand it) was the original ECN proposal (if you don't count ICMP Source Quench, which is different in some ways).

------------------Review #3---------------------

Rating: Top 10% but not top 5%

Qualification: I know a lot about this area

Summary: This paper performs a comprehensive analysis of a rather simple enhancement to ECN (called ECN+), in which TCP SYN-ACK packets are given full ECN treatment. The results are impressive in terms of the improved latency for short TCP transfers, and some improvements in throughput also result. It is also shown that ECN+ provides good incentives for incremental deployment, in that early adoptes see immeditate improvement while laggards are not greatly penalized.

Most of the analysis is via simulation, while there is also some mathematical analysis and some testbed experiments.

Strength: Potentially significant results regarding the effectiveness of AQM and ECN.

Very thorough analysis from multiple angles using simulation, math and experiment.

Good coverage of prior work

Weakness: The actual new idea here (change to ECN treatment of TCP SYN-ACKs) is a rather modest advance.

"Proof" by simulation is likely to be a hard sell.

I thought the mathematical model was a bit too simplified.

Comments: I enjoyed this paper for several reasons. The results are striking, and helped to explain why ECN and AQM have not been as successful as one would have wished. The paper is very clear, and the analysis is admirably thorough. I felt that you did a generally good job of simulating realistic scenarios, although it may be argued that by simulating networks with only a single congested link and no reverse path congestion you may miss some important effects. I thought you did a very good job deciding which graphs to include and which to omit to meet the space restrictions.

I have a few minor comments and suggestions for improvement.

It wasn't completely clear early on whether you decided to mark the SYN packets or only the SYN-ACK packets in "ECN+". This needs to be made more explicit when you define ECN+ in Section 3.

Section 4.2 - A small nit - light *load* is the norm in the Internet backbone, not light *congestion* as you state (loads < 40% would be more common) This of course means that AQM is rarely used in the Internet backbone.

Since most of the benefits seem to come from the avoidance of un-necessary timeouts at connection establishment, I wondered what effect reducing the 3 second timeout on a dropped SYN or SYN-ACK (say, to 1.5 seconds) would have. Has this idea been proposed or studied at all?

The assumptions built into the queueing model of 5.2.2 seem pretty open to question. It was nice that you were able to show simulation to back up the model, but I wondered how much credence to place in the model. Since your small queue depths seem to line up somewhat with the results from your reference [8], perhaps you should make that connection more explicitly.

I think the claim about poor incremental deployment for TCP Vegas is over stated. I don't have the paper handy but I thought they actually showed reasonable incremental deployment.

There are a few typos. Notably:

penultimate para of Section 3.2.1:

"prevent network stability" should be "protect network stability"

Section 5.3.2

"the paste" should be "pace"

------------------Review #4---------------------

Rating: Top 25% but not top 10%

Qualification: I know a lot about this area

Summary: The main contribution of this paper is to look at using ECN beyond their use for data packets, to also use it on SYN ACKs for TCP, so as to prevent their loss during congestion, even when intelligent Active Queue Management mechanisms are used. By avoiding the loss of SYN ACKs, they avoid timeouts and exponential backoffs. The ECN specification (RFC 3168) avoided using ECN on TCP control packets (SYN and SYN-ACKS) because of security concerns. This paper makes the case that it is a good idea to use ECN at least for the SYN-ACKS, although not on SYNs thereby avoiding exposure to SYN attacks. Their claim is that because congestion is in the server-to-client direction and that there are far fewer servers, incremental deployment is worthwhile and feasible (although I am not sure).

The benefit would primarily be for short flows because of the higher proportion of SYN/SYN-ACKS. They have covered the details of reacting the ECN/CE on the SYN-ACKs properly (don't react, send the subsequent packet anyway.)

The paper focuses on web response time as the primary metric. This may be a somewhat limited view.

Overall, a paper with some reasonably thorough simulations. There are deficiencies in the metrics they examine. But the basic idea of adding ECN on SYN-ACKs is a relatively simple one. It is worthwhile to have a short paper on it, and for the standards community to seriously consider incorporating the suggestion. However, I don't see it as a significant and substantially novel contribution worthy of a Sigcomm paper.

Strength: I agree with the paper that it would really be nice to not drop TCP control packets, where possible, to improve the performance of TCP. They do a reasonably thorough analysis to demonstrate that latency is improved when TCP SYN-ACKs are not dropped by routers, especially when they implement AQM schemes. They also do a thorough job of examining what happens when a subset of the end-systems implement ECN+, and show that there is a good incentive for them to implement it.

One more interesting aspect of the paper is its examination of the utility of the various AQM mechanisms in the context of TCP. They model the queueing behavior with a relatively simple G/M/1 model, taking into account that TCP's window size limits the bursty arrival process. As such, the probability that the size of the queue at the bottleneck rises above a reasonable level even with persistently congested links is pretty small. As such, they conclude that the usefulness of AQM mechanisms beyond simple threshold based schemes and RED is not that great.

The paper makes a good case for adding ECN to SYN-ACKs. Going from no ECN to ECN there is an order of magnitude drop in the response time for the workloads they consider (including those used in the testbed). Then, going from ECN to ECN+, there is a further order of magnitude drop in response time, which is worthwhile. If there are remaining security considerations, the paper makes a good case for seeking a separate solution for them, while allowing ECN to be used for SYN-ACKs also.

Weakness: The paper deals with a relatively simple solution to a problem that probably deserves some attention. It does however only apply to client-server environments, where the servers get updated to enable ECN for SYN-ACKs. In terms of incremental deployment however, I suspect it is not that easy anymore. There are a large enough number of servers out there, that the deployment of additional functionality is just as hard to do on servers as it used to be to upgrade clients in the past.

The perform a bunch of testbed experiments, which I didn't find to be that impressive. There wasn't substantial new information to be gleaned from the testbed experiments that I didn't learn from the simulations already.

Comments: The paper suggests that RED* (which marks all packets) when using ECN is supposed to be better than ECN with RED (which only marks packets when the avg. queue is below maxth.) However, as observed in Figure 3, with just ECN (setting apart the issue of SYN-ACKS being dropped or not), the right decision is to drop packets when the avg. queue is above maxth. Thus, ECN with RED appears to be the right choice, rather than the suggestion that ECN with RED*. However, I can understand the desired change when using ECN+.

------------------Review #5---------------------

Rating: Top 50% but not top 25%

Qualification: I know the material, but am not an expert

Summary: The authors propose a variant of ECN, where the SYN+ACK packet can be ECN-marked, to avoid it being dropped (and hence slowing connection establishment). They then demonstrate that this scheme is superior to ECN when coupled with a variety of queue management disciplines, such as RED and REM.

Strength: The paper combines formal models, simulations, and an experimental testbed.

Weakness: I think some of the claims are too sweeping. Beyond that, I think that other simulations and measurements are needed.

4.1: The behavior of the Linux kernel as a router is hardly indicative of the behavior of ISP routers.

4.2: you really should have a diagram of the topology; words are ambiguous. For example, you say there is a "web-client and a web-server pool". Is the client side a pool, too, or just the server side? (Yes, I read the next sentence.)

I was disturbed that you claim repeatedly that ECN+, by only marking SYN+ACK packets instead of SYN packets, helps against DoS attacks. Your choice does indeed prevent more serious SYN floods. However, it still causes problems for DDoS attacks, since marked packets of any sort (including, of course, any ECN-marked packets) are more troublesome, since routers are less likely to drop them.

Your simulations should have been run against a background of very long-lived connections -- most of the backbone traffic these days consists of transfers of large (i.e., mp3 or video) files.

------------------Review #6---------------------

Rating: Top 10% but not top 5%

Qualification: I know the material, but am not an expert

Summary: This paper investigates the utility of Explicit Congestion Notification (ECN). It suggests using ECN on the SYN,ACK of the TCP handshake (but not the SYN!) and shows that this can very significantly improve performance for short flows characteristic of web traffic. Here "using ECN" means that the router *DOES NOT DROP* SYN,ACKs when first congested, but rather marks them using ECN. Furthermore, the client immediately ACKs a SYN,ACK and sends the first data packet (e.g., HTTP request) despite an ECN bit. The paper argues that this will not cause instability. This augmented ECN is called ECN+.

The paper then does extensive, large-scale ns-2 simulations and testbed experiments to compare ECN and ECN+ with a variety of AQM mechanisms - RED, REM, and PI; there are also some analytic results. It concludes that ECN+ is a very good thing indeed. Finally, it argues that implementing ECN+ on a server provides immediate performance improvements to those clients that also implement it (assume that routers implement ECN.) Hence, he argues for successful incremental deployability.

1. It provides comprehensive, persuasive support in theory, simulation, and experiment for the ECN+ modification of ECN.

2. It is a clean, thorough piece of work that has the potential of signficant impact on the real Internet. Its consideration of incremental deployability is crucial.

3. The graphs are actually clear, even in black@white.

1. Minor problems with English.

2. I think the author cheated on font size.

o The first sentence of the Abstract violates English grammar something awful.

o Incorrect words/phrases/constructions: "primarily security reasons indeed prevent", "is capable to", "origins ... are versatile...", "particularity for", "Not only that we show...", "non-web-only", "that not only that", "...not used... in neither ... nor...", "...distinguish between ... from ...", "are also possible to generate", "well expected", "Not only that...", "...Rs is the reminder", "more frequently congestion indications", "keep the paste with", "a significant performance differences".

o A bunch of missing articles, usually "the". o I like the concept of TCP admission control.

o "We enable server to use ECN..." should be "to set ECN...".

o Need to note: Fig 1 and discussion assumes congestion is server-> user only.

o Fig 1 says "TCP Echo" bit, but text says "ECN Echo" bit.

o Sect 3.1: it is not true that AIMD causes fair bandwidth sharing, famously.

o Sect 5 talks about "supremacy [should be superiority] of ECN+ over other AQM mechanisms". This seems confused.

o It was not obvious from Fig 13 that avg client response time improved by nearly an order of magnitude.