Review for Low-Rate TCP-Targeted Denial of Service Attacks (The Shrew vs. the Mice and Elephants)



Review #1

1. What is your familiarity with this area?:
I am well-versed in this area, but it is not my current specialty

2. Overall Evaluation. In your estimation, how would you rank this paper with
respect to others that have been submitted to Sigcomm this year (and in
recent years, if you're familiar with those)? If you're unfamiliar with
Sigcomm submissions, how would you rank this paper w.r.t. papers that are
submitted to a broad range of relatively selective
networking conferences?:
Top 5% of submitted papers

3. Is the *particular* problem(s) or issue(s) addressed by this paper
important and/or interesting? (1-3 sentences):
Yes. The authors show that TCP flows can be induced to synchronized by a low
average-rate square wave flow.

4. Are the *results* of this paper significant? Will anyone benefit from it?
Will it be used by others in their research? Does it open up new areas or
resolve an important open issue? (1-3 sentences):
Yes. It should cause one to re-think TCP's congestion-control that relies on
congestion-recovery.

5. Does the work in the paper push into a new area, or is it in a traditional
area? (1 sentence):
It is an interesting twist of a very traditional area.

6. What are the most important reasons to accept this paper, in order of
importance? Say whether these reasons dominate the reasons below.
(1-3 sentences):
It shows that TCP flows can be shut off due to TCP's timeout mechanism. The
authors show that this is the case even when there is a large number of TCP
flows, when these flows have varying RTTs, when the attack flow is distorted
by interferring traffic, when RED is in use, and when different variants of
TCP ranging from Reno, NewReno, to SACK are in use. The authors describe the
attack mechanism analytically and presents empirical data to back up the
analysis.

7. What are the most important reasons NOT to accept this paper, in order of
importance? (e.g., the paper has serious technical mistakes, isn't novel,
doesn't demonstrate its point by proofs, simulations or experiments, makes
very unreasonable assumptions, etc.) If the overall conclusions are still
likely to hold despite these flaws, please say so. Say whether these reasons
dominate the reasons above. (1-3 sentences):
It will give ideas to hackers everywhere.

8. Detailed comments on the paper (primarily for the authors):
I am surprised that RED fails to counter the attack. I understand why it
fails to throttle the DoS stream. However, it should have prevented the
synchronized window-drop of existing TCP flows due to its "Early" detection.
Unless the DoS stream manage to take RED from below min_th to max_th in a
step function?

-----
(*) The reviewer correctly points out that the attacker needs to quickly fill
the buffer from min_th to max_th (ideally in a step function) in order to
make the attack effective against RED. However, since this is not possible in
reality, RED manages to only (marginally) randomize TCP flows, as shown in
Figure 9(b). We further address this issue in the text of Section 7.1:
``Thus, while RED and RED-PD's randomization has lessened the severity of
the null, the DoS attack remains effective overall.''
-----

Would a congestion control algorithm that does not operate so near the cliff
of the load/delay power curve be the appropriate counter to this attack?

-----
(*) It is possible that a congestion control algorithm that would operate
closer to the knee of the load/delay power curve would decrease the control
time-scales for end-users and would give more freedom to a
counter-DoS technique to detect malicious flows. However, to make this
strategy successful, all end-hosts in the Internet would have to use this
algorithm. Also, it would be problematic to incrementally deploy such an
algorithm due to unfairness problems of ``knee'' vs. ``cliff'' algorithms
operating in the same network. In any case, we find the suggestion
interesting, yet beyond the scope of this paper.
-----



Review #2

1. What is your familiarity with this area?:
I am well-versed in this area, but it is not my current specialty

2. Overall Evaluation. In your estimation, how would you rank this paper with
respect to others that have been submitted to Sigcomm this year (and in
recent years, if you're familiar with those)? If you're unfamiliar with
Sigcomm submissions, how would you rank this paper w.r.t. papers that are
submitted to a broad range of relatively selective networking conferences?:
Top 10% but not top 5% of submitted papers

3. Is the *particular* problem(s) or issue(s) addressed by this paper
important and/or interesting? (1-3 sentences):
I believe so. Security of the Internet is obviously of some concern, but has
been heightened recently. Understanding the vulnerabilities of its most
frequently used protocol seems important.

4. Are the *results* of this paper significant? Will anyone benefit from it?
Will it be used by others in their research? Does it open up new areas or
resolve an important open issue? (1-3 sentences):
Yes, I believe the results are useful and interesting. It suggests that a
low-rate attack on TCP can be effective and difficult to detect. There are
countermeasures described in the paper (and others one could imagine) which
could be useful to others. I'm not clear it opens up a new area, but it does
seem to contribute to an opening area.

5. Does the work in the paper push into a new area, or is it in a traditional
area? (1 sentence):
Relatively new.. Perhaps not "brand new", but new in the last few years at
least.

6. What are the most important reasons to accept this paper, in order of
importance? Say whether these reasons dominate the reasons below. (1-3
sentences):
It points out vulnerabilities in TCP. These are presumably known already
(at least in the abstract), but the authors have constructed details.

It points out that these vulnerabilities can be exploited in a way that is
difficult to detect.

It suggests some countermeasures.

I believe these reasons dominate #7.

7. What are the most important reasons NOT to accept this paper, in order of
importance? (e.g., the paper has serious technical mistakes, isn't novel,
doesn't demonstrate its point by proofs, simulations or experiments, makes
very unreasonable assumptions, etc.) If the overall conclusions are still
likely to hold despite these flaws, please say so. Say whether these reasons
dominate the reasons above. (1-3 sentences):
The biggest issue I have is whether there would in fact be ways of detecting
the attack profile. This doesn't bother me profoundly, but with more room
the authors could certainly go into this in more detail.

-----
(*) We have studied two types of mechanisms: both the router-based and an
end-host based solutioin. We agree that the development of prevention mechanisms
that detect malicious low-rate flows remains an important area for future
research, but find it to be beyond the scope of this work.
-----

8. Detailed comments on the paper (primarily for the authors):

I don't have much to say regarding detailed comments. After reading this
paper I was wondering whether modest modifications of the current schemes
(e.g. RED variants) could be used [e.g. using a shorter term detector;
something that inspected flows in other ways] would solve the problem.
Or, if you had some form of enforced pacing at the network edge(s) whether
you would solve the problem. The fact that these questions arise is not
particularly a negative reflection on the paper.

-----
(*) We discuss the use of algorithms that use very short time scales to
detect malicious flows in Section 7.1. We believe that such solutions are
suitable for a homogeneous-RTT high-speed network environments and are not
generally applicable in the Internet due to RTT heterogeneity. We improve
the text in Section 7.1. to address this issue more clearly.
-----




Review #3

1. What is your familiarity with this area?:
This is my area

2. Overall Evaluation. In your estimation, how would you rank this paper with
respect to others that have been submitted to Sigcomm this year (and in
recent years, if you're familiar with those)? If you're unfamiliar with
Sigcomm submissions, how would you rank this paper w.r.t. papers that are
submitted to a broad range of relatively selective networking conferences?:
Top 10% but not top 5% of submitted papers

3. Is the *particular* problem(s) or issue(s) addressed by this paper
important and/or interesting? (1-3 sentences):
Yes, frequency transients in network protocols represent an interesting new
source of vulnerability.

4. Are the *results* of this paper significant? Will anyone benefit from it?
Will it be used by others in their research? Does it open up new areas or
resolve an important open issue? (1-3 sentences):
Yes, the authors sucessfully demonstrate that it is possible to leverage the
frequency response of TCP long-term congestion control to deny wervice with
minimal effort on the part of the attacker (making it harder to detect an
attack and easier to mount).

5. Does the work in the paper push into a new area, or is it in a traditional
area? (1 sentence):
While DoS is not a new area, this kind of attack is new. I suspect that this
paper would spur additional additional work.

6. What are the most important reasons to accept this paper, in order of
importance? Say whether these reasons dominate the reasons below.
(1-3 sentences):
Vulnerabilities caused by frequency response dynamics represent an
interesting new class of vulnerabilities. This paper is a well written and
easy to follow demonstration of such vulnerabilities in TCP. It is
particularly nice how the authors areable to _induce_ syncrhonizaiton to
support their attack.

7. What are the most important reasons NOT to accept this paper, in order of
importance? (e.g., the paper has serious technical mistakes, isn't novel,
doesn't demonstrate its point by proofs, simulations or experiments, makes
very unreasonable assumptions, etc.) If the overall conclusions are still
likely to hold despite these flaws, please say so. Say whether these reasons
dominate the reasons above. (1-3 sentences):
Not sure, I like this paper. It is not super flashy and if you're aren't
convincedthat this style of attack represents a distinct threat (i.e. has
advantages over just flooding) then it is less interesting. An obvious
criticsm along these lines is that network worms allow an attacker to
compromisethousands ofmachines and therefore an attacker can mount very large
attacks and does not require any stealth.

8. Detailed comments on the paper (primarily for the authors):
Minor comments:
1) While its clear that you guys understand fast-retransmit you should be
clear, up front, that you need to trigger enough losses (or have small enough
windows) that an RTO is caused. This is confusing in the first page.

----
(*) We have modified the text on page 1 to describe this issue more clearly.
----


2) There are two reasons why the attack is effective:
a) You force RTOs to be syncrhonized across flows
b) It takes each flow some time to regain their fair share of the network
bandwidth
It would be interesting to know how much each of these issue contributes to
the overall effectiveness of the attacks and which defenses help each aspect.
For example, the randomized RTO clealry only impacts a). Similarly, if we
deployed XCP, that might address point #b effectively, but would still allow
for synchronized timeouts.

-----
(*) The reviewer is right in pointing out that it takes some time for
TCP flows to regain their fair share of the network bandwidth and that TCP
flows are most vulnerable at times when their window size is small. This
is exactly pointed out in Figure 12(b) from the paper ``which indicates that
TCP is the most vulnerable to DoS in the 1 - 1.2 sec time-scale region''. We
also state that ``During this period, TCP flows are in slow-start and have
small window sizes such that a smaller number of packet losses are needed to
force them to enter retransmission time-out''. It is possible that XCP can
address point #b effectively, yet it is beyond the scope of this work.
-----

3) RED-PD seems like a bit of a straw man given the short time scale of your
attack (anything with a weighted moving average will have this property).
Why wouldn't you consider an algorithm that operates instantaneously like
fair-queuing? (assume that source address validation is enforced).

-----
(*) We use RED-PD algorithm because we believe that relatively long-time-scale
measurements are required to determine with confidence that a flow is
transmitting at excessively high rate and should be dropped. On the other hand,
we believe that fair-queuing (with dynamic buffer limiting [5]) is suitable
only for a homogeneous-RTT network environment, and not for the Internet both
due to RTT heterogeneity and scalability constraints. We improve the text in
Section 7.1. to address this issue more clearly.
-----



Review #4

1. What is your familiarity with this area?:
I am well-versed in this area, but it is not my current specialty

2. Overall Evaluation. In your estimation, how would you rank this paper with
respect to others that have been submitted to Sigcomm this year (and in
recent years, if you're familiar with those)? If you're unfamiliar with
Sigcomm submissions, how would you rank this paper w.r.t. papers that are
submitted to a broad range of relatively selective networking conferences?:
Top 10% but not top 5% of submitted papers

3. Is the *particular* problem(s) or issue(s) addressed by this paper
important and/or interesting? (1-3 sentences):
It is a new style of DOS attack.

4. Are the *results* of this paper significant? Will anyone benefit from it?
Will it be used by others in their research? Does it open up new areas or
resolve an important open issue? (1-3 sentences):
A DOS attack that is conservative in the number of packets
it sends to achieve its result is certainly of interest.

5. Does the work in the paper push into a new area, or is it in a traditional
area? (1 sentence):
I don't think it creates a new area, but it is certainly innovative.

6. What are the most important reasons to accept this paper, in order of
importance? Say whether these reasons dominate the reasons below.
(1-3 sentences):
* New way to perform DOS
* good simulations with real traffic tests on the Internet

7. What are the most important reasons NOT to accept this paper, in order of
importance? (e.g., the paper has serious technical mistakes, isn't novel,
doesn't demonstrate its point by proofs, simulations or experiments, makes
very unreasonable assumptions, etc.) If the overall conclusions are still
likely to hold despite these flaws, please say so. Say whether these reasons
dominate the reasons above. (1-3 sentences):
minor issues discussed below

8. Detailed comments on the paper (primarily for the authors):
My comments are mostly about edge issues that the paper doesn't
handle well.

Probably the biggest question is why only focus on the basic
RTO algorithm, when clearly, at some point, flows are affected
by Karn's algorithm (which comes into play on any transmission
after a timeout)? See Karn's SIGCOMM '87 paper . I think the
result will be similar, but Karn's algorithm means that successive
transmissions, both affected by losses, will take even longer than
predicted here.

-----
(*) The authors are aware of the Karn's algorithm and possibility that
successive transmissions, both affected by losses, will take even longer
than predicted in the paper. However, as stated in Section 3, while an
attack that uses exponential backoff is potentially effective for a single
flow, a DoS attack on TCP aggregates in which flows continually arrive and
depart requires periodic (vs. exponentially spaced) outages at the minRTO
time scale.
-----

Another issue that struck me is whether the periodicity of the DOS
attacks could be discovered. They've got a strong pulse and so,
I suspect, a frequency analysis (perhaps as simple as an FFT or
perhaps a Lomb approach) would discover them. The DOS
attack itself might require substantial randomization to stay
undetected. See the WISE '02 paper on traffic analysis.

-----
(*) While it is possible that the periodicity of the DoS attacks could be
discovered by a frequency analysis techniques, there are two important issues:
1) Note that many data sources (audio, video) might have strong periodic
behavior and one should be careful in declaring periodic flows as malicious.
2) On the other hand, the DoS attacker might use simple randomization techniques
(e.g., T=uniform(1,1.5)) that would slightly decrease the effectiveness of the
attack, but would also blur the strong energy component in the frequency
domain to elude detection. However, we believe that the above discussion is
beyond the scope of the paper.
-----

Typo in para two from end of 5.1 2/minRTO is, I believe, supposed
to be minRTO/2

-----
(*) The reviewer correctly points out that the time scale of interest is
T=minRTO/2. However, in 5.1 we refer to frequency f=1/T. Thus, 2/minRTO
is not a typo.
-----

Minor point -- only some shrews are venomous and the amount
of venom in even the venomous species is very mild.



Review #5

1. What is your familiarity with this area?:
I am well-versed in this area, but it is not my current specialty

2. Overall Evaluation. In your estimation, how would you rank this paper with
respect to others that have been submitted to Sigcomm this year (and in
recent years, if you're familiar with those)? If you're unfamiliar with
Sigcomm submissions, how would you rank this paper w.r.t. papers that are
submitted to a broad range of relatively selective networking conferences?:
Top 10% but not top 5% of submitted papers

3. Is the *particular* problem(s) or issue(s) addressed by this paper
important and/or interesting? (1-3 sentences):
Yes, it's the robustness of protocols to denial-of-service attacks, this time
studying congestion control.

4. Are the *results* of this paper significant? Will anyone benefit from it?
Will it be used by others in their research? Does it open up new areas or
resolve an important open issue? (1-3 sentences):
Yes. Who'd have thunk it? The paper shows that real TCP variants are
vulnerable to a fairly simple interference attack that significantly degrades
their throughput. This is unexpected, and it will likely cause others to
rethink parts of TCP and other transports.

5. Does the work in the paper push into a new area, or is it in a traditional
area? (1 sentence):
It combines two traditional areas, DOS and congestion control, in an unusual
way.

6. What are the most important reasons to accept this paper, in order of
importance? Say whether these reasons dominate the reasons below.
(1-3 sentences):
It's unexpected, has something perhaps fundamental to say about congestion
control, and it matters in terms of the real Internet.

A nice mix of analysis, simulation and experimentation.

It should be accepted.

7. What are the most important reasons NOT to accept this paper, in order of
importance? (e.g., the paper has serious technical mistakes, isn't novel,
doesn't demonstrate its point by proofs, simulations or experiments, makes
very unreasonable assumptions, etc.) If the overall conclusions are still
likely to hold despite these flaws, please say so. Say whether these reasons
dominate the reasons above. (1-3 sentences):
Some might say that the paper mostly presents vulnerabilities and not
solutions, such that it is irresponsible to publish at this stage. I disagree.

8. Detailed comments on the paper (primarily for the authors):
Nice job! My comments are mainly focused on improving your paper, both
presentation and content.

This paper cries out for SACK from the start, and at least NewReno (as the
dominant implementation according to TBIT surveys I believe?) where you have
used Reno. Almost as soon as the problem was presented I began to wonder to
what extent it would be ameliorated by SACK. And yet SACK does not really
emerge other than two paragraphs in 5.4! Your Internet tests also lack SACK!
The latter should be required as part of revision.

-----
(*) TCP SACK was used not only in Section 5.4, but also in section 7.1 where
it was used together with RED and RED-PD. Beyond this, \emph{all} the Internet
experiments are done with TCP SACK. At first we thought that the version used
in the experiments is New Reno. However, a letter revision showed that Linux
2.4.18 that we used actually uses TCP SACK by default.
-----

How much packet loss is required to trigger a timeout for the different TCP
versions you consider? You should state this explicitly, since while the
attack causes loss it does not cause total packet loss and this begs the
question of how much loss is enough -- especially NewReno or SACK versus Reno.

-----
(*) Exactly how many packets needs to be dropped for different versions of
TCP is not a trivial question and depends on the window size. A detailed
analysis on this issue is done in reference [7]. We improve the discussion
on this issue in Section 5.4 and explicitly point out a reader to reference
[7].
-----

For some reason I became confused about whether your attack was working by
causing loss in the forward or reverse direction, even though the latter
doesn't make much sense. (It makes some: if you can lose all the ACKs then it
doesn't matter whether the data arrived. But you need to lose almost all of
them.) Perhaps this is because your attack is unlikely to have an impact on
Web servers unless it can cause its effects via the reverse path. That is,
this attack is unlikely to shut down content providers; it may be a nuisance,
but won't disrupt the Web. True?

-----
(*) The reviewer is right in pointing out that the attack could be successful
if the reverse path were attacked, but such that almost all ACKs are lost from
the window of data, which we believe is not impossible, especially for short
RTT flows. We have treated the impact of DoS attacks on web traffic in two
generic scenarios in Section 5.3. Thus, while the reviewer points out an
important, yet a specific scenario of reverse traffic, we believe that it is
beyond the scope of the paper.
-----

Be clear where you use simulation and what you're simulating with. It seems
likely that the effectiveness of your attack depends on the quirks of TCP
handling in different cases. So we need to know exactly what ns or real TCP
you're reporting on if anyone is to have a chance of reproducing this. You
might even nmap the real boxes. And it would be good to test with more of
them, over more paths. Can't you use something like Planetlab or RON/Emulab
for vantage points?

-----
(*) We will post our ns-2 code on the web such that anyone should be able to
reproduce the results. Also, we will provide software that we used to
perform DoS attacks, but only for colleagues from the research community
in order to prevent possible malicious users. Also, in the revisited version
of the paper, we reveal the network maps where we performed the experiments.
These were not provided in the first version due to double blind
reviewing process.
-----

Can't read Figure 12, too small.

-----
(*) We have increased the font on Figure 12.
-----

I'd like to know in places like Figure 10 what the bandwidth of the attack
stream is. It of often increasing or deceasing as we move along the axis.
The reason for putting it there is that it wouldn't be surprising if your DOS
stream of 50% bandwidth were degrading the throughput by 50% ... BTW, are you
accounting for the stolen bandwidth as part of your normalization?

-----
(*) We add another curve in Figure 10 that shows the bandwidth of the attack
stream. Another such curve is shown in Figure 15(b). Yes, we have accounted
for the stolen bandwidth as part of normalization.
-----

Your simulations didn't expose one of the parameters I expected to be key:
the bandwidth-delay product measured in packets. If this is small then the
flow seems more vulnerable to disruption. Is this true?

-----
(*) Yes, this is true. The smaller the bandwidth-delay product is, it is
easier to throttle down TCP flows. See also the answer to reviewer #6.
-----

It would be great if you went further in your paper on how to design TCP
variants that are more robust to this attack. I'm not sure I believe the
tradeoff is as fundamental and we're as hosed as you seem to imply. First,
randomization seems good, and why not do it over a considerably larger
interval than you show (1 to 1.2 seconds)? Second, there may be very
specific SACK implementation choices that are causing problems here.
How many lost segments in a window will SACK retransmit and what is causing
the limit? In theory I don't see why a SACK-like transport needs to take a
timeout even if every second packet were being lost as long as the window is
above some minimum. You could be more radical in speculating as to what scope
of redesign is needed to address this or the limits of robustness to this
attack. BTW, you need a name, as you don't seem to use Shrew in the text.
I'd note that you seem to have the ultimate in TCP-unfriendly transports.

-----
(*)
- We agree that it would be interesting to analyze how to design TCP variants
that are more robust to this attack, yet we find it to be beyond the scope of
the paper.

- The reasons not to enlarge parameter b are provided in Section 7
where we say that ``increasing b is not a good option for low aggregation
regimes since the throughput can become too low according to Equation (8).
Moreover, excessively large b could significantly degrade the throughput of
sort-lived HTTP flows which form the majority traffic in today's traffic.''

- TCP SACK question is discussed above.
-----

Please include more information about RED-PD. Isn't your conclusion
predicated on a particular RED-PD parameter setting? Could I easily make
RED-PD do better?

-----
(*) We have included more information on RED-PD in Section 7.1. We believe
that the problem cannot be solved by tuning parameters in RED-PD and that
the problem is in a tradeoff induced by a mismatch of defense and attack
timescales, as stated in the conclusions.
-----



Review #6

1. What is your familiarity with this area?:
This is my area

2. Overall Evaluation. In your estimation, how would you rank this paper with
respect to others that have been submitted to Sigcomm this year (and in
recent years, if you're familiar with those)? If you're unfamiliar with
Sigcomm submissions, how would you rank this paper w.r.t. papers that are
submitted to a broad range of relatively selective networking conferences?:
Top 25% but not top 10% of submitted papers

3. Is the *particular* problem(s) or issue(s) addressed by this paper
important and/or interesting? (1-3 sentences):
yes, protocol vulnerabilities is an important problem; solutions are more
helpful though, and this paper only addresses the problem, but doesn't
address the solution.


4. Are the *results* of this paper significant? Will anyone benefit from it?
Will it be used by others in their research? Does it open up new areas or
resolve an important open issue? (1-3 sentences):
The paper illustrates a practical vulnerability of TCP to a low (average)
rate of attack packets, provided those packets are precisely timed. It also
argues that there are limited ways for TCP implementers to respond to this
problem. I would guess the paper will be referenced in the future.


5. Does the work in the paper push into a new area, or is it in a traditional
area? (1 sentence):
not a new area, although it doesn't reference the earlier work in the area (!)


6. What are the most important reasons to accept this paper, in order of
importance? Say whether these reasons dominate the reasons below.
(1-3 sentences):
As a high level idea, the paper is a clear accept. As a paper submitted to
HotNets, I'd take it in a second.


7. What are the most important reasons NOT to accept this paper, in order of
importance? (e.g., the paper has serious technical mistakes, isn't novel,
doesn't demonstrate its point by proofs, simulations or experiments, makes
very unreasonable assumptions, etc.) If the overall conclusions are still
likely to hold despite these flaws, please say so. Say whether these reasons
dominate the reasons above. (1-3 sentences):
The paper's execution is awful. I usually argue for accepting papers
despite their flaws, but this one is the exception that proves the rule.

I'm also not sure about the paper's long term impact. Of course, there are
fundamental limitations of cooperative protocols to malicious participants.
This isn't the first (or sadly, the last) to point this out. But these kinds
of attacks are easy to detect, despite what the authors claim.



8. Detailed comments on the paper (primarily for the authors):
I was rooting for this paper from the title onward, and it pretty much went
straight downhill -- the more I read, the less I liked it. It was a 5 after
the abstract, a 4 after the intro, and a 3 by the first proof.

First, the paper has a hopelessly naive model of DoS detection. The state of
the art in the commercial world is to look for anomalous traffic patterns,
not simply whether a protocol is sending too fast over the long term. This
is well beyond what RED-PD does, and could be found by anyone reading any of
the trade rags. This particular attack even has its own term, the
so-called, "pulsing zombie".

Yet I think the paper has some insight, even if the spin is all wrong --
what's the minimal number of packets a (mostly TCP conforming) flow has to
send to disrupt its peer flows? The authors don't answer this question! But
they come close enough that I can guess the answer.

So at this point I'm still in favor of publishing -- I can get something out
of a paper with the wrong spin. But the paper compounds this by being
embarrassingly bad at execution. For example:

the model in Section 3.2 is described as a "result", but it is no such
thing. The formula is never motivated in detail, and seems clearly wrong in
several key respects.

For example, the formula doesn't include the RTT, but suppose I have a tiny
RTT (e.g., 1 usec)? Then if the DoS attacker floods my bottleneck every
three seconds (say), I'll lose out for minRTO (1 second), and then have two
seconds to ramp up to full bandwidth. If my RTT is large, then I won't be
able to send any meaningful traffic before the DoS attack kicks in.

-----
(*) The reviewer correctly points out that the formula doesn't include RTT.
We improve the presentation of Section 3.2 and add appropriate explanation.
-----

This is clarified several paragraphs down, where it becomes clear that when
the authors say "result" they mean "model" -- i.e., they have a formula that
seems to describe their simulated data, but they have no deep insight as to
why.

-----
(*) The formula matches well the simulation results because the TCP is able
to quickly utilize the available bandwidth after exiting the time-out phase,
which was one of the model assumptions. Thus, the throughput losses due to
slow-start phase were not significant in simulations.
-----

The second "result" is similar -- not a result, just an empirical observation
(of simulations!). The paper doesn't show that the DoS attack will leave
other flows not meeting the conditions unaffected, despite what's said.
(In particular, if a flow has SRTT + RTTVAR > minRTO, it might still be
captured by the DoS attack.)

-----
(*) We treat the case of heterogeneous values of retransmission time-outs
in Section 7. This is indicated in the footnote on page 3.
-----

OK, what about the simulation results? Well, there they also have problems.
They arbitrarily pick the bottleneck to have 10*delay-bandwidth product, when
one times is more normal. Why? The authors don't ever study the sensitivity
to this parameter.

-----
(*) The bottleneck was not set to 10*delay-bandwidth product, since the RTT
(without queuing) of the flows used in the simulation varied from 20ms up to
460ms (based on a representative Internet measurement from [18]). Thus, the
queue size was set such that it varied from 5*delay-bandwidth product (for
20ms flows) down to 0.4*delay-bandwidth product (for the longest-RTT flow).
In any case, setting queue size to a smaller value, as indicated by the
reviewer, would only improve the effectiveness of the attack.
-----

Rather, they assume that the DoS attack lasts longer than the RTT +
buffering/link capacity, despite saying later that the attacker needs to
know nothing about the underlying characteristics of the bottleneck! And
this is central to the technique -- if you don't know the queue size and the
link capacity, it won't work.

-----
(*) The reviewer correctly points out that the knowledge of buffering/link
capacity is important for the effectiveness of the attack. In the paper, we
state that ``an attacker can use a number of existing techniques to estimate
the bottleneck link capacity [3,4,16,19,27], bottleneck-bandwidth queue size
[21] and secondary bottleneck rate [26]''. On the other hand, if the attacker
does not know these parameters (and even cannot send at the bottleneck rate),
he/she can still make successful attacks as shown in Section 5.2.3.
-----

The authors also state early on that the attacker only needs to fill the
bottleneck with attack packets "aggregated with existing traffic". But
all the experiments are designed to wipe out the bottleneck independent of
what else is using the pipe. So which is it? Alas, the technique really
does need to wipe out the pipe. For example, they don't consider the case
where their initial attack packets cause some flows to start backing off,
so the only way they can ensure they nail everyone is to blast enough packets
to fill the buffers, plus use the link capacity, for the entire maximum
possible RTT.

-----
(*) The above issue is addressed in Section 5.2.3 where we explore the
effectiveness of the attack to heterogeneous-RTT flows. There, we show that
longer RTT flows play the role of background traffic that enables
lower-than-bottleneck peak DoS rates to cause outages.
-----

As yet another example, they claim that the two phase
attack (fill the queue, then reduce to fill the pipe)
is experimentally equivalent to blasting at full rate.
But the test they run to show this is nonsense -- it
compares the two algorithms using the same number of packets. (Presumably,
this means the two phase attack runs for longer.) Most practical detection
algorithms would look for anomolous rates of traffic over some window of
time, would be equally capable of detecting both these variants -- while
reducing the number of packets needed for the attack does seem to be an
interesting question, but it isn't answered here.

-----
(*) First, the reviewer correctly points out that the two phase attack runs
for longer (because it ``optimally'' sends packets into the network). We have
shown that even if the square-wave ``on'' period lasts shorter (because it is
not optimal) the square-wave stream still manages to deny TCP flow's rate such
that the resulting frequency responses are nearly identical. We agree that
development of prevention mechanisms that detect malicious low-rate flows
remains an important area for future research, yet beyond the scope of this
work.
-----

I could keep going, but I'll stop with one from the end of the paper. Why
don't the authors consider removing the minRTO constraint? I doubt Allman
and Paxson would make their recommendation today, having read the paper. Yet
almost all of the results of the paper disappear without this constraint.
There's no law that says that minRTO has to be larger than the largest
possible RTT -- sure it helps stabilize the network in some cases, but its
not really required (the exponential backoff in RTO accomplishes much the
same thing).

-----
(*) The reviewer correctly points out that removing the minRTO would solve the
problem. However, as stated in the conclusions, this would degrade TCP's
performance in absence of the attacks.
-----

And the authors didn't bother to spellcheck; not surprising given the sloppy
execution of everything else.

-----
(*) We have spell-checked the paper.
-----



Review #7

1. What is your familiarity with this area?:
I am well-versed in this area, but it is not my current specialty

2. Overall Evaluation. In your estimation, how would you rank this paper with
respect to others that have been submitted to Sigcomm this year (and in
recent years, if you're familiar with those)? If you're unfamiliar with
Sigcomm submissions, how would you rank this paper w.r.t. papers that are
submitted to a broad range of relatively selective networking conferences?:
Top 25% but not top 10% of submitted papers

3. Is the *particular* problem(s) or issue(s) addressed by this paper
important and/or interesting? (1-3 sentences):


4. Are the *results* of this paper significant? Will anyone benefit from it?
Will it be used by others in their research? Does it open up new areas or
resolve an important open issue? (1-3 sentences):


5. Does the work in the paper push into a new area, or is it in a traditional
area? (1 sentence):


6. What are the most important reasons to accept this paper, in order of
importance? Say whether these reasons dominate the reasons below.
(1-3 sentences):
Very well written. A clever idea, thoroughly analyzed.

7. What are the most important reasons NOT to accept this paper, in order of
importance? (e.g., the paper has serious technical mistakes, isn't novel,
doesn't demonstrate its point by proofs, simulations or experiments, makes
very unreasonable assumptions, etc.) If the overall conclusions are still
likely to hold despite these flaws, please say so. Say whether these reasons
dominate the reasons above. (1-3 sentences):
In order for the attack to be successful, the attacker has to send at
about 10% of the rate which an ordinary DoS would require. It's a bit
speculative that an attacker would care about the 90% reduction in
average b/w, particularly since the attack isn't as effective as
brute-force DDoS.

8. Detailed comments on the paper (primarily for the authors):
Summary: Sending a link-rate burst every second is likely to keep all
TCPs using the link in timeout all the time. This allows a DoS
attacker to reduce the average attack b/w by about 10x.

It would be good to more precisely explain why this technique would be
more attractive to attackers than brute force DDoS. It basically
allows a 90% reduction in the amount of attack traffic (100ms burst
every one second). Is the point that this is impossible to detect
and/or trace back to the sender? Since the attack traffic is pretty
bursty, this is not clear; the bursts might be almost as conspicuous
as brute force DDoS traffic. It's not obvious why DDoS detection and
trace-back mechanisms that would work with a 100% DoS would not work
with this technique. Also I'm not sure DoS attackers care too much
whether the attack machines can be located, since they don't tend to
launch attacks from their own machines.

-----
(*) Yes, the reviewer correctly points out that the main difference between
brute force DDoS is that it is hard/impossible to detect due to high
false alarm probability, since the legitimate Internet traffic is also very
bursty. The DDoS detection and trace back mechanisms (such as RED-PD which
is used as a detection mechanism for a trace-back mechanism) would not work
well exactly because of this tradeoff. Moreover, in Section 5.2.3 we showed
that less-than-bottleneck bursts can significantly degrade traffic.
This type of attack does not have to be launched from attackers' machine
only, but also from a compromised machine. The attackers are interested in
making effective DoS attacks. Thus, the longer the attack lasts (the longer
the machines are not located), the attack is more effective. We improve
the text to bring these issues more clearly.
-----

The effect of the technique seems to be to degrade performance (3x to
5x slowdown for web traffic), rather than render the attacked link
useless. In contrast, the DDoS attacks we hear about have often
completely devastated the target. Again, it would be good if you
explained why an attacker would prefer this technique to a brute force
DDoS in light of its partial effectiveness.

-----
(*) An attacker would prefer this technique because it could significantly
degrade system performance for longer time. Also, 3-5 times is the average
delay degradation biased by very short flows that can skip the attack.
However, some of the downloads were degraded by more than 1000 times (see
Figure 9(a)).
-----

Is it the case that this technique might allow an attacker to attack
links that would be too fast to attack with brute force DDoS? I
suspect not: this technique requires the attacker to generate traffic
at the full link rate, so the attacker could in principle just
generate traffic at that rate all the time.

-----
(*) In Section 5.2.3 we showed that less-than-bottleneck bursts can
significantly degrade traffic.
-----

Part of the reason for considering whether the attack is likely to be
popular is to help decide how important it is to develop and deploy
counter-measures. So you should comment on this more explicitly.

-----
(*) We analyzed two counter-DoS techniques in Section 7 and believe that
prevention mechanisms that detect low-rate flows remains an important area
for future research, which is beyond the scope of this paper.
-----