===== Review =====

*** Originality: How new are the results/ideas presented?
Top 25%, but not top 10% (3)

*** Technical Merit: Please rate the correctness and soundness of the
scientific methodology.
Top 50%, but not top 25% (2)

*** Readability: Effectivness of presentation and readability
Top 25%, but not top 10% (3)

*** Relevance: Please rate the expected interest in the paper by the
participants
Top 10% (4)

*** Reviewer Confidence: How confident are you in your evaluation?
Very confident, my research area (4)

*** Overall Rating: Summary score
Possible accept (good, top 25%, but not top 15%) (3)

*** Short summary assessment: (REQUIRED) What are the main
contributions of this paper? Do you consider the issues addressed
important and/or interesting? Comment on novelty, creativity, impact
and technical depth(1-5 sentences)
This paper describes how delay can be measured on targeted path
segments using measurements taken at endpoints. The key idea is
that both endpoints of a path are under user control and that clock
synchronization issues can be avoided. Using these ideas the method
combines one-way end-to-end delay measurements with round-trip delay
measurements to network-internal nodes.

*** Strengths:: (REQUIRED) What are the most important reasons to
accept this paper? (1-3 sentences)
The paper contributes a new probing scheme that adresses an open
problem in a new way. The method is likely to be used in practice.
The paper describes a (complex) implementation and demonstrates that
the methods can be deployed. The paper does a good job of
considering the issues associated with asymmetric paths.

*** Weaknesses:: (REQUIRED) What are the most important reasons NOT
to accept this paper? (1-3 sentences).
Apart from the basic idea behind the method, most of the rest of the
paper is very ad-hoc. Validation of the method is quite weak. The
paper defines a large set of metrics whose definitions are not
theoretically justified. The authors use the tool to take extensive
measurements but the methodology is poor and the conclusions drawn
from those measurements are likely to be misleading.

*** Detailed Comment to the Authors: (REQUIRED) Please provide
detailed comments that can be used by the authors to improve this
paper.
Specifically, if you gave a low originality grade please support with
relevant citations. Also if you gave a low technical merit grade
please specify the location of error(s) in the paper. In general I
liked some aspects of this paper and disliked other aspects. On the
whole I am slightly leaning toward accepting this paper although I am
concerned that its results will be misinterpreted.

The parts of this paper I liked were the definition of a new probing
method and the discussion of how to adapt the probing method to
asymmetric paths. If the paper had simply focused on a careful
evaluation of the strengths and weaknesses of this method I would
have been very favorable toward the paper.

Unfortunately, once the method is defined the paper seems to rush
forward and treat a number of subsequent topics quite poorly. First,
the validation of the method is entirely unconvincing. The authors
state that out of 35,000 samples they decide to focus on just 8 links
for validation. I got the strong sense that we were only seeing the
strengths of the method and the weaknesses were being swept under the
rug. This feeling grew stronger when I read footnote 1 which stated
that 'measuring one-delays has become commonplace and no hard
synchronization between endpoints is needed' (with a cite to a paper
that doesn't support this statement at all as far as I could tell).
This is a crucial issue in the design of the method, it's not a
solved problem as far as I know, and to simply wave it away is pretty
concerning. Further concern arises when the authors discuss the
issue of clock skew and clock jumps (sec 3.2.3). In practice this is
a very difficult problem to solve (quite a few papers have been
written about it, eg see sec 4.2 of Crovella & Krishnamurthy) and the
authors describe only vaguely a very ad-hoc way of addressing it.
Again, given that one-way delay measurements are a key part of the
method, clock skew and jump is a crucial question to address clearly.

These factors diminish my confidence in the accuracy of the tool.
The problem is exacerbated by the metrics used as outputs of the
tool. sec 2.1.7 is very unclear as to why these metrics are the
'right' ones (or even how they are exactly defined, in the case of
Cx) How is Cx 'normalized' and why should that be interpreted as
'confidence'? Why is the definition of Px the right one to measure
*probability* of congestion? The rest of the paper uses these
metrics extensively but they don't seem to be clearly interpretable
right from the start.

After all this, it is simply premature to do a study like that in
Section 4 and to draw strong conclusions. First of all, as already
stated, we don't know *when* the tool is accurate or what its outputs
mean. Beyond that, Sec 4 introduces new ad-hoc metrics (first/last
20% of a path is the 'edge' -??) and is sloppy about intra- vs
inter-AS (another knotty problem that can't simply be hand-waved
away; lots of papers on this issue too!). Then it performs
extensive experiments on Planetlab which is well known to be a poor
way to obtain 'representative' views of the Internet since almost all
paths traverse the GREN (see Tim Griffin's papers on this). After
all this, presenting results such as 'congestion is stronger on
intra-AS than inter-AS links' is completely unsupported and could
very well be misleading.

Finally, the key assumption behind this method is that both endpoints
are under user control. This is rarely the case in settings of
practical interest. As such, the methods developed here are of
limited applicability. This fact is never acknowledged in the
abstract, intro, or conclusion and only indirectly mentioned in the
related work (paragraph 4).

In summary I liked sections 1 and 2 (except 2.1.7). Section 3 is a
mixed bag. Section 4 should be tossed in favor of a better
understanding of when the tool can and can't be trusted.

===== Review =====

*** Originality: How new are the results/ideas presented?
Top 10% (4)

*** Technical Merit: Please rate the correctness and soundness of the
scientific methodology.
Top 10% (4)

*** Readability: Effectivness of presentation and readability
Top 25%, but not top 10% (3)

*** Relevance: Please rate the expected interest in the paper by the
participants
Top 25%, but not top 10% (3)

*** Reviewer Confidence: How confident are you in your evaluation?
Informed outsider (2)

*** Overall Rating: Summary score
Likely accept (very good, top 15% but not top 6%) (4)

*** Short summary assessment: (REQUIRED) What are the main
contributions of this paper? Do you consider the issues addressed
important and/or interesting? Comment on novelty, creativity, impact
and technical depth(1-5 sentences)
The paper presents a prob-based methodology for determining congested
links on The Internet. The essential idea is that queuing delay
along packet paths can be "sensed" be differencing the minimum
observed delay from a particular delay in question. By measuring
paths redundantly from several locales, and by sending multiple
packet pairs, the methodology identifies gateways that are likely to
be experiencing congestion.

*** Strengths:: (REQUIRED) What are the most important reasons to accept
this paper? (1-3 sentences)
The probing methodology and resulting readings are new and interesting.

*** Weaknesses:: (REQUIRED) What are the most important reasons NOT to
accept this paper? (1-3 sentences).
Not clear what the accuracy is. Is this actually measuring anything?
That is a difficult question to answer, but without an answer, this
may just be an interesting programming exercise.

*** Detailed Comment to the Authors: (REQUIRED) Please provide
detailed comments that can be used by the authors to improve this
paper. Specifically, if you gave a low originality grade please
support with relevant citations. Also if you gave a low technical
merit grade please specify the location of error(s) in the paper.
The method is interesting and appears novel. By looking at the
difference between bounce packet times and round trip times, it is
possible to discern bounds on delay. It is also quite interesting to
note that the methodology deals with asymmetric paths by "demoting"
the measurement of certain links automatically. Clearly the authors
are concerned about building a real tool at least as much (if not
more) than they are concerned about purely academic results. These
qualities speak highly for the paper.

The presentation is not as good as it could be. First, it introduces
a complete menagerie of terms. This circumstance, by itself, is not
damning (although it does make the paper very difficult to read and
to follow). What is vexing is that some of the terms are reused (T
is both a threshold in 2.1.7 and a time interval in 2.3) and others
(C_x in section 2.3) seem to be defined but not to serve a specific
purpose. Moreover, several of these terms (C_x, CI(x,T), CC(x)) get
"normalized" to values between 0 and 1 and then treated as
probabilities. This is obviously a careful implementation of work
that has been thoughtfully proposed. It is disappointing to see
terms like "confidence," which have very specific meanings
statistically, used in what appear to be a haphazard way.

The evaluation is another disappointment in terms of its
organization. Section 2.4 explains that the method is evaluated
using ns-2 simulation, but no results are presented. Instead, the
results presented are empirical and a "self-consistency" validation
proposed. Both are needed. It would have been valuable to see how
well "Pong" identifies congestion points in simulation (where the
congestion points are known) and how sensitive it is when the queuing
delays vary (e.g. when the size of the congestion "spikes" vary).
Then, armed with an evaluation of how well Pong works when conditions
are known, we might be assured of Pong's value in a real setting
where ground truth is elusive.

Still, as a tool, the approach appears novel and it is clear the the
authors have attempted to address a large number of problems. It may
just be that in doing so, the complexity of the approach becomes so
great as to not lend itself well to a clear presentation in the space
available.

===== Review =====

*** Originality: How new are the results/ideas presented?
Top 25%, but not top 10% (3)

*** Technical Merit: Please rate the correctness and soundness of the
scientific methodology.
Top 10% (4)

*** Readability: Effectivness of presentation and readability
Top 25%, but not top 10% (3)

*** Relevance: Please rate the expected interest in the paper by the
participants
Top 25%, but not top 10% (3)

*** Reviewer Confidence: How confident are you in your evaluation?
Very confident, my research area (4)

*** Overall Rating: Summary score
Possible accept (good, top 25%, but not top 15%) (3)

*** Short summary assessment: (REQUIRED) What are the main
contributions of this paper? Do you consider the issues addressed
important and/or interesting? Comment on novelty, creativity, impact
and technical depth(1-5 sentences)
This paper presents an active probing technique for locating
congested links in a WAN, and reports the results of extensive
measurements on Planetlab.

*** Strengths:: (REQUIRED) What are the most important reasons to
accept this paper? (1-3 sentences)
The work is technically strong. The proposed technique is novel
(although there is a lot of similar work in this area) and apparently
effective. The experiments are executed well and the results reveal
some interesting things about the location of congestion in the
Internet (although there are not great surprises). The presentation
is generally good, although there are a few sections that are
unclear.

*** Weaknesses:: (REQUIRED) What are the most important reasons NOT
to accept this paper? (1-3 sentences).
One weakness of the paper is that it is not clear that the authors
are addressing an important problem, as opposed to measuring what
they can measure. Is there a motivating application for this tool?

*** Detailed Comment to the Authors: (REQUIRED) Please provide
detailed comments that can be used by the authors to improve this
paper. Specifically, if you gave a low originality grade please
support with relevant citations. Also if you gave a low technical
merit grade please specify the location of error(s) in the paper.
Overall I think this is a good paper, and solid technical work. My
suggestions below are mostly minor. My reservation about accepting
the paper is primarily because I am not sure that the problem is
compelling to sigmetrics attendees.

A large-scale question I have about the project is how the proposed
technique distinguishes between congestion episodes and routing
changes. The paper doesn't mention routing changes, but the issue
must have come up!

One page 2, the authors say that "congestion at the edges tends to be
more clustered in time". I didn't understand what this meant when I
read the intro, and I didn't find the section of the paper that
presents this result (but I might have missed it).

At the beginning of Section 2.1.1, the authors introduce fmin without
explaining what set of measurements it is the minimum of. The reader
can infer that the probing process being described here will be
repeated, and fmin is the smallest of the repeated measurements,
right? This should be explained more clearly. Also, this might be a
good place to address routing changes, since they will cause problems
with estimating fmin.

Figure 1 is hard to read -- it is not clear what line each letter is
meant to correspond it. Making the figure bigger might be a step
toward making it more readable.

On page 3, citation [33] is introduced with e.g., but I don't think
e.g. is the right relationship between this citation and the context
where it is cited.

In section 2.1.7, the authors proposed that the "probability that a
segment is congested" is a "more appropriate measure", but it's not
clear what it's more appropriate _for_, which gets to my previous
comment that I'm not sure this project has a motivating application.
The thresholds the authors use to define the "probability that a
segment is congested" seem ad hoc to me, and it is not clear that
they are extracting information from their measurements in a
well-motivated way.

The same issue comes up at the beginning of section 4.1. The authors
discard almost 30% of their measurements because they have
"non-negligible measurement errors". It is a feature of the proposed
method that it can indicate when the measurements are uncertain (and
uncertainty might be a better way to describe what is known, rather
than error), but it is hard to evaluate whether a technique that
fails 30% of the time is effective.

At the end of Section 4.2, the authors claim that their observations
are internally consistent. Specifically, they report that when a
congested link is observed on two paths at the same time, the
observations are consistent in "scale and time pattern". I'm not
sure what that last phrase means, but it seems to me that this claim
is very important, so I think it is worth spending some column-inches
demonstrating it more quantitatively. Is there some visualization or
summary statistic that would give the reader a more precise idea of
how many opportunities there were for this kind of checking, and how
repeatable the estimates are?

I always find it funny when related work is tacked onto the end,
especially when other related work is discussed in the introduction.
I think it would be better to make Section 5 a subsection of Section
1.

The conclusions section is more of a summary than a concise
presentation of the primary conclusions of the paper.

But in general I think the quality of presentation, and the
organization of the paper, are good.

===== Review =====

*** Originality: How new are the results/ideas presented?
Top 50%, but not top 25% (2)

*** Technical Merit: Please rate the correctness and soundness of the
scientific methodology.
Top 50%, but not top 25% (2)

*** Readability: Effectivness of presentation and readability
Top 25%, but not top 10% (3)

*** Relevance: Please rate the expected interest in the paper by the
participants
Top 10% (4)

*** Reviewer Confidence: How confident are you in your evaluation?
Very confident, my research area (4)

*** Overall Rating: Summary score
Likely reject (top 50%, but not top 25%) (2)

*** Short summary assessment: (REQUIRED) What are the main
contributions of this paper? Do you consider the issues addressed
important and/or interesting? Comment on novelty, creativity, impact
and technical depth(1-5 sentences)
The paper proposes active probing techniques for the estimation of
queueing delays at arbitrary links of an Internet path. The paper
then applies these techniques in the detection and localization of
congestion. An extensive measurement study using Planetlab nodes is
conducted and the results attempt to quantify the frequency and
intensity of congestion at interdomain/intradomain links at the
network edge and network core.

*** Strengths:: (REQUIRED) What are the most important reasons to
accept this paper? (1-3 sentences)
The main strength of the paper is what it promises to do. Detection
and localization of congestion is a very ambitious goal in the area
of Internet measurement. It would be great if we had a technique that
can do so. Additionally, the paper is well executed, looking at
several different directions in 12 very packed pages.

*** Weaknesses:: (REQUIRED) What are the most important reasons NOT
to accept this paper? (1-3 sentences).
The key weakness of the paper is that the proposed techniques are
based on assumptions that do not hold in the Internet. I do not
believe that the proposed techniques would work in practice, in the
sense of accurately detecting the presence, duration or intensity of
congestion events. I will elaborate on these weaknesses next.

*** Detailed Comment to the Authors: (REQUIRED) Please provide
detailed comments that can be used by the authors to improve this
paper. Specifically, if you gave a low originality grade please
support with relevant citations. Also if you gave a low technical
merit grade please specify the location of error(s) in the paper.
A key weakness of the paper is the fact that it requires measurement
of absolute one-way delays, not relative one-way delays. The former
require an accurate estimate of the clock offset, something that
cannot be done with GPS-synchronized clocks or similar technologies.
NTP does not provide sufficient accuracy for measurements of queueing
delays in the network. The paper "hides" this important issue in
Footnote-1. Even worse, it states that "measuring one-way delays has
become a common place and no hard synchronization is needed." This
statement is just not true. The reference that the paper provides
([5] is the PCP paper) does not give a mechanism to measure absolute
one-way delays. I found this part of the paper quite misleading, as
it tries to make the reader believe that the clock synchronization
issue, down to the accuracy of a few milliseconds, is solved.

Note that even if we had GPS at the end-hosts, we would still not be
able to use the proposed techniques because they require measurement
of one-way delays between the end-points and intermediate routers
(using ICMP-based responses).

Another important problem is that it does not present a convincing
validation study. It mentions some simulations in Section 2.4 but of
course a simulation study cannot capture in a realistic way clock
offset and skew, random ICMP delays, and the timescales of congestion
occurrence in the Internet. The "self-consistency validation" of
Section 4.2 is also not convincing. The reason is simple: measurement
from different endpoints can be equally inaccurate if they use the
same flawed methodology. Consistency does not mean accuracy. I
suggest that the authors use testbed experiments to examine the
accuracy of their methods. There are several such infrastructures
available to researchers, such as Emulab, WAIL, etc.

A major problem with the proposed methodology is the assumed relation
between queueing delay and congestion. The method assumes that a link
is congested if the queueing delay in that link is more than a
threshold. But of course, the absolute magnitude of the queueing
delay depends on the link capacity. A delay of 1-2msec represents the
transmission delay of just a single MTU packet at a 10Mbps Ethernet,
but it would also represent a large backlog of 100 packets at a
Gigabit Ethernet. Clearly, the threshold-based approach that the
paper is based on cannot detect congestion or quantify its intensity
or frequency.

Another major problem with the proposed methodology is the timescales
of congestion in the Internet. The proposed method attempts to sample
the queueing delay at a specific link every few hundreds of
milliseconds. This is a very long period compared to the timescales
with which the queues of a high-speed link (say more than 10Mbps)
vary. A 100Mbps link transmits a packet every 120microseconds; a
backlog of 100 packets can be transmitted within 12 milliseconds.
Whether Pong will be able to detect this backlog or correctly
estimate its magnitude is just a matter of luck.

For all the previous reasons, I am not convinced about the validity
of the claims in Section 4.4. It is possible that they are actually
flawed and opposite to what really happens.

Other issues:
- why do you propose fsd and fsb given that they are very rarely
applicable in practice.
- a similar issue: the 2-packet probing method is used in about 50%
of the cases. But the queueing delay bounds are very loose with that
method, raising even more concerns about the reported results.
- In section 2.1.7: the definition of P_x does not explain why that
metric is actually the probability that the segment is congested.
That section seems very ad-hoc overall.
- The claims about improving the sample frequency are not convincing:
you simply sample different links. Each link is still sampled every I
seconds, where I is 0.5-1seconds, meaning that the queueing delay
would be very undersampled in practice.
- The paper often includes statements such as "we omit the details
due to space constraints". This is annoying especially when the
reviewer understands that "the devil is in the details" in such
probing methods. I suggest that the authors keep the paper more
focused on the proposed methods and on its validation. If they
present a convincing study for the accuracy of the proposed method,
then they can examine additional directions (such as the application
of Pong in characterizing congestion in the Internet) in follow-up
papers.
- In Section 4.1 you mention that you removed 30% of the data because
they have non-negligible errors. Isn't that biasing the results?
- it seems that you are confident about the determination of which
links are interdomain vs intradomain. In section 4.1.2 it is stated
that you always mark two links as interdomain, meaning that for every
claimed interdomain link there is also a falsely classified
intradomain link.