===========================================================================
IMC '08 Review #82A
Updated Thursday 29 May 2008 12:30:51pm CDT
---------------------------------------------------------------------------
Paper #82: Thinning Akamai
---------------------------------------------------------------------------

Reviewer qualification: 5. I am an expert on this topic
Overall merit: 5. Top 5% of submitted papers!
Novelty: 4. New contribution

===== Paper summary =====

The paper examines vulnerabilities in a popular content distribution
network, designs suitable experiments to identify and measure them,
and offers (partial) potential solutions for the vulnerabilities.

===== Reasons for acceptance =====

This is a well-researched, well written paper with a reasonable
measurement methodology, clear goals, and interesting insights into a
not very well known problem. The paper straddles reverse engineering,
security, and measurements, remaining understated in tone and delivering
what it promises.

===== Reasons against acceptance =====

None. If this paper is not accepted at IMC, I'd expect it to be accepted
at a better venue.

===== Comments to author(s) =====

Congratulations on a good paper!

I liked the paper very much and my comments are simply aimed at
improving the paper. There are some obvious English errors you
should re-read and fix (surprisingly sloppy for an otherwise well
written paper).

As you might expect, many of the shortcomings and vulnerabilities you
point out are likely to be fixed. But in presenting a general
methodology you should anticipate some of them and see how you will
still be able to repeat such an experiment to look for remaining
vulnerabilities. Section 3.1 talks about ARLs: it would be trivial
for this (or any other CDN with a similar problem) to obscure the
ARLs - how would you go about reverse engineering then?

Sec 3.2: it is a bit risky to use the word "representative" (especially
when you don't need to) with a very small sample. The choice of streams
and customers you chose to study appear to be arbitrary and so it is
hard to make the case (just based on that) that the problem is widespread.

Why does the experiment last for four days? What takes the most time
and why? How can it be shortened?

Sec 3.3 good observation about being able to impact specific customers
(targeted DoS) - all the more reason to have looked at a diverse set
of customers to show how widespread the problem really is.

3.3.1 Implications: how do we know that the small sample of streams was
not just a bad choice for the 'significant' overlap?

3.3.2 What can you say about a larger sample? Why is it hard to gather
more data? When you say 'additional experiments (not shown)' - how many
were done? how large a data set? Why can't information about the size
be disclosed even if you don't draw all graphs - why can't the collective
data be shown without having to draw individual lines?

Again the geographic mapping is easy for the CDN to mask - the fact that
they expose it now does not mean it can't be trivially obscured in future.
So the Implications para in 3.3.2 is a bit glib: it may not be easy to locate
overlapping customers by looking for same 'g' names.

The end of 3.3 points to a risk of such experiments being easily identified
when you use a pay-per-view channel as their is customer identification
available.

3.4 There is an unnecessary and quite possibly bogus claim about redirection
to "distant clusters, e.g., to different continents". There are many problems
with this unsubstantiated and broad claim. What does distant mean? What is
the cost of this 'distance'? How often does continent hopping occur and who
is affected as a result of it and for how long? How can you draw such a
sweeping conclusion based on the (very) limited tests that you have done
- this is needlessly distracting.

Sec 4 You may want to see the related concept of carrying out increasingly
larger experiments without Internet servers detecting it in a recent
Usenix paper "Remote Profiling of Resource Constraints of Web Servers
Using Mini-Flash Crowds"

Sec 4.2.1 - What is the reasoning behind choosing 7 prober machines?
Why not fewer/more? You don't say what the tipping point is and
why that is the right choice (for degradation).

Sec 4.2.2 I found it problematic that you did the No-isolation
experiment just once.

Sec 4.2.3 Playing videos back to "colleagues" (how many? 2? 20? 200?)
is not a way to measure 'degradation'. And this experiment absolutely
cannot be used to make the transcontinental alarm that you raise
earlier.

Sec 4.3 What are these mysterious 'additional experiments' that you
mention to ensure absence of bottlenecks (column 2 page 11).
Last para - was this experiment repeated?

Typos: (many)

3.3 vatage -> vantage

3.3.1 'find totally' -> 'find a total of'

3.3.2 'the common for the two CNAME' --> 'common string in the two CNAMEs'

4.2.1 page 10 "yet at a different server" -> "but at a different server"

5.2.2 'vincible' -- 'visible'

===========================================================================
IMC '08 Review #82B
Updated Sunday 8 Jun 2008 7:57:54pm CDT
---------------------------------------------------------------------------
Paper #82: Thinning Akamai
---------------------------------------------------------------------------

Reviewer qualification: 3. I know the material, but am not an
expert
Overall merit: 2. Top 50% but not top 25% of
submitted papers
Novelty: 3. Incremental improvement

===== Paper summary =====

This paper describes the Akamai streaming infrastructure, and enumerates a set
of vulnerabilities that could be exploited by attackers to degrade the
performance. They also describe a set of mechanisms Akamai could employ to
defeat (or at least hinder) their attacks.

===== Reasons for acceptance =====

The authors present and demonstrate a set of effective mechanisms to DoS Akamai.

===== Reasons against acceptance =====

See above. Moreover, none of the attacks are particularly insightful or
illuminating---they follow directly from the mechanisms employed by Akamai.

===== Comments to author(s) =====

This is a fine study and a generally nice piece of work, but it seems more
appropriate as contract labor for Akamai than a research study. There is
nothing fundamental about the way Akamai has built its streaming system, and
all of your attacks depend on their system's architecture. While you claim
"the lessons learned from our study can be generalized not only to other
DNS-driven multicast streaming," it seems they could also have been
generalized from existing DoS techniques.

Turning to the technical detail of the paper, the study plotted in Figure 3
seems poorly executed. I infer your sampling interval was 30 seconds, which
seems to be too small. You refer to "additional experiments" that point out
the actual minimum switching time is 20 seconds. Why were these experiments
not included instead (or in addition)?

Perhaps the one aspect that is not Akamai-specific is section 4.2.3, which
attempts to quantify the difference in quality of intercontinental
redirection. Unfortunately, the technical methodology employed was to "record
the three traces and replay them to colleagues." While I certainly appreciate
the validity of human factors studies, such an anecdotal report seems to lack
sufficient rigor.

At a nit level, there are a number of typos and mis-spellings. The paper
would be well-served by a spellcheck. (e.g., I think you'll find Dixon et
al.'s system is called Phalanx, not Planx, and I think you mean 8:37, not 6:37
in 4.2.2.)

===========================================================================
IMC '08 Review #82C
Updated Thursday 26 Jun 2008 6:23:21am CDT
---------------------------------------------------------------------------
Paper #82: Thinning Akamai
---------------------------------------------------------------------------

Reviewer qualification: 4. I know a lot about this area
Overall merit: 3. Top 25% but not top 10% of
submitted papers
Novelty: 3. Incremental improvement

===== Paper summary =====

The paper examines the streaming and VoD services offered by
the Akamai CDN. It reverse-engineers the structure of the CDN
and identifies the key elements of the Akamai infrastructure.
It then proposes and evaluates a number of "attacks" that can
degrade the performance of the CDN. It concludes by proposing
a set of mechanisms to guard the CDN from these kind of DoS
attacks.

===== Reasons for acceptance =====

A well-done study that reverse-engineers and exposes flaws
in the Akamai CDN.

===== Reasons against acceptance =====

Unclear what is the general relevance of the results -- the paper
demonstrates that Akamai has predictable flaws, but does not
provide any new additional insights into how CDN streaming systems
have to be designed.

===== Comments to author(s) =====

In general, I thought that this was a nice paper with a well-done
study. It was fun to read and was able to say a bunch of
interesting things about Akamai's architecture.

My biggest concern though is regarding the larger relevance of
the study. Akamai got it wrong in certain ways -- updates their
DNS redirections with not enough agility, uses naming conventions
and mapping mechanisms that are too predictable, does not
perform rebinding of client to other edge servers when performance
degrades, etc. But the attacks as well as defenses proposed aren't
anything new that the research community didn't know of.

In spite of this complaint, I am overall in favor of accepting
the paper, with the following reservations on the experiments,
presentation, and implications presented in the paper.

a) The authors say multiple times that DNS-based systems
are incapable of reacting quickly to overloaded conditions.
I don't quite believe that. The problem with Akamai is that
it seems to have a hard-coded constant of 30secs to update
their DNS entries in their redirection servers. There is
nothing fundamentally infeasible about making this much
smaller. (Note that there is no issue of DNS TTL values as
far as I can tell.)

b) What are the implications of the experiment in 4.2.2?
There is no congestion observed inbound into the edge
server, but there is a whole lot of flapping on the output
streams. Why is that the case?

c) There is a claim that redirectors are carrying a higher
load than the edge servers. Seems like an unsubstantiated
claim (section 4.3).

d) 5.1.1 seems to have a naive analysis of stream replication.
What about digital fountain-like techniques?

Writing issues:

The writing can be tightened quite a bit. In addition there are
a number of typos:

2.2.4: "to to detect"

3.1: "different set reflectors"

3.3: "vatage points"

3.3.2: CNAME defined now after having been used before

4.1: "exam the status"

4.2.1: "start streaming", "observering"

4.2.2: should be 8:36 and 8:37

5.1.3: "successfull"

References: Planx should be Phalanx