Review for Drafting Behind Akamai (Travelocity-Based Detouring)

Public Review (by Z. Morley Mao)

Due to policy constraints and lack of load-sensitive routing, today's IP routing often cannot satisfy performance and robustness requirements of real-time applications such as Voice over IP and many other important applications such as financial transactions. To deal with path inflation and degraded performance of existing IP paths, overlay routing has been proposed in an attempt to bypass bottlenecks in real time. Extensive performance monitoring across potential overlay nodes is required to effectively identify preferred overlay paths with high probability of overcoming the performance bottlenecks affecting an existing IP path. Although existing work showed that randomly picking an intermediate node to perform one-hop source routing yields high success rate, to ensure the selection of the overlay node provides high guarantees of good performance, network monitoring often requiring active probing appears inevitable. With increasingly larger scales of overlay networks, such performance monitoring becomes more challenging.

The main contribution of this paper is to propose a novel use of existing Content Distribution Networks (CDNs) such as Akamai by identifying low-latency Internet paths suggested by the CDNs for the purpose of improving overlay routing. Certainly other applications can also make use of discovered low-latency path with no additional active probing. The paper focuses specifically on Akamai given its large size and more importantly its open redirection mechanism easily queried for the purpose of discovering "nearby" servers from a given client's perspective. Although the redirection algorithm is proprietary, the authors using extensive measurements showed that latency is a primary metric aed information nd thus can be taken advantage of for overlay node s's collection. The novel insight of making use of an existing system's collected performance information for other applications requiring similar information in an non-intrusive and light-weight fashion is refreshing. As the Internet keeps growing with more applications relying on dynamic adaptation based on discovered network performance very likely obtained through active probing, it is important to identify ways to more scalably identify dynamic network behavior. Using information already available from an existing application without requiring cooperation from the application is one such approach to reduce probing overhead.

Besides this novel insight, the paper however has several shortcomings which may offer potential avenues for future work. The authors did a thorough job in the measurement analysis for verifying that Akamai's server redirections strongly correlate with network conditions measured by latency of the path between clients and servers. However, when it comes to the evaluation to support using discovered preferred servers for the purpose of one-hop source overlay routing, there are still a number of questions unanswered. The measurement performed is quite limited and the obtained performance improvement is based on comparing within the limited set of returned Akamai servers rather than "all" Akamai edge servers. Arguably, this part of the work requires additional investigation with a larger set of measurement data to better understand how information obtained from Akamai can be used effectively for applications such as overlay routing and the potential limitations. For example, it is useful to quantify under what type scenario information from Akamai-based can be beneficial. For smaller overlay networks or networks that do not share many network locations with Akamai servers, the proposed system appears less applicable. One limitation that has been under0emphasized is the potential implication of the proposed system if it becomes wildly successful, i.e., a large number of overlay networks start to latch onto Akamai. The discovered paths may no longer be optimal if too many 1 users start to shift their traffic to them. This may in turn result in more frequent probing of the Akamai DNS system and unnecessary overhead.

Finally, this work encourages us to think of other novel ways to make use information collected by existing systems for our own applications. If such systems attempt to hide such information, it may be useful to understand how we can create sufficient incentives to share these information. For certain systems such as Akamai, it appears that it may not be easy to completely deter the proposed work in this paper, as long as its redirection mechanism is public. Instead of inferring properties from existing systems, it also behooves us to think about designing common infrastructures such as "network weather service" to allow sharing of commonly used network performance related information.

Review #1
Attribute Value
Provide a short summary of the paper the paper reports on an extensive measurement study of the akamai cdn. the objective is to explore to what extent network measurements (and operations) performed by akamai can be used to infer and utilize quality short time-scale information regarding network conditions. some of the key findings are (i) akamai redirection times are sufficiently low to be useful for network control, (ii) akamai server-redirections strongly correlate with network conditions experienced on the paths between clients and servers, and (iii) in more than half the scenarios studied, taking the route "recommended" by akamai is better than using the direct route.

What is the strength of the paper? (1-3 sentences) excellent paper, with a great title.

one always had the feeling that it should be possible for a third party to take advantage of the extensive measurement infrastructure and partially known operations of a CDN like akamai and infer/utilize akamai-based information for one's own purpose. this paper shows how such akamai-supported inference is possible and why akamai-based information regarding network conditions has value for non-akamai customers.

What is the weakness of the paper? (1-3 sentences) just fine as is. small issues here and there (see below), but fine piece of work overall.
Your qualifications to review this paper I know a lot about this area
Novelty of paper This is very novel
Overall paper merit Score 5: Top 10%. Strong Accept. Great paper. I really want to see it in the program and will argue for it to be accepted.
Provide detailed comments to the author - in section 3, how is "proximity" defined? RTT, loss, hop count?

- what's the significance (if any) of the steep changes in the cdf in fig 3 around 125 and (less so) around 110?

- it would be useful to see more details about the grouping shown in fig 10. there are a number of claims in the text describing this figure, but no evidence is provided for, say, the fact that the 0-20 group is dominated by nodes that have a large server and path diversity and a small redirection frequency. there should be ways to show this info in conjunction with fig 10.

- in section 5, it would be worthwhile to dig somewhat deeper. for example, the direct path (obtained by the source node pinging the destination) may well be inflated due to policy routing. this brings up the question to what degree the differences between the akamai-based one-hop source routing and the direct-path routing can be explained by path inflation. depending on the answer, there may be simple and equally effective alternatives to akamai-based one-hop source routing in an appropriate overlay network.

- if a cdn like akamai resents this sort of "free riding", what are obvious strategies that would prevent a third party from taking advantage of the cdn's measurements/operations (or at least make it more difficult to infer/utilize them)?

Review #2
Attribute Value
Provide a short summary of the paper This is a paper that is primarily a measurement paper giving an assessment of how good a job Akamai redirection solution does, when evaluated in terms of picking lower latency network paths. In carrying out this investigation, the paper provides an interesting dissection of the behavior and characteristics of Akamai's redirection scheme and how it varies across geographies and proximity of clients to Akamai servers.

The paper's secondary focus (less than a 1/3rd of the pages), although it represents the initial motivation for the work, is an investigation on whether and how it is possible to use information derived from Akamai's redirection choices to build a large-scale one-hop overlay network.

The motivation for this approach is that realizing such a solution calls for either performing extensive measurements to be able to continuously select the best (one-hop overlay or direct) path among many possible choices, or for relying on some other means for deciding ahead of time what alternate paths might be good choices. The paper's premise, which it supports to some extent through the evaluation of Section 5, is that it is possible to leverage Akamai's extensive measurement infra-structure, and its results as available through its redirection recommendations, to achieve this goal.
What is the strength of the paper? (1-3 sentences) The paper's main strength is really in its extensive investigation of the behavior and performance of Akamai's measurement infrastructure and how it is used to make redirection decisions.

This part alone is of interest even if it might be a better fit for a conference such as IMC than SIGCOMM.
What is the weakness of the paper? (1-3 sentences) The paper's weakness is really in that it leaves quite a few loose ends in its attempt at convincing us that using Akamai's redirection information can provide a viable solution to the choice of good alternate, one-hop overlay paths.

The weakness is not only because the benefits of using Akamai's redirection results exhibit wide variability in the benefits it yields (as per the statistics provided in the paper), but also because tying it to the route selection problem is not done very convincingly. In particular, as the authors point out at the end, Akamai's information only give insight into one half of the path, and it is far from clear whether that always (or even in a majority of cases) results in a good overall choice (from source to destination and back). I would really have liked to see more data on this in order to be convinced that this was even a good idea. Unfortunately, the routing evaluation section (section 5) is relatively succint and not as thorough as the rest of the paper.

Another aspect that the paper does not even mention, but that is probably important to at least discuss, is that of the impact that the widespread use of the proposed technique would have on its performance. In other words, if the approach is successful and widely followed, it is likely to affect the performance of the network paths that Akamai identifies as "good". This could defeat the original purpose of performance improvement, and may even lead Akamai to change its approach, or possibly try to somewhat hide its results. There should at a minimum be an acknowledgment that this could be an issue.
Your qualifications to review this paper I know a lot about this area
Novelty of paper This is a new contribution to an established area
Overall paper merit Score 4: Top 10-20%. Soft Accept. I'm inclined to accept it - I would like to see it in the program, but I am not arguing strongly in favor of the paper. Note that most papers in the program will probably have an average score of about 4.
Provide detailed comments to the author Let me state up front that I like the initial idea of leveraging someone else's work/infrastructure in order to solve a challenging problem.

Using large-scale overlays to allow one-hop alternate routes to bypass performance problems on the direct path is clearly an interesting proposition, especially in light of the increasing availability of P2P technology that can relatively easily make it feasible. In such a context, the problem is more to find what are good alternate paths (out of possibly many), than to allow/support the use of an alternate path (e.g., see the forthcoming INFOCOM 2006 paper entitled "How to Select a Good Alternate Path in Large Peer-to-Peer Networks" for a similarly motivated paper). In that respect, the idea of leveraging the fact that Akamai is continuously monitoring a large number of paths is clearly a good idea.

The one caveat with this approach, one that is acknowledged in Section 5.4, is that Akamai will only give you half of the answer, and it is also not clear how to get the "right" answer from Akamai for different destinations, i.e., is querying Akamai for Yahoo, as you do in Section 5.1, always providing the best/a good answer across destinations all over the Internet. You provide some evidences that this may be OK, but the data is far too limited to enable a solid and convincing conclusion.

Let me next make a few more pointed comments directed to specific places in the paper.

In the intro, you mention that for overlay routing to be able to use the Akamai info, it is also necessary that the network be able to map some of its nodes to Akamai edge servers, but you say nothing of how this can be done until later in the paper. I would suggest addressing this up front.

I'll come back to that later, but having the Akamai redirected paths outperform the direct paths 25% of the time is not a great statistics. In the context of a large overlay network, you should compare this to what randomly picking an overlay node would yield.

You only define that performance really means latency in the last paragraph before Section 1.2, while you have repeatedly mentioned performance before. In addition, you wait until page 12 to argue why latency is the most important metric. I'm not sure I agree with that position, but irrespective of that you need to have that discussion earlier on.

Figure 8 had me puzzled as to why the ordering of nodes was so consistent (continuously increasing average rank across all Akamai customers) until I spotted footnote 8. You may want to move that explanation in the text or in the caption.

Speaking of figures, I found figure 10 confusing as you never directly state that nodes on the left and right of the range 20-30 correspond to different relative values giving the same absolute value. It's sort of there in the text, but easy to miss.

As you point out, the setup of Fig. 11 forces a symmetric path. In addition to the problem of performance information for the segment between the overlay node and the destination not being always available, it is also not clear that this always represents the best choice, e.g., the direct return path might be much better (no congestion) than the forward direct path.

The discussion of what Fig. 13 reports is very confusing. You say it reports differences in latency between the one-hop overlay and the direct path measured over short time-scales, but then you have an experiment running over 3 days. So is Fig. 13 reporting the average of these small time-scale differences, and if yes what is the duration of each measurement interval? You need to better explain the data for this figure.

Still in relation to Fig. 13, you say that for 50% of the scenarios the best measured one-hop Akamai path outperforms the direct path, but this is a bit of a stretch in that for close to 30% of them the difference is pretty much nil.

In the path pruning scenario, I am assuming that the Akamai DNS is queried for a given customer that is independent of the actual destination for the direct path. As discussed earlier, it is key to properly assess the sensitivity of the scheme to this choice across a broad range of destinations, especially since you now have no visibility into the segment from the redirect server to the destination.

It is very hard to distinguish the different line styles in Fig. 14.

You point to a "sharp" decline at 2hrs in the performance of BTAS, but omit to mention that there is a steady and non-negligible decline up to that point, i.e., the difference in slopes before and after is not that substantial. In addition, can you explain why performance seems to be improving as the update interval increases beyond 500 minutes or so? This seems counter-intuitive.
Review # 3
Attribute Value
Provide a short summary of the paper The paper used 140 PlanetLab nodes to conduct a measurement study of Akamai networks, showing that Akamai's redirection correlates well with the network latencies on the paths between Akamai edge servers and the clients. Further it claims that this Akamai redirection can be used by other overlay networks in selecting their one hop source routing.
What is the strength of the paper? (1-3 sentences) The paper did a fairly broad measurement study of how Akamai redirection works using 140 PlanetLab nodes. It shows that the DNS based redirection for network control Akamai is using worked well for most cases. The idea of using that Akamai redirection information to do one hop source routing for other general overlay networks seems to be interesting.
What is the weakness of the paper? (1-3 sentences) The metrics ("top 10 path", "rank") used in the paper are somewhat misleading. The evaluation of one-hop routing is based on very small subset of nodes. The conclusion is not sufficiently supported by the evidence presented.

The paper did not show how other overlay applications can use the Akamai redirection information for their source route in detail. (1) How the Akamai redirection information will be collected and made available to other overlay applications in non intrusive, low overhead ways (2) the paper selects the best Akamai edge server for detour service. It is unclear how to select the overlay node.
Your qualifications to review this paper I am an expert on this topic
Novelty of paper This is a new contribution to an established area
Overall paper merit Score 3: Top 20-35%. Soft Reject. Worth considering, but I would prefer not to see this paper in the program.
Provide detailed comments to the author You did an interesting study. However, the paper can be significantly improved by applying more sound methodology and drawing more qualified/precise conclusions from the available evidence.

A key metric is the latency benefits of Akamai selected paths over "10 best paths". However, the definition of "10 best paths" is misleading. Ideally, the top 10 paths should be selected among the paths between the client and each of ALL Akamai edge servers. However, in the paper, the selection is based only the returned edge servers by Akamai. So there is an implicit assumption that Akamai is returning the good paths, and from those paths, it is able to select the best path.

Another metric is "rank". Rank reflects only the relative ordering. However, the key benefit may need to be measured by a combination of (a) difference between delays on two paths, (b) ration between delays on two paths. For example, if the delays of "10 best paths" have very small differences, the ranking does not really matter.

A number of conclusions do not seem to have strong supporting evidence. For example, in Section 5.3, Figure 13 shows the latency gains for using Akamai paths versus direct paths for pairs of hosts. The best and worst values do not have a notion of number of time intervals that the best path was seen versus the total times the measurement was conducted. The statement about 50% of scenrios, the best Akamai path outperforms the direct path is misleading. The best path could have been measured only once in the 3 days, and including it in the 50% is not fair. The real result seems to be that in 50% of the pairs evaluated over three days, there was at least one measurement which lead to a better path than the direct path.

For the one-hop overlay routing, only very few number of sites were selected. Given that this is the claimed contribution of the paper, I would expect much through evaluation based on more comprehensive data.

Other minor comments:

. pg.6 first paragraph: "on the same networks as Akamai edge servers", does this mean they are on the same LAN, which does not seem to be likely, or on the same ISP network, which then could still be pretty far. There are several other places mentioned "same networks" without explaining the real meaning of it.

. pg.8 Section 5: "finding hosts that share networks with Akamai edge servers" here again, what share networks mean is not clear.

. pg.9 Section 5.1: just below figure 11. "The source node iteratively issues a DNS query for an Akamai customer." As mentioned in the paper, different Akamai customers will return very different sets of servers. Even the number of servers returned could be very different, ranging from 2 servers to several hundred of servers. The practical question is then for each client to the service, how to pick the Akamai customer to query.

. Following the above point, if every overlay application that does one hop source routing issues some DNS queries for Akamai customers for their own purpose, will this put a lot of undue burden on Akamai networks? . pg.9 last paragraph: the measurement based on the Akamai "data" is asymmetry and can be quite different depending on which node is assigned to be source. But usual RTT (round trip) from A and B and from B to A should be very similar. So we have 4 cases: (1) A to "an Akamai server then to B; (2) A to B and back to A; (3) B to A and back to B; (4) B to "an Akamai server" then to A; (2) and (3) should be similar; And if (1) and (2) are similar (for good cases), and (3) and (4) are similar (for good cases), then (1) and (4) should be similar also. It seems that for the cases we can use Akamai, (1) and (4) should be similar, and if they are different, most likely we can not use Akamai's "recommendation"?

The paper mentioned several places of "share networks with Akamai edge servers", but it did not explain what this really means. Does it assume the overlay network needs to have nodes that co-locate with all Akamai servers? What is the level of co-location requirement? Same PoP? same network? This seems to be very restricting for the overlay network design.
Review # 4
Attribute Value
Provide a short summary of the paper This paper describes a way to use the existing Akamai CDN network to help select overlay nodes to do one-hop source routing. It justifies the technique through an extensive measurement study showing that Akamai's redirection mechanism often chooses an edge server with low network latency to the client. The optimization metric for overlay routing is low latency.
What is the strength of the paper? (1-3 sentences) The paper demonstrates a novel use of an existing infrastructure (Akamai's CDN) for a common application (overlay routing).
What is the weakness of the paper? (1-3 sentences) Aside from the novel use of the CDN for indirectly obtaining latency measurement, the paper didn't provide convincing evidence that this provides substantial improvement over existing overlay routing schemes (for example UW's one-hop source routing scheme). Furthermore, given the prioprietary nature of the CDN servers, the optimization criteria used for server selection is never clearly known. The obtained overlay node only indicates a "close-by" network location with no additional information on other performance metrics such as bandwidth, and with no information on the its path to the destination.
Your qualifications to review this paper I know a lot about this area
Novelty of paper Incremental improvement
Overall paper merit Score 3: Top 20-35%. Soft Reject. Worth considering, but I would prefer not to see this paper in the program.
Provide detailed comments to the author To be convincing, the paper needs to compare with the work by UW on one-hop source routing [12] to demonstrate either an improvement over their performance or simplicity in the design of the overlay node selection scheme.

If many users decide to use Akamai's server selection information to choose overlay nodes, the previously uncongested paths may become congested within short time period, before the server information is reflected. What are the negative implications of this and how to resolve this problem?

Akamai's redirection scheme is still prioprietary. It probability optimizes using a number of factors in addition to latency, e.g., bandwidth, server load, network cost. In such cases, the hint provided by Akamai is not going to be very useful given that the actually selected overlay node may not always be located in the same network as the Akamai's edge server or be close to it. Akamai can decide to change its algorithm for redirection tomorrow. The measurement results in the first part of the paper heavily depends on how Akamai operates its system. For instance, if Akamai only allocates a small number of servers to the customer queried, (or to any customer), the selection would be limited to specifically those edge servers. This can be a dynamically determined policy by Akamai in mapping the name request for a given customer to a dynamically selected set of servers.

The underlying assumption of the overlay routing system to take advantage of the Akamai's redirection scheme is that there is a significant number of nodes participating in overlay. It is highly likely to locate an overlay node to be on the same network and located close to an Akamai edge server. This may not always be the case.

Figure 5: there is no discussion on the stability of the data and it is unclear over what time duration the study is done.

Page 8, end of section 4.2: it is premature to conclude that CDN services show significantly better than traditional server farm approach using data centers. The authors didn't present sufficient evidence in the paper to show that.

The overlay routing algorithm proposed uses the information of Akamai's edge server to select an intermediate server without considering the network performance from the intermediate overlay node to the target destination. It is highly likely that if the overlay node is selected to be on the same network as the edge server, then the network is very likely a well provisioned ISP network with low latency to the target destination. The authors didn't discuss the implication if the intermediate overlay node is not on the same network, but physically close by.

Review #5
Attribute Value
Provide a short summary of the paper  
What is the strength of the paper? (1-3 sentences) clever hack
What is the weakness of the paper? (1-3 sentences) evaluation incomplete: assumes ability to indirect through Akamai nodes (which is potentially far beyond the best case)
Your qualifications to review this paper I know a lot about this area
Novelty of paper This is a new contribution to an established area
Overall paper merit Score 4: Top 10-20%. Soft Accept. I'm inclined to accept it - I would like to see it in the program, but I am not arguing strongly in favor of the paper. Note that most papers in the program will probably have an average score of about 4.
Provide detailed comments to the author  
Review #6
Attribute Value
Provide a short summary of the paper The authors consider leveraging the Akamai infrastructure to guide the selection of overlay routes. Specifically, the authors focus on a scenario where nodes have the ability to use one-hop detour routes through a (node colocated with) an Akamai server. The main contributions of the paper are a deeper measurement-based understanding of the Akamai infrastructure (where there has already been considerable work), the idea of bootstrapping overlay routing on Akamai, and an experimental evaluation that demonstrates that this can provide a decent advantage.
What is the strength of the paper? (1-3 sentences) Nice idea: the premise of avoiding measurements by leveraging an existing piece of private infrastructure is clever.

Extensive set of measurements: the authors do a good job of exploring the key questions surrounding this problem.

Accessible and interesting read.
What is the weakness of the paper? (1-3 sentences) Seems like a bit of a hack: Akamai would almost surely employ counter-measures to prevent users from exploiting their measurements in this way in practice.

Questions remain regarding practicality. Foremost among these is how much leverage you can actually gain from Akamai when your overlay infrastructure has a smaller set of nodes that are not fully colocated with Akamai's nodes. This question could have been considered in this paper.
Your qualifications to review this paper I know the material, but am not an expert
Novelty of paper This is a new contribution to an established area
Overall paper merit Score 4: Top 10-20%. Soft Accept. I'm inclined to accept it - I would like to see it in the program, but I am not arguing strongly in favor of the paper. Note that most papers in the program will probably have an average score of about 4.
Provide detailed comments to the author This paper has an interesting premise, as it investigates the value of side information obtained from Akamai regarding nearest available servers when used in overlay settings. Unfortunately, much or most of the paper focuses on the well-trodden ground of the mechanics of how Akamai works and conducts measurements to quantify how well Akamai works. There is an abundance of related work on these topics and the material presented here, while presented well, does not break lots of new ground. The new material, on so-called one-hop source routing, is more interesting, but I am skeptical on practicability. In dense networks like Kazaa, there will of course be nodes colocated with most Akamai servers, but it is not clear how one would identify these colocated nodes, to say nothing of the load that would be introduced on these nodes if a feature like this were available. In sparser networks, it is not clear whether the Akamai information buys you all that much. Were it the case, and had the authors presented experimental evidence to that effect, then I would be a stronger advocate of this paper.

Another question I had is "Why Akamai"? Is it the case that the Akamai data is that much better than could be provided by say, a well-designed beaconing service. This is important, because Akamai surely would not allow users to leverage their infrastructure in this way, but a public service designed for this purpose might be an alternative. A concrete comparison to other forms of detouring systems would also be apropos.

Other more superficial issues:

There is quite a bit of hype about locating high quality paths without performing extensive monitoring. But in practice, the nodes do have to perform extensive monitoring -- in the form of queries to Akamai.

The heuristics investigated in 5.4 seem pretty ad hoc. The measurement load to do well is quite high here.

Lines on many of the figures are very hard to distinguish, e.g. Figures 12, 14. Also, figs are out of order.

Lots of italics and footnotes seem gratuitous. Surely the reader doesn't need a footnote in the title to have the term "drafting" explained.

I don't get the Travelocity reference. Maybe I do need a footnote for that part.