======= Review 1 =======

> *** Recommendation: Your overall rating (Please try giving as few borderlines as possible).
A = (top 10% of reviewer's perception of all INFOCOM submissions, but not top 5%) (5)

> *** Contributions: What are the major issues addressed in the paper? Do you consider them important? Comment on the novelty, creativity, impact, and technical depth in the paper.

ISPs desire to inspect users' packet content in order to serve them targeted advertising. However, this is prevented today by legal stipulations. Leveraging the fact that the law does not prevent packet headers from being inspected, this paper proposes a mechanism to inspect TCP headers and learn about users' browsing patterns by correlating this information with profiles of websites. They leverage aspects such as "requests for web pages typically have TCP PUSH flag set" in order to demarcate pages at a website.

> *** Strengths: What are the major reasons to accept the paper? [Be brief.]

Even though I do not like the very goal of the paper and the technical approach at a high level due to its hackish nature, I like the paper overall and believe that it deserved to be published. The work is systematic, the algorithm uses sound heuristics, and the results are interesting.

> *** Weaknesses: What are the major reasons NOT to accept the paper? [Be brief.]

- the validity of the approach can change with HTTP evolution
- no information on how long the algorithm takes to run? Is it feasible for an ISP to do it?
- profiling 6 websites is not enough to draw conclusions

> *** Detailed Comments: Please provide detailed comments that will help the TPC assess the paper and help provide feedback to the authors.

Suggestions for improving the writing:

- webpage --> web page (website as a single word is accepted but I think web server and web pages need to be separate words)
- when referring to the Web, capitalize. It is done at most place but one instance where it is not is second para, first line of II.A. Another instance: III.A, middle of second para.
- pointed by --> pointed to by (several instances)
- II.B, first sentence under "Locality": "The crawler records the location of each of the root files and objects." could be better clarified as "The crawler records the location of each root file and its corresponding objects."
- II.B: Web user behavior --> Web-user behavior

======= Review 2 =======

> *** Recommendation: Your overall rating (Please try giving as few borderlines as possible).
B = (top 30% of reviewer's perception of all INFOCOM submissions, but not top 20%) - weak accept (3)

> *** Contributions: What are the major issues addressed in the paper? Do you consider them important? Comment on the novelty, creativity, impact, and technical depth in the paper.

This paper proposes and evaluates a method for ISPs to identify the webpages that a user is visiting using only the information in TCP headers. The authors relate this problem to behavioral ad targeting and view the proposed solution as the "game changer" for ISPs to participate in online advertising business without violating existing law. However, it is not clear how does the webpage identification problem fit to the overall behavioral ad targeting scheme. The proposal solution is simple and has room for improvement. But it is fine since the paper is the first to consider the problem. The evaluate is reasonable, showing a proof-of-concept level of result, demonstrating the potential of the approach.

> *** Strengths: What are the major reasons to accept the paper? [Be brief.]

The idea and the problem formulation is new and interesting. Many of the technical challenge are well discussed. The evaluation and experiment study contains valuable insight worth of publishing.

> *** Weaknesses: What are the major reasons NOT to accept the paper? [Be brief.]

The big picture is missing - how do ISPs make use of this information in ad advertising? Do ISPs identify webpages and modify the ads in realtime? What information does behavioral ad targeting depend on? All these factors determine whether the problem under study is meaningful. But they are not discussed.

Some technical details are unclear. The reviewer has to guess what the authors mean in many cases.

> *** Detailed Comments: Please provide detailed comments that will help the TPC assess the paper and help provide feedback to the authors.

This paper brings attention to a new and interesting topic -- how can ISPs overcome legal barriers and participate in online ad advertising (with the hope that ISPs have better knowledge of user browsing profile and can hence help improving the relevancy of the ads). However, the paper quickly focuses on a very specific problem, ie., can ISPs identify the webpage by only examining the TCP header information, without a discussion of the big picture. This makes it hard for reader to appreciate the contribution. For example, how can behavioral ad targeting benefit from this study?
Existing behavioral ad targeting typically focuses on analyzing the history of user responses (e.g. clicks) when certain ads are displayed, which is quite different than what this paper studies. The web browsing history can arguably fit into a broader sense of behavioral targeting. However, it is not clear that correctly identify the webpage is what it takes for advertising. Should the link clicks be ! of more interest? Or is identifying the category of webpage content (e.g., furniture shopping vs automobile service) sufficient (rather than the page itself)? Without those, it is hard to justify whether the paper studies the right problem.

Many technical details are quite obscure. To give a few examples, in the detection algorithm, what is "highest percentage of identified objects"? does it apply on a slice by slice basis or combined? in tagging algorithm, how does it distinguish compressed versus non-compressed version? In evaluation, what is your "false positive"? Is it a subset of your "false negatives"? is your result weighted by distinct pages or by distinct request (different page have different request frequency)?

For analyzing "dynamic website behavior", it is casted as analyzing how frequently do root files change. However, a much more important factor is that some website adjust content based on users login information (the simplest would be a welcome addressing to the name of the user) or based on user's query. This can change the size of root files on a user by user base. It seems to be more troublesome for the proposed algorithm.

======= Review 3 =======

> *** Recommendation: Your overall rating (Please try giving as few borderlines as possible).
B+ = (top 20% of reviewer's perception of all INFOCOM submissions, but not top 10%) (4)

> *** Contributions: What are the major issues addressed in the paper? Do you consider them important? Comment on the novelty, creativity, impact, and technical depth in the paper.

This paper proposes an approach which can allow ISPs to identify the webpage visited by an source IP and then potentially sends targeted Ads to consumers behind the source IP. This is an important problem to solve.

More specifically, the paper profiles websites using features such as sizes of webpage root file and objects, external/internal links etc, extract these features from the TCP header and TCP packet timing info, and then compare these two to figure out the webpages (available through destination IP) visited by the user.
The effectiveness is shown using both controlled experiments as well as some small-scale real-world experiments. The reasonally high success ratio and low false positive ratio is attributed to the inherent diversity in the web content nowadays. The paper also did a reasonablly thorough job in terms of identifying the sources of potenial errors, measuring their pervalence in real world, and evaluating their impacts on the paper's approach.

Appears to be the first to apply the idea of comparing webpage and TCP features to area of the targeted AD problem.

> *** Strengths: What are the major reasons to accept the paper? [Be brief.]

Works on an important problem.

Appears to be the first to apply the idea of comparing webpage and TCP features to area of the targeted AD problem.

Reasonablly thorough job in detailed approach and evaluation.

> *** Weaknesses: What are the major reasons NOT to accept the paper? [Be brief.]

It is not a new idea to figure out webpages visited by an source IP by comparing website features (e.g. page and object sizes) with those identified in TCP header. Such an idea has been used in identifying the encrypted web traffic visited by a source IP [26,25,24,23,22,21]. There does not seem to be a fundamental difference between the basic approach used in these work and this paper. (although this paper apprears to be first to apply this idea to the applicaiton of potentail targeted adds)

> *** Detailed Comments: Please provide detailed comments that will help the TPC assess the paper and help provide feedback to the authors.

This paper proposes an approach which can allow ISPs to identify the webpage visited by an source IP and then potentially sends targeted Ads to consumers behind the source IP. More specifically, the paper profiles websites using features such as sizes of webpage root file and objects, external/internal links etc, extract these features from the TCP header and TCP packet timing info, and then compare these two to figure out the webpages (available through destination IP) visited by the user. The effectiveness is shown using both controlled experiments as well as some small-scale real-world experiments. The reasonally high success ratio and low false positive ratio is attributed to the inherent diversity in the web content nowadays. The paper also did a reasonablly through job in terms of identifying the sources of potenial errors, measuring their pervalence in real world, and evaluating their impacts on the paper's approach.

It is not a new idea to figure out webpages visited by an source IP by comparing website features (e.g. page and object sizes) with those identified in TCP header. Such an idea has been used in identifying the encrypted web traffic visited by a source IP [26,25,24,23,22,21]. There does not seem to be a fundamental difference between the basic approach used in these work and this paper. (although this paper apprears to be first to apply this idea to the applicaiton of potentail targeted adds)


minor comments:
- why does the top curve goes change rate goes down a little for the top curve in figure 5(b). one would expect the change rate would granudally increases over time since the trace is becoming more outdated, right?

- the paper mentions twice that "websites have no incentives to apply countermeasures since they are theprimary beneficiaries ofthe advertising business". But I am missing one thing -- why won't a website like NYTimes (one of the sites the paper studied) has no incentive to countermeasure the ISP's targeted ads attempts? they are not the sites that are selling something like Toyota or Ikea, instead they are hosting the ads. I.e. Aren't they the competitors of the ISP in teams of ads?

-- section III.F shows that once website profile size increases, the success ratio drops and false positive increase non-trivially. This is for a relatively static website (Toyota with 900 total papes). One would wonder what would happen with a more dynamic and potentially much larger website like NYTimes or even Yahoo. The reviewer would like to see more results like this.