response for cache

***************************
* Comment 1 (Reviewer #1) *
***************************

The only minor concern that I have regards the excessive length of the paper, which might be shrinked a bit without losing in expressiveness. Some of the concepts are somehow repeated (especially in the introduction and in section 2, which deals with the paper's motivation in a very detailed fashion)

***************** Response *****************

We have revised Section 1 (Introduction) and Section 2 (Motivation and Scope) a bit, removing or rewriting sentences that go into oo much details or tend to carry repeated concepts.

********************************************

***************************
* Comment 2 (Reviewer #2) *
***************************

First, while the authors claim that their defense solutions can be also used to mitigate the pollution attack in p2p systems discussed in [14], I doubt the claim. Unlike the two Web cache pollution attacks defined by the authors, the p2p pollution attack is different in that a pollution server populates supernodes' indexes with pointers of corrupted files for target contents. The goal of this p2p pollution attack is to increase the chances of a client downloading a target content from the pollution server. Since the target content is not necessarily unpopular in the p2p system, the p2p pollution attack can be neither locality-disruption attack nor false-locality attack.

***************** Response *****************

We have modified the first paragraph of Section 7 (Related work), page 32, to make the difference between p2p- and our proxy-cache-targeted pollution attacks more clear. We point out that the p2p pollution attack which exploits the method of populating supernodes' indexes with pointers of corrupted files is very different from the research scope of our work. We also mention that the solution to counter p2p pollution attack is to introduce proper media matching systems, which is fundamentally different from the solution to counter proxy-cache-targeted pollution attacks.

********************************************

***************************
* Comment 3 (Reviewer #2) *
***************************

Second, the experiment methods have two major weakness. First of which is a Web workload model. The authors should look at the paper by Paul Barford: http://citeseer.ist.psu.edu/barford98generating.html. If the workload model that authors chose is better than the one suggested by the above paper, I encourage the authors to explain why. Otherwise, it may appear tthat the workload could be chosen in favor of showing the effective of the attacks/defense methods.

***************** Response *****************

We have added a new paragraph at the end of Section 7 (Related work), page 33, to compare our model with the one proposed in the paper ([51]) by Paul Barford. We point out that like the model in [51], which captures the high variabilities for major parameters that affect the evaluation of server resource consumption, our model captures similar high variabilities for all those major parameters that affect the evaluation of cache efficiency. We also give the reason why it is valid to use some simpler model for some parts. In addition, we talk about two important properties, (i) the heavy tailed distribution of the request rate of different clients and (ii) the NAT effect, which we have carefully modeled but are not included in the model of [51]. These two properties are specific to our research scope, in that they affect the efficiency of our countermeasures a lot. Furthermore, we list several miscellaneous features in our model, i.e., new client arrival, new object arrival, and time of day effects, which make the model more realistic.

********************************************

***************************
* Comment 4 (Reviewer #2) *
***************************

Second of which is the results presented in Section 3.2.1. It looks to me (by examining Figure 4), the attack traffic (and background traffic) was generated to an empty cache. Otherwise, for LFU and for the locality-disruption attack, the extremely low cache hit ratio does not make sense. In practice, an attack is likely to happen on a cache that has been in operation for a while, having certain locality already established. In this case, the locality-disruption attack for the cache employing LFU can't be as effective as shown in Figure 4(a).

***************** Response *****************

The reviewer's observation is correct. We address this issue as follows:

We have added a new paragraph (para. 2 in Section 3.2.1, pages 12-13). We point out that the hit ratios shown in Figure 4(a) are in stable states, which are quickly determined when the cache is empty first and then serving requests from regular users and attackers. We add a new figure (Figure 5) to depict the case of a non-empty cache. For a non-empty cache, the hit ratio could remain at a relatively high value in the beginning, but it declines as the time elapses. Eventually, it converges to the stable state. Our experiments also show that the convergence time is determined both by the replacement algorithm and the aggregate request rate of attackers.

********************************************

***************************
* Comment 5 (Reviewer #2) *
***************************

Third, as mentioned in Section 4, the workload to popular dynamic contents or contents with a short time-to-live (e.g., nytimes.com) may look similar to pollution attacks in that the ratio of the requests to the unique client IPs can be high. However, I do not see any discussion on how to identify these files. Moreover, I believe that if evaluated against real Web cache workload, the proposed defense methods would generate many false alarms because of these dynamic contents. I strongly suggest that the authors should evaluate their schemes with real Web workload.

***************** Response *****************

In Section 4.1, page 19, we added a paragraph to address this problem from two perspectives. First, there are many queries to the same website (e.g., www.google.com), but in fact different URLs due to different queries. Second, for those very few popular websites (e.g., cnn.com) where many people tend to load the same page, we argue that we can set up a "white list" to address this issue.

In particular, the paragraph now looks as follows:

"While some clients may load dynamic content, including the documents that change on every access, documents that are results of queries, and documents that embody client-specific information (cookies), Bent et al. found that about 60% of HTTP requests are generated for dynamic content and are thus uncacheable [37]. The second caveat is that certain programs, e.g., web crawlers, repeatedly request the same file until a successful download occurs. The cache server and our detection system can recognize such failed requests and exclude them from counting. Finally, it is also possible that some clients may keep loading certain news web sites, stock web sites and search engines. Usually the webpage of the news web sites and stock market, e.g., "www.cnn.com", has a short time-to-live (TTL) parameter. They always need to reload the content from the server after the expiration time, while the unpopular files of pollution attacks do not have such patterns. Many people may request the search engines multiple times a day, but most URL queries of the search engine include the different request contents in the URLs, such as "http://www.google.com/search?hl=en\&q=abc". That means the probability for one client reload the same page with large frequency in a short time is small. Although it is also possible for some clients to solely keep loading the same web page, e.g., "http://www.google.com," without placing any queries, there is only a very small number of such popular web pages with large TTL. In practice, we can create a "white list" of such URLs to reduce detection false positives."

********************************************

***************************
* Comment 6 (Reviewer #2) *
***************************

==========
minor nits
==========
- page 3: no existing schemes is -> are
- page 6: Such servers resign -> reside
- page 22: false-locality attack -> locality-disruption attack
- there are may citations in the references section that are missing a journal title.

***************** Response *****************

All the minor nits pointed out are fixed.

********************************************