This paper presents an approach to mitigate the effect of so-called pollution attacks. i.e. those attacks aimed at degrading the performance of a proxy cache by appropriately modifying the cache's file locality. Cache performance degradation actually takes place through polluting the cache with unpopular content. Pollution can be created either by ruining the existing file locality (locality disruption), or by inducing a false one (false locality). The authors study in depth, through simulations, two different scenarios which might be subject to this kind of attacks, namely p2p and web-based client/server. They show that resilience to pollution is highly influenced by the specific cache replacement algorithm adopted (the paper analyses the three cases of Greedy Dual-Size Frequency, Least Frequently Used and Least Recently Used). Based on the consideration that all of the mentioned replacement algorithms are unable to effectively protect the system against pollution attacks, the authors devise a set of countermeasures to be implemented in order to let caches survive to this kind of Denial of Service. The proposed mechanisms are all based on streaming computation techniques (bloom filters and probabilistic counting).
The idea behind the paper is quite interesting and it is presented in a clear fashion. The authors demonstrate a good knowledge of the issues that have to be faced in case pollution attacks should become popular in the Internet. The study is conducted in a thorough fashion and the proposed countermeasures look interesting.
I enjoyed reading this paper. It is well written and clearly explains the context and the authors' contribution(s). The only minor concern that I have regards the excessive length of the paper, which might be shrinked a bit without losing in expressiveness. Some of the concepts are somehow repeated (especially in the introduction and in section 2, which deals with the paper's motivation in a very detailed fashion).
I would recommend to accept the paper.
major weak points
First, while the authors claim that their defense solutions can be also used to mitigate the pollution attack in p2p systems discussed in , I doubt the claim. Unlike the two Web cache pollution attacks defined by the authors, the p2p pollution attack is different in that a pollution server populates supernodes' indexes with pointers of corrupted files for target contents. The goal of this p2p pollution attack is to increase the chances of a client downloading a target content from the pollution server. Since the target content is not necessarily unpopular in the p2p system, the p2p pollution attack can be neither locality-disruption attack nor false-locality attack.
Second, the experiment methods have two major weakness. First of which is a Web workload model. The authors should look at the paper by Paul Barford: http://citeseer.ist.psu.edu/barford98generating.html
If the workload model that authors chose is better than the one suggested by the above paper, I encourage the authors to explain why
Otherwise, it may appear tthat the workload could be chosen in favor of showing the effective of the attacks/defense methods. Second of which is the results presented in Section 3.2.1. It looks to me (by examining Figure 4), the attack traffic (and background traffic) was generated to an empty cache. Otherwise, for LFU and for the locality-disruption attack, the extremely low cache hit ratio does not make sense. In practice, an attack is likely to happen on a cache that has been in operation for a while, having certain locality already established. In this case, the locality-disruption attack for the cache employing LFU can't be as effective as shown in Figure 4(a).
Third, as mentioned in Section 4, the workload to popular dynamic contents or contents with a short time-to-live (e.g., nytimes.com) may look similar to pollution attacks in that the ratio of the requests to the unique client IPs can be high. However, I do not see any discussion on how to identify these files. Moreover, I believe that if evaluated against real Web cache workload, the proposed defense methods would generate many false alarms because of these dynamic contents. I strongly suggest that the authors should evaluate their schemes with real Web workload.
- page 3: no existing schemes is -> are
- page 6: Such servers resign -> reside
- page 22: false-locality attack -> locality-disruption attack
- there are may citations in the references section that are missing
a journal title.