review for taming

======= Review 1 =======

> *** Contributions: What are the major issues addressed in the paper? Do you consider them important? Comment on the novelty, creativity, impact, and technical depth in the paper.

This paper considers how the large amounts of user upload activity can be supported in mobile networks economically.
Their proposal revolves around the concept of using delayed uploads. They observe that many types of uploads can be
performed with some delay. They propose an architecture where service providers selectively upgrade certain access locations with higher capacity and these are referred to as Drop Zones. The ides is that the user handles the content to a backend application which later uploads the content when near a Drop Zone. They formulate the Drop Zone placement problem as a static optimization problem : set cover and propose a greedy solution. Trace-based evaluations indicate the approach is effective.

The authors target an important research problem in an active area - with the increasing capability of smartphones
and
other mobile devices, the data consumption and generation by these devices is increasing rapidly, putting stress on existing infrastructure, and spurring research on how to handle this traffic. Based on the evaluations, the proposed technique appears to be effective. The paper is overall well written.

> *** Strengths: What are the major reasons to accept the paper? [Be brief.]

Well-written paper on an important practical issue. They analyze user data for a large base of users to make a convincing case for the Drop Zone idea. Overall a solid work with some limitations.

> *** Weaknesses: What are the major reasons NOT to accept the paper? [Be brief.]

The proposal and evaluation depends on getting right a critical piece of information - the base station location of
the users in the data trace. Certain results eg. a user uploads from a few base stations depends on this data being accurate. The authors indicate they get this from billing information. However as I describe below, the base station information in the billing records can be quite inaccurate, due to user mobility and the fidelity at which the billing systems get base station information.

My concern is how this inaccuracy impacts the key results in the paper. At the least, the authors should explain how the base station information was populated in their trace, how accurate it is, if they validated the data quality of the base station info they use and quantify how inaccuracies in this data impact their findings.

The paper misses basic cross-validation experiments for the placement algorithm, without which it is difficult to evaluate the general effectiveness of the scheme.

The user analysis does not consider WiFi offloading. WiFi hotspots can bias the user behavior they are seeing in their analysis. Also they should compare their approach to one being used in industry where WiFi hotspots are being selectively beefed up.

> *** Detailed Comments: Please provide detailed comments that will help the TPC assess the paper and help provide feedback to the authors.

I liked the work. There are some issues that should be addressed, some may be resolved through better clarifications , others (eg the cross validation) may need some work.

- Effect of WIFI

Your user analysis results are not absolute as they does not account for the presence of WiFi hotspots which are likely to bias the results that you are seeing. At work and at home people who have WiFi tend to use WiFi. The top locations you see for a user may be a function of the WiFi availability for that user. Both the amount of time and amount of data one uses at a location will be influenced by if one has WiFi connectivity there.
Make it clear in the paper, that your analysis in conditioned on WiFi deployment.

Instead of ramping up the bandwidth in the mobility network, another option is to consider selectively deploying WIFI hotspots. Many providers are taking this route. This is attractive since often the most constrained resource is the last mile spectrum in mobility networks. How would that compare to your proposal in terms of cost and effectiveness ?

- How do you get the base station location of the user from the billing trace ? How accurate is it ?

As I understand, the base station location in these types billing systems is obtained from tunnel status messages for the GTP tunnel between the SGSN and GGSN over which a handsets data connection runs. The base station info is correct when the GTP session is initiated. But afterwards it can get stale if the user is moving. The GTP update messages only updates the base station info for handovers that involve SGSN handovers. For handovers between basestations covered by the same SGSN, the GTP info is not updated. A single SGSN typically covers many base stations over a large area. So your basestation information can be quite stale, particularly since GTP tunnels and states can last for the order of many days.
Its possible that the authors trace has augmented information from other sources to rectify the above problem. The
authors should explain how the base station information was populated in their trace, how accurate it is, if they validated the data quality of the base station info they use and quantify how inaccuracies in this data impact their findings.

- Evaluations: The DropZone placement is formulated as a static optimization problem. I had expected to see some validation experiments where the placement is done using one part of the data and evaluations using other data. I did not see that. This is pretty standard and required to understand what happens when the user behavior changes from the one it was optimized for.

> *** Recommendation: Your overall rating (Please try giving as few borderlines as possible).
B+ = (top 20% of reviewer's perception of all INFOCOM submissions, but not top 10%) (4)

======= Review 2 =======

> *** Contributions: What are the major issues addressed in the paper? Do you consider them important? Comment on the novelty, creativity, impact, and technical depth in the paper.

This paper proposes using so-called 'drop zones' to opportunistically transfer user-generated content in mobile networks in a delay fashion, so as to temporarily addressing the bandwidth challenge posed to mobile service providers due to the increasing amount of user-generated data traffic. The addressed problem is very important and timely. Although the idea itself is not new, the proposed solution is interesting and can be useful in practice.

> *** Strengths: What are the major reasons to accept the paper? [Be brief.]

Interesting idea, backed with analysis from real traces.

> *** Weaknesses: What are the major reasons NOT to accept the paper? [Be brief.]

Some of the technical details need to be clarified.

> *** Detailed Comments: Please provide detailed comments that will help the TPC assess the paper and help provide feedback to the authors.

1) The measurement and analysis in Sec II is interesting and insightful. But it also leaves some questions unanswered. For instance, how are the drop zones typically shared among users according to their mobility patterns?

2) The drop zone placement method described in Sec III-A needs some clarification: how are the drop zones selected based on user mobility patterns (in addition to the other constraints)? The greedy algorithm does not seem to take user mobility pattern into consideration.

3) The discussion on 'infrastructure needs' is confusing. The proposed method does not need any additional infrastructure. It basically selects a subset of existing cells and transfers content only when users move into these cells. Actually I think the more interesting problem is to study how to upgrade the bandwidth of a small number of cells, and analyze the trade-off between the cost of such upgrades and the effectiveness in content upload delays.

4) I don't understand the argument on page 7 regarding figure 8(b). The more drop zones we have, the more coverage of locations at which user can upload their content, why the travel distance is getting higher?

5) In the proposed scheme, what happens if a crowd of users move into a drop zone and create contention in uploading their content?

6) The '14-year' conclusion in Sec IV-E-1) may be too optimistic. The cellular data network is having problem mostly in densely populated areas like Manhattan. And mobility of users are largely concentrated in a small region. Using the proposed method, all cells in that small region will tend to be selected as drop zones, and the bandwidth problem can be solved without upgrading the cells or suppressing user traffic. This points back to comment 3), which I think is a more practical problem to solve. Also related to this, it would be helpful if the paper provides some numbers in terms of how many total base stations exist in the data set studied, and the percentage of drop zones selected from this pool. This will make some analysis much clearer as opposed to using the absolute # of drop zones, as it currently shows in fig 6 and 9.

> *** Recommendation: Your overall rating (Please try giving as few borderlines as possible).
B+ = (top 20% of reviewer's perception of all INFOCOM submissions, but not top 10%) (4)

======= Review 3 =======

> *** Contributions: What are the major issues addressed in the paper? Do you consider them important? Comment on the novelty, creativity, impact, and technical depth in the paper.

The paper relies on some real mobility traces of the users of a cellular network, to characterize their data uploading process. The conclusions of this study is that each user uploads most of the content by a limited number of locations and that she/he is usually postponing the upload in comparison to the time the content has been generated. These observations motivate their proposed solution to select only some specific locations (drop zones) to upgrade the cellular infrastructure and provide incentives to let the users postpone their uploads until they do not reach these zones.
A heuristic is proposed to determine the position of such locations and experiments on the same traces are carried out to show the number of dropzones needed to be able to upload through the drop zones a given percentage of the total traffic under a constraint on the maximum time the upload is postponed.
The conclusions from the analysis of the traces are interesting, even if I have some remarks about their validity. To the best of my knowledge the proposed solution is new, even if it is not a breakthrough. It is not clear why the drop zones could not be simply WiFi access points. Performance evaluation would benefit from a comparison with the performance of the existing cellular network.

> *** Strengths: What are the major reasons to accept the paper? [Be brief.]

The change of usage traffic patterns due to the spread of smartphones is a real challenge for user operator.
The analysis of real traces is the key element in this paper.

> *** Weaknesses: What are the major reasons NOT to accept the paper? [Be brief.]

The conclusion about users being willing to postpone their uploads is questionable.
It is not clear why users could not be incentivized to use alternative access networks like WiFi, or wired networks.

> *** Detailed Comments: Please provide detailed comments that will help the TPC assess the paper and help provide feedback to the authors.

I first explain my remarks about the main weaknesses of the paper.

1) The conclusion about users being willing to postpone their uploads is questionable.

The authors show that users often postpone their uploads after many hours from the generation of the content. From this observation they conclude that users may be willing to postpone their uploads and this would support the their drop zones solution.
In reality, it seems reasonably that the upload occurs at a later time for one of the following reasons:
i) the users have not decided yet to upload the content,
ii) they do not have time to upload it at the generation time (they are moving and they were just able to took the photo),
iii) they have experienced bad performance when uploading while moving and so they postpone their upload to a later moment when they are at one of their comfort zones.
These reasons are in agreement with the fact that they upload from a limited number of locations. At home or at work they have the time to process the content generated and decide what to do with it, they have the time to launch the upload, and they may experience better uploading performance.
Then, in my opinion, the observation of late uploads does not provide evidence that users may be willing to postpone them.

2) It is not clear why users could not be incentivized to use alternative access networks like WiFi, or wired networks.

The authors think that a software should be installed on the smartphones to ask the users if they are willing to postpone their upload and to transparently do it when the user reaches a drop zone. The authors think that users should be incentivized to choose this option. Now it seems that these incentives could also promote the use of other cheaper networks that could be available later, like WiFi connections (most smartphones support WiFi), or wired connections.
If users can really be convinced to postpone their upload, why would it be necessary to deploy smart zones rather than taking advantage of existing alternatives?

OTHER REMARKS
- A comparison with the current cellular architecture is missing.

Figure 9 motivates the conclusion that the deployment of 963 base stations with LTE technology could be able to handle a 4 order magnitude increase of the traffic (expected to happen in 14 years) while still managing 50% of the content upload through drop zones. I would have appreciated a comparison with the current cellular network: what percentage of traffic increase could it handle if no upgrade was performed?

- Figure 8.b about the average distance from a drop zone
As the authors state, the increase on the average distance is counter-intuitive and I have not understood their explanation. It is possible that the authors are not showing the average distance of a user from the closest drop zone to her/him (then the average is evaluated across the users), but rather the average distance of users from all the drop zones (then the average is evaluated across the users and for each users across the drop zones). This would justify the result. Please clarify.

> *** Recommendation: Your overall rating (Please try giving as few borderlines as possible).
B+ = (top 20% of reviewer's perception of all INFOCOM submissions, but not top 10%) (4)