W-MUST Review #8A
                Updated Monday 25 Apr 2011 8:15:16am PDT
     Paper #8: Understanding End-user Perception of Network Problems

                     Overall merit: 2. Weak reject
                Reviewer expertise: 4. Expert

                        ===== Paper summary =====

This paper collects network and system performance data at endhosts
together with user irritation (i.e., moments when users click a button
to indicate that they are unhappy with their network performance). The
paper studies 32 users from Northwestern University. The goal of the
analysis in the paper is to verify assumptions that people may how
network performance metrics correlate with user irritation.

                      ===== Reasons to Accept =====

Interesting problem of understanding user perception of network performance.

The tool has some nice properties, for instance: it forces logging of
all metrics when user reports irritation, which guarantees that it has
measurements when it matters.

                      ===== Reasons to Reject =====

It is unclear where the assumptions in the paper come from and whether
the collected data and the analysis methods can support the

The paper is sometimes hard to follow.

                ===== Further comments for authors =====

This paper presents an ambitious effort to collect endhost data
together with the user irritation. The user study and the data should
help understand under which network performance properties users feel
irritated with their network experience. The paper is well on target
for the workshop and should generate discussion.

My main concern with this paper is on the conclusions from the
analysis. First, the paper says that the hypothesis it studies are
common assumptions of user perception of performance. If this is the
case, are there any references for these assumptions? Second, the
current presentation of the results leaves many questions about the
relevance and accuracy of the conclusions. More detailed comments

Introduction, end of first paragraph says that engineers work based on
assumptions. Although engineers do make assumptions and that we
clearly need a better understanding of user perception of performance,
the situation is not as bad as the introduction suggests. There have
been user studies for many applications like voice, video, gaming, or

Sec. 3: Do users give any feedback on irritation or just the click?
Are they supposed to click multiple times if event persists?

How do you guarantee that users are rating irritation consistently?
Did you interview the users at the end?

Sec. 4:

- If users do not rate irritation consistently, then the method of
comparing irritation vs. non-irritation periods may lead to wrong

- It would be nice to have some citations for each hypothesis. Some of
these hypotheses do not seem common network engineer hypothesis. For
instance, why is hypothesis 1 relevant?
- Under hypothesis 1: throughput values during irritation are higher.
It is hard to interpret this. If people are getting higher throughput
then they should be happy with their network. On the other hand, this
may also mean that the user is generating lots of competing traffic
and some of the applications may work poorly because of that.

- The definition of the connections associated with an irritation
could be refined. Instead of taking a time window around the event,
you could look at the application that the user is using at the time
of the irritation.

- In hypothesis 2: what is a small flow? Total bytes? Low rate? It
would be helpful to define. At the end of this hypothesis it would be
nice to give some examples of small flows that are long lived, the
ones that are often associated with irritation.

- Hypothesis 3: It is hard to discuss irritation with a given AS with
considering the applications it supports. These factors are not
independent. Maybe the ASes that are responsible for most irritation
are just ASes with more sensitive applications.

- Hypothesis 4: here you have a heuristic to classify flow as
streaming and you say that 90% of the flows do indeed correspond to
content delivery networks. What about the flows that you do not
classify as streaming? If you want to compare the two distributions
you need to be sure that there are no false negatives as well. Also,
the metric used study irritation with streaming versus non-streaming
is not very natural. I'm not sure I understand what it captures. Maybe
you are just looking at the wrong metric and your conclusions are just
because of the metric and a bad classification of streaming content.
I'm also not sure the hypothesis itself is correct. People have a
hypothesis that users are sensitive to streaming, but not necessarily
*more* sensitive, which is your hypothesis. I always thought that the
most sensitive application is gaming.

- Second Hypothesis 4 (irritation is stateful): the first paragraph is

- Hypothesis 5: you say and the curve shows that users are more likely
to be irritated under lower signal quality, but then you conclude that
signal quality is not a factor. This is confusing.

- The text explaining Figure 9 talks about a baseline. Is this
presented in the figure? What is it?

- The conclusion in Hypothesis 6 seems to contradict Hypothesis 5. If
wireless quality is not a factor, then why is the access point a

The writing needs a lot more detail in places. You could cut some of
the hypothesis and explain a few of them well.

                            W-MUST Review #8B
                Updated Saturday 2 Apr 2011 3:22:17pm PDT
     Paper #8: Understanding End-user Perception of Network Problems

                     Overall merit: 4. Accept
                Reviewer expertise: 2. Some familiarity

                        ===== Paper summary =====

This paper presents the design and evaluation of SoylentLogger, a
client-based logging tool that captures network data as well as
feedback from end-users. The paper also presents results of an
analysis of the data with several interesting results about network
traffic flows and end-users' perception of irritation.

                      ===== Reasons to Accept =====

- Pretty interesting tool
- Good analysis of the data
- Good linkage between low-level network data and higher-level
end-user perceptions

                      ===== Reasons to Reject =====

- Used only technical people in the user study

                ===== Further comments for authors =====

Overall, I found this to be a really interesting paper. I think it
would spur a lot of good discussion, and as such would argue for
accepting it.

Having highly technical participants often isn't a good thing for user
studies, but I think it's useful and appropriate here, in terms of
being consistent in data collection and getting user data that will be
more likely to be accurate. However, it also does cast into question
hypothesis 1 for a general audience (Users can distinguish between
local and network sources of irritation).

For hypothesis 2, would be good to define "small flows"

Paper goes Figure 5, Figure 7, then Figure 6 (which is also more of a
table than a figure)

- "Our subjects use a variety of network services and that few have
academic experience in CS, CE, or EE." -> reads a bit awkward

                            W-MUST Review #8C
               Updated Saturday 23 Apr 2011 8:02:15am PDT
     Paper #8: Understanding End-user Perception of Network Problems

                     Overall merit: 1. Reject
                Reviewer expertise: 3. Knowledgeable

                        ===== Paper summary =====

The authors conducted a study in which they collected end-host network
traffic along with user labels that indicate when a user was annoyed.
Then they propose 6 hypotheses and test them with their data. For
example,  they examine whether user irritation is affected by flow
size, location, wireless link quality, the AS’s they visit and so on.
While this paper poses nice questions, the results they present either
in support of or against the hypotheses are often unconvincing and/or
problematic in their construction. (See details below.) This paper is
not ready for publication.

                      ===== Reasons to Accept =====

This paper poses an interesting question, namely,  what network
characteristics lead to user irritation. They also have an interesting
dataset of network data that is labeled by users when moments of
irritation occur.

                      ===== Reasons to Reject =====

The paper is poorly written, including key elements such as
ambiguously worded hypotheses. Also, this reviewer believes that 3 of
the 6 hypotheses are not demonstrated in the paper – either due to
ambiguity,  weak metrics or insufficient exploration of the
implications of the hypothesis.

                ===== Further comments for authors =====

Hypothesis 1. I’m not convinced that the metric you use to support
hypothesis 1 actually confirms the claim. Perhaps the 2 distributions
of page fault rate during time windows of irritation and time windows
of non-irritation don’t differ because there were no page fault
events. Did you check this? If so then your claim doesn’t quite work.
You may conclude that users can perceive network events that impact
throughput, but you cannot conclude that users are unaffected by page
fault events. You would need to show instances of high CPU utilization
under which no irritation event was logged (same for page faults), in
order to claim that “users can distinguish between local and machine
sources of irritation”. Also, the text in this section implies (if the
claim were true) that users don’t care about high CPU utilization or
page faults, but they do care about high network throughput. Is that
your point - that even when page faults occur user's do not get
annoyed? Note a!

 lso that high throughput can be a GOOD thing, especially when
downloading video clips. In this case the user should be happy that
the video is downloading quickly.  High throughput becomes a problem
only when it surpasses a threshold such that traffic flow gets
interrupted or delayed. It would be interesting to know what is the
total throughput capacity of the network connection of your user’s
machines – this affects the critical threshold.

Hypothesis 2. The statement of the result for this hypothesis in the
highlighted box is ambiguous. What does it mean to say “further
evidence”. This is not a completed English sentence. Does it mean that
further evidence is needed to support your claim? If so, why did you
include this in the paper? You should provide a reference to the
statement “it is widely assumed that small flows are critical to the
end-user experience and that poor performance of small flows dominates
a user’s perspective”

Hypothesis 3: In figure 6b you say for the top 3 ASes in terms of
highest irritation rate, you observe a small number of very large
flows that are responsible for this irritation.  Does this contradict
your conclusion from hypothesis #2 that the performance of small flows
dominates a user’s experience? Hypothesis 3 is also poorly worded. I
wouldn’t say that “user irritation is dependent upon the services …”
This sounds so generic as to be obvious. What you really mean is that
their irritation is dependent upon the AS hosting the service.
Alternatively put, “their irritation is highly uneven across AS’s”, or
something like that.

Hypothesis 4: Again the statement of the hypothesis is extremely
ambiguous. If you are presenting this as a hypothesis the reader
should not have to read the following paragraph to know what you are
talking about. Why not make a precise statement in the hypothesis
itself? You are talking here about “persistence” of the user remaining
in an annoyed state. However, I don’t consider a 20 second period to
be indicative of “statefulness” or “persistence” or user irritation.
Some people can take a long time to calm down after getting annoyed –
this is just personality. The more direct question is how long does
the underlying cause of irritation last? Since you have the low level
network measurements, why not look at what else was going on at the
time of irritation. What is poor throughput? Did poor throughout
continue throughout the 20 second period? Also, if the flows causing
irritation are long-lived (as you showed in hypothesis 2), then it
seems logical that irr!

 itation episodes last 15-20 seconds. So in the end, are you saying
that user irritation lasts as long as the underlying cause of

There are other statements in the paper that are claimed but not
demonstrated. For example, in section 3 you say that “the interarrival
time distribution for irritation events from an active individual
users appears to fit a power law.” I couldn’t find any evidence of
this claim in the paper (nor a reference).