Aleksandar
Kuzmanovic, Professor
Seeley Mudd 3519, 847-467-5519, akuzma@northwestern.edu
Office Hours: By appointment
Time/Place:
Lectures: TuTh 12:30PM-1:50PM
Technological Institute L170
The course will cover a broad range of topics including congestion control,
routing, analysis and design of network protocols (both wired and wireless),
data centers, analysis and performance of content distribution networks,
network security, vulnerability, and defenses, net neutrality, and online social
networks. In particular:
Datacenter
architectures: coping with heterogeneity
Today’s
data centers may contain tens of thousands of computers with significant
aggregate bandwidth requirements. The network architecture typically consists
of a tree of routing and switching elements with progressively more specialized
and expensive equipment moving up the network hierarchy. Non-uniform bandwidth
among data center nodes complicates application design and limits overall
system performance. We will study a range of datacenter network architectures
that aim to resolve the above heterogeneity-induced problems. In the context of
heterogeneity, we will further analyze architectures that aim to resolve the
problems experienced by small jobs, which are typically run for interactive
data analyses in datacenters, and which continue to be plagued by
disproportionately long-running tasks called stragglers. We will analyze
mitigation techniques based on speculation and job cloning.
Cloud applications
and usage
Reliability
at massive scale is one of the biggest challenges for cloud applications. Even
the slightest outage has significant financial consequences and impacts
customer trust. Large-scale platforms are implemented on top of an
infrastructure of tens of thousands of servers and network components located
in many datacenters around the world. At this scale, small and large components
fail continuously and the way persistent state is managed in the face of these
failures drives the reliability and scalability of the software systems. We
will analyze methods that make it possible to provide an “always-on”
experience, despite the component failures. We will further explain the
underlying tradeoffs between service availability and data consistency.
Cloud application
mitigation: hybrid architectures
We
will learn about challenges in migrating enterprise services into hybrid
cloud-based deployments, where enterprise operations are partly hosted
on-premises and partly in the cloud. Such hybrid architectures enable
enterprises to benefit from cloud-based architectures, while honoring
application performance requirements, and privacy restrictions on what services
may be migrated to the cloud. We will analyze the complexity inherent in
enterprise applications today in terms of their multi-tiered nature, large
number of application components, and interdependencies. We will shed insight
on security policies associated with enterprise applications in data centers.
We articulate the importance of ensuring reconfiguration of security policies
as enterprise applications are migrated to the cloud.
Network resource
sharing
The
network, similar to CPU and memory, is a critical and shared resource in the
cloud. However, unlike other resources, it is neither shared proportionally to
payment, nor do cloud providers offer minimum guarantees on network bandwidth.
We will analyze the fundamental tradeoffs when sharing cloud networks by
studying different allocation policies that allow users to navigate the
tradeoff space.
Cloud-specific
congestion control schemes
TCP
incast is a network transport pathology that affects
many-to-one communication patterns in datacenters. It is caused by a complex
interplay between datacenter applications, the underlying switches, network
topology, and TCP, which was originally designed for wide area networks. Incast increases the queuing delay of flows, and decreases
application level throughput to far below the link
bandwidth. The problem especially affects computing paradigms in which
distributed processing cannot progress until all parallel threads in a stage
complete. Examples of such paradigms include distributed file systems, web
search, advertisement selection, and other applications with partition or
aggregation semantics. We will analyze effective solutions to the TCP incast problem. Furthermore, we will study other congestion
control mechanisms and protocols proposed for datacenters.
Data placement
By
offering storage services in several geographically distributed data centers,
cloud computing platforms enable applications to offer low latency access to
user data. However, application developers are left to deal with the
complexities associated with choosing the storage services at which any object
is replicated and maintaining consistency across these replicas. We will
analyze a key-value store that exports a unified view of storage services in
geographically distributed data centers. We will further show that cloud
tenants can do a better job placing applications by understanding the
underlying cloud network as well as the demands of the applications. To do so,
tenants must be able to quickly and accurately measure the cloud network and
profile their applications, and then use a network-aware placement method to
place applications.
Security and privacy
issues in cloud services
We
will analyze new application platforms that prevent apps from misusing
information about their users. To strike a useful balance between users’
privacy and apps’ functional needs, such platforms shift much of the
responsibility for protecting privacy from the app and its users to the
platform itself. To achieve this, the platforms deploy a sandbox that spans the
user’s device and the cloud, specialized storage and communication channels
that enable common app functionalities.
Datacenter
performance characterization
Although
there is tremendous interest in designing improved networks for data centers,
very little is known about the network-level traffic characteristics of current
data centers. We will analyze results from empirical studies of the network
traffic in data centers belonging to different types of organizations. We will
analyze SNMP statistics, topology, and packet-level traces. We will examine the
range of applications deployed in these data centers and their placement, the
flow-level and packet-level transmission properties of these applications, and
their impact on network utilization, link utilization, congestion, and packet
drops. We will describe the implications of the observed traffic patterns for
data center internal traffic engineering as well as for architectures for data
center networks. We will further analyze upgrades made to the HTTP/1.1 protocol
to improve the user-perceived performance.
Software defined
networking
We
will analyze a new network architecture for the
enterprise. The architecture allows managers to define a single network-wide
fine-grain policy, and then enforces it directly. It couples extremely simple
flow-based Ethernet switches with a centralized controller that manages the
admittance and routing of flows. While radical, this design is
backwards-compatible with existing hosts and switches. We will further analyze
the design, implementation, and evaluation of an API for applications to
control a software-defined network (SDN). The API addresses the two key
challenges: how to safely decompose control and visibility of the network, and
how to resolve conflicts between untrusted users and across requests, while
maintaining baseline levels of fairness and security.
Energy usage
Large-scale
Internet applications, such as content distribution networks, are deployed
across multiple datacenters and consume massive amounts of electricity. To provide
uniformly low access latencies, these datacenters are geographically
distributed and the deployment size at each location reflects the regional
demand for the application. Consequently, an application’s environmental impact
can vary significantly depending on the geographical distribution of end-users,
as electricity cost and carbon footprint per watt is location specific. We will
analyze a flow optimization based framework for request-routing and traffic
engineering. It dynamically controls the fraction of user traffic directed to
each datacenter in response to changes in both request workload and carbon
footprint. It allows an operator to navigate the three-way tradeoff between
access latency, carbon footprint, and electricity costs and to determine an
optimal datacenter upgrade plan in response to increases in traffic load.
Students will form teams of two or three; each team will tackle a well-defined
research project during the quarter. A list of suggested project topics will be
provided. All projects are subjected to approval by the instructor. The project
component will include a short written project proposal, a short mid-term
project report, a final project presentation, and a final project report. Each
component adds some significant element to the paper, and the overall
project grade will be based on the quality of each component of your work.
The above project components are due by email to the instructor by the end of
the given day of the respective week.
1. Week 1 (Thursday 3/30) Project presentations by group leaders
2. Week 2 (Tuesday 4/4) Form groups of 2 or 3, choose a topic for your project, and meet
with the project leader.
3. Week 3 (Tuesday 4/11) Write an introduction describing the problem and how you plan to
approach it (what will you actually do?). Include motivation (why does the
problem matter?) and related work (what have others already done about it?).
2 pages total.
4. Week 6 (Tuesday 5/2) Midterm presentation. Update your paper to include your
preliminary results. 5 pages total.
5. Week 10 (Thursday 6/1):
Presentations by all groups.
6. Week 11 (Friday 6/9) Turn in your completed paper. 10 pages total. You should
incorporate the comments received during the presentation.
Each team will have a weekly meeting with project
leaders.
1. Paper reviews (15%), presentations
(20%) and debating in the class (15%): 50%
2. Projects 50% (Project proposal: 5%;
Midterm report: 5%; weekly report and meeting: 10%; project presentation: 10%; final
project report: 20%)
3. Research idea report (optional, 3
pages): 10%
Recommended: CS 340 or equivalent networking course
There will be no textbook for
this class. Classes will be a combination of lectures conducted by the
professor and discussions led by students. In cases when students are assigned
to discuss a topic in the class, they will review and discuss research papers
related to networking problems in cloud computing. Students must read the
assigned papers and submit paper reviews before each lecture. Two teams of
students will be chosen to debate and lead the discussion. One team will be
designated the offense and the other the defense. In class, the defense team
will present first. For 30 minutes the team will discuss the work as if it were
their own.
1. The team should
present the work and make a compelling case why the contribution is
significant. This will include the context of the contribution, prior work, and
in cases where papers are previously published, how the work has influenced the
research community or industry's directions (impact). If the paper is very
recent, the defense should present arguments for the potential impact. Coming
up with potential future work can show how the paper opens doors to new
research.
2. The
presentation should go well beyond a paper "summary". The defense
should not critique the work other than to try to pre-empt attacks from the
offense (e.g., by explicitly limiting the scope of the contribution).
3. The defense should also try to
look up related work to support their case (CiteSeer
is a good place to start looking.)
After the defense presentation, the offense team will state their case for
20 minutes.
1. This team should critique the work,
and make a case for missing links, unaddressed issues, lack of impact,
inappropriateness of the problem formulation, etc.
2. The more insightful and less obvious
the criticisms the better.
3. While the offense should prepare
remarks in advance, they should also react to the points made by the defense.
4. The offense should also try to look
up related work to support their case.
Next, the defense and offense will be allowed follow up arguments, and
finally, the class will question either side either for clarifications or to
add to the discussions and controversy and make their own points on either
side. The presentations should be written in Powerpoint
format and will be posted on the course web page after each class.
All students must read the assigned papers and write reviews for the papers
before each lecture. Email the reviews to the instructor and TA ( akuzma@cs.northwestern.edu
and yunming.xiao@u.northwestern.edu)
prior to each lecture. Periodically,
a random subset of the reviews will be evaluated and feedback will be provided
directly to students.
Please send one review in plain text per email in the body of the email
message.
A review should summarize the paper sufficiently to demonstrate your
understanding, should point out the paper's contributions, strengths as well as
weaknesses. Think in terms of what makes good research? What qualities make a
good paper? What are the potential future impacts of the work? Note that there
is no right or wrong answer to these questions. A review's quality will mainly
depend on its thoughtfulness. Restating the abstract/conclusion of the paper
will not earn a top grade. Reviews should cover all of the following aspects:
1. What is the main result of the paper? (One or
two sentence summary)
2. What strengths do you see in this paper? (Your
review needs have at least one or two positive things to say)
3. What are some key limitations, unproven
assumptions, or methodological problems with the work?
4. How could the work be improved?
5. What is its relevance today, or what future work
does it suggest?
6. Overall score? 1 (worst) - 5 (best) (no 3s)
Course web site: http://networks.cs.northwestern.edu/CS497-s23/
Check it regularly for schedule changes and other course-related announcements.
Canvas Webpage: https://canvas.northwestern.edu/courses/190403
Students in this course are required to comply with the policies found in the booklet, "Academic Integrity at Northwestern University: A Basic Guide". All papers submitted for credit in this course must be submitted electronically unless otherwise instructed by the professor. Your written work may be tested for plagiarized content. For details regarding academic integrity at Northwestern or to download the guide, visit: https://www.northwestern.edu/provost/policies-procedures/academic-integrity/index.html
Northwestern University is committed to providing the most accessible learning environment as possible for students with disabilities. Should you anticipate or experience disability-related barriers in the academic setting, please contact AccessibleNU to move forward with the university's established accommodation process (e: accessiblenu@northwestern.edu; p: 847-467-5530). If you already have established accommodations with AccessibleNU, please let me know as soon as possible, preferably within the first two weeks of the term, so we can work together to implement your disability accommodations. Disability information, including academic accommodations, is confidential under the Family Educational Rights and Privacy Act.
This course strives to be an inclusive learning community, respecting those of differing backgrounds and beliefs. As a community, we aim to be respectful to all students in this class, regardless of race, ethnicity, socio-economic status, religion, gender identity or sexual orientation.
Unauthorized student recording of classroom or other academic activities (including advising sessions or office hours) is prohibited. Unauthorized recording is unethical and may also be a violation of University policy and state law. Students requesting the use of assistive technology as an accommodation should contact AccessibleNU. Unauthorized use of classroom recordings - including distributing or posting them - is also prohibited. Under the University's Copyright Policy, faculty own the copyright to instructional materials - including those resources created specifically for the purposes of instruction, such as syllabi, lectures and lecture notes, and presentations. Students cannot copy, reproduce, display, or distribute these materials. Students who engage in unauthorized recording, unauthorized use of a recording, or unauthorized distribution of instructional materials will be referred to the appropriate University office for follow-up.
Northwestern University is committed to supporting the wellness of our students. Student Affairs has multiple resources to support student wellness and mental health. If you are feeling distressed or overwhelmed, please reach out for help. Students can access confidential resources through the Counseling and Psychological Services (CAPS), Religious and Spiritual Life (RSL) and the Center for Awareness, Response and Education (CARE). Additional information on all of the resources mentioned above can be found here:
March 2023, Aleksandar
Kuzmanovic