CS 397/497: Selected Topics in Computer Networks

Instructor:

Aleksandar Kuzmanovic, Professor
Seeley Mudd 3519, 847-467-5519, akuzma@northwestern.edu
Office
Hours: By appointment

Time/Place:

Lectures: TuTh 12:30PM-1:50PM
Technological Institute L160

Overview

The course will cover a broad range of topics including congestion control, routing, analysis and design of network protocols (both wired and wireless), data centers, analysis and performance of content distribution networks, network security, vulnerability, and defenses, net neutrality, and online social networks. In particular:

Datacenter architectures: coping with heterogeneity

 

Today’s data centers may contain tens of thousands of computers with significant aggregate bandwidth requirements. The network architecture typically consists of a tree of routing and switching elements with progressively more specialized and expensive equipment moving up the network hierarchy. Non-uniform bandwidth among data center nodes complicates application design and limits overall system performance. We will study a range of datacenter network architectures that aim to resolve the above heterogeneity-induced problems. In the context of heterogeneity, we will further analyze architectures that aim to resolve the problems experienced by small jobs, which are typically run for interactive data analyses in datacenters, and which continue to be plagued by disproportionately long-running tasks called stragglers. We will analyze mitigation techniques based on speculation and job cloning. 

 

Cloud applications and usage

 

Reliability at massive scale is one of the biggest challenges for cloud applications. Even the slightest outage has significant financial consequences and impacts customer trust. Large-scale platforms are implemented on top of an infrastructure of tens of thousands of servers and network components located in many datacenters around the world. At this scale, small and large components fail continuously and the way persistent state is managed in the face of these failures drives the reliability and scalability of the software systems. We will analyze methods that make it possible to provide an “always-on” experience, despite the component failures. We will further explain the underlying tradeoffs between service availability and data consistency. 

 

Cloud application mitigation: hybrid architectures

 

We will learn about challenges in migrating enterprise services into hybrid cloud-based deployments, where enterprise operations are partly hosted on-premises and partly in the cloud. Such hybrid architectures enable enterprises to benefit from cloud-based architectures, while honoring application performance requirements, and privacy restrictions on what services may be migrated to the cloud. We will analyze the complexity inherent in enterprise applications today in terms of their multi-tiered nature, large number of application components, and interdependencies. We will shed insight on security policies associated with enterprise applications in data centers. We articulate the importance of ensuring reconfiguration of security policies as enterprise applications are migrated to the cloud.

 

Network resource sharing

 

The network, similar to CPU and memory, is a critical and shared resource in the cloud. However, unlike other resources, it is neither shared proportionally to payment, nor do cloud providers offer minimum guarantees on network bandwidth. We will analyze the fundamental tradeoffs when sharing cloud networks by studying different allocation policies that allow users to navigate the tradeoff space.

 

Cloud-specific congestion control schemes

 

TCP incast is a network transport pathology that affects many-to-one communication patterns in datacenters. It is caused by a complex interplay between datacenter applications, the underlying switches, network topology, and TCP, which was originally designed for wide area networks. Incast increases the queuing delay of flows, and decreases application level throughput to far below the link bandwidth. The problem especially affects computing paradigms in which distributed processing cannot progress until all parallel threads in a stage complete. Examples of such paradigms include distributed file systems, web search, advertisement selection, and other applications with partition or aggregation semantics. We will analyze effective solutions to the TCP incast problem. Furthermore, we will study other congestion control mechanisms and protocols proposed for datacenters.

 

Data placement

 

By offering storage services in several geographically distributed data centers, cloud computing platforms enable applications to offer low latency access to user data. However, application developers are left to deal with the complexities associated with choosing the storage services at which any object is replicated and maintaining consistency across these replicas. We will analyze a key-value store that exports a unified view of storage services in geographically distributed data centers. We will further show that cloud tenants can do a better job placing applications by understanding the underlying cloud network as well as the demands of the applications. To do so, tenants must be able to quickly and accurately measure the cloud network and profile their applications, and then use a network-aware placement method to place applications.

 

Security and privacy issues in cloud services

 

We will analyze new application platforms that prevent apps from misusing information about their users. To strike a useful balance between users’ privacy and apps’ functional needs, such platforms shift much of the responsibility for protecting privacy from the app and its users to the platform itself. To achieve this, the platforms deploy a sandbox that spans the user’s device and the cloud, specialized storage and communication channels that enable common app functionalities.

 

Datacenter performance characterization

 

Although there is tremendous interest in designing improved networks for data centers, very little is known about the network-level traffic characteristics of current data centers. We will analyze results from empirical studies of the network traffic in data centers belonging to different types of organizations. We will analyze SNMP statistics, topology, and packet-level traces. We will examine the range of applications deployed in these data centers and their placement, the flow-level and packet-level transmission properties of these applications, and their impact on network utilization, link utilization, congestion, and packet drops. We will describe the implications of the observed traffic patterns for data center internal traffic engineering as well as for architectures for data center networks. We will further analyze upgrades made to the HTTP/1.1 protocol to improve the user-perceived performance.

 

Software defined networking

 

We will analyze a new network architecture for the enterprise. The architecture allows managers to define a single network-wide fine-grain policy, and then enforces it directly. It couples extremely simple flow-based Ethernet switches with a centralized controller that manages the admittance and routing of flows. While radical, this design is backwards-compatible with existing hosts and switches. We will further analyze the design, implementation, and evaluation of an API for applications to control a software-defined network (SDN). The API addresses the two key challenges: how to safely decompose control and visibility of the network, and how to resolve conflicts between untrusted users and across requests, while maintaining baseline levels of fairness and security.

 

Energy usage

 

Large-scale Internet applications, such as content distribution networks, are deployed across multiple datacenters and consume massive amounts of electricity. To provide uniformly low access latencies, these datacenters are geographically distributed and the deployment size at each location reflects the regional demand for the application. Consequently, an application’s environmental impact can vary significantly depending on the geographical distribution of end-users, as electricity cost and carbon footprint per watt is location specific. We will analyze a flow optimization based framework for request-routing and traffic engineering. It dynamically controls the fraction of user traffic directed to each datacenter in response to changes in both request workload and carbon footprint. It allows an operator to navigate the three-way tradeoff between access latency, carbon footprint, and electricity costs and to determine an optimal datacenter upgrade plan in response to increases in traffic load.

 

Projects

Students will form teams of two or three; each team will tackle a well-defined research project during the quarter. A list of suggested project topics will be provided. All projects are subjected to approval by the instructor. The project component will include a short written project proposal, a short mid-term project report, a final project presentation, and a final project report. Each component adds some significant element to the paper, and the overall project grade will be based on the quality of each component of your work. The above project components are due by email to the instructor by the end of the given day of the respective week.

1.   Week 1 (Thursday 3/30) Project presentations by group leaders

2.   Week 2 (Tuesday 4/4) Form groups of 2 or 3, choose a topic for your project, and meet with the project leader.

3.   Week 3 (Tuesday 4/11) Write an introduction describing the problem and how you plan to approach it (what will you actually do?). Include motivation (why does the problem matter?) and related work (what have others already done about it?). 2 pages total.

4.   Week 6 (Tuesday 5/2) Midterm presentation. Update your paper to include your preliminary results. 5 pages total.

5.   Week 10 (Thursday 6/1): Presentations by all groups.

6.   Week 11 (Friday 6/9) Turn in your completed paper. 10 pages total. You should incorporate the comments received during the presentation.

Each team will have a weekly meeting with project leaders.

Grading

1.   Paper reviews (15%), presentations (20%) and debating in the class (15%): 50%

2.   Projects 50% (Project proposal: 5%; Midterm report: 5%; weekly report and meeting: 10%; project presentation: 10%; final project report: 20%)

3.   Research idea report (optional, 3 pages): 10%

Prerequisites

Recommended: CS 340 or equivalent networking course

Classes, Textbook, and other readings

There will be no textbook for this class. Classes will be a combination of lectures conducted by the professor and discussions led by students. In cases when students are assigned to discuss a topic in the class, they will review and discuss research papers related to networking problems in cloud computing. Students must read the assigned papers and submit paper reviews before each lecture. Two teams of students will be chosen to debate and lead the discussion. One team will be designated the offense and the other the defense. In class, the defense team will present first. For 30 minutes the team will discuss the work as if it were their own.

1.    The team should present the work and make a compelling case why the contribution is significant. This will include the context of the contribution, prior work, and in cases where papers are previously published, how the work has influenced the research community or industry's directions (impact). If the paper is very recent, the defense should present arguments for the potential impact. Coming up with potential future work can show how the paper opens doors to new research.

2.    The presentation should go well beyond a paper "summary". The defense should not critique the work other than to try to pre-empt attacks from the offense (e.g., by explicitly limiting the scope of the contribution).

3.     The defense should also try to look up related work to support their case (CiteSeer is a good place to start looking.)

After the defense presentation, the offense team will state their case for 20 minutes.

1.    This team should critique the work, and make a case for missing links, unaddressed issues, lack of impact, inappropriateness of the problem formulation, etc.

2.    The more insightful and less obvious the criticisms the better.

3.    While the offense should prepare remarks in advance, they should also react to the points made by the defense.

4.    The offense should also try to look up related work to support their case.

Next, the defense and offense will be allowed follow up arguments, and finally, the class will question either side either for clarifications or to add to the discussions and controversy and make their own points on either side. The presentations should be written in Powerpoint format and will be posted on the course web page after each class.

Writing and Submitting Reviews

All students must read the assigned papers and write reviews for the papers before each lecture. Email the reviews to the instructor and TA ( akuzma@cs.northwestern.edu and yunming.xiao@u.northwestern.edu) prior to each lecture. Periodically, a random subset of the reviews will be evaluated and feedback will be provided directly to students.

Please send one review in plain text per email in the body of the email message.

A review should summarize the paper sufficiently to demonstrate your understanding, should point out the paper's contributions, strengths as well as weaknesses. Think in terms of what makes good research? What qualities make a good paper? What are the potential future impacts of the work? Note that there is no right or wrong answer to these questions. A review's quality will mainly depend on its thoughtfulness. Restating the abstract/conclusion of the paper will not earn a top grade. Reviews should cover all of the following aspects:

1.     What is the main result of the paper? (One or two sentence summary)

2.     What strengths do you see in this paper? (Your review needs have at least one or two positive things to say)

3.     What are some key limitations, unproven assumptions, or methodological problems with the work?

4.     How could the work be improved?

5.     What is its relevance today, or what future work does it suggest?

6.     Overall score? 1 (worst) - 5 (best) (no 3s)

Communication

Course web site: http://networks.cs.northwestern.edu/CS497-s23/
Check it regularly for schedule changes and other course-related announcements.

Canvas Webpage: https://canvas.northwestern.edu/courses/190403

Academic Integrity

Students in this course are required to comply with the policies found in the booklet, "Academic Integrity at Northwestern University: A Basic Guide". All papers submitted for credit in this course must be submitted electronically unless otherwise instructed by the professor. Your written work may be tested for plagiarized content. For details regarding academic integrity at Northwestern or to download the guide, visit: https://www.northwestern.edu/provost/policies-procedures/academic-integrity/index.html

Accessibility

Northwestern University is committed to providing the most accessible learning environment as possible for students with disabilities. Should you anticipate or experience disability-related barriers in the academic setting, please contact AccessibleNU to move forward with the university's established accommodation process (e: accessiblenu@northwestern.edu; p: 847-467-5530). If you already have established accommodations with AccessibleNU, please let me know as soon as possible, preferably within the first two weeks of the term, so we can work together to implement your disability accommodations. Disability information, including academic accommodations, is confidential under the Family Educational Rights and Privacy Act.

Diversity, Equity, and Inclusion

This course strives to be an inclusive learning community, respecting those of differing backgrounds and beliefs. As a community, we aim to be respectful to all students in this class, regardless of race, ethnicity, socio-economic status, religion, gender identity or sexual orientation.

Prohibition of Recording of Class Sessions by Students

Unauthorized student recording of classroom or other academic activities (including advising sessions or office hours) is prohibited. Unauthorized recording is unethical and may also be a violation of University policy and state law. Students requesting the use of assistive technology as an accommodation should contact AccessibleNU. Unauthorized use of classroom recordings - including distributing or posting them - is also prohibited. Under the University's Copyright Policy, faculty own the copyright to instructional materials - including those resources created specifically for the purposes of instruction, such as syllabi, lectures and lecture notes, and presentations. Students cannot copy, reproduce, display, or distribute these materials. Students who engage in unauthorized recording, unauthorized use of a recording, or unauthorized distribution of instructional materials will be referred to the appropriate University office for follow-up.

Support for Wellness and Mental Health

Northwestern University is committed to supporting the wellness of our students. Student Affairs has multiple resources to support student wellness and mental health. If you are feeling distressed or overwhelmed, please reach out for help. Students can access confidential resources through the Counseling and Psychological Services (CAPS), Religious and Spiritual Life (RSL) and the Center for Awareness, Response and Education (CARE). Additional information on all of the resources mentioned above can be found here:

 


March 2024, Aleksandar Kuzmanovic