Image: ESA - C.Carreau (SEMPDN9OY2F)
Overview | Assignments | Policies

NETS 2120: Scalable and Cloud Computing (Fall 2026)

What is the "cloud"? How do we build software systems and components that scale to millions of users and petabytes of data, and are "always available"?

In the modern Internet, virtually all large Web services run atop multiple geographically distributed data centers: Google, Yahoo, Facebook, iTunes, Amazon, eBay, Bing, etc. Services must scale across hundreds of thousands of machines, tolerate faults, and support millions of concurrent requests. The major providers (including Amazon, Google, and Microsoft) are also hosting third-party applications in their data centers - forming so-called "cloud computing" services. This course, aimed at a sophomore with exposure to basic programming within the context of a single machine, focuses on the issues and programming models related to such cloud and distributed data processing technologies: data partitioning, storage schemes, scalable analytics, and massively-parallel algorithms and data structures.

NETS2120 is a core requirement for the Data Science Minor. It counts as a project elective for CSCI and ASCS, and as an AI Project Elective for the AI BSE.

Instructor

Andreas Haeberlen
Office hours: TBA (Levine 560)

Teaching assistants

Edwin Mui edwinmui@sas.upenn.edu OH: Time and location TBA
Nicolas Cloutier nac1@seas.upenn.edu OH: Time and location TBA
Justin Zhou jclaner@seas.upenn.edu OH: Time and location TBA
Rayyan Shaik rashaik@seas.upenn.edu OH: Time and location TBA

Format

The format will be two 1.5-hour lectures per week, plus assigned readings. There will be regular homework assignments with code reviews, three midterms, and a term project. We will use an online forum for course-related discussions.

Time and location

Tuesdays/Thursdays 10:15-11:45am (Location TBA)

Prerequisites

CIS 1200, Programming Languages and Techniques
CIS 1600, Mathematical Foundations of Computer Science

Textbooks

Spark: The Definitive Guide, by Bill Chambers and Matei Zaharia (O'Reilly)
ISBN 9781491912218; read online for free, or buy for approx. $54.

Additional materials will be provided as handouts or in the form of light technical papers.

Grading

Homework 20%, Term project 25%, Exams 50%, Participation/quizzes 5%

Policies

You can find a list of key course policies here.

Assignments

Homework assignments will be available for download; solutions should be submitted via GradeScope.

Tentative schedule

DateTopicDetailsReadingRemarks
25-Aug Introduction Course introduction
What is the Cloud, and why is it interesting?
Data-centric computing
Course goals
Logistics
Policies
27-Aug The Cloud What is the Cloud?
Cloud hardware
Problems with classical scaling
Utility computing
Kinds of clouds
Virtualization
Cloud challenges
Armbrust et al.: "A view of cloud computing" HW0 released
01-Sep Concurrency Scalability and parallelization; Amdahl's law
Synchronization/concurrency/consistency
Mutual exclusing and locking
NUMA and shared-nothing
Frontend/backend
Sharding
Vogels: "Eventually consistent"
03-Sep The Internet The Internet; packet switching
Path properties; TCP
HW1 overview
MDN: "JavaScript language overview" HW0 due; HW1 released
08-Sep Faults and Failures Fault models
Examples of non-crash faults
Replication; durability and availability
Primary-backup replication
Quorum replication
Network partitions; CAP theorem
Tseitlin: "The antifragile organization"
10-Sep Cloud basics History of cloud computing
Interacting with the cloud
EC2 basics
EBS basics
Overview of some other AWS services
Puthal et al.: "Cloud computing features, issues, and challenges: a big picture" HW1MS1 due
15-Sep Cloud storage Key-value stores
KVS and concurrency
KVS and the Cloud
Case study: S3
Case study: DynamoDB
Cooper et al.: "PNUTS to Sherpa - Lessons from Yahoo!'s Cloud Database"
17-Sep Spark Introduction to scalable analytics
MapReduce
The Streams API
Apache Spark
Lambdas and serialization
Spark textbook, Chapter 2 and 3 HW1MS2 due; HW2 released
22-Sep Programming in Spark Spark jobs
Working with files
Spark transformations
Spark actions
The Structured API
Distributed shared variables
Spark textbook, Chapters 4-8
24-Sep Understanding Spark Origins of Spark
The HDFS file system
Using HDFS
Apache Livy
Zaharia et al.: "Spark: Cluster Computing with Working Sets" HW2MS1 due
29-Sep (Exam) First midterm exam
01-OctFall break (October 1-4)
05-OctLast day to drop
06-Oct Graph algorithms Distributed graph algorithms
Distributed graphs
Graph algorithms in Spark
Single-source shortest path
K-Means clustering
Naive Bayes learning
08-Oct Random-walk algorithms Random-surfer model
Naive PageRank
Full PageRank
Adsorption / label propagation
Baluja et al.: "Video Suggestion and Discovery for YouTube" HW2MS2 due; HW3 released
13-Oct Iterative processing Iterative processing
Bulk synchronous parallelism
Pregel and graph processing
Overview of deep neural nets
Malewicz et al.: "Pregel - A System for Large-Scale Graph Processing"
15-Oct Web programming Web overview
HTML and CSS
Client/server model
The Domain Name System
Ghedini et al.: "HTTP/3: The past, the present, and the future"
20-Oct Web programming (continued) HTTP and HTTPS
HTTP/2 and HTTP/3
Server design
22-Oct Node.js Motivation: CGI and servlets
Node.js; basic operation
Hello world with Node
Accessing data
Cookies and sessions
"Node at LinkedIn: the pursuit of thinner, lighter, faster" HW3 due; HW4 and project handout released
23-OctLast day to pass/fail
27-Oct Dynamic content Project overview
Project advice
The Document Object Model
XMLHttpRequest
"React: Facebook's Functional Turn on Writing JavaScript"
29-Oct (Exam) Second midterm exam
02-NovLast day to withdraw
03-Nov AJAX AJAX overview
AJAX with jQuery
socket.io and async
Working with APIs
HW4MS1 due; team formation deadline; project begins
05-Nov Web services Web services
Data interchange; challenges
Data formats
10-Nov Security Cryptography; RSA
Digital signatures
Attacks and Defenses
OWASP Top 10 HW4MS2 due; first project check-in
12-Nov Databases Motivations for databases and data management
Relational model, data streams
SQL basics; declarative approach; query optimization
Transactions; ACID
Shute et al.: "F1: A Distributed SQL Database That Scales" HW4MS3 due
17-Nov Peer-to-peer Decentralization
Partly centralized systems; BitTorrent
Unstructured overlays; epidemic protocols
Structured overlays; consistent hashing; KBR
Case study: Pastry
Security challenges
Rodrigues and Druschel: "P2P systems" Second project check-in
19-Nov Case study: Bitcoin Distributed ledgers
Bitcoin and Proof-of-Work
Bitcoin Script
Challenges in Bitcoin
Nakamoto: "Bitcoin: A Peer-to-Peer Electronic Cash System"
24-Nov Case study: Facebook Facebook's TAO
Scalability in TAO
Fault handing in TAO
Facebook's Haystack
Haystack design
Bronson et al.: "TAO: Facebook's Distributed Data Store for the Social Graph" Third project check-in
26-NovThanksgiving break (November 26-29)
01-Dec Special topics TBA Fourth project check-in
03-Dec (Exam) Third midterm exam
08-DecReading days (December 8-9)
10-DecFinals period; in-person project demos (December 10-17)