CIS 6930.5: Federated Distributed Systems

Fall 2005

Professor: Adriana Iamnitchi (Anda)
Semester: Fall 2005
Time and Venue: MW: 9:30-10:45 in BSN1400
Office Hours: Wednesdays 10:45-1:00 and by appointment
Office: ENB 334


announcements
syllabus course format
schedule
projects


Announcements

November 28: Project presentations:
  1. Rahul and Kiran's project on Wednesday, November 30, usual time and place.  
  2. Shyamala's and Chris'  projects on Tuesday, December  6, 9:00 am in ENB 337 (conference room in the department). This class replaces the class on Monday, December 5.
  3. The rest of the projects will be presented during last class, December 7, usual time and place.
November 28: The class of Monday, December 5 will be transferred to Tuesday, December 6 at 9:00 am (notice the earlier time == longer class) in ENB 337 (conference room).
August 29
: We'll use H2O for posting reviews. Please join this project: http://h2o.law.harvard.edu/ViewProject.do?projectId=396
August 22
: Some project ideas have been posted.
August 18
:  The course web page has been posted. The class schedule is tentative, it will be completed in the next two weeks and refined during the semester.

top

Syllabus

Federated distributed systems are collections of Internet-connected autonomous computing nodes spread across administrative domains. Participation in these federated systems allows access to potentially unique or large sets of resources such as data, storage space, computing power, or services. Examples of federated systems include computational grids, peer-to-peer networks, and wide-area testbeds such as PlanetLab.

This course is a tour through various research topics in federated distributed systems. We will explore solutions and learn design principles for building large network-based computational systems. Our readings and discussions will help us identify research problems and understand methods and general approaches to design, implement and evaluate distributed systems. Topics include resource management (discovery, allocation), data management (replication, location), security, fault-tolerance, system characterization, and overlay construction. Our discussions will often be grounded in the context of deployed distributed systems such as Grids and peer-to-peer networks.

The course involves discussions of four papers a week and a final project.

Grading is based on paper reviews and contributions to the class discussions (45%) and the final project (55%).

Reading materials: Most of the papers are available on the Internet. We will also read some chapters from "The Grid2: Blueprint for a New Computing  Infrastructure" (Grid2) by Ian Foster and Carl Kesselman.

Prerequisites:  This is a graduate-level class. Undergraduate students are welcome with instructor's consent (email to anda at cse dot usf dot edu).

top


Course format

The course is structured to provide (a) an in-depth understanding of current topics in large-scale, distributed system research; (b) experience with reviewing and presenting advanced technical material; (c) exercising writing papers. The class workload has a participation component and a final project.

Participation

In each class we discuss two research papers. Read the papers before class (be an efficient reader!)  and write a review for each paper that includes the following:

  1. State the main contribution of the paper
  2. Critique the main contribution. 
    1. Rate the significance of the paper on a scale of 5 (breakthrough), 4 (significant contribution), 3 (modest contribution), 2 (incremental contribution), 1 (no contribution or negative contribution). Explain your rating in a sentence or two.
    2. Rate how convincing the methodology is. You may consider some of the following questions (use what is relevant): do the claims and conclusions follow from the experiments? Are the assumptions realistic? Are the experiments well designed? Are there different experiments that would be more convincing? Are there other alternatives the authors should have considered? (And, of course, is the paper free of methodological errors?)
    3. What is the most important limitation of the approach?
  3. What are the three strongest and/or most interesting ideas in the paper?
  4. What are the three most striking weaknesses in the paper?
  5. Name three questions that you would like to ask the authors.
  6. Detail an interesting extension to the work not mentioned in the future work section.
  7. Optional comments on the paper that you’d like to see discussed in class.

Reviews must be submitted by midnight before class to the relevant Rotisserie Discussion on H2O. Papers are discussed in class. Discussions will be lead by one or more students and may include a brief (5-minute) presentation of the paper. Discussion leaders do not need to submit reviews, but they need to:

top

Final Project

The final project is an opportunity for hands-on research in distributed systems. It involves literature survey, programming, running experiments or analytical modeling, analyzing results and writing a 10-15-page report. A list of project ideas is posted, but students are highly encouraged to propose topics of their own interest.  Teams of two students are highly recommended. Please see me if you want to form a 3-student team.

Milestones (tentative dates):

top


Schedule


DATE
TOPICS AND ARTICLES
EXTRA PAPERS (optional unless you're doing a project in this area)
DISCUSSION
LEADERS
8/29
Introduction to the class, goals, and structure. [ppt]

Anda
8/31

Background reading on Grid and P2P applications and systems:

  1. Concepts and Architecture, Foster and Kesselman, Grid2-Ch4.
  2. Peer-to-Peer Technologies. Crowcroft, J., Moreton, T., Pratt, I. and Twigg, A. Grid2-Ch29.
  3. Scientific Data Federation: The World-Wide Telescope, Szalay and Gray, Grid2-Ch7.
  4. Medical Data Federation: The Biomedical Informatics Research Network, Ellisman and Peltier, Grid2-Ch8.
On death, taxes and the convergence of peer-to-peer and grid computing,. Foster and Iamnitchi, IPTPS 2003 Anda
9/5
Labor Day.
9/7
The structure of networks (pick 2):
  1. Small-world file sharing communities, Iamnitchi, Ripeanu, Foster. Infocom 2004.
  2. On Power-Law Relationships of the Internet Topology, Faloutsos, Faloutsos, and Faloutsos, SIGCOMM 1999 [pdf]
  3. Mapping the Gnutella network, M. Ripeanu et al, IEEE Computing Journal 2002.
  1. Survey (excellent!): The structure and function of complex networks, M. E. J. Newman, SIAM Review 45, 167-256 (2003).
  2. The origin of power law in the internet revisited, Chen et al. [pdf]
  3. Implicit structure and the nature of blogspace, Adar et al.
  4. The anatomy of a large-scale hypertextual web search engine, Brin and Page
  5. DHT routing using social links, S. Marti et al, IPTPS 2004.
  6. Duncan Watt's Small World Project, Columbia
  7. Jon Kleinberg's Structure of Information Networks course
  8. Just for fun: Could it be a Big World After All? The `Six Degrees of Separation' Myth. , J. Kleinfeld, Society, April 2002.
  9. Graph structure in the web, A. Broder et al. 9th International World Wide Web Conference, May 2000.
Anda, Matt
9/12
System characterization:
  1. Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload, Gummadi et al, [pdf]
  2. Understanding Availability, Baghwan et al., [pdf]
  1. A measurement study of peer-to-peer file sharing systems, S. Saroiu et al, MMCN 2002
  2.  Free riding on Gnutella, Adar and Huberman, First Monday, 2000
Anda, Chamara
9/14
P2P Systems:
  1. A Survey and Comparison of Peer-to-Peer Overlay Network Schemes, Lua et. al
  2. Is P2P dying or just hiding? Karagiannis et al, [pdf]
 
Anda, Cesar
9/15
Project proposals due. [12pt font, 1 page max]
9/19
In-class discussion of project proposals.
No reviews required. Good reading:
  1. You and Your Research, R. W. Hamming [pdf] [html]
  2. Technology and Courage, I. Sutherland [pdf]


9/21
Step back: Concepts of Distributed Systems [1] [ppt]
Anda (lecture)
9/26
Overlay Networks
  1. Resilient Overlay Networks, Andersen et al. [html]
  2. Structure Management for Scalable Overlay Service Construction, Shen [pdf]

Xu
9/28
Tools for research in distributed systems:
  1. Overview of the Globus toolkint: http://www.globus.org [ppt]
  2. Macedon Project : http://www.cs.duke.edu/~razor/MACEDON/ [ppt]

Chris on Macedon
Anda on Globus
10/3
Distributed Hash Tables (pick 2, at least one in bold)
  1. Chord [html]
  2. CAN [html]
  3. The impact of DHT routing on resilience and proximity [pdf]
  4. Deconstructing DHTs [pdf]
  1. Tapestry html
  2. Plaxton Networks [html]
  3. SkipNet [html]
  4. Kademlia [pdf]
  5. The Design and Implementation of a Next Generation Name Service for the Internet, Ramasubramanian and Sirer [html]
  6. Prefix Hash Trees: An Indexing Data Structure over Distributed Hash Tables, Ramabhadran et. al, [pdf]
Shyamala
10/5
Data movement:
  1. Slurpie: A Cooperative Bulk Data Transfer Protocol, Sherwood et al, [pdf
  2. Incentives Build Robustness in BitTorrent, Bram Cohen, 2003.
1. The Livny and Plank-Beck Problems: Studies in Data Movement on the Computational Grid, Allen and Wolski [pdf]
2.
Jen
10/10
We need to reschedule class due to NSF workshop that Anda needs to attend.

10/11
Literature surveys due [12pt font, 3 pages max]
10/12
Step back: Concepts of Distributed Systems [2]
Anda (lecture)
10/17
Availability and Monitoring:
  1. Total Recall: System Support for Automated Availability Management, Bhagwan et al [pdf]
  2. The Ganglia Distributed Monitoring System: Design, Implementation, and Experience. Massie, Chun, and Culler. Parallel Computing, Vol. 30, Issue 7, July 2004. [PDF]

Rahul
10/19
File management
  1. Preserving Peer Replicas By Rate-Limited Sampled Voting, Maniatis et al. [pdf]
  2. The Google File System, Ghemawat et al. [pdf]
  3. Stupid File Systems Are Better, Stein, [pdf]
Taming Aggressive Replication in the Pangaea Wide-Area File System, Saito et al. [pdf] Chamara
10/24
Class cancelled due to Wilma.


10/26
Replica Placement
  1. Choosing Replica Placement Heuristics for Wide-Area Systems, Karlsson and Karamanolis [pdf]
  2. Latency-Driven Replica Placement, Szymaniak, Pierre and van Steen[pdf]

Kiran
10/31 Replication in Large-Scale Distributed Systems
  1. PRACTI Replication for Large-Scale Systems, M. Dahlin, L. Gao, A. Nayate, A. Venkataramani, P. Yalagandula, J. Zheng [pdf]
  2. Dual-Quorum Replication for Edge Services, L. Gao, M. Dahlin, J. Zheng, L. Alvisi, A. Iyengar, [pdf]

Xu
11/2
Fault Tolerance
  1. The Byzantine Generals Problem, L. Lamport et al, TOPLAS 1982
  2. BAR Tolerance for Cooperative Services, A. Aiyer, L. Alvisi, A. Clement, M. Dahlin, J. Martin, C. Porth[pdf]


Chris
11/7
The end-to-end Argument:
  1. End-to-end Arguments in System Design, J. Saltzer, D. Reed, and D. Clark, ACM Transactions on Computer Systems, Vol. 2, No. 4, pp. 195-206, 1984.[pdf]

  2. An End-to-End Approach to Globally Scalable Network Storage, Micah Beck, Terry Moore, James S. Plank, SIGCOMM 2002

The end-to-end Argument -- original and revisited:

  1. Rethinking the design of the Internet: The end to end arguments vs. the brave new world, D. Clark and M. Blumenthal, Workshop on Policy Implications of End-to-End. December 1, 2001. [pdf]
  2. Network Infrastructure, J. Touch and J. Postel (Grid2-Ch30)

Shyamala
11/9


Tools for research in distributed systems (2)
  1. Overview of Otter (CAIDA Tools) (Matt)
  2. ModelNet (Cesar) [ppt]
  3. Overview of LinkRank (CAIDA Tools) (Kiran)


11/14
Midterm project reports due. (no class)
11/16
No class, time to work on projects.
11/21
Logistical Networks and the End-to-End Argument:
  1. The Internet BackPlane Protocol: A Study in Resource Sharing, Second IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID 2002), Berlin, Germany, May, 2002
  2. Logistical Multicast for Data Distribution, Workshop on Grids and Advanced Networks, Cardiff, UK, May, 2005
http://loci.cs.utk.edu/
11/23

Security:

  1. Automated Worm Fingerprinting, Sumeet Singh, Cristian Estan, George Varghese, and Stefan Savage, OSDI 2004.
  2. Vigilante: End-to-End Containment of Internet Worms. Manuel Costa et al, SOSP 2005.


11/28
Virtualization:
  1. Distributed File System Support for Virtual Machines in Grid Computing, Ming Zhao, Jian Zhang, Renato Figueiredo, HPDC 2004.
  2. Towards Virtual Networks for Virtual Machine Grid Computing. A. Sundararaj and P. Dinda. 3rd USENIX Conference on Virtual Machine Technology, 2004.
  1. Xen and the Art of Virtualization, Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauery, Ian Pratt, Andrew Wareld, SOSP 2003.
  2. Live Migration of Virtual Machines, Christopher Clark et al, NSDI 2005.
  3. From Virtualized Resources to Virtual Computing Grids: The In-VIGO System. Adabala, S., Chadha, V., Chawla, P., Figueiredo, R., Fortes, J., Krsul, I., Matsunaga, A., Tsugawa, M., Zhang, J., Zhao, M., Zhu, L. and Zhu, X., Future Generation Computer Systems. 2004.
Chris and Shyamala
11/30
Real Systems and Markets:
  1. ACMS: The Akamai Configuration Management System, Alex Sherman et al, NSDI 2005
  2. Why Markets Could (But Don't Currently) Solve Resource Allocation Problems in Systems, Jeffrey Shneidman et al, HOTOS 2005
Final project presentations: Kiran and Rahul

Jen
12/6,
9am
ENB337
Wrap-up: Class Review and Research Directions in Distributed Systems
Final project presentations: Chris and Shyamala
Directions in Network Research:
 - A Knowledge Plane for the Internet, Clark, Partridge, Ramming, Wroclawski [pdf]
 - A Layered Naming Architecture for the Internet, Balakrishnan et al., [pdf]

12/7
Final project presentations.
12/16
Final project reports due. [12pt font, 10 pages max]


top


Projects

Some ideas are here. You're strongly encouraged to propose your own project ideas. Be innovative and aim high!

top


announcements syllabus
course format
schedule
projects


Adriana Iamnitchi (anda at cse usf edu)