Syllabus for Roster(s):

  • 17Sp CS 4434-001 (ENGR)
  • 17Sp CS 4434-001 (ENGR) Waitlist
  • 17Sp CS 6434-001 (ENGR)
  • 17Sp ECE 4434-001 (ENGR)
  • 17Sp ECE 6434-001 (ENGR)
  • 17Sp SYS 4582-009 (ENGR)
  • 17Sp SYS 6582-006 (ENGR)
In the UVaCollab course site:   Dependable Computing SP17

Course Description

Computing systems are used in various critical domains including aerospace, energy, transportation, healthcare, and commerce. Failures of these systems may lead to catastrophic consequences such as injury, loss of life, damage to equipment, or financial loss.  This course focuses on techniques for designing and analyzing dependable computing systems that can continue to operate correctly in the presence of software and hardware problems. We will learn what can go wrong, how we can predict, prevent, and detect faults/errors, and how we can design systems that can tolerate faults and recover from failures.

Topics:

  • Introduction to dependable computing
  • Basic terminology, attributes, and evaluation techniques
  • Combinatorial and state-space modeling
  • Hardware fault tolerance
  • Information redundancy
  • Software fault tolerance
  • Checkpointing and recovery
  • Reliable networked systems
  • Error detection techniques
  • Dependability evaluation techniques
  • Safety and Security


Time: Mon/Wed/Fri 9:00AM - 9:50AM
Location: Thornton Hall E304
Office Hours: Wed 10:00AM - 11:00AM - 
Thornton Hall E314

Schedule and Activities

This is the tentative timeline for the class and subject to change.

Week

Dates

Topics

Lectures

In-class Activities

Assignments

Reading

1

Jan 18

Background and Motivation

Lecture 1

Lecture 2

 

 

Chapter 1

Jan 20

 

Homework 1

Probability Refresher

2

Jan 23

Basic Dependability Concepts

 

Lecture 3

 

Lecture 4

 

 

 

 

Basic Concepts & Taxonomy

Jan 25

 

 

 

Jan 27

 

Pre-Assessment Quiz
Solution

 

3

Jan 30

 

Combinational/State-space Modeling

Lecture 5

 

Lecture 6

 

 

 

 

Chapter 2

FailureOblivious

Feb 1

 

 

Reliability Models

Feb 3

Short Presentation 1

Group Activity 1

Homework 2

Homework 2 Solution

 

4

Feb 6

Hardware Fault Tolerance

Lecture 7 

Lecture 8

Lecture 9

 

 

 

Feb 8

 

 

 

Feb 10

Short Presentation 2

 

 

5

Feb 13

Hardware Fault Tolerance

 

Lecture 10

Short Presentation 3

 

 

Feb 15

Short Presentation 4

Group Activity 2

 

 

Feb 17

Information Redundancy (Guest Lec.)

Lecture 11

--

 

 

6

Feb 20

Information Redundancy

 

 

Lecture 12

Short Presentation 5

Homework 3

Homework 3 Solution

 

Feb 22

Short Presentation 6

 

Parity Prediction

Feb 24

Information Redundancy (Cont.)

Lecture 13

Short Presentation 7

 

 

7

Feb 27

Information Redundancy (Cont.)

Midterm Review

Lecture 14

Lecture 15

Short Presentation 8

 

 

Mar 1

 

 

 

Mar 3

Midterm Exam

 

---

Midterm Solution

---

Spring Recess

Mar 4-12

 

 

 

 

 

8

Mar 13

Error Detection Techniques

 

Lecture 16

Lecture 17

Short Presentation 9

Homework 4

Homework 4 Solution

Heartbeat Models

Mar 15

--

 

Software Control Flow Checking

Mar 17

Final Project Overview

Lecture 18

Short Presentation 10

 

Mini Project

Chapter 10:
Fault Injection

9

Mar 20

Software Fault Tolerance

 

 

Experimental Evaluation (Validation)

Lecture 19

Lecture 20

Lecture 21

Short Presentation 11

 

Chapter 7: Software Detection

Mar 22

Short Presentation 12

 

N-version Programming

Mar 24

Short Presentation 13

 

 

10

Mar 27

Check-pointing & Recovery

 

Lecture 22

Lecture 23

Short Presentation 14

 

Chapter 8

Mar 29

 

 

 

Mar 31

Group Activity 3

Final Project Topics

 

11

Apr 3

Processor-level detection and Recovery

 

Lecture 24

 

 

 

Chapter 3

Apr 5

Final Project
Topic Presentations

 

 

Apr 7

Paper Presentation 1
Ted Xie Slides
Group Activity 4

 

 

12

Apr 10

Processor-level detection and Recovery

 

Lecture 25

Lecture 26

 

 

 

 

SMP/CMP
SRT, SRTR
DIVA

Apr 12

 

 

RSE

Apr 14

Paper Presentation 2
Minyan Gao
Group Activity 5

Homework 5

Homework 5 Solution

 

13

Apr 17

Distributed Systems/Network Specific Issues

 

Lecture 27

 

Lecture 28

 

 

 

 

 

Chapter 6

Apr 19

 

 

Byzantine Generals Problem

Apr 21

Paper Presentation 3
Atallah Hezbor
Group Activity 6

 

 

14

Apr 24

 

 


Final Exam Review

Lecture 29

Paper Presentation 4
Minghui Sun
Group Activity 7

Homework 6

Homework 6 Solution

 

Apr 26

 

 

 

Apr 28

Paper Presentation 5
Xiyuan Ge
Group Activity 8

Homework 7

Homework 7 Solution

 

Final

May 1

No Class/Only Office Hours

 

 

 

 

May 2
May 5
May 12

Final Exam Release
Final Exam Due
Final Project Presentations

 

 

Final Exam

Project Report Due

 

 

 

 

 

 

 

References

The lectures and assignments are based on the following references:

  • I. Koren and C. Mani Krishna, Fault-tolerant Systems, 1st edition, 2007, Morgan Kaufmann. (Read online through UVA Library)
  • J. Knight, Fundamentals of Dependable Computing for Software Engineers, 2012, CRC Press. (Read online through UVA Library)
  • K. Trivedi, Probability and Statistics with Reliability, Queuing and Computer Science Applications, 2nd edition, 2001, John Wiley & Sons.
  • D. K. Pradhan, Fault Tolerant Computer System Design, 1st edition, 1996, Prentice-Hall.

 

  • An unpublished textbook by R. K. Iyer, Z. Kalbarczyk, and N. Nakka from the University of Illinois at Urbana-Champaign, who have agreed to let us use a pre-publication copy of the book. The book consists of multiple chapters, each contained in a separate pdf file which will be shared internally with you. Please do not redistribute.

 

Grading

 

Undergraduate Students

Graduate Students

Class Participation/Activity

5%

5%

Short presentations

5%

2%

Paper presentations *

--

8%

Homework and Mini Project **

25%

20%

Final Project ***

30%

30%

Midterm Exam

15%

15%

Final Exam (Take home)

20%

20%

* Each graduate student will select a paper on a special topic (of mutual interest) and will:

  • Present the paper to the class (20 minutes)
  • Prepare a short homework assignment based on the material in the lecture and the paper
  • Grade the homework assignment and provide a solution to the class.
  • The paper must be made available to the class at least a week prior to presentation and the homework assignment must be graded within a week of its submission.

** There will be a 10% penalty for late assignments (per school day).

*** The final projects will be performed by the teams consisting of both graduate and undergraduate students. Each team will propose a project on a related topic of interest, define measurable outcomes and deliveries for the project, and present the results as a short paper and a lecture to the class at the end of semester. All the students are required to actively participate in different aspects of the projects.