Fault-Tolerant Systems


  • Israel Koren, University of Massachusetts, Amherst
  • C. Krishna, University of Massachusetts, Amherst

There are many applications in which the reliability of the overall system must be far higher than the reliability of its individual components. In such cases, designers devise mechanisms and architectures that allow the system to either completely mask the effects of a component failure or recover from it so quickly that the application is not seriously affected. This is the work of fault-tolerant designers and their work is increasingly important and complex not only because of the increasing number of “mission critical” applications, but also because the diminishing reliability of hardware means that even systems for non-critical applications will need to be designed with fault-tolerance in mind. Reflecting the real-world challenges faced by designers of these systems, this book addresses fault tolerance design with a systems approach to both hardware and software. No other text on the market takes this approach, nor offers the comprehensive and up-to-date treatment Koren and Krishna provide. Students, designers and architects of high performance processors will value this comprehensive overview of the field.
View full description


Senior undergraduate, graduate students in fault-tolerance computing courses. Professional architects of high performance processors, and designers of enterprise computing systems.


Book information

  • Published: March 2007
  • ISBN: 978-0-12-088525-1

Table of Contents

Chap. 1: IntroductionChap. 2: Hardware Fault ToleranceChap. 3: Information RedundancyChap. 4: CheckpointingChap. 5: Software Fault ToleranceChap. 6: Fault-tolerant NetworksChap. 7: Case StudiesChap. 8: Defect Tolerance in VLSIChap. 9: Fault Tolerance in Cryptography Chap. 10: Experimental and Simulation Techniques