Self-Checking and Fault-Tolerant Digital Design


  • Parag Lala, North Carolina Agricultural and Technical State University

With VLSI chip transistors getting smaller and smaller, today's digital systems are more complex than ever before. This increased complexity leads to more cross-talk, noise, and other sources of transient errors during normal operation. Traditional off-line testing strategies cannot guarantee detection of these transient faults. And with critical applications relying on faster, more powerful chips, fault-tolerant, self-checking mechanisms must be built in to assure reliable operation.

Self-Checking and Fault-Tolerant Digital Design deals extensively with self-checking design techniques and is the only book that emphasizes major techniques for hardware fault tolerance. Graduate students in VLSI design courses as well as practicing designers will appreciate this balanced treatment of the concepts and theory underlying fault tolerance along with the practical techniques used to create fault-tolerant systems.

View full description


Book information

  • Published: June 2000
  • ISBN: 978-0-12-434370-2

Table of Contents

Chapter 1 - Fundamentals of Reliability1.1 Reliability and Failure Rate1.2 Relation between Reliability and Mean-Time-Between-Failures1.3 Maintainability1.4 Availability1.5 Series and Parallel Systems1.6 DependabilityReferencesChapter 2 - Error Detecting and Correcting Codes2.1 Parity Code2.2 Multiple Error Detecting Codes2.2.1 Unordered Codes for Unidirectional Error Detection m-out-of-n Codes Berger Code2.2.2 t-unidirectional Error Detecting CodesBorden CodeBose-Lin Codes2.2.3 Burst Unidirectional Error Detecting Code2.3 Residue Codes2.4 Cyclic Codes2.5 Error-Correcting Codes2.5.1 Hamming Code2.5.2 Hsiao Code2.5.3 Reed-Solomon CodeReferencesChapter 3 - Self-Checking Combinational Logic Design3.1 Strongly Fault-secure Circuits3.2 Strongly Code-disjoint Circuits3.3 Terminology3.4 Bidirectional Error Free Combinational Circuit Design3.5 Detection of Input Fault Induced Bidirectional Errors 3.6 Techniques for Bidirectional Error Elimination3.6.1 Input Encoding3.6.2 Output Encoding3.7 Self-dual Parity Checking3.8 Self-Checking Design using Low-Cost Residue Code3.9 Totally Self-Checking PLA Design3.10 Fail-safe Combinational Circuit DesignReferencesChapter 4 - Self-Checking Checkers4.1 The Two-rail Checker4.2 Totally Self-Checking Checkers for m-out-of-n codes4.2.1 Pass Transistor-based Checker Design for a subset of m-out-of-2m codes4.2.2 Totally Self-Checking Checker for I-out-of-n-code4.3 Totally Self-Checking Checkers for Berger code4.4 Totally Self-Checking Checkers for Low-cost Residue codeReferencesChapter 5 - Self-Checking Sequential Circuit Design5.1 Faults in State Machines5.2 Self-Checking State Machine Design Techniques5.3 Elimination of Bidirectional Errors5.4 Synthesis of Redundant Fault-free State Machines5.5 Decomposition of Finite State Machines5.6 Self-Checking Interacting State Machine Design5.7 Fail-safe State Machine DesignReferencesChapter 6 - Fault-Tolerant Design6.1 Hardware Redundancy6.1.1 Static RedundancyTriple Modular Redundancy6.1.2 Dynamic Redundancy6.1.3 Hybrid redundancy6.2 Information Redundancy6.2.1 Fault-tolerant state machine design using Hamming codes6.2.2 Error Checking and Correction (ED) in Memory Systems6.2.3 Improvement in Reliability with ECC6.2.4 Multiple Error Correction using Orthogonal Latin Square Configuration6.2.5 Soft error Correction using Horizontal and Vertical Parity Method6.3 Time Redundancy6.4 Software Redundancy6.5 System Level Fault Tolerance6.5.1 Byzantine Fault Model6.5.2 System Level Fault Detection6.5.3 Backward Recovery Schemes6.5.4 Forward Recovery Schemes ReferencesReferencesAppendixMarkov Models