There
are four aspects of fault tolerance:
i.
Failure
detection: The system must detect a particular state combination has resulted
or will result in a system failure.
ii.
Damage
assessment: The parts of the system state, which have been affected by the
failure, must be detected.
iii.
Fault
Recovery: The system must restore its state to a known ‘safe’ state. This may
achieved by correcting by correcting the damaged or by restoring the system to
a known ‘safe’ state.
iv. Fault repair: This involves modifying the system
so that the fault does not recur. In many cases, software failures are
transient and due to a peculiar combination of system inputs. No repair is
necessary as normal processing can resume immediately after fault recovery.
This is an important distinction between hardware and software faults.
No comments:
Post a Comment