Triple modular redundancy

In computing, triple modular redundancy, sometimes called triple-mode redundancy,[1] (TMR) is a fault-tolerant form of N-modular redundancy, in which three systems perform a process and that result is processed by a majority-voting system to produce a single output. If any one of the three systems fails, the other two systems can correct and mask the fault.

Triple Modular Redundancy. Three identical logic circuits (logic gates) are used to compute the specified Boolean function. The set of data at the input of the first circuit are identical to the input of the second and third gates.

The TMR concept can be applied to many forms of redundancy, such as software redundancy in the form of N-version programming, and is commonly found in fault-tolerant computer systems.

Space satellite systems often use TMR,[2][3] although satellite RAM usually uses Hamming error correction.[4]

Some ECC memory uses triple modular redundancy hardware (rather than the more common Hamming code), because triple modular redundancy hardware is faster than Hamming error correction hardware.[5] Called repetition code, some communication systems use N-modular redundancy as a simple form of forward error correction. For example, 5-modular redundancy communication systems (such as FlexRay) use the majority of 5 samples – if any 2 of the 5 results are erroneous, the other 3 results can correct and mask the fault.

Modular redundancy is a basic concept, dating to antiquity, while the first use of TMR in a computer was the Czechoslovak computer SAPO, in the 1950s.

General case

edit

The general case of TMR is called N-modular redundancy, in which any positive number of replications of the same action is used. The number is typically taken to be at least three, so that error correction by majority vote can take place; it is also usually taken to be odd, so that no ties may happen.[6]

Majority logic gate

edit

3-input majority gate

edit
 
3-input majority gate using 4 NAND gates

The 3-input majority gate output is 1 if two or more of the inputs of the majority gate are 1; output is 0 if two or more of the majority gate's inputs are 0. Thus, the majority gate is the carry output of a full adder, i.e., the majority gate is a voting machine.[7]

The 3-input majority gate can be represented by the following boolean equation and truth table:

 
INPUT
A   B   C
OUTPUT
Q
0 0 0 0
0 0 1 0
0 1 0 0
0 1 1 1
1 0 0 0
1 0 1 1
1 1 0 1
1 1 1 1

In TMR, three identical logic circuits (logic gates) are used to compute the same set of specified Boolean function. If there are no circuit failures, the outputs of the three circuits are identical. But due to circuit failures, the outputs of the three circuits may be different.

TMR operation

edit

Assuming the Boolean function computed by the three identical logic gates has value 1, then: (a) if no circuit has failed, all three circuits produce an output of value 1, and the majority gate output has value 1. (b) if one circuit fails and produces an output of 0, while the other two are working correctly and produce an output of 1, the majority gate output is 1, i.e., it still has the correct value. And similarly for the case when the Boolean function computed by the three identical circuits has value 0. Thus, the majority gate output is guaranteed to be correct as long as no more than one of the three identical logic circuits has failed.[7]

For a TMR system with a single voter of reliability (probability of working) Rv and three components of reliability Rm, the probability of it being correct can be shown to be RTMR = Rv (3 Rm2 – 2 Rm3).[6]

TMR systems should use data scrubbing – rewrite flip-flops periodically – in order to avoid accumulation of errors.[8]

Voter

edit
 
Triple modular redundancy with one voter (top) and three voters (bottom)

The majority gate itself could fail. This can be protected against by applying triple redundancy to the voters themselves.[9]

In a few TMR systems, such as the Saturn Launch Vehicle Digital Computer and functional triple modular redundancy (FTMR) systems, the voters are also triplicated. Three voters are used – one for each copy of the next stage of TMR logic. In such systems there is no single point of failure.[10][11]

Even though only using a single voter brings a single point of failure – a failed voter will bring down the entire system – most TMR systems do not use triplicated voters. This is because the majority gates are much less complex than the systems that they guard against, so they are much more reliable.[7] By using the reliability calculations, it is possible to find the minimum reliability of the voter for TMR to be a win.[6]

Chronometers

edit

To use triple modular redundancy, a ship must have at least three chronometers; two chronometers provided dual modular redundancy, allowing a backup if one should cease to work, but not allowing any error correction if the two displayed a different time, since in case of contradiction between the two chronometers, it would be impossible to know which one was wrong (the error detection obtained would be the same of having only one chronometer and checking it periodically). Three chronometers provided triple modular redundancy, allowing error correction if one of the three was wrong, so the pilot would take the average of the two with closer reading (vote for average precision).

There is an old adage to this effect, stating: "Never go to sea with two chronometers; take one or three."[12]

Mainly this means that if two chronometers contradict, how do you know which one is correct? At one time this observation or rule was an expensive one as the cost of three sufficiently accurate chronometers was more than the cost of many types of smaller merchant vessels.[13] Some vessels carried more than three chronometers – for example, HMS Beagle carried 22 chronometers.[14] However, such a large number was usually only carried on ships undertaking survey work as was the case with the Beagle.

In the modern era, ships at sea use GNSS navigation receivers (with GPS, GLONASS & WAAS etc. support) – mostly running with WAAS or EGNOS support so as to provide accurate time (and location).

edit

See also

edit

References

edit
  1. ^ "David Ratter. "FPGAs on Mars"" (PDF). Retrieved May 30, 2020.
  2. ^ "Actel engineers use triple-module redundancy in new rad-hard FPGA". Military & Aerospace Electronics. Retrieved 2017-04-09.
  3. ^ ECSS-Q-HB-60-02A : Techniques for radiation effects mitigation in ASICs and FPGAs handbook
  4. ^ "Commercial Microelectronics Technologies for Applications in the Satellite Radiation Environment". radhome.gsfc.nasa.gov. Archived from the original on March 4, 2001. Retrieved May 30, 2020.
  5. ^ "Using StrongArm SA-1110 in the On-Board Computer of Nanosatellite". Tsinghua Space Center, Tsinghua University, Beijing. Archived from the original on 2011-10-02. Retrieved 2009-02-16.
  6. ^ a b c Shooman, Martin L. (2002). "N-Modular Redundancy". Reliability of computer systems and networks: fault tolerance, analysis and design. Wiley-Interscience. pp. 145–201. doi:10.1002/047122460X.ch4. ISBN 9780471293422. Course notes
  7. ^ a b c Dilip V. Sarwate, Lecture Notes for ECE 413 – Probability with Engineering Applications, Department of Electrical and Computer Engineering (ECE), UIUC College of Engineering, University of Illinois at Urbana-Champaign
  8. ^ Zabolotny, Wojciech M.; Kudla, Ignacy M.; Pozniak, Krzysztof T.; Bunkowski, Karol; Kierzkowski, Krzysztof; Wrochna, Grzegorz; Krolikowski, Jan (2005-09-16). "Radiation tolerant design of RLBCS system for RPC detector in LHC experiment". In Romaniuk, Ryszard S.; Simrock, Stefan; Lutkovski, Vladimir M. (eds.). Photonics Applications in Industry and Research IV. Vol. 5948. Warsaw, Poland. pp. 59481E. doi:10.1117/12.622864. S2CID 15987757.{{cite book}}: CS1 maint: location missing publisher (link)
  9. ^ A.W. Krings."Redundancy".2007
  10. ^ Sandi Habinc (2002). "Functional Triple Modular Redundancy (FTMR): VHDL Design Methodology for Redundancy in Combinatorial and Sequential Logic" (PDF). Archived from the original (PDF) on 2012-06-05.
  11. ^ Lyons, R. E.; Vanderkulk, W. (April 1962). "The Use of Triple-Modular Redundancy to Improve Computer Reliability" (PDF). IBM Journal of Research and Development. 6 (2): 200–209. doi:10.1147/rd.62.0200.
  12. ^ Brooks, Frederick J. (1995) [1975]. The Mythical Man-Month. Addison-Wesley. p. 64. ISBN 978-0-201-83595-3.
  13. ^ "Re: Longitude as a Romance". Irbs.com, Navigation mailing list. 2001-07-12. Archived from the original on 2011-05-20. Retrieved 2009-02-16.
  14. ^ R. Fitzroy. "Volume II: Proceedings of the Second Expedition". p. 18.
edit