Improving Industrial Functional Safety Compliance with High Performance Supervisory Circuits: Safety Critical Features—Part 3

By: Bryan Angelo Borres, Senior Product Applications Engineer, and Christopher Macatangay, Senior Product Applications Engineer, Analog Devices

Abstract

Using functional safety (FS)-compliant components in designing safety-related systems (SRS) provides several advantages, but in most cases, standard ICs (not developed to an FS standard) are employed. Designers can always achieve compliance from a system-level perspective alongside the practice of using standard parts for functional safety designs, especially since only a few existing ICs are rated for FS. For this reason, this third part of the series aims to provide some insights into the FS-critical features worth considering when selecting a supervisory circuit among all the available standard solutions.

Introduction

Part 1 of this series demonstrated how diagnostics act as the backbone of functional safety (FS) compliance. It highlights how supervisory circuits acting as diagnostic functions in a safety related-system (SRS) improve FS compliance through the three key requirements—systematic capability, reliability prediction, and architectural constraints. Systematic capability evaluates the effectiveness of quality management throughout the entire lifecycle of a product or service, from conceptualization to decommissioning. It emphasizes the goal of using a rigorous development process to avoid the introduction of systematic failures as well as the need for diagnostics to control such failures. Reliability predictions, on the other hand, indicate how likely the SRS is to fail with respect to time. Then, architectural constraints demonstrate the trade-off between the hardware fault tolerance (HFT), the ability to tolerate failures and safe failure fraction (SFF), or the tendency of the safety function to fail in a safe state—to which diagnostics are also mainly part of.

Part 2 of this series then introduced a way to improve SRS designs using FS-compliant diagnostic functions. Several advantages were noted, including:

  1. Having their own failure mode, effects, and diagnostics analysis (FMEDA).
  2. Having integrated safety features scoping several diagnostic functions.
  3. Having their own diagnostics to detect on-chip random hardware failures.
  4. Being future-proof against IEC 61508’s upcoming revision.
  5. Considering other countries’ safety standards and directives.
  6. Easing FS assessment.

Despite such several advantages, system designers may still opt to use non-FS-compliant components such as ADI’s FS-enabled and FS-evaluated parts.1 However, with the numerous available supervisory circuit solutions on the market, it may be difficult for

them to select the right solution. Thus, it is important to underscore what features of diagnostic functions are critical to FS, regardless of a component’s FS compliance. For this reason, this third part of the series enumerates four of the safety critical features preferable for diagnostic functions when designing an SRS and how such features affect FS compliance.

Power Supply Monitoring

The basic FS standard IEC 61508:2010, specifically in its second part, has two requirements related to power supplies.2 The first requirement pertains to the diagnostic coverage (DC) and SFF as found in Table A.1, specifying the faults or failures to be assumed when quantifying the effect of random hardware failures or to consider when calculating the SFF, and Table A.9, recommending diagnostic test measures alongside the maximum achievable DC. The second requirement recommends techniques and measures for each safety integrity level (SIL) for controlling systematic failures as shown in Table A.15 to Table A.17. For instance, Table A.16 indicates a diagnostic measure pertaining to measures against voltage breakdowns, variations, overvoltage (OV), undervoltage (UV), and others.

According to IEC 61508-2 Table A.1, a 60% (low) diagnostic coverage can be claimed if stuck-at faults are assumed when quantifying the effect of random hardware failures, whereas 99% (high) can be claimed if a DC fault model and/or drift and oscillation are assumed. Stuck-at faults refer to faults that can be described with continuous zeroes (low signal) or ones (high signal) at the pins of an element. A DC fault model includes failure modes such as stuck-at faults, stuck-open, open, or high impedance outputs as well as short circuits between signal lines.

For IEC 61508-2’s Table A.9, diagnostic measures recommended to detect or tolerate failures caused by a defect in the power supply can range from claiming a low DC with the technique of OV protection with safety shutoff or switch-over to a second power unit to claiming a high DC with the technique of power-down with safety shut-off or switch-over to a second power unit. This is shown in Table 1.

Table 1. Power Supply Recommended Diagnostic Measures According to IEC 61508-2 Table A.9

  Diagnostic MeasureMaximum Diagnostic Coverage Considered Achievable
OV protection with
safety shut-off
Low
Voltage control (secondary)High
Power-down with
safety shut-off
High

Such recommended diagnostic measures emphasize the use of proper power supply monitoring that must be able to appropriately detect OV and/or UV conditions in the required time and provide a signal to trigger a safety shut-off of the system may it be through a power-down routine or a switch-over to a second unit.

Monitoring Accuracy

The first parameter of power supply monitoring when designing an OV/UV detection mechanism is related to tolerance window and threshold accuracy. A window supervisor’s tolerance, or tolerance window, sets the UV and OV threshold in terms of percentage with respect to nominal value. For instance, for a window voltage supervisor with a nominal voltage value of 1 V and a tolerance window of ±3%, the UV threshold is set at 1 V × 0.97, and the OV threshold is set at 1 V × 1.03. These UV and OV thresholds from the monitor’s tolerance window, however, have their own set of tolerance specification, which is known as the threshold accuracy. Such threshold accuracy refers to the specification of the monitor related to its ability to deviate from its nominal or ideal threshold. Such parameters are shown in Figure 1.3

Power supply monitor

Figure 1. Power supply monitor accuracy and tolerance example from the MAX16193’s typical application.

Figure 1 shows an example of how a power supply monitor’s threshold accuracy affects an SRS design. For instance, a field programmable gate array’s (FPGA’s) core supply voltage (VCORE) is designed to operate from 1.07 V to 1.13 V to ensure correct operation. With a power supply output voltage of 1.1 V supplying for the FPGA’s VCORE, a power supply monitor needs to assert the reset before the core supply voltage becomes out-of-specifications—falling below 1.07 V or rising above 1.13 V. As shown in Figure 1, the OV and UV threshold of the power supply monitor was set to ±2.4% tolerance window. With the MAX16193’s threshold accuracy specification of ±0.3%, the monitor can trip at voltage levels of 1.1231 V for OV and 1.0769 V for UV. On the other hand, if the monitor accuracy is worse, that is, ±1%, the monitor can trip at much earlier levels: 1.1151% for OV and 1.0843% for UV. Such a design will not only require more accurate power supplies but can also cause spurious trips. For this reason, having a higher monitor threshold accuracy not only relaxes required power supply specifications but also minimizes false triggers.

Output Mechanism

After detecting OV and UV conditions, the next consideration will be the output response of the power supply monitor. As shown in Figure 2, this may be in terms of providing a signal to trigger a safety shut-off of the system, such as asserting a status signal to initiate a power-down routine or switch over to the second unit, or even by driving the gate of a transistor switch to disconnect the succeeding safety critical circuits. TUV SUD recommends that a reset of the microcontroller unit (MCU) is not appropriate after an OV event but only after a UV event if it does not persist.4 This is because an OV event can damage parts of an MCU without it being obvious, thus suggesting additional measures to address an OV event may it be through a switch-off, wider range of operation, etc.

Figure 2. An example architecture of power supply monitoring with both fault status and gate drive output using the MAX6399.

Fault Status Signals

The first example of an output mechanism is the fault status signal, which is used to communicate the state of the power supply or signal being monitored. As an example, it can be seen in Figure 2 that when an abnormality is detected in the 12 V input, the POK signal will be asserted. This assertion sends the MCU a signal that an abnormality is detected, which demands the need for the MCU’s reset.

Gate Drive

Another example of an output mechanism is the gate drive that is used to control a transistor switch for protection and isolation purposes. Such a gate drive output helps bring the system into a safe state by sending a signal to the transistor to connect the main power supply input to the downstream circuitry during normal conditions or disconnect it to protect the succeeding circuits during an OV event as shown in Figure 2. For this reason, choosing the right MOSFET5 is also as crucial as designing the gate drive parameters6 to achieve optimal performance.

Output Operation

Output operation is also a critical factor when designing for FS as it affects how the system will respond once a failure occurs. There are two types: latched and non-latched operation. Latched mode is when the output will be asserted permanently unless a power recycle is done as in the MAX16126, or a clear latch signal is triggered as for other supervisory circuits. In contrast, non-latched mode—that is, always auto-retry mode—will automatically de-assert the system when the fault disappears.

Other Considerations

Output topology and polarity are also important to FS as different options provide different operations. For instance, an open-drain topology for the fault status signal can provide isolation of the succeeding circuit to a voltage supervisor’s input power supply. In terms of polarity, using an active low reset signal for safety-critical functions—that is, brakes and emergency stops—ensures that if a control signal fails, the system defaults to a safe state as opposed to active high polarity, which may become faulty during a power supply failure event.

On-Chip Diagnostics

On-chip diagnostics refers to the features of an IC that allow it to detect its own failures. This can be done by an IC’s internal safety mechanisms like the built-in self-test (BIST)7,8,9, which can then be used to do self-testing and periodic testing. For instance, the BIST feature of the MAX16138 can be initiated during powerup or normal operation. In turn, BIST allows the component to check its internal comparators or digital circuits automatically, which reduces the likelihood of random hardware failures, thus decreasing the device’s probability of dangerous failures.8

The basic FS standard (IEC 61508-2) states that the use of design for testability (DFT) techniques is recommended to prevent the introduction of faults during the design and development of SRS. As a type of DFT, BIST is a hardware structure that produces test data, applies it to the circuit under test (CUT), collects the output response, and verifies that the output is correct.9 This can be seen in Figure 3.7 Other industry-sector standards such as IEC 62566, IEC 60987, IEEE 379, IEEE 7-4.3.2, and NUREG/CR-7006 require similar testability measures as well. The NUREG/CR-7006 particularly suggests the inclusion of BIST to monitor the health of the FPGA system as BIST logic in instrumentation and control systems is used to monitor entire functions such as bus activity, the occurrence of erroneous data, and time-out circuits.9

Figure 3. A BIST circuitry block diagram.

On-chip diagnostics can also be in the form of a RAM test, Flash CRC check, or output tests. When performed at each start-up of the system, these tests are important as they are used for argumentation on diagnostic measures or for fault detection time in the safety analysis.4 Thus, any information regarding the operation of such diagnostic measures shall be specified in the safety manual.

Therefore, a component having on-chip diagnostics such as BIST improves component- and system-level implementation as well as enhances FS compliance.

Watchdog Timer Architecture

With the prevalence of microcontroller usage in power supplies and SRS, another safety feature worth noting for high performance voltage supervisors is the watchdog timer (WDT) feature. History has shown how WDTs serve as the system’s failsafe or the last line of defense when all else fails, especially when using an MCU in safety-critical applications.

Watchdog timers fall under the basic FS standard IEC 61508’s requirement on program sequence monitoring as a diagnostic measure to control systematic failures caused by hardware design and environmental stress or influences. This is shown in the IEC 61508-2:2010 Table A.15 and Table A.16, which also refers to WDTs as highly recommended regardless of safety integrity level and diagnostic coverage.

As opposed to the built-in WDT feature of a microcontroller, an external or independent WDT, which is typically in the form of a supervisor IC, is desired to eliminate common cause failures and simultaneously ensure that the safety function will still trigger if the MCU fails. An example of a WDT implementation in an SRS is shown in Figure 4. This is also consistent with the view of TÜV SÜD in their discussion on watchdogs and microcontrollers.4

Figure 4. An example WDT implementation in an SRS using the MAX16058.

Table 2 and Table 3 show the IEC 61508-2’s Table A.10 and Table A.11 with the maximum diagnostic coverage considered achievable per type of watchdog—with separate time base and with or without time window—when used for a program sequence and clock, respectively. This means that a simple watchdog timer can claim a maximum DC of 60% to 90% when diagnosing failures of a program sequence as well as a clock. On the other hand, a windowed watchdog can claim 90% to 99% when used for a program sequence and can be at least 99% for a clock. Such claims are useful when doing the FMEDA where a DC is a component for the safety analysis and metrics calculations.

Table 2. According to IEC 61508-2’s Table A.10—Program Sequence (Watchdog)

Diagnostic Technique/ Measure   Maximum DC
  Considered Achievable
Watchdog with separate time base without time windowLow
Watchdog with separate time base and time windowMedium
Logical monitoring of
program sequence
Medium
Combination of temporal
and logical monitoring of program sequences
  High
Temporal monitoring with online checkMedium

Table 3. According to IEC 61508-2’s Table A.11—Clock

Diagnostic Technique/ Measure     Maximum DC
      Considered Achievable
Watchdog with separate time base without time windowLow
Watchdog with separate time base and time windowHigh
Logical monitoring of
program sequence
Medium
Temporal and logical monitoring of
program sequences
  High
Temporal monitoring with online checkMedium

Depending on the watchdog timer architecture10 and application as shown in Table 2 and Table 3, the maximum claimable diagnostic coverage also varies. Notably, windowed watchdog timers have more fault coverage11 as compared to simple ones (nonwindowed) resulting in a higher claimable DC for the windowed architecture.

Conclusion

The main goal of this article is to provide insights into several supervisor IC features that are critical when designing safety-related systems regardless of the IC’s FS compliance. This article discussed the importance of power supply monitoring accuracy and output mechanism, on-chip diagnostics, and watchdog architecture in the context of FS standards and an external assessor’s point of view. Furthermore, corresponding examples of architectures as well as design considerations for each diagnostic measure needed in the design of an SRS were provided. Stay tuned for the next part of this series as we discuss how to use watchdog timers as part of program sequence monitoring.

About the Authors

Bryan Angelo Borres is a TÜV SÜD-certified functional safety engineer who currently works on several industrial functional safety product development projects. As a senior power applications engineer, he helps system integrators design functionally safe power architectures that comply with industrial functional safety standards such as the IEC 61508. Recently, he became a member of the IEC National Committee of the Philippines to IEC TC65/SC65A and IEEE Functional Safety Standards Committee. Bryan has a postgraduate diploma in power electronics and around seven years of extensive experience in designing efficient and robust power electronics systems.

Christopher Macatangay is a senior product applications engineer in the Multimarket Power–East Business Unit. He is currently supporting HPS products with functional safety features and new product development. He joined Analog Devices, Inc. as a product applications engineer in 2015. Before his tenure at ADI, Christopher gained six years of experience as a test/product development engineer at a power supply company. He holds a bachelor’s degree in electronics and communications engineering from Adamson University.