The electronics industry has grown tremendously over the last 50 years, and performance and cost advances have enabled capabilities beyond what anyone could have imagined in the early semiconductor days.
The integrated circuit (IC), which recently turned 50 years old, is unquestionably one of the greatest inventions of all time. This technology has transformed our world and is incredibly ubiquitous in our civilization. However, ICs have vulnerabilities that can affect long-term reliability, leading to premature failure of the systems that employ them.
These vulnerabilities are especially critical to the defense industry, since most equipment used by the military contains ICs. But the military is only one user that requires long-term reliability. Industrial control systems such as programmable logic controllers (PLCs) are expected to operate in harsh environments and run uninterrupted for decades. These systems run our infrastructure and provide energy, clean water, and sanitation, as well as goods and services to millions of people.
The automotive industry has used ICs in cars almost as long as they’ve been available, in engine controllers, anti-lock braking systems, chassis controls, transmissions, and driver comfort and assistance systems. Many of these systems (such as air bags) are critical to occupant safety, so reliability is paramount. However, when automobile manufacturers produce millions of cars in a year, the economic impact of a recall due to the premature failure of a component could be devastating.
There are many excellent reasons to pursue reliable ICs. But why do they fail? What can be done to improve reliability, and at what cost?
Why Do ICs Fail?
The process of making ICs sheds some light on the factors that can cause premature failures. The majority of ICs start with a base wafer of pure silicon, which is fabricated through various methods (Czochralski pulling, horizontal gradient freeze, etc.). The purity of the crystal structure is extremely critical to the performance of the circuits built on top, so an epitaxial layer of pure silicon is often deposited on the wafer.
Throughout the process of building layers, which deposit impurities (doping, for example) that change the electrical properties and implanting ions, heat is used to either diffuse atoms or anneal the wafer to fix dislocations in the crystal. Once completed, all of these impurities, oxides, and metal traces need to stay in place. However, thermal motion can move them, as it did during fabrication. Elevated temperatures will degrade ICs for this reason and are part of the equation for reliability.
Another phenomenon is metal migration, often referred to as electromigration. This is the movement of metal conductors due to the electron wind caused by current flow. Devices with high current densities are more susceptible to this problem. As IC geometries shrink, the current densities in the conductive traces increase as well. This increased current density can lead to metal migration, causing short or open connections depending on where in the circuit the metal moves—and how much.
Modern computer tools used for the design of ICs take this effect into account and can correctly size and space conductors to minimize this type of failure. The effect is real, though, and continues for as long as the circuit is in operation.
To estimate the mean time to failure (MTTF) of an IC due to electromigration, James Black of Motorola developed an empirical model in 1969 now referred to as Black’s equation:
where A is a constant, j is the current density, n is the model parameter, Q is the activation energy, k is Boltzmann’s constant and T is temperature.
Black’s equation is not a physical model, but an abstract construct that can estimate the MTTF due to electromigration (based on empirical data acquired at elevated temperatures), which is then applied at the nominal operating temperature. It’s flexible enough to take into account materials and electrical stress (current density), but doesn’t account for other failure mechanisms. Due to this limitation, high MTTF values are suspect and can’t accurately predict failure rates, but they do provide some indication of electromigration failures.
A strange issue that has plagued the electronics industry since the very beginning is tin whisker growth. Tin (atomic symbol Sn) has been used as a solder and interconnect coating for electronic components since the days of electron tubes. The use of pure (or almost pure) tin caused small whisker-like hairs of metal to grow perpendicular to the surface and could extend out many millimeters, easily reaching an adjacent connection. This phenomenon would occur even in the absence of electric fields or while in long-term storage.
Adding a very small percentage of lead (Pb) to the solder significantly slowed down the growth process. But given the push to remove lead from consumer electronics, this issue is back in focus for both the military and companies that produce lead-free solders.
ICs are also susceptible to various types of ionizing radiation, such as gamma or X-rays that strip electrons from the atoms of semiconductor material (ionization) or high-speed particles (or subatomic particles such as neutrons) with enough energy to displace the atoms themselves. This effect can occur anywhere radiation is present. However, it increases with altitude due to either a loss of atmosphere (less atoms to interact with before reaching the IC), or moving beyond the magnetosphere, where the Earth’s magnetic field deflects high-energy charged particles flowing from the sun (and other cosmic sources). Radiation can lead to temporary malfunctions or permanent damage to the IC, depending on the energy level and time of exposure.
Lastly, there’s the packaging used to protect and interconnect the IC. Packages can be very complex, varying as widely as the ICs they contain. They’re typically made with a conductive alloy lead frame that holds the IC die (or dies) and provides some mechanism of connecting it to the lead frame (flip-chip direct die attach or metal wires). The lead-frame assemblies are encapsulated with various materials, such as epoxy or ceramic, to protect the IC lead-frame assembly and provide mechanical stability.
Packages can have many different failure modes, including a loss of hermeticity (allowing contamination of the die), mechanical failures such as delamination or cracking, and many others.
Reliability Options
To ensure uniform quality, the defense industry created various standards in the 1970s. The Joint Army Navy (JAN) task force introduced the military general standard 38510 (MIL-M-38510), along with standard test method 883 (MIL-STD-883). To qualify, manufacturers had to submit device candidates to the Military Parts Control Advisory Group (MPCAG). Once approved, they would be listed as a qualified source to defense industry designers, which included periodic audits to confirm compliance. The additional testing and screening increased costs substantially, but guaranteed the performance and reliability required for harsh military environments.
In the 1980s, amid an increased number of military system failures, semiconductor manufacturers requested that the government improve the process. In 1986, the U.S. Secretary of Defense announced the Standardized Military Drawing (SMD) program for microcircuits, based on previous work done by the Defense Electronics Supply Center (DESC) in the late 1970s. This new program added various requirements to existing documents and renegotiated with manufacturers for compliance. Today, the SMD program is used throughout the defense industry and provides a uniform and cost-effective way to procure high-reliability devices.
As the automotive industry began using more ICs, quality and reliability became a greater issue. In the early 1980s, Chrysler, Ford, and Delco Electronics (a division of General Motors) formed the Automotive Electronics Council (AEC). The AEC developed several standards, most notably AEC-Q100, which specified stress testing qualification for ICs. The Q100 specification had similarities to its military cousins, but was created with the economics of the automobile industry in mind. Today, Q100-qualified components are available from most IC manufacturers and are an excellent alternative to standard industrial devices for improved reliability with reasonable cost.
Enhanced (Plastic) Products
Early in the 21st century, the military and defense industry began trying to lower the cost of electronic systems. They approached manufacturers about how to provide high-quality and reliable components that could still operate in harsh environments, but at a lower cost.
The table compares various semiconductor device qualifications.
Texas Instruments, along with other suppliers, responded with families of high-reliability plastic devices called enhanced product (EP) or enhanced plastic (see figure). For instance, the TPS7A4001-EP is the enhanced version of the commercial TPS7A4001 high-input-voltage low-dropout (LDO) regulator. The EP version has similar specifications to the commercial device, but provides an extended military temperature range (−55°C to +125°C), gold-bond wires, and no pure tin to mitigate whisker growth.
The EP product family also adds traceability and several other reliability enhancements, such as improved die attach, to mitigate delamination. Devices follow a similar flow to Q100, but use a controlled baseline that ensures more uniform performance between wafer lots. Other suppliers provide EP products that follow similar flows, with the same concept of providing high reliability economically.
Another major issue that the defense industry faces is obsolescence. Because of how the government orders systems, a product may stay in production for more than 20 years. Many consumer products have production life spans of six months to a few years (like cellphones). If a component is no longer available following a production run of several million units, it’s less of an issue. However, a military system can be extremely complex, and redesigning because of obsolescence can be costly.
Since EP products follow a single flow, even in situations where the commercial device is no longer available, EP devices have a much longer life span. So while the performance of EP devices is also enhanced, so is the availability.
The push for EP products is not only driven by the military, but the industrial markets as well. Many industrial systems must work in harsh environments and have reliability measured in tens of years. Applications such as power plants, water-treatment facilities, manufacturing sites, and many others rely on computerized controls that operate continuously and often require system shutdowns to replace. These systems can also benefit from EPs that add additional reliability over time—especially where long service life is required.
The Future of High Reliability
With space becoming more accessible, especially for low earth orbit (LEO), the cost of satellites (specifically smaller payloads such as cube-satellites) is a prime concern. Both the defense industry and commercial space systems providers are looking for an EP-like product for short-mission-life applications (such as LEO). So imagine a family similar to EP but with a known radiation tolerance—enter space EP (SEP).
As shown in the figure, SEP lies between QMLQ (Qualified Manufacturer List: Space) and QMLV (Qualified Manufacturer List: Hermetic) quality levels, but uses plastic packaging. It’s intended to be radiation-tested and assured, but removes expensive burn-in and wafer-lot life tests to control cost. With many LEO applications lasting three years or less, SEP is an economic compromise to more costly, QMLV fully space-qualified devices, which are intended for longer missions or higher radiation levels. SEP fills the gap for expendable lower-cost satellite and nonstrategic radiation applications, such as high-altitude unmanned aerial vehicles (UAVs).
Conclusion
Without reliable electronics, our world would look much different. Since the invention of the IC, the semiconductor industry has made great strides in improving the reliability of even the most commonplace devices. As we move farther into the 21st century, the requirements for additional levels of performance and reliability will continue to expand supplier portfolios with enhanced products that are both economical and highly reliable.
Space will drive the next level of reliability, adding radiation and high-altitude requirements. The physics of semiconductor processes continue to move to the ultra-small while continuing to mitigate failure mechanisms at the atomic scale. It’s a never-ending battle between cost and performance; component families such as EP and SEP will fill the required reliability versus cost gaps.