This website uses cookies primarily for visitor analytics. Certain pages will ask you to fill in contact details to receive additional information. On these pages you have the option of having the site log your details for future visits. Indicating you want the site to remember your details will place a cookie on your device. To view our full cookie policy, please click here. You can also view it at any time by going to our Contact Us page.

Design for reliability reduces life cycle costs of industrial edge devices

Author : Steve Ward, Emerson

12 May 2022

When sourcing industrial edge devices, cost is usually one of the most significant selection criteria. However, for those involved in the procurement process, it is important to look at costs holistically, including not only the unit purchase price, but also integration and deployment costs, and product support and maintenance costs over its life cycle.

In addition, secondary, indirect costs associated with loss of service in the event of a device failure must be carefully considered. Depending on the application, a selection based solely on the lowest unit price and ignoring life cycle costs could potentially result in costs associated with loss of revenue, recalls, early re-qualification and replacement, and legal penalties many magnitudes higher.  Understanding what influences the reliability and life cycle of industrial edge devices is critical to ensure the selection criteria is expanded to include reliability and availability over time. 

Design for reliability

Design for reliability is the practice of increasing product reliability by design, focusing on maximising the life of the product, minimising service requirements, increasing uptime and the intervals between re-qualification, and reducing productivity losses. These goals are achieved by increasing mechanical robustness, reducing operating temperatures, overprovisioning components, using long-life circuitry and parts offering higher reliability and avoiding components with known shorter life cycles, maximising assembly quality and implementing solid configuration control processes.

At first glance, this concept sounds expensive, but in practice, the cost to produce a very reliable product designed to provide a long service life is relatively small and more than justifies the additional investment when considered over the lifetime of the device.

Operating temperature

It is well known that heat is an enemy of electronics. The life expectancy of most electronic components, such as semiconductors, capacitors, PCBs and solder joints, starts to decrease exponentially with an increase in operating temperature. Therefore, when designing an edge device, it is critical to reduce the operating temperature of the electronic components as much as possible. This is best achieved by optimising the heat removal paths (increasing heat conductivity) all the way to the heat sources (semiconductor die). For example, Emerson avoids the use of sockets and all components, including CPUs and memories, are soldered. This helps to create a conductive thermal path, not only to the CPU, but also to memory devices, chipset components and DC/DC converters. As a result, the industrial edge devices typically operate at 10°C cooler (Fig 1), which increases component life expectation.

Mechanical robustness

Mechanical robustness, including resistance to shock and vibration, is typically associated with applications in harsh environments. However, over the course of 10-20 years, even in benign or controlled environments, mechanical robustness plays a significant role. Consider the medical CT scanner or semiconductor processing machine. First, all devices within these machines are exposed to shock and vibration during transportation. During periodic machine maintenance, the industrial edge device may be removed and reinstalled, with cables and connectors unplugged and plugged. Thermal cycles during normal operation can result in mechanical expansion and contraction. Finally, there are chemical ageing processes, which play a significant role over the long lifetime of an edge device.

Industrial edge devices must be designed with a focus on cost-effective robustness. For example, housings made of sturdy aluminium and internal PCBs with extra mounting points to reduce flex and stress on the solder joints. The device should have all components soldered down to minimise the number of internal connectors and cables. Mechanical mounting support for plug-in PCIe cards can add extra stability and robustness.

Long-life circuitry

Electronic equipment life cycle theory shows that in the early part of a product’s life cycle, there is a higher failure rate due to component infancy failures and undetected manufacturing defects. Later in the life cycle, there is a gradual increase in failures due to a variety of ageing factors. Typically, these result from capacitors losing their capacitance, causing power supplies to either fail or drift out of specification, poor connection quality affecting high-speed signalling and semiconductor devices drifting out of specification.

Class-leading industrial edge devices are designed to provide 10+ years of active service at their maximum operating temperature. This is achieved by derating and overprovisioning components, such as using capacitors with higher voltage margin, designing in extra capacitance, sizing DC/DC coils for extra current and using higher-performance FET switches. Additionally, higher-quality components are used, such as X7R capacitors, higher-quality crystals, and connectors from proven vendors. Industrial edge devices should be conductively cooled, avoiding the use of fans, which are one of the biggest reliability issues.

Figure 1. Operating temperature comparison
Figure 1. Operating temperature comparison

Computer technology choices

The choice of drive will impact the reliability and life of the device. Solid state drives (SSDs) are commonly used in industrial edge devices, benefiting from having no moving parts to wear out over time and withstanding greater levels of shock and vibration than rotating hard disks. Industrial SSDs that cope with extreme operating temperatures increase reliability in demanding applications.

Rotating hard disks can offer longer life where there are many, small data writes, such as in some database applications.

A hybrid solution employing an SSD for the operating system and a rotating hard disk for the database may be used. Rotating hard disk reliability can be improved using redundant disks in a RAID array with removable drive cages to allow online replacement of failed disks. Error Correcting Code (ECC) memory provides additional data integrity, using additional storage bits to detect and correct single-bit errors. ECC is used in many demanding applications to guard against corrupted data and prevent random operating system crashes.

Assembly quality and test

Even with a robust mechanical design, the use of high-quality components, derating, overprovisioning and reducing operating temperature, a long service life would not be achieved if the product assembly quality and production tests were poor. Industrial edge device vendors must ensure every device they manufacture undergoes an extensive functional test, which not only checks if the connection between two components is established, but also if high-speed links are functional and bit error rates are within expected limits. Every unit must undergo a thermal test, which measures thermal conductivity between the processor die and external heatsink. Test results and logs should be stored for future reference.

Configuration control

The design for reliability must encompass all phases of the product’s life cycle, including configuration control. Electronic components frequently change or become obsolete, forcing the use of alternatives and the need for design changes. Component churn can quickly get out of control, resulting in two edge devices that are seemingly the same, having slightly different functional behaviours. For certain industries, such as semiconductor manufacturing, it is critical to prevent this product variation because even seemingly benign design changes can have noticeable effects on the process controlled by the edge device.

For industries that require stricter change control, Copy Exact is a process whereby all delivered edge devices are exact copies of the original units used for qualification. Should there be any unavoidable component changes, the customer is informed and the amended device is first tested and approved by the customer before being implemented, with clear tracking of the IPC version provided.

Modular

One of the fundamental design decisions for industrial edge devices is whether to use a modular or monolithic approach. Both offer pros and cons. For example, a monolithic design, with the motherboard containing the CPU, chipset, memory, all I/O controllers, interface circuitry and connectors, allows for a lower manufacturing and unit cost. However, by dividing the functionality between the main CPU and I/O modules, a modular approach provides a number of reliability benefits and also supports reduced life cycle costs.

In terms of reliability, a modular design offers the opportunity to decouple high-temperature components from low-power/low-temperature components, resulting in a high operating temperature. It also enables industrial edge devices to meet high levels of shock and vibration. Perhaps more critically, modular designs decouple the CPU and I/O life cycles, helping to reduce life cycle management effort and cost.

With rapidly changing consumer demand, modern manufacturing needs to be able to adjust production lines rapidly. That requires industrial edge devices to be cost-effectively updated, preventing the need for complete retrofits to take advantage of advancing technology and functionality that will help meet changing production demands. 

Companies are also striving to improve their maintenance strategies, including spare part management. Wherever possible, companies want to minimise their inventory without affecting the speed at which they can respond to part failures or required system configuration changes.

A monolithic design is based around a specific generation of CPU, so upgrading to a different CPU requires replacing the whole board, and any customisation for a specific customer requirement means a complete design change, along with validation testing to a complete motherboard. 

Adopting a modular approach helps to future-proof the industrial edge device by enabling new components to be implemented to an existing device without having to redesign and approve an entirely new control unit. As technology progresses or greater functionality is required, components can be easily replaced.

Conclusion

There is a wide selection of industrial edge devices available on the market, which seemingly all have the same performance characteristics (CPU, memory, interfaces, temperature range).

However, not all edge devices are created equally, with some designed for greater reliability and long service life. Choosing the lowest-price edge device may save a few euros during procurement, but could result in millions of euros of productivity loss, service costs and material scrap costs, not to mention the lost opportunity costs of an engineering team being continuously distracted by debugging and service tasks. 

Selecting an edge device explicitly designed for 10+ years of service life ensures the end user can remain focused on the tasks that really matter.

Visit Emerson's website here.


Contact Details and Archive...

Print this page | E-mail this page