
The latest accomplishments of artificial intelligence (AI) powered by gigascale cloud computing systems continue to astonish ordinary consumers and computer scientists alike. AI can perform functions such as writing a business report, creating a clothing retailer’s fashion shoot, or discovering a new molecule. This is now accepted as a simple fact of life, where even as recently as 2020 it would have seemed like a sci-fi fantasy.
The potential for transformative computing ability is as salient for embedded device manufacturers as it is for consumers using cloud AI services today. Imagine devices that can respond in real time to your voice, predict when your machinery needs maintenance before it breaks down, or analyze your environment instantly – all without sending data to the cloud.
The competition for consumer dollars will be won by those manufacturers that can adapt to take advantage of the rapidly advancing technologies of AI to provide compelling new user experiences in edge AI applications such as portable and wearable devices.
But to realize the ambitious visions for AI that are currently being pursued by the leading lights of the consumer device industry, the embedded world is going to have to master the challenge of scaling the power consumption, cost, and size of AI hardware and software to fit highly constrained resources.
Certain types of microcontrollers (MCUs) seem certain to be at the heart of whatever form embedded devices take in this exciting new era of embedded edge AI. But the MCU industry needs first to radically retool the architecture of its products to provide the performance that new AI-enabled devices will need while staying within the tight power budget of tomorrow’s battery-powered products.
Why AI is essential at the edge
The amount of data generated by edge devices is growing exponentially, driving an urgent need for on-device AI processing. The global datasphere, which includes all data created worldwide, is expected to skyrocket, reaching 150 zettabytes (ZB) by 2025 and more than quadrupling to close to 600 ZB by 2030. With billions of devices (from wearables to industrial IoT sensors) coming online, the sheer volume of data will overwhelm traditional cloud infrastructures.
The more data is produced, the more essential the role that AI and machine learning (ML) play in interpreting the data to produce meaningful insights – and, in turn, the more data is generated on the basis of the insights. Not to mention that a subset of data from the edge is used to train improved data-generating models. This data-driving-data feedback loop is a huge factor in the exponentially exploding datasphere.
This huge volume of data produced and analyzed at the edge is now sparking a shift in how embedded systems process data, requiring local processing at the edge.
Four reasons for performing AI inferencing at the edge
Enterprise and smartphone-based AI systems typically perform many AI processes in the cloud. But for embedded devices that have constrained resources, local processing offers four big advantages:
- Reduced latency for real-time response: applications such as predictive maintenance or health monitoring cannot afford the lag that comes with cloud processing. For example, a health device detecting abnormal heart rhythms needs immediate action, which local processing can enable.
- Bandwidth efficiency: the constant transmission of vast volumes of data to the cloud would strain networks and become unsustainably expensive, especially as the volume of data grows. Local processing reduces bandwidth demands by handling data on-device.
- Privacy and security: certain applications, such as health monitoring or surveillance, require that sensitive data remain on-device. Local AI operation allows data to be processed without ever leaving the device, bolstering privacy.
- Energy efficiency: the communication of data between the device and cloud consumes enormous energy resources which drain the battery in an edge device. Processing at the edge conserves energy, enabling battery-powered devices to operate substantially longer between charges.
The combination of these benefits is why edge AI is seeing rapid adoption across industries. In consumer electronics, it enables real-time language translation on wearable devices. In industrial IoT, it empowers sensors to monitor machinery health, signaling maintenance needs before a breakdown. In healthcare, wearable devices are monitoring vital signs locally, transforming reactive care into proactive insights (see Figure 1).

For certain high-stakes or real-time applications, local AI processing isn’t just an improvement—it’s a necessity. Applications such as health monitoring, predictive maintenance, and interactive consumer devices cannot achieve their full capabilities if they rely on cloud processing alone.
For instance, wearable health devices that detect heart arrhythmias or sudden changes in vital signs need to analyze data instantly to alert users to potential health risks in real time. Any delay from cloud processing could make the difference between early intervention and a missed warning. Similarly, industrial IoT sensors used for predictive maintenance on factory equipment must detect anomalies immediately to prevent costly shutdowns and equipment damage. Sending data to the cloud would add a level of delay that could undermine this proactive approach.
In consumer applications, augmented reality (AR) glasses are another example. For these devices to deliver seamless, on-demand information about the environment—such as identifying landmarks or translating text in real-time—they need processing power on-device, without reliance on the cloud, to ensure that interactions are quick and natural.
These examples underscore why local AI is not a nice-to-have but an essential capability for edge devices. Edge processing is the only way to meet the demands of real-time, secure, and energy-efficient data handling, sparking a major shift in how we think about data processing.
What edge AI can do—key applications and model types
The desire to bring these innovations to the edge has led to a surge in the development of machine learning (ML) model optimizations. Techniques such as model pruning, compression, and quantization all help to reduce the size and computational footprint of ML models while preserving their accuracy. Dedicated models are being tailored to meet the needs of specific edge applications across different industries.
- Image classification, segmentation and object detection: Convolutional Neural Networks (CNNs) excel at analyzing visual data, making them ideal for applications such as facial recognition in smart home devices or object detection in AR glasses, as well as gesture and posture recognition.
- Generative AI and Natural Language Processing (NLP): transformers, originally designed for large-scale language tasks, are now being refined into specialized, lightweight Small Language Models (SLMs) that can run directly on edge devices. These SLMs power voice commands, real-time translation, and voice-activated interfaces, giving users more intuitive, seamless interactions with their devices.
- Predictive Maintenance and Anomaly Detection: Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) models excel at handling time-series data. Industrial machinery and vehicles are equipped with edge sensors that monitor vibration, temperature, and pressure data. These models can use that data to detect anomalies, and alert operators to maintenance needs before failures occur (see Figure 2).

Some capabilities found in cloud-connected devices are out of reach of embedded devices – but manufacturers are impatient to implement them. This impatience is particularly acute in the case of Large Language Models (LLMs) such as ChatGPT. LLMs have sparked interest in natural language processing, and the potential to enable new, intuitive voice interfaces at the edge: MCUs in edge devices will sooner or later need to be capable of supporting this technology.
In the consumer world, manufacturers see broad potential for AI in previously unthinkable environments for AI data processing, such as wearable devices. Manufacturers are eyeing the potential of the market for smart glasses, for instance, in which the implementation of AI could be transformative. It can provide the wearer with ‘killer apps’ such as translation of speech or text in real time, natural-language voice interpretation, enhanced search functions, or context-aware AI assistance related to objects in the field of view.
So the question is, how are embedded device manufacturers to scale up the performance of their next generation of products to meet the needs of future AI technology, without also scaling up power consumption and cost?
What this means for MCU design: adapting to the demands of edge AI
The experience of edge embedded device manufacturers today shows that the MCU’s capabilities do not match the requirements even of today’s ML models, without taking into account their future evolution. The potential to add value, particularly to wearable audio/visual devices, is going unrealized because the systems-on-chip (SoCs) currently on the market are too power-hungry and too large, and lack crucial compute resources.
This is not to say that there is no scope to implement AI on edge MCUs today: MCUs that are optimized for AI functionality, including the Ensemble and Balletto family products from Alif Semiconductor, are enabling valuable breakthroughs in the deployment of voice, image, and motion analysis processing, particularly using CNNs (see Figure 3).
But there is a rich array of new capabilities that new transformer-based SLM models and emerging models including Graph Neural Networks (GNNs) and Spiking Neural Networks (SNNs) will enable, and that are currently beyond the scope of today’s AI-oriented MCUs.

The next generation of MCUs will need to upscale hardware capabilities in three areas. First, they will need to integrate specialized neural networking compute capability in the form of a neural processing unit (NPU). Future embedded devices will need to offer performance of more than 1 TOPS while consuming so little power that they can be deployed in products such as earbuds with very small batteries while still providing for day-long use between charges.
Second, the new edge AI systems running models such as SLMs will have much bigger memory requirements, both on- and off-chip. This suggests that MCU manufacturers will need to introduce faster and lower-power interfaces between the MCU and external memories. The architecture of tomorrow’s MCU will also need to evolve to provide pipelines fast enough to support the operation of bigger and faster memories.
Last, the familiar trend of integrating more and more functionality into the MCU needs to be intensified to enable the implementation of sensor-rich AI-enabled systems in an edge node’s small footprint.
A new future: compute embedded in the fabric of people’s everyday lives
The opportunity for the MCU world is not only about increasing its footprint in one or two new categories of edge devices: we are potentially seeing a tectonic shift in demand for compute capability, moving from CPU- and GPU-powered hubs to a world in which most people’s interaction with the digital domain is via an MCU.
In this new, more distributed compute environment, AI-driven functions will be implemented seamlessly and with low latency at the edge, in devices such as smart glasses, or earbuds with enhanced audio. They will provide a more natural user experience in which technology is embedded in the fabric of the user’s life, rather than constantly drawing their attention to their smartphone or PC – a world in which embedded devices take center stage.
The potential impact is profound: as more tasks are delegated to smart devices that are embedded in the environment and which can communicate with each other – whether health monitoring, energy optimization, security, or other applications – our reliance on smartphone apps and PC-based internet services will diminish. This diffused, decentralized intelligence challenges the monopoly that centralized devices and services currently hold. It would allow us to interact with technology in a more seamless, personalized, and less intrusive way, improving our personal and professional quality of life.
And at the center of this world will be a new generation of MCUs based on architectures optimized for the new technologies of AI and machine learning.