Alexa Voice Service: New Reference Design for STM32 Embedded Systems

A blog by STMicroelectronics

ST just launched its qualified Alexa Voice Service for AWS IoT reference design. Its single-chip approach makes it a cost-effective and highly integrated platform for natural language applications on embedded systems. ST’s board uses a simple PCB and an STM32H743. It also includes two MP23DB01HP MEMS microphones, a Wi-Fi module, an audio amplifier, and a speaker. The platform thus solves massive hardware hurdles by tackling complex electronic challenges for engineers looking to build smart products. And since the Amazon qualification also extends to the software stack, teams know that choosing the ST reference design also means benefiting from features and protocol implementations that will enable them to bring their final product to market faster.

Doing the Hard Work For You

Put simply, the ST reference design is a far-field device that meets Amazon’s strict requirements. For instance, the system must recognize and process a user’s voice across different ambient noise levels. Similarly, distance is also a critical factor. Hence, the product must work even when speaking to it from three to four meters away or up to 13 feet. We also had to meet low false rejection rates, or false negatives, and false acceptance rates, or false positives. An overly precise system suffers from too many false negatives, while excessively loose settings get too many false positives. The fact that ST received the Amazon qualification thus means that we solved these significant challenges, and more, so our customers wouldn’t have to deal with them. Therefore, let’s look at difficulties engineers might face if they tried to build such a system from scratch.

Alexa Voice Service: The Hardware Challenges, from MCU, and Wi-Fi to Microphones

When the MCU, Memory, and Wireless Stack All Work For You

The ample computational throughput and extensive memory capabilities of the STM32H743 mean that developers can run the audio front end processing, the local Alexa wake word detection, the full connectivity stack, and the audio playback layer without needing external memory or a discrete DSP. The PCB is also more straightforward, and the whole bill of materials is far more cost-effective. We also have a Wi-Fi module that teams can reuse, but we know that companies may choose a component they qualified internally. Hence, to make our platform as flexible as possible, we used our Wi-Fi module in bypass mode. As a result, most of the software governing wireless interactions is on the STM32. Engineers can thus easily switch Wi-Fi devices, use different drivers, and expect to get the Amazon qualification quickly.

A Small Form Factor in a Modular Approach

A small smart home device with Alexa Voice Service built-in can fit in almost any smart embedded products, such as appliances. The problem is that it becomes a lot harder to capture sound accurately. Indeed, when microphones are very close to each other, signal processing becomes far more challenging. The fact that we received the Amazon qualification, despite a space in between mics of only 36 millimeters, means that teams can enjoy a small form factor and benefit from the solutions we came up with to overcome this challenge. Designers can put two mics only 25 millimeters apart and still enjoy the same performance from our audio front end.

ST also understands that some teams may simply want to reuse the audio capture hardware they already created. To be practical, a reference design must also be modular so designers can pick what’s most pertinent for them. Hence, we put the microphones and the audio codec FDA903D on a separate board. Engineers only interested in using our STM32 and the software implementation can grab the motherboard and leave the rest in favor of their creation. The reference design also includes an extension board with a USB port for more accessible programming and debugging operations.

Alexa Voice Service: The Software Challenges, from Audio Processing to Acoustic Considerations

Audio Processing and Wake Word

The software stack of the reference design is first there to implement the Alexa Voice Service for AWS IoT protocol. It ensures that customers rapidly connect to Amazon’s servers. However, before sending the signal from the microphones to the cloud, the platform must first capture and clean the audio. To improve the system’s accuracy, we offer noise reduction, echo cancellation, and beam-forming algorithms so the system can adequately recognize the user’s voice, even if there’s quite a lot of ambient noise or if the subject is far.

The reference design also includes the Alexa wake word detection, which runs on the STM32 MCU and under an evaluation license, with production licenses available from Amazon. Additionally, ST is licensing the full software reference design that runs on the STM32 MCUs. Hence, developing an application capable of waking the system when the user calls out for “Alexa” is relatively straightforward. Engineers starting their first project have everything they need to develop a prototype and rapidly ship a product.

Not For Everyone

Despite all the tools and solutions we bring with our reference design we understand that designing a cloud-based platform with Alexa capabilities remains complex. Even if engineers use our design as is, there are still significant acoustic hurdles, such as the placement of the microphones within the appliance or smart home product and the tuning of the speaker to ensure that it doesn’t interfere with the mics. It’s for this precise reason that we decided to limit our reference designs to OEMs. We hereby ensure that we can offer them the support they need to get their product out to consumers faster. ST can thus assist them as they put the final acoustic touches that will make a world of difference.

Learn more about the ST reference design for smart home devices with Alexa built-in

For detailed information visit ST Blog.