3D Sound for a Virtual World

By Jon Gabay for Mouser Electronics

The blog talks about 3D Sound or 3D Audio, immersive audio, surround-sound technology and immersive audio experience in Dolby Atmos.

Introduction

The world is constantly changing. This evolving state means the way we interact with the world, equipment, and people is also changing. Other than profit motives, two forces are dictating this change. One is that technology is pushing us forward and enticing us with must-have devices and apps. The other driving force is pandemic-driven concern for health.

With more time on people’s hands to be entertained, immersive technology has had an opportunity to make inroads. Virtual- and augmented-reality headsets are now readily available from several manufacturers. Development tools let hobbyists and professionals render worlds and scenescapes. While the visuals get all the press, the audio is moving forward as well.

Of the five acknowledged senses, sound is perhaps the most appreciated. Whether from the richness of nature, the fullness of an orchestra, or the stirring of spiritual music, sound as much as any other sense profoundly impacts us emotionally and physiologically.

Getting Here

We exist today in large part because of our focused ability to listen. For example, our ancestors who hunted would use sound to track and find food for a family or village. Using sound to locate potentially lethal sources in a 3D, immersive way makes avoiding predators possible. We could sense where something was, how far away it was, and how fast it was approaching.

Immersive audio is the most advanced audio-processing and -delivery system yet. To get it right requires audio processing and an array of well-characterized and -calibrated woofers, speakers, and tweeters strategically placed to provide a dynamic full-spectrum listening sensation.

Although implementations like this have been beyond the average audiophile of the past, movie theaters and performance venues have taken advantage of this technology for years. Like all pioneered technology, it eventually finds its way to the rest of the world. 

Modern entertainment systems utilize many specialized filters and dynamic processing to create affordable implementations that fit more budgets. More home theaters exist today than ever before, especially in a pandemic world, and immersive audio is sure to be in game and home theater locations everywhere.

Although stereo allows a basic surround-sound capability, the most popular surround sound and 3D audio in use today is Dolby Digital 5.1 technology. These systems are called Dolby Digital, Dolby Pro Logic II, DTS, SDDS, and THX. They all feature a six-speaker configuration (five total bandwidth, one subwoofer) surrounding the listener(s) (Figure 1). These surround-sound technologies were first used in movie theaters, which helped advance these systems and make them more cost-effective and available to the masses.

A 5.1 home theatre setup. With a subwoofer, centre speaker,2 front speakers and 2 back speakers.

Figure 1: Surround-sound 5.1 uses six fill range speakers placed at specific locations so that the audio process engineer can mix down audio that spatially seems to move around the listener. Not shown here is the woofer because it can typically be placed anywhere. (Source: Zern Liew/Shutterstock.com)

Multiple speakers are driven with unique individual audio streams so that the perceived location of virtual sound surrounds the listener. Here, rear left and right channels are used for spatial depth. The front left, right, and center channels are used for lateral depth, and a single subwoofer distributes the low-frequency bass for the entire room.

Although ideal for a single listener centrally located in a listening zone (or couch), every listener will experience slight differences everywhere else. The relatively homogeneous sound will let everyone in the listening zone experience audio in motion. What’s more, recording artists are advertising their latest CDs as immersive by providing 5.1 surround-sound tracks.

Interestingly, the center-front channel is optimized for speech-range signals. This helps listeners discern conversations while immersed in 3D sound. As the bandwidth and fullness of sound became popular, the ability to distinguish speech became more challenging, so center-channel filtering and amplification can make conversations easier to understand.

The addition of one more rear-center channel ups the specification to 6.1 surround-sound (Figure 2), and 7.1 standard systems eliminate the rear-center channel but add left and right mid-channels (Figure 3).

A 6.1 home theatre setup.With a subwoofer, centre speaker,rear centre speaker, 2 front speakers, and 2 side speakers.

Figure 2: Surround-sound 6.1 provides lateral speakers to enhance the audio in motion as an audio object moves from front to side and back. Again, woofer placement is arbitrary. Here, it’s not about the bass. (Source: Zern Liew/Shutterstock.com)

The 7.1 surround-sound technology adds more speakers and unique channels. The 2.5-D cube or polygon that can be created can extend to more speakers, tweeters, and woofers at strategic locations, immersing the listener in 2D and limited 3D audio (Figure 3). Something directly on top or bottom can be somewhat approximated through signal processing, but it will never be perfect unless confirmed speakers are above and below.

A 7.1 home theatre setup.With a subwoofer, centre speaker,2 front speakers, 2 side speakers and 2 back speakers.

Figure 3: More speakers placed at lesser angles helps eliminate audio hotspots that can occur, especially if the tracks aren’t mixed or processed correctly, or if the audio converters don’t process the surround sound properly. (Source: Zern Liew/Shutterstock.com)

We should note that source converters from stereo capture can process the stereo audio signal to create synthesized multi-speaker surround-sound signals. This demonstrates how digital signal processing can separate source locations mostly from a stereo source. The best solution would be to capture sound in a 3D microphone configuration then play it back in the same 3D speaker configuration. However, this is cumbersome and difficult, and most will not go to these levels when signal processing makes a good approximation.

Is this always the best approach? Can signal processing fool our keenly-developed sense of hearing using fewer speakers, or will we continue to create walls and ceilings of sound?

young guitar player and speakers 3d background

Figure 4: More is better. Performers are used to exorbitant numbers of speakers and amplifiers. In large outdoor settings, it might be necessary. But do you really want walls of sound? Or at some point, do you realize that better sound is better than louder sound? (Source: tommistock/Shutterstock.com)

Object-Oriented Audio

The most up-to-date implementation of immersive audio comes from Dolby Atmos, and it is designed for theater applications. So far, almost 5,000 theaters have been retrofitted to use 64 speakers to take advantage of this latest audio listening experience. As such, it supports an extensive array of up to 128 channels and can be fitted with full bandwidth, low-frequency woofers and subwoofers, as well as high-frequency tweeters. 

Unlike regular audio, Atmos (and the competing Sony 360 standards) uses the concept of audio objects. An Audio-Visual Receiver (AVR) will automatically know the number of speakers, their type, and their location and perform processing on each audio object’s spectral makeup, amplitude location, speed, and direction. However, it is not just audio. The objects contain metadata that helps an Object Audio Renderer (OAR) put the object in motion. Of the 128 channels, ten are used for ambient stems, and the other 118 are available for audio objects.

Not every channel is a speaker. Channel information corresponds to objects, and object audio can be processed and combined with other object audio to be directed to each speaker at the appropriate level. It is up to the AVR to process the signals in real-time using metadata to perform real-time mixing and distribution of sound.

As you can imagine, it is not like stereo, where you simply place a couple of speakers and are then ready to listen. With Atmos and many surround-sound and 3D sound systems, speakers must be placed then calibrated to be an accurate part of the soundscape. The average home will not use all 128 channels. The standard seems to be a 34-speaker arrangement for home theater implementations.

Atmos is not brand-new. It was first used in 2012 in a theater in Los Angeles for a Disney movie premiere.  Since then, large theaters, IMAX, planetariums, musicals, plays, and other sound applications have propelled it into the de facto standard used to capture audio for new movies and events. Atmos also uses ceiling speakers to create a full hemisphere of sound, making it easier to process in real-time while providing sound from above.

At one time, Atmos was much too elaborate and expensive for the average audiophile, but it is now moving into the realm of got-to-have for enthusiasts who have the space and budget to wow their friends. It is also rather high on the gee-whiz index.

If you have already bitten the bullet for other surround-sound technologies, you can get a Dolby Atmos converter and still use your existing speakers and amplifiers. However, you will want more, including ceiling speakers. Converters will take Dolby 5.1 and convert it with a 17-speaker surround-sound 7.4.1 implementation. 

It is worth noting that an alternative approach to surround speakers is the soundbar. Soundbar technology in various forms is gaining popularity. The obvious benefits of cost reduction, setup simplicity, lower power, fewer cables, and smaller size drive this technology forward, even as we drive forward.

Phased-array vertical soundbars have demonstrated their ability to emulate a full audio spectrum with good clarity and separation. Musicians who use them will tell you that soundbar columns with six-inch speakers produce an 18-inch speaker’s sound clarity for subwoofer applications. That should turn a few heads. As a result, horizontal soundbars and soundbar-based hybrid systems (including remote speakers) are popular for many home theaters and studios.

The up-and-down enhances this, and sideways-pointing speakers cause sound to reflect on wall and ceiling surfaces, appearing to be coming from above or behind the listener. The modern-day Tesla Model 3 uses front soundbar technology as part of its 15-speaker audio system to tout surround and immersive audio capability. Turn off a Model 3’s rear speakers and engage immersive audio mode with signal processing and reverb to show its capabilities. Those who’ve tried this swear sound is coming from behind. Feedback is mixed, and many don’t like the effect. Reviewers praise and criticize the technology, and many reviewers have mentioned that different types of music work and don’t work with sound-bar-style immersive implementations. This makes sense because the quality reproduced will depend on the recording engineers’ mix-down techniques. Advances here will mean that accurate above-and-below immersive-sound technology is almost achievable without floor and ceiling speakers.

Capturing vs. Rendering

Immersive video experiences such as gaming, for the most part, use created environments. These are 3D structures with surface renderings and assigned physical properties. Real video swaths can be captured and digitally stitched together to make a panoramic view that includes the above and below imagery.

An immersive experience such as a walk through a national park can integrate rich visuals, and audio can be synthesized or created through a composite of pre-recorded clips. They can be audio-captured in a 3D sound system and used as part of the immersive experience. Like a video controlled by head tracking, the audio must also be controlled by head tracking. For example, facing a babbling brook will sound much different than facing away from it, and if the sound didn’t track, the immersive experience would be lacking.

Fortunately, you don’t have to invent your own 3D audio capture for immersive purposes. Audio leaders like Sennheiser make specialty truly omnidirectional microphones using segmented axis and digital tools to capture the highly directional sound (Figure 5). The AMBEO VR Mic contains several susceptible wideband microphone elements in a surround-sound configuration. The DearVR microprocessing software can render directional audio to feed a standard surround sound configuration.

young guitar player and speakers 3d background

Figure 5: Immersive Audio Capture technologies like the Sennheiser AMBEO VR Mic allow digital audio engines to render soundscapes based on polar magnitude and direction orientations. Digital summing can create composite audio combining multiple sources of sound at different distances. (Source: Sennheiser)

For this to work, the audio engine needs to know your head orientation and motion. With a headset, that’s easy by today’s standards. Head tracking is built in for video rendering. But how do you create an immersive head position-based audio system using headphones that are limited to two ears? Tiny speakers can be placed in the headphones around the ears and mimic the surround-sound experience. For most applications, stereo will suffice, but it will not be at the same level as authentic surround sound.

Non-Entertainment Applications

While most immersive audio and visual technology will be used for entertainment, there are also professional uses. For example, product design engineering can benefit from immersive technology, both video and audio. From a video perspective, the mechanical design of complex assemblies can be virtually constructed, rendered, and examined. An immersively-generated assembly like a jet engine can be constructed, pushed into, and examined to see if gears and turbines align. A repair technician on the other side of the world can be shown what to do by a factory expert immersed in a fabricated environment.

Even immersive audio can be helpful in engineering applications. An engineering team designing a car can listen to a rendered simulation of engine and transmission noise. Internal environmental controls like airflow, vibrations, and oscillations can be extracted from a virtual design. Windows can be designed and tested to eliminate the thumping oscillations that still occur on new cars when we roll the windows down in just the right position at just the right speed.

In all cases, an immersive experience includes audio. However, not every case requires surround sound, and simulated surround may be adequate, at least until someone solves the problem of creating true surround sound in a binaural headset.

The article is republished with the permission of Mouser Electronics.