This website uses cookies primarily for visitor analytics. Certain pages will ask you to fill in contact details to receive additional information. On these pages you have the option of having the site log your details for future visits. Indicating you want the site to remember your details will place a cookie on your device. To view our full cookie policy, please click here. You can also view it at any time by going to our Contact Us page.

Shape-shifting smart speaker can mute specific spaces in your room

25 September 2023

A smart speaker equipped with deep-learning algorithms can selectively mute specific areas within a room, transforming the way we control and isolate audio in crowded environments.


The shape-changing smart speaker uses self-deploying microphones to divide rooms into speech zones and track the positions of individual speakers.

The ability to locate and control sound –isolating one person talking from a specific location in a crowded room, for instance – has challenged researchers, especially without visual cues from cameras.

Like a fleet of Roombas, each about an inch in diameter, the microphones automatically deploy from, and then return to, a charging station. This allows the system to be moved between environments and set up automatically. 

In a conference room meeting, for instance, such a system might be deployed instead of a central microphone, allowing better control of in-room audio.

“If I close my eyes and there are 10 people talking in a room, I have no idea who’s saying what and where they are in the room exactly. That’s extremely hard for the human brain to process. Until now, it’s also been difficult for technology,” says co-lead author Malek Itani, a doctoral student at the University of Washington.

“For the first time, using what we’re calling a robotic ‘acoustic swarm,’ we’re able to track the positions of multiple people talking in a room and separate their speech.”

Previous research on robot swarms has required the use of overhead or on-device cameras, projectors, or special surfaces. The team’s system is the first to accurately distribute a robot swarm using only sound.

The team’s prototype consists of seven small robots that spread themselves across tables of various sizes. As they move from their charger, each robot emits a high-frequency sound, like a bat navigating, using this frequency and other sensors to avoid obstacles and move around without falling off the table. 

The automatic deployment allows the robots to place themselves for maximum accuracy, permitting greater sound control than if a person set them. The robots disperse as far from each other as possible since greater distances make differentiating and locating people speaking easier. 

Today’s consumer smart speakers have multiple microphones, but clustered on the same device, they’re too close to allow for this system’s mute and active zones.

“If I have one microphone a foot away from me, and another microphone two feet away, my voice will arrive at the microphone that’s a foot away first. If someone else is closer to the microphone that’s two feet away, their voice will arrive there first,” says co-lead author Tuochao Chen, a doctoral student in the Allen School. 

“We developed neural networks that use these time-delayed signals to separate what each person is saying and track their positions in a space. So, you can have four people having two conversations and isolate any of the four voices and locate each of the voices in a room.”

The team tested the robots in offices, living rooms, and kitchens with groups of three to five people speaking. Across all these environments, the system could discern different voices within 50cm of each other 90 percent of the time, without prior information about the number of speakers. 

The system was able to process three seconds of audio in 1.82 seconds on average – fast enough for live streaming, though a bit too long for real-time communications such as video calls.

As the technology progresses, researchers say, acoustic swarms might be deployed in smart homes to better differentiate people talking with smart speakers. That could potentially allow only people sitting on a couch, in an ‘active zone’, to control a TV vocally, for example.

Researchers plan to eventually make microphone robots that can move around rooms, instead of being limited to tables. The team is also investigating whether the speakers can emit sounds that allow for real-world mute and active zones, so people in different parts of a room can hear different audio. 

The current study is another step toward science fiction technologies, such as the 'cone of silence' in Get Smart and Dune, the authors write.

Of course, any technology that evokes comparison to fictional spy tools will raise questions of privacy. Researchers acknowledge the potential for misuse, so they have included guards against this: the microphones navigate with sound, not an onboard camera like other similar systems. 

The robots are easily visible, and their lights blink when they’re active. Instead of processing the audio in the cloud, as most smart speakers do, the acoustic swarms process all the audio locally, as a privacy constraint. Even though some people’s first thoughts may be about surveillance, the system can be used for the opposite, the team says.

“It has the potential to actually benefit privacy, beyond what current smart speakers allow,” Itani says. “I can say, ‘Don’t record anything around my desk’, and our system will create a bubble three feet around me. 

“Nothing in this bubble would be recorded. Or if two groups are speaking beside each other and one group is having a private conversation, while the other group is recording, one conversation can be in a mute zone, and it will remain private.”


Print this page | E-mail this page

Minitec