05-23-2024

AI noise-canceling headphones grab one voice from a crowd

Earth.com staff writer

In a world where noise-canceling headphones have become increasingly proficient at creating an auditory blank slate, researchers continue to face challenges in allowing specific sounds from a wearer’s hearing environment to pass through the erasure.

While recent advancements, such as Apple’s AirPods Pro, automatically adjust sound levels for wearers based on their surroundings, users have little control over whom to listen to or when this happens.

Target Speech Hearing (TSH)

A team from the University of Washington, led by senior author Shyam Gollakota, a professor in the Paul G. Allen School of Computer Science & Engineering, has developed an artificial intelligence system called “Target Speech Hearing” (TSH).

This innovative system allows a user wearing noise-canceling headphones to “enroll” a speaker by looking at them for just three to five seconds.

Once enrolled, the system cancels all other sounds in the environment and plays only the enrolled speaker’s voice in real time, even as the listener moves around in noisy places and no longer faces the speaker.

Gollakota emphasizes the potential of AI beyond web-based chatbots, stating, “With our devices, you can now hear a single speaker clearly even if you are in a noisy environment with lots of other people talking.”

How TSH works with noise-canceling headphones

To use the TSH system, a person wearing off-the-shelf headphones fitted with microphones simply taps a button while directing their head at someone talking.

The sound waves from that speaker’s voice should reach the microphones on both sides of the headset simultaneously, with a 16-degree margin of error.

The headphones then send that signal to an on-board embedded computer, where the team’s machine learning software learns the desired speaker’s vocal patterns.

The system latches onto that speaker’s voice and continues to play it back to the listener, even as the pair moves around. As the speaker keeps talking, the system’s ability to focus on the enrolled voice improves, thanks to the additional training data.

AI meets noise-canceling technology

The team tested its system on 21 subjects, who rated the clarity of the enrolled speaker’s voice nearly twice as high as the unfiltered audio on average.

This work builds on the team’s previous “semantic hearing” research, which allowed users to select specific sound classes, such as birds or voices, that they wanted to hear while canceling other sounds in the environment.

Currently, the TSH system can only enroll one speaker at a time and requires the absence of another loud voice coming from the same direction as the target speaker’s voice.

If a user is unsatisfied with the sound quality, they can run another enrollment on the speaker to improve clarity.

The University of Washington team is working to expand the system to earbuds and hearing aids in the future, further revolutionizing the way we experience sound in various environments.

Redefining the way we experience sound

The team presented its findings at the ACM CHI Conference on Human Factors in Computing Systems in Honolulu, and the code for the proof-of-concept device is available for others to build upon.

While the system is not yet commercially available, it represents a significant step forward in the field of auditory perception and AI-driven noise-canceling technology.

In summary, the new Target Speech Hearing (TSH) system represents an impressive advancement in AI-driven auditory technology, offering users the ability to selectively hear and focus on specific speakers in noisy environments.

By harnessing the power of machine learning and innovative enrollment techniques, TSH has the potential to revolutionize the way we experience sound through noise-canceling headphones, earbuds, and hearing aids.

As the team continues to refine and expand the system’s capabilities, we can look forward to a future where personalized audio experiences become the norm, empowering users to navigate even the most chaotic auditory landscapes with clarity and ease.

The full study was published here.

—–

Like what you read? Subscribe to our newsletter for engaging articles, exclusive content, and the latest updates.

Check us out on EarthSnap, a free app brought to you by Eric Ralls and Earth.com.

—–