New AI model can use human perception to help tune out noisy audio

Conventional measures to limit background noise have used AI algorithms to extract noise from the desired signal.

Update: 2024-02-08 07:28 GMT

NEW DELHI: Researchers have developed a new artificial intelligence (AI) model that may significantly improve audio quality in real-world scenarios by taking advantage of how humans perceive speech.

The team at The Ohio State University, US, found that the subjective ratings of sound quality made by people can be combined with a speech enhancement model to lead to better speech quality as measured by objective metrics.

The new model, described in the journal IEEE/ACM Transactions on Audio, Speech, and Language Processing, outperformed other standard approaches at minimising the presence of noisy audio -- unwanted sounds that may disrupt what the listener actually wants to hear.

The predicted quality scores the model generates were found to be strongly correlated to the judgments humans would make, the researchers said.

Conventional measures to limit background noise have used AI algorithms to extract noise from the desired signal.

However, these objective methods don't always coincide with listeners' assessment of what makes speech easy to understand, said study co-author Donald Williamson, an associate professor at The Ohio State University.

''What distinguishes this study from others is that we're trying to use perception to train the model to remove unwanted sounds,'' Williamson said in a statement.

If something about the signal in terms of its quality can be perceived by people, then the model can use that as additional information to learn and better remove noise, the researchers said.

The study focused on improving monaural speech enhancement, or speech that comes from a single audio channel, such as one microphone.

This study trained the new model on two datasets from previous research that involved recordings of people talking. In some cases, there were background noises like TV or music that could obscure the conversations.

Listeners rated the speech quality of each recording on a scale of 1 to 100.

The model derives its impressive performance from a joint-learning method that incorporates a specialised speech enhancement language module with a prediction model that can anticipate the mean opinion score that human listeners might give a noisy signal.

Results showed that the new approach outperformed other models in leading to better speech quality as measured by objective metrics such as perceptual quality, intelligibility and human ratings.

However. using human perception of sound quality has its own issues, Williamson said.

''What makes noisy audio so difficult to evaluate is that it's very subjective. It depends on your hearing capabilities and on your hearing experiences,'' he said.

Factors like having a hearing aid or a cochlear implant also impact how much the average person perceives from their sound environment, the researcher added.

Tags:    

Similar News