This website uses cookies primarily for visitor analytics. Certain pages will ask you to fill in contact details to receive additional information. On these pages you have the option of having the site log your details for future visits. Indicating you want the site to remember your details will place a cookie on your device. To view our full cookie policy, please click here. You can also view it at any time by going to our Contact Us page.

Classifying the emotional tone of a conversation in real-time

21 February 2017

Researchers from MIT created an artificially intelligent, wearable system that can predict the mood of a conversation based on a person’s speech patterns and vitals.

The team's system was implemented on a Samsung Simband, a research device that can measure metrics such as movement, heart rate, blood pressure and skin temperature. (Credit: Jason Dorfman/MIT CSAIL)

“Imagine if, at the end of a conversation, you could rewind it and see the moments when the people around you felt the most anxious,” says graduate student Tuka Alhanai, who co-authored a related paper with PhD candidate Mohammad Ghassemi. “Our work is a step in this direction, suggesting that we may not be that far away from a world where people can have an AI social coach right in their pocket.”

As a conversation is flowing, the AI system analyses speech, text transcriptions and physiological signals to determine the overall tone. Using deep learning techniques, it also provides a ‘sentiment score’ for specific five second intervals of a conversation. It has an overall accuracy of 83 percent.

“As far as we know, this is the first experiment that collects both physical data and speech data in a passive but robust way, even while subjects are having natural, unstructured interactions,” says Ghassemi. “Our results show that it’s possible to classify the emotional tone of conversations in real-time.”

The researchers say that the system's performance would be further improved by having multiple people in a conversation use it on their smartwatches, creating more data to be analysed by their algorithms. The team is keen to point out that they developed the system with privacy strongly in mind: The algorithm runs locally on a user’s device as a way of protecting personal information. (Alhanai says that a consumer version would obviously need clear protocols for getting consent from the people involved in the conversations.)

An experiment was carried out on 31 different conversations using the Samsung Simband, a research device that captures high-resolution physiological waveforms to measure features such as movement, heart rate, blood pressure, blood flow, and skin temperature. The system also captured audio data and text transcripts to analyse the speaker’s tone, pitch, energy, and vocabulary.

“The team’s usage of consumer market devices for collecting physiological data and speech data shows how close we are to having such tools in everyday devices,” says Björn Schuller, professor and chair of Complex and Intelligent Systems at the University of Passau in Germany, who was not involved in the research. “Technology could soon feel much more emotionally intelligent, or even ‘emotional’ itself.”

Video courtesy of MITCSAIL.

For more information on the experiment, visit the MIT website.


Print this page | E-mail this page