The Ethics of Smart Devices That Analyze How We Speak
As smart assistants and voice interfaces become more common, we’re giving away a new form of personal data — our speech. This goes far beyond just the words we say out loud. Speech lies at the heart of our social interactions, and we unwittingly reveal much about ourselves when we talk. When someone hears a voice, they immediately start picking up on accent and intonation and make assumptions about the speaker’s age, education, personality, etc. But what happens when machines start analyzing how we talk? The big tech firms are coy about exactly what they are planning to detect in our voices and why, but Amazon has a patent that lists a range of traits they might collect, including identity (“gender, age, ethnic origin, etc.”), health (“sore throat, sickness, etc.”), and feelings, (“happy, sad, tired, sleepy, excited, etc.”). This is worrisome, because algorithms are imperfect. And voice is particularly difficult to analyze because the signals we give off are inconsistent and ambiguous. What’s more, the inferences that even humans make are distorted by stereotypes. In business, we’ve gotten used to being careful about what we write in emails, in case information goes astray. We need to develop a similar wary attitude to having sensitive conversations close to connected devices. The only truly safe device to talk in front of is one that is turned off.
As smart assistants and voice interfaces become more common, we’re giving away a new form of personal data — our speech. This goes far beyond just the words we say out loud.
Speech lies at the heart of our social interactions, and we unwittingly reveal much about ourselves when we talk. When someone hears a voice, they immediately start picking up on accent and intonation and make assumptions about the speaker’s age, education, personality, etc. Humans do this so we can make a good guess at how best to respond to the person speaking.
But what happens when machines start analyzing how we talk? The big tech firms are coy about exactly what they are planning to detect in our voices and why, but Amazon has a patent that lists a range of traits they might collect, including identity (“gender, age, ethnic origin, etc.”), health (“sore throat, sickness, etc.”), and feelings, (“happy, sad, tired, sleepy, excited, etc.”).
This worries me — and it should worry you, too — because algorithms are imperfect. And voice is particularly difficult to analyze because the signals we give off are inconsistent and ambiguous. What’s more, the inferences that even humans make are distorted by stereotypes. Let’s use the example of trying to identify sexual orientation. There is a style of speaking with raised pitch and swooping intonations which some people assume signals a gay man. But confusion often arises because some heterosexuals speak this way, and many homosexuals don’t. Science experiments show that human aural “gaydar” is only right about 60% of the time. Studies of machines attempting to detect sexual orientation from facial images have shown a success rate of about 70%. Sound impressive? Not to me, because that means those machines are wrong 30% of the time. And I would anticipate success rates to be even lower for voices, because how we speak changes depending on who we’re talking to. Our vocal anatomy is very flexible, which allows us to be oral chameleons, subconsciously changing our voices to fit in better with the person we’re speaking with.
We should also be concerned about companies collecting imperfect information on the other traits mentioned in Amazon’s patent, including gender and ethnic origin. The speech examples used to train machine learning applications are going to learn societal biases. It has already been seen in other similar technologies. Type the Turkish “O bir hemşire. O bir doctor” into Google Translate and you’ll find “She is a nurse” and “He’s a doctor.” Despite “o” being a gender-neutral third-person pronoun in Turkish, the presumption that a doctor is male and a nurse is female arises because the data used to train the translation algorithm is skewed by the gender bias in medical jobs. Such problems also extend to race, with one study showing that in typical data used for machine learning, African American names are used more often alongside unpleasant words such as “hatred”, “poverty”, “ugly,’” than European American names, which tended to more often be used with pleasant words such as “love”, “lucky”, “happy”.
The big tech firms want voice devices to work better, and this means understanding how things are being said. After all, the meaning of a simple phrase like “I’m fine” changes completely if you switch your voice from neutral to angry. But where will they draw the line? For example, a smart assistant that detects anger could potentially start to understand a lot about how you get along with your spouse by listening to the tone of your voice. Will Google then start displaying advertisements for marriage counseling when it detects a troubled relationship? I’m not suggesting that someone is going to deliberately do this. The thing about these complex machine learning systems is that these types of issues typically arise in unanticipated and unintended ways. Other mistakes that AI might make include detecting a strong accent and inferring that this means the speaker is less educated, because the training data has been skewed by societal stereotypes. This could then lead a smart speaker to dumb down responses to those with strong accents. Tech firms need to get smarter about how to avoid such prejudices in their systems. There are already worrying examples of voice analysis being used on phone lines for benefit claimants to detect potential false claims. The UK government wasted £2.4M on a voice lie detection system that was scientifically incapable of working.
A final issue is that many people seem to be more careless near these devices. Amazon has already noted that many people have real conversations with Alexa, and often tell the device how they’re feeling — even going so far as to profess love for the technology: “Alexa, I love you.” Adding speech to a device suggests agency, making it more likely that we will anthropomorphize the technology and feel safe revealing sensitive information. It’s probably only a matter of time before there is a major security breach of voice data. For that reason, researchers are just starting to develop algorithms to try to filter sensitive information. For example, you might set the device to mute the microphone on the smart speaker when you mention the name of your bank to stop you from accidentally revealing access details, or if you mention words of a sexual nature.
What are consumers’ attitudes about privacy when it comes to smart assistants? The only published study I could find on this is from The University of Michigan. It showed that owners of the tech are not that concerned about giving more data to gatekeepers like Google and Amazon. “I find that really concerning,” explained one of the study’s authors, Florian Schaub. “These technologies are slowly chipping away at people’s privacy expectations. Current privacy controls are just not meeting people’s needs.” Most people in the study didn’t even realize that data was being analysed to serve targeted ads at them, and when they found out, they didn’t like their voice commands being used that way.
But consumers can also subvert the technology for their own aims. In the University of Michigan study, one person reviewed the audio logs on their Amazon Echo to check up on what house sitters were doing with the technology. These devices can also open up some new channels of persuasion in the future. If you think your washing machine needs to be replaced, but your partner disagrees, do a voice search for possible models near the smart speaker, and your spouse might be bombarded by endless advertisements for new ones.
In business, we’ve gotten used to being careful about what we write in emails, in case information goes astray. We need to develop a similar wary attitude to having sensitive conversations close to connected devices. The only truly safe device to talk in front of is one that is turned off.
Trevor Cox is a professor of acoustical engineering at the University of Salford. He’s the author of Now You’re Talking: Human Conversation from the Neanderthals to Artificial Intelligence. You can follow him on Twitter @trevor_cox.
The Ethics of Smart Devices That Analyze How We Speak
Research & References of The Ethics of Smart Devices That Analyze How We Speak|A&C Accounting And Tax Services
Source
0 Comments