h a l f b a k e r y
What was the question again?
add, search, annotate, link, view, overview, recent, by name, random
news, help, about, links, report a problem
or get an account
Using signal characteristics to detect smiles in spoken speech
Have you ever been on the phone with somebody (or where you can't see their face) but you could *hear* the smile in their voice? I have.
My half baked idea is that there is an actual difference in how we modulate our words (via mouth shape or some other factor influenced by wearing a BIG smile
- maybe specific muscle usage?), and that with the proper modelling of that smiling speech, a detector can be made.
The uses of this? Well, you've got me there. In all normal situations, there would another human to "hear" the smile. It would need to be something where speech is monitored by a computer. Perhaps it would help with automated transcriptions or maybe the countries using Echelon to monitor the communications of each other's citizens would use it to know when people are smiling when they are saying something. (not that i support anything Echelon-related)
AfroAssault's Speak in AOL idea
[snarfyguy, Oct 22 2001, last modified Oct 04 2004]
Techniques for the Phonetic Description of Emotional Speech
System for human transcription of paralinguistic speech features. [pottedstu, Oct 22 2001, last modified Oct 21 2004]
Software extracts prosody (pauses, tone) from a speech sample, and applies to to synthesised speech; but works at a low level. [pottedstu, Oct 22 2001, last modified Oct 21 2004]
Emotional speech synthesis
Considers whether it's possible to add emotion to a speech synthesiser like Stephen Hawkings'. [pottedstu, Oct 22 2001, last modified Oct 21 2004]
Recognition of emotions
Compares acoustic analysis with listener identification to see how well emotions can be detected. [pottedstu, Oct 22 2001, last modified Oct 21 2004]
Expression of emotions
More technical, but some interesting points. [pottedstu, Oct 22 2001, last modified Oct 21 2004]
Please log in.
If you're not logged in,
you can see what this page
looks like, but you will
not be able to add anything.
Description (displayed with the short name and URL.)
||Sounds like AfroAssault's Speak in AOL idea (see link)
||I went and read AfroAssault's speak in AOL idea. I don't think it is remotely close to my idea. I don't want to detect trendy emoticon-talk - i want to detect *real* smiles in real speech. (whether or not they are doing AOL-speak)
||AfroAssault's idea is to augment language to describe emoticons. Mine is to detect emotion (not emoticons) in normal spoken language.
||This does seem pretty baked. There's a lot of work going on at the moment in analysing prosody - those features of speech such as intonation, speed and rhythm which communicate emotional states. These characteristics are generated both unconsciously by physiological changes in the speaker (excitement making you speak faster), and consciously by rules learnt and shared between speakers (e.g. putting on a sarcastic tone of voice).
||There's a fair bit of work going on in this field, not just for intellectual interest, but for very low bit rate speech coding, where it's important not just to carry the phonetic content of speech (the words spoken) but paralinguistic data about the speaker's age, sex, and emotional state.
||I posted a few links: Prosel is a software package for speech generation (text-to-speech) that adds emotion to speech based on analysing a similar speech sample and extracting information on tone of voice, etc. There are also a number of scientific papers, analysing the factors involved in expressing emotion through speech. The last 2 come from the Geneva Emotion Research Group, and give some examples of computer analysis of emotion. They find some emotions are easier to recognise than others, either for listeners, or based on audio processing. However, it's still early days.
||There's also a neurological/mental disorder where patients are unable to recognise the emotional content of speech, called aprosodia. This suggests that a particular area of the brain is responsible for decoding the emotional content of speech, separate from identifying the words used and extracting semantic content.
||stu, that's some pretty cool stuff. thanks!
||How about detecting whether a pianist was smiling when they played a particular passage?