Half a croissant, on a plate, with a sign in front of it saying '50c'
h a l f b a k e r y
Just add oughta.

idea: add, search, annotate, link, view, overview, recent, by name, random

meta: news, help, about, links, report a problem

account: browse anonymously, or get an account and write.



Sonic raytracing and physics engine

Simulate sound properly
  [vote for,

There is a problem with speech synthesis, and also with the synthesis of imaginary alien languages mediated via sound: a popular model is crude and inflexible. It assumes at most two sources of sound modified by the filter of the vocal tract, involving tongue, lips, nasal cavity or the absence thereof and so forth. Not good enough for two reasons. Firstly, it assumes the organs producing the sound are the same for everyone - nobody has a cleft palate or hare lip, there are no functional speech impediments, no hoarse voices, no "cri du chat", everyone has the same teeth, a hard palate of the same height, has the same lungs, a body of the same build, is of the same gender and so on. Secondly, humans are not the only source of sound and the voice is not the only source of human sound. We stamp, clap, speak while running or jumping or lying down, and in echoing rooms, open spaces, inside anechoic chambers and the like. Grasshoppers chirp, dolphins do their thing, insect buzz, birds sing and so on. A speech synthesiser just doesn't cut it, even for the human voice.

Therefore, instead of all that, i suggest this. There are already two tools used for visual purposes: raytracing and physics engines. Raytracing deals with light of different colours and intensities passing through a number of processes which involve scattering, attenuation and a lot of the kinds of things which happen to another kind of wave, namely sound. This would be adequate to produce maybe a hundred milliseconds of audible sound if you're lucky. However, for longer periods, a physics engine would help. In the case of the human voice, we have for instance the sound of a trilled R coming out of the mouth of a male with thick lips and a cleft palate. Model the shapes and vibrations of the speech organs over a period of time, sonically raytrace each stage and you have a much more realistic human voice, of course at the cost of considerably more processing. Moreover, you can then model that voice in whatever space you like, with a bee buzzing round the bloke's head, give him a cold, make him heavier or lighter, have him talking while running, introduce a cricket and a sparrow, fly a plane overhead and have a visiting alien over with five mouths each containing a forked tongue and three larynxes each wearing a scarf over two of them and a series of elephant-like trunks.

Thanks to [Vernon] for the inspiration.

nineteenthly, Apr 08 2011

Vernon's idea Random_20Word_20Generator
Random word generator - thanks [nineteenthly, Apr 08 2011]

"Survey of Methods for Modeling Sound in Interactive Virtual Environment Systems" http://www-sop.inri...blis/presence03.pdf
A nice survey paper that covers elements of modelling sound. [Jinbish, Apr 08 2011]

Physical Modelling Synthesis http://en.wikipedia...modelling_synthesis
The same idea applied to musical instruments [iaoth, Apr 09 2011]

Titanian _Smoke_ http://www.peppermi....net/res_andi.html?
Of course, after [19thly] mentioned it, I had to go find it. [mouseposture, Apr 09 2011]

real-time_20rendered_20audio Similar. [spidermother, Apr 16 2011]

Please log in.
If you're not logged in, you can see what this page looks like, but you will not be able to add anything.


       Thanks [Jinbish], i'll take a butchers'.
nineteenthly, Apr 08 2011

       Ray tracing doesn't model the wave characteristics of light directly. When the wavelength approaches or exceeds the scale of the space in which the wave exists, it becomes necessary to model the wave directly, rather than abstracting it as rays.   

       Your voice echoing in a canyon behaves much like rays, but modelling its production in your vocal tract using ray tracing would be unproductive.
spidermother, Apr 08 2011

       For a proper ray-tracing, you'd need a canonical first scene - say, Julie Andrews singing "Do-Re-Mi" over an infinite checkered plane of edelweiss and granite?
lurch, Apr 08 2011

       OK, fine, but that can still be modelled. Maybe not raytracing then, but the methods i'm aware of, and yes i'm ignorant, seem not to be up to much at all.
nineteenthly, Apr 08 2011

       Oh yes, forgot about that, [bigsleep]. Also, there are those guys who altered 'Smoke On The Water' to see what it would sound like on Titan.
nineteenthly, Apr 09 2011

       How about:   

       1) Start with existing standard acoustic modeling software as used by e.g. theater architects.   

       2) Merge it with existing finite-element modeling software, so that the simple surfaces (ceiling, balcony, acoustic tiles, auditorium chairs, etc.) are replaced with 3-D finite element volumes with those surfaces. (Computational demands would increase enormously at this point.) Integrating these two software packages might be a pretty interesting project, amounting to, maybe, one doctoral dissertation's worth of work (one DD, in SI units).   

       3) Build a finite element model of the larynx, pharynx, toungue, and lips. This is at least another DD, but it may already have been done (the closest I can find is a finite element model of the soft palate, but that was in 1999).   

       Considerable work would be required, in practice, to achieve this, but, in principle, all the pieces are there, and they just need to be assembled.   

       The only gap I can see is that the mechanical properties of tounge, pharynx & lips may not be well-studied enough for modeling. (I'm guessing the larynx is easier and has been done already.) So:   

       2.5) Quantify mechanical characteristics of pharynx, tongue, etc, in enough detail to permit modeling. This includes active elements (i.e. muscles) controlled by the nervous system, complex geometry, and many, many degrees of freedom (though you might achieve dimensional reduction by abandoning the naive, general approach, and exploiting the existing literature on speech production). Optimisticly, 2-3 DDs.   

       In short, a suitable long-term project for a large well funded lab (in ENT or Speech/Communications, say) at a university with a strong bioengineering program.   

       Once you've finally built the model, the sky's the limit: you can do Hausa clicks, throat singing, grasshopper stridulations, finger-snapping ... anything you like or can conceive of.
mouseposture, Apr 09 2011

       Sounds feasible but big. I would expect the properties of the tongue to be somewhat similar to that of muscle, and i'd expect that to be modelled somewhere.   

       Oh, and yes, sorry, i should've found that link myself.
nineteenthly, Apr 09 2011

       The properties of tongue are probably identical to muscle, but modeling muscle isn't straightforward. It's been done for individual sarcomeres, and I think, for geometrically simple arrangements of sarcomeres, like a pinnate muscle. And for assemblages of muscles with well-defined origins & insertions on rigid bones whose motion is constrained by joints. That's as of several years ago; maybe by now someone's done the tongue, but I doubt it: not impossible, just too difficult to be an attractive proposition for anyone with the capability. The tongue's a snarl of muscle fibers going in different directions, following curved paths, with no rigid elements, origins, or joints. Also, it can push by generating internal pressure, which isn't going to be part of a standard muscle model.   

       In short, a model of the tongue would get really hairy.   

       (I think this is a good idea, or at least a cool and feasible one, but feel it's irreducibly "big," i.e. can't be done without lots of time, people, and money.)
mouseposture, Apr 09 2011

       Doesn't actually sound that difficult.   

       The windpipe/vocal chords already have a physical-model algorithm used in PM synthesis for flutes etc.   

       Then it's mostly a matter of reverberation/resonance formulae, with maybe a bit more PM modelling to handle the tongue and lips' [edit: and epiglottis'] ability to constrict the openings.   

       PM modelling generally sounds as realistic as CGI animation looks, and people are "tuned in" to voices, so it will be easily distinguishable from the real thing. Pretty neat though.
FlyingToaster, Apr 09 2011

       // feel it's irreducibly "big," //   

       Well, maybe think of this as the gold standard. After all, raytracing isn't the only way scenes are rendered in CGI and there could be ways of simplifying the tongue. For instance, some kind of in-betweening type approach could be taken there - model a tongue, which is indeed going to be difficult, but rather than doing it from moment to moment, identify key moments in its behaviour and simulate those, then just sort of join the dots. The kind of thing i mean is, suppose you're modelling a trilled R. You might not need to generate _every_ tongue position in the lower-frequency vibrations of the tongue, partly because they repeat and other than the first and last are probably effectively identical, and partly because it won't make an audible difference between different parts of the tongue moving in a straight line and the tongue moving in an arc at all times.   

       [FT], no comment at the moment though it will come and thanks.
nineteenthly, Apr 10 2011

       OK, [FT], looks like it'd work, thanks. It has occurred to me that the output is in a sense one-dimensional, and i wonder if this simplifies the process.
nineteenthly, Apr 10 2011

       Modern synthesis touched on PM in the '90s, but these days the large manufacturers concentrate on modelling the sound rather than the instrument: figure out what sounds different part of the instrument make, eg: a piano's strings, harp, soundboard, hammer-hits and key-returns, damper noises.... quite silly some of them... I knew a (seriously good) pianist which, if you got up real close, you could hear him humming off-key and making "wheee" noises... maybe they should include stuff like that too.   

       I steered you a bit wrong there on the reverb formulae though: state of the art "convolution reverb" starts off with putting speakers and mics into the space-to-be-modelled and seeing what the space does to the sound. So, it too is a "passive" technique rather than actually modelling the space.
FlyingToaster, Apr 10 2011

       [FlyingToaster] Glen Gould?
mouseposture, Apr 10 2011

       [MP]... some orchestra player whose solos were accompanied by the strings' section trying not to giggle... dunno if it was a relaxation technique, or a Tourette's variant, or if he was simply trying to see if he could crack up everybody else by working his way through an impressive serious classical piece, straightfaced, while making zoom-zoom noises.
FlyingToaster, Apr 10 2011

       That's a little disappointing, but i suppose a sufficient number of samples could be sort of interpolated and you'd get a similar result.
nineteenthly, Apr 10 2011

       That's what they (large synth mfrs) count on. Have you done any research on current methods of vocal emulation ?
FlyingToaster, Apr 10 2011

       A little bit, but this is more a spin-off from [Vernon]'s random word idea. By interpolation, i mean between different arrangements of surfaces at different angles and so on. I don't mean bits of waves.
nineteenthly, Apr 10 2011

       Regarding modelling the tongue, isn't there a rather eerie talking robotic head that rather than simulating speech by calculation and reproduction through a speaker, actually has a complex arrangement of vibrating chords, dynamically alterable resonators modelling the nasal and mouthal cavities, a tongue, teeth and lips. It sounds as though it's had an awful lot to drink.
zen_tom, Apr 12 2011


back: main index

business  computer  culture  fashion  food  halfbakery  home  other  product  public  science  sport  vehicle