Halfbakery: Speech to text

I have recently read //low end speech recognition //by jutta, link 1. and thought it was brilliant in its own right, and that it would make an excellent front end for something I was going to post at some point.
The idea was to run the voice recognition software only to the point of recognizing the uttered sounds, then printing them out without any higher level processing. Leaving the human recipient to make sense of it. Hippo suggested using the phonetic alphabet, to which I suggest allowing less certainty, so a sound might be 70% d or 30%t.

Instead of printing this out for a human being to deal with, I intend to use this as the input to a different type of Speech to text. engine Before some one reads the title and anno's that Speech to text is well baked and widely known, let me just say -not in this flavour they are not! And that what is out there are that good that they are mainly used by those for whom typing is not a practical option.
If this idea works as badly as it might then it will generate texts that are about as coherent and meaningful as a whiter shade of pale by Procol Harem or a set of David Bowie lyrics, a slight improvement upon this state of the art a few years ago : ).
If this works as well as I would hope, the output will be a block of text with perfect spelling and grammar that expresses the meaning of the original speech, though not necessarily using the original words. You may have gathered from this that it would be difficult to do on the fly, first the speech would go in, then the text would come out.

1 ) blocks of 1 to 7 syllables are translated into a monosyllabic ideographic language, such as old Chinese or Vietnamese. (both modern forms are not monosyllabic) the resulting logograms have semantic meaning. A number of small words, such as the, do not have there equivalent in CHINESE. So not all of the syllables will be translated. And many will come through as as belonging to several different words.

2 ) use probability tables, which logograms occur in combination, to identify the best fits in the new language. There is a slight problem with this, because the characters are in the wrong order, from a Chinese point of view. There are two obvious solutions to this: a ) create a probability table from scratch, by running large amount of English text through the full process. Or b ) and this is the one I prefair, create a set of extra texts by considering the characters as being at one or two places either side of its recorded position, and then using a probability table derived from CHINESE literature

3 ) the three or four best fits are translated back into English, and a human intelligence cuts and pastes from the choices available produce a block of text that expresses the meaning of the original speech.

4 ) use information from which bits make it into the finished document to adjust the probabilities; both the CHINESE probability table for all users of English, and the probability of a particular sound being a particular syllable, for each individual speaker.

By CHINESE (capitals ) I mean a suitable language, possibly even ancient cuneiform ?
over 80% Chinese characters have have a phonetic component, this in no way detracts from it's use.