h a l f b a k e r y
Oh yeah? Well, eureka too.
add, search, annotate, link, view, overview, recent, by name, random
news, help, about, links, report a problem
or get an account
I think glossolalia differs from natural language by not
having log-normal distribution of sounds, syllables et
caetera. Though i don't think it's language, this made
wonder if it could be thought of as losslessly compressed
English, Spanish or whatever. Why would the angels
So, consider the following process, which is not
Write a text in Basic English. This text will probably
log-normal vocabulary distribution: relatively few words
will be much more frequent than the majority of words in
the text, and there will be a "long tail". This is, i think,
normal feature of natural language.
Remove all articles, occurrences of the present tense of
"be", replace all gendered pronouns with "it", convert all
possessives to "of" forms, express all plural nouns and all
verbs in the continuous aspect by duplication, and always
use "more" and "most" instead of "-er" and "-est" (except
for "more" and "most"!). This may smooth the frequency
distribution a bit.
Encode the words as ten-bit binary numbers. Make this
into a sting of bits and slice it into byte-sized segments.
Count the frequency of the bytes and re-encode the
text again, reversing the byte frequency, so for example
the most frequent byte turns out to be ten and the rarest
eleven, swap them over. Scramble this text
Intersperse the numbers alternately, so that only odd-
numbered bytes correspond to sense.
Convert the bytes back into words by using a series of
hundred and fifty-six consonant-vowel or consonant-
Read out the result.
||//Encode the words as ... binary numbers ... re-encode the same text again//
||Can I suggest a Huffman code?
||You can but that would necessitate me trying to
understand that again. This is only contingently
compression - it could be longer instead if preferred.
Therefore, if Huffman coding is essentially
compressive it wouldn't be relevant. I shall Google.
||OK, yes, i was going to go with something like that
when i was mainly concerned with compression, and
in fact i may go back to it. I wonder how efficient it
||[MaxwellBuchanan] threw down that gauntlet a
while ago and your humble servant did it in rhyme.
||Thanks. These are things i've heard of but on which i
haven't read. I do think linguistic universals and
deep grammar might be useful in this respect but i
can't currently think of how.
||so the idea is to translate something into Chinese then verbalize it ?
||Chinese syllables can end in consonants. You may be
thinking of normal language as opposed to the one
we're using now. Japanese is an example of a normal
||I was thinking in the manner of ideographs (if that's the right word): symbols that represent an entire idea, rather than nouns, verbs, etc. Your trimming down of the English language sortof heads in that direction. What you've missed out (I think) is word substitution, ie: "I went to the shop" could be compressed to "I go store", as could "I travelled to the store", "I went to the mall", etc.
||I had more than i posted, [FT]. However, whereas
i could concentrate on merely simplifying the
English language, which might be worthwhile if
not liable to widespread adoption, the reasons for
the modifications i suggest are not to simplify so
much as smooth out frequency distributions. As it
stands, English uses "he", "she" and "it" a lot, "my",
"your" and "her" a lot, and so on. If every genitive
occurrence of "her" was replaced by "of it", that
would increase the frequency of "it" and "of" while
confining "her" to the objective usage. This is
why i want to repeat words for the continuous
tenses and plurals - it would increase the
frequency of words which are rarer than pronouns,
conjunctions and the like.
||The problem with the frequency distribution of
this idea as it stands is that it would probably be
bimodal rather than having a flat distribution
graph. Making the verbs and nouns more frequent
and certain other common words rarer blunts the
peaks and pushes them closer.
||Dibs on creating the much more efficient nickname: 'Lossy glossy.'
||[Bigsleep] helpfully suggests a lossless
compression system, but I don't think even that's
required - You could run the text (or sections of
the text) to be glossolallified through the MD5
algorithm, leaving you with similar results - better
in one sense as you've more control over the
output size. Since you're scrambling and
pseudorandomifying it anyway, you're obviously no
longer interested in any embedded meaning - so
to be honest - it doesn't really matter whether
you use a lossless, lossy or completely random
algorithm. Frequency distribution of output
hexadecimal values should be fairly smooth, and if
you wanted to, you can compress or expand into
almost as many individual consonant/dipthong
symbols as you like.
||What effect are you trying to achieve here, or is it
a veiled exercise in suggesting that glossolalialists
in general tend towards sillyness?
||It's inspired by their silliness but would also be a genuine form of communication because with some kind of table included it could be translated back into a rather odd form of English. The pseudorandom bit can be extracted - it's only either odd or even syllables.
||What i'm kind of getting at is that whereas i'm confident that glossolalia is not a natural language (though it communicates certain things other than words, and what it communicates would depend on one's religious or sociological beliefs), the fact that the frequency distribution is not like any human language, or even birdsong or whalesong i think, that doesn't necessarily imply it's not a language. I can easily imagine a species which precompresses its language, says what it needs to say and has it uncompressed by a hearer of the same species, and that signal needn't have that kind of frequency distribution at all. So this is more about possible alien languages than real or pretend human or angelic tongues.
||It all sounds very right brain. You sure this is abstraction and not distraction?
||Most things i stick on here are distraction. That's
why i do it.