Half a croissant, on a plate, with a sign in front of it saying '50c'
h a l f b a k e r y
carpe demi

idea: add, search, annotate, link, view, overview, recent, by name, random

meta: news, help, about, links, report a problem

account: browse anonymously, or get an account and write.

user:
pass:
register,


                             

Lossy glossy

Thanks, [RayfordSteele]
  (+2)
(+2)
  [vote for,
against]

I think glossolalia differs from natural language by not having log-normal distribution of sounds, syllables et caetera. Though i don't think it's language, this made me wonder if it could be thought of as losslessly compressed English, Spanish or whatever. Why would the angels have our linguistic universals?

So, consider the following process, which is not compression:

Write a text in Basic English. This text will probably have log-normal vocabulary distribution: relatively few words will be much more frequent than the majority of words in the text, and there will be a "long tail". This is, i think, a normal feature of natural language.

Remove all articles, occurrences of the present tense of "be", replace all gendered pronouns with "it", convert all possessives to "of" forms, express all plural nouns and all verbs in the continuous aspect by duplication, and always use "more" and "most" instead of "-er" and "-est" (except for "more" and "most"!). This may smooth the frequency distribution a bit.

Encode the words as ten-bit binary numbers. Make this into a sting of bits and slice it into byte-sized segments.

Count the frequency of the bytes and re-encode the same text again, reversing the byte frequency, so for example if the most frequent byte turns out to be ten and the rarest eleven, swap them over. Scramble this text pseudorandomly.

Intersperse the numbers alternately, so that only odd- numbered bytes correspond to sense.

Convert the bytes back into words by using a series of two hundred and fifty-six consonant-vowel or consonant- diphthong syllables.

Read out the result.

nineteenthly, Jan 08 2012


Please log in.
If you're not logged in, you can see what this page looks like, but you will not be able to add anything.



Annotation:







       //Encode the words as ... binary numbers ... re-encode the same text again//   

       Can I suggest a Huffman code?
Wrongfellow, Jan 08 2012
  

       You can but that would necessitate me trying to understand that again. This is only contingently compression - it could be longer instead if preferred. Therefore, if Huffman coding is essentially compressive it wouldn't be relevant. I shall Google.
nineteenthly, Jan 08 2012
  

       OK, yes, i was going to go with something like that when i was mainly concerned with compression, and in fact i may go back to it. I wonder how efficient it would be.
nineteenthly, Jan 08 2012
  

       [MaxwellBuchanan] threw down that gauntlet a while ago and your humble servant did it in rhyme.
mouseposture, Jan 08 2012
  

       Thanks. These are things i've heard of but on which i haven't read. I do think linguistic universals and deep grammar might be useful in this respect but i can't currently think of how.
nineteenthly, Jan 08 2012
  

       so the idea is to translate something into Chinese then verbalize it ?
FlyingToaster, Jan 08 2012
  

       Chinese syllables can end in consonants. You may be thinking of normal language as opposed to the one we're using now. Japanese is an example of a normal language.
nineteenthly, Jan 08 2012
  

       I was thinking in the manner of ideographs (if that's the right word): symbols that represent an entire idea, rather than nouns, verbs, etc. Your trimming down of the English language sortof heads in that direction. What you've missed out (I think) is word substitution, ie: "I went to the shop" could be compressed to "I go store", as could "I travelled to the store", "I went to the mall", etc.
FlyingToaster, Jan 08 2012
  

       I had more than i posted, [FT]. However, whereas i could concentrate on merely simplifying the English language, which might be worthwhile if not liable to widespread adoption, the reasons for the modifications i suggest are not to simplify so much as smooth out frequency distributions. As it stands, English uses "he", "she" and "it" a lot, "my", "your" and "her" a lot, and so on. If every genitive occurrence of "her" was replaced by "of it", that would increase the frequency of "it" and "of" while confining "her" to the objective usage. This is why i want to repeat words for the continuous tenses and plurals - it would increase the frequency of words which are rarer than pronouns, conjunctions and the like.   

       The problem with the frequency distribution of this idea as it stands is that it would probably be bimodal rather than having a flat distribution graph. Making the verbs and nouns more frequent and certain other common words rarer blunts the peaks and pushes them closer.
nineteenthly, Jan 09 2012
  

       Dibs on creating the much more efficient nickname: 'Lossy glossy.'
RayfordSteele, Jan 09 2012
  

       [Bigsleep] helpfully suggests a lossless compression system, but I don't think even that's required - You could run the text (or sections of the text) to be glossolallified through the MD5 algorithm, leaving you with similar results - better in one sense as you've more control over the output size. Since you're scrambling and pseudorandomifying it anyway, you're obviously no longer interested in any embedded meaning - so to be honest - it doesn't really matter whether you use a lossless, lossy or completely random algorithm. Frequency distribution of output hexadecimal values should be fairly smooth, and if you wanted to, you can compress or expand into almost as many individual consonant/dipthong symbols as you like.   

       What effect are you trying to achieve here, or is it a veiled exercise in suggesting that glossolalialists in general tend towards sillyness?
zen_tom, Jan 09 2012
  

       It's inspired by their silliness but would also be a genuine form of communication because with some kind of table included it could be translated back into a rather odd form of English. The pseudorandom bit can be extracted - it's only either odd or even syllables.   

       What i'm kind of getting at is that whereas i'm confident that glossolalia is not a natural language (though it communicates certain things other than words, and what it communicates would depend on one's religious or sociological beliefs), the fact that the frequency distribution is not like any human language, or even birdsong or whalesong i think, that doesn't necessarily imply it's not a language. I can easily imagine a species which precompresses its language, says what it needs to say and has it uncompressed by a hearer of the same species, and that signal needn't have that kind of frequency distribution at all. So this is more about possible alien languages than real or pretend human or angelic tongues.
nineteenthly, Jan 09 2012
  

       It all sounds very right brain. You sure this is abstraction and not distraction?
reensure, Jan 09 2012
  

       Most things i stick on here are distraction. That's why i do it.
nineteenthly, Jan 10 2012
  


 

back: main index

business  computer  culture  fashion  food  halfbakery  home  other  product  public  science  sport  vehicle