psycholinguistic compression

bringing lossy compression home
Psychoacoustic compression has worked wonders for music storage: just filter out all the bits that the human hearing system doesn't notice anyway, and voila, smaller footprint music.

If we can apply the same to text and the spoken word... just think of the benefits! Trees could be saved, bookstores could carry a lot more stock. Newspapers would be a single sheet. Many forms of paperwork would shrink considerably or disappear altogether. Anywhere where words take up space would be an opportunity to make significant savings.

conskeptical, Oct 15 2008


       I think you have invented texting. The problem with texting is that it permits freeform evolution of the language. Texting needs a Daniel Webster like you, cons, to formulate and codify the new way.   

       As my humble contribution I suggest the substitution of all vowels with i, because it is skinnier.
bungston, Oct 15 2008

       "i" is skinnier, but "a" is more versatile because you can turn it upside down and make an "e".
phoenix, Oct 15 2008

       As my hmbl cntrbtn, I sggst th rmvl of all vwls tht arnt frst.
Spacecoyote, Oct 15 2008

       All you need is one dictionary, numbering all words consecutively; about 10,000 should do it, meaning that each word is a maximum of five digits. Now switch to a base-36 system (digits 0-Z), and you're down to three characters per word.
MaxwellBuchanan, Oct 15 2008

       Redundancy plays an important part in language.
Voice, Oct 16 2008

       Are we talking about the readers digest method of compression, the Clif's Notes method of compression or the simple illiteracy of the dark ages? I think we can hit a healthy balance somewhere between "A Tale of Two Cities" and "Newsweek International Section" without destroying the richness of language.
WcW, Oct 16 2008

       sp. "viola, smaller footprint music". Check out Lojban, which has similar goals - low redundancy and ambiguity but less unstrongful than Newspeak.
spidermother, Oct 16 2008

       [Spccyt] For vowel removal I suggest Teeline shorthand.
hattiel, Oct 16 2008

       wonder if anybody's done speedreading comparisons in different written language types.
FlyingToaster, Oct 16 2008

       Redundancy plays an important part in language.
lostdog, Oct 16 2008

       Did somebody say that redundancy plays a significant role in language?
neelandan, Oct 16 2008

       Yes. You're fired.
8th of 7, Oct 16 2008

       [hattiel] - I read [Spccyt] as 'Specky twat'. (No offence intended, [Spccyt]!)
Jinbish, Oct 16 2008

       What important part does redundancy play in language? I buy this for spoken language, where you get one shot to grok what comes at you and so it helps to have late reiteration of what came first.. But written language? You can redund that yourself by reading it again.
bungston, Oct 17 2008

       In reading, one does not read words letter by letter. The words get guessed at by the shapes they form and the context they appear in. Minor spelling errors which do not change the shapes of the words will often escape detection.   

       Which is why there are proofreaders.   

       And angry letters to the editors when one of them goofs up in a newspaper. There was an amusing incident in one I suscribe to when an article purporting to illustrate certain common mis-spellings in English ... didn't.
neelandan, Oct 17 2008

       //one I suscribe // Oh, the ferrousness.
coprocephalous, Oct 17 2008

       You are hired. As my proofreader, I mean.   

       (sp. subscribe)
neelandan, Oct 17 2008

       e-paper and cheap memory will eliminate the need for trees soon enough. I certainly don't want to re-learn written language when I haven't really learned it in the first place. I hate Texting and urge everyone to get a phone with a QWERTY keyboard. Abbreviations and acronyms are doing what the can to compress language, but one thing no one is mentioning is that written language bleeds into spoken language. Have you heard people say "LOL", “BTW” or “XOXO”. It is beyond annoying and I'm American, a people who casually butcher the English language. I would think this would drive the Brits insane.
MisterQED, Oct 17 2008

       Maybe you should start. Dont you want that empire back?
bungston, Oct 17 2008

       A way of cutting down word length without affecting the content would be to add more letters to our alphabet, like having an English version of the katakana alphabet. Having single characters for sounds that include both a vowel and a consonant would make reading and writing much harder to learn, but a lot more compact.
mitxela, Oct 20 2008

       Following that idea to its logical conclusion, it's a well-known phenomenon that many "real" programmers can "read" hexadecimal code as ASCII text. Which suggests that there's a case for encoding phonemes into 8-bit values. You'd end up with a 255-character "alphabet" expressed as nibble pairs .... only 16 actual symbols. That could be workable.   

       // niche oriented high skilled networking of the people of globalisation 2.0. // [Ian] ... you're spending too much time with the marketing department.
8th of 7, Oct 20 2008

       just talk faster; works for auctioneers.
FlyingToaster, Oct 20 2008

       Isn't this also called inattention?
reensure, Oct 21 2008

       Sorry, what was that?
Jinbish, Oct 21 2008

       //just talk faster; works for auctioneers//   

       And horse race announcers.
theleopard, Oct 21 2008

       Bibliographical compression - If we put all of our books into a digital library, and extracted the text so that it was one long string, the likelihood would be that any combination of words that you might be intending to use exists at point x in THE TEXT - all you'd have to do is specify where in THE TEXT you wanted to lift words, and how much to lift - like this:
Mid(THE TEXT,12345,646210)
Which would lift out 646210 characters from location 12345. The actual compression ratios could be massive, assuming that you're saying something that someone else has said before.
zen_tom, Oct 21 2008

       Mid(THE TEXT,12345,646210)
*** malloc[9351]: error for text 0x12345: Incorrect check sum for freed text.

       "Hmm, that's odd. I'll just look up error number 9351..." <flicks through manual> "Ah. Here we are malloc error 9351... 'returned phrase was utter bollocks' "
Jinbish, Oct 21 2008


