Why do neurons adapt? Or, the art of making acronyms

Sep 13, 2024

6 min read

To be verbally descriptive is to be informationally inefficient. All that space, time, and cognitive effort wasted on words! Most books only contain a few pages worth of pure information. What a futile attempt to dress up words ever more extravagantly to cover up their inherent emptiness! This whole paragraph already takes up such spaces, but informationally, I just want to say three words: words are redundant. And as we shall see, those three words really are too long as well.

Minimizing information seems unnecessary when the cost of information transmission is trivial. The efficiency only ever becomes a problem when the transmission is expensive. For example, medical students are already out of time, they don’t want to be writing the “gastrointestinal tract” or “electrocardiography” every time they use them, so instead, they just write “GIT”, or “ECG”. To learn medicine is to learn a whole new language; Hx is "history", Sx is "symptom", Dx is "diagnosis", Tx is "treatment", Fx is "fracture", etc. The reason these words can be shortened is that they were redundant, to begin with. If you use a phrase frequently enough, it can be encoded with a much shorter sequence of alphabets. Theoretically, one letter acronym can only specify 26 words, and to maximize their utility, these 26 words better be the most frequently used. In reality, though, one-letter acronyms are not much used, because we happen to already use many one-letter acronyms for chemistry (elements, units e.g. M = molar), so to avoid overlap of meaning and thereby confusion, two- or three-letter acronyms are most frequently used. Two-letter acronyms can theoretically specify 26x26 = 676 words, and three-letter words can specify 26x26x26 ≈ 17,000 words. This in fact covers the good majority of words used daily in the clinics. Using more letters is, again, redundant. For example, COPD (chronic obstructive pulmonary disease) is 4 letters by convention, but CPD probably would have done the job just as well. Well, this is always a cost-benefit analysis, writing one extra letter is usually such minimal effort that COPD is forgiven.

In a word guessing game, usually, the difficult part is only the first few letters, especially the beginning of letters. Once you get “app_ ”, it is much easier to guess “le” for the last two positions. This means the specifying power of the word is not equally distributed among the words, but really is concentrated at the beginning of the words, or syllable-defining letters (usually the consonants), in the case of English. So d Eng lang rlly cn b spcfd w/ mch fwr lttrs. Commonly used phrases can be specified similarly with much fewer letters, e.g. TBD = to be determined. Also, the more context you have, the easier it is to predict the correct ___. In short, information really doesn’t reside in the whole word, but only at very specific places. These specific places decrease the likelihood of words being something else. If I ask you to “guess the three-letter word that starts with an A that I'm thinking now” it’s impossible because there are too many. In other words, to be given “a_ ” is not much informationally helpful. But what about a three-letter word that starts with a Z? You’re probably thinking of Zoo, zip, or perhaps Zen. “Z _” is informationally much more helpful. The reason consonants are more useful in specifying a word is that the particular sequence of consonants is usually less likely than a combination of vowels. The rarer it is, the easier it is to specify. Information

is fundamentally connected to the probability of things; if the letter eliminates a good chunk of probable words, then the letter provides more information. If the letter leaves so many potential words, it has low information.

When Morse code was invented, the transmission was still quite expensive (back then electricity was harder to come by), so being informationally redundant was to lose more money. In Morse code, you can only talk in two letters — a short dot, and a long dot. So to specify 26 alphabets, one needs to use a combination of short and long dots. If you try to talk real English this way, soon you will start noticing some patterns. For example, E and T show up very frequently in English, while X, Y, and Z show up rarely. So if you code all alphabets with the same number of dots, the pattern for X, Y, and Z is barely used while you use E and T very often. When each dot transmission costs money, you start feeling Es and Ts are not informationally “worth” the three dots to specify, as they are so frequent — you would prefer to use an “acronym” to keep it cheaper. This situation is similar to a busy pulmonologist feeling lazy writing “chronic obstructive pulmonary disease” every time. E and T are therefore specified with only one dot each, while X, Y, and Z take up 4 dots. The generalization from this is that the more common the word is, the fewer letters it takes to represent/specify it. In the sense that it saves letters, the more probable the letter, the more informationally cheap. (The most efficient coding we know today is called Hoffman coding, and it uses very similar ideas).

So the information is closely tied to probability — if something is common, it tells us very little about the phenomenon, and its informational value is low. If something is rare, it says more about the phenomenon, and its informational value is high. This is a critical inverse relationship, and to make it more simple, sometimes “low probability” is referred to as “surprisal” — so high information means high surprisal.

Sensory neurons adapt — prolonged stimulation leads to the neurons firing less and less, leading to the attenuated subjective experience of the sensation. But why does this have to take place? Physics does not change, and the stimulus intensity remains constant. What changes is its informational value — sensory information constantly updates the brain about our environment, and encoding the physics that’s going on takes metabolic energy. Initially, the intensity of the stimulus needs to be encoded, but if the stimulus is frequent/continuing, the informational value of encoding it becomes less and less — making neurons want to use “acronyms”. The neuronal firing rate reduces to reflect the drop in informational value. Imagine someone screaming “Fire!” at the beginning of a fire, vs. when that person continually creaming “Fire!” after the fireman has already arrived — the informational value unavoidably drops. We can also put it this way: at the beginning, few people are thinking about a fire, thus screaming “Fire!” has a high surprisal and therefore high information, vs. later, when most people are already aware, “Fire!” is very much an expected reaction, thereby generates little surprisal. (Neuroscientists explain it this way: neural adaptation allows neurons to efficiently encode input statistics that vary in time (Walk et al., 2007). I guess/hope we are talking about the same thing.)

If you are more mathematically minded, here’s a more rigorous definition of information as described by Claude Shannon. Shannon’s information is inversely correlated with probability p(x), but to make it additive, Shannon took the log of it i.e. log1/p(x). This is only talking about each and every particular case of x e.g. one letter being A, so when thinking about a group of phenomena e.g. a 3-letter word containing A, it is more useful to consider the average expected information — so you take the average by adding them all up and dividing by the whole number, which happens to be mathematically equivalent with multiplying each by p(x). This gives the aptly-named Shannon’s entropy (which simply means the average expected information) H = ∑ p(x) log1/p(x). Notice one thing — Shannon's information is simply a re-statement of probability! In fact, even the thermodynamic entropy can be rewritten to the same form S = k ∑ p(x) log1/p(x) (where k is Boltzmann’s constant), also revealing it to be a re-statement of probability. Both information and energy, at their core, are manifestations of probabilities — if that doesn’t blow your mind, I don’t know what will.

Wark, B., Lundstrom, B. N., & Fairhall, A. (2007). Sensory adaptation. Current opinion in neurobiology, 17(4), 423-429.