We’ve seen that there is not only a potent evolutionary rationale for the evolution of norepinephrine (NE) as the neuromodulator regulating attention, but that our hypothesis about its role gives us remarkable insight into the behavior of our earliest vertebrate ancestors. But you probably don’t have any friends with file drawers full of unfinished projects who are also ray-finned fishes, let alone sharks. So let’s see what sense our hypothesis makes of human behavior.
And let’s forget our NE hypothesis for a moment and just start with DA. We’ve proposed that it turns on phenomenal consciousness, especially the experience of emotion. This means that high-DA people are passionate and low-DA people are dispassionate. (This explains, incidentally, why highly opinionated people tend to talk loudly and gesture with their hands: remember that DA also controls motor activity through a second pathway.) In terms of cognition, high-DA people tend to be highly emotionally invested in the things they’re thinking about, while low-DA people tend to be less invested—more, well, dispassionate.
Is there a cognitive trait that would make sense to correlate with these two different emotional relationships to the contents of our thoughts? Well, sure: the more emotionally invested you are in what you’re thinking about, the longer you’d want to stay thinking about it. The more dispassionate you were, the more likely you’d be to move on and start thinking about something else. So it makes perfect good sense to correlate passion with length of attention span—and that means DA with NE.
And which of the four combinations of the two traits might be a particularly bad idea? Someone with high DA and low NE will be passionate about the contents of their thoughts, but flighty and prone to attentional shifts. Since attentional shifts can lead to creativity, that doesn’t sound like a bad combination. But low DA and high NE would give you someone dispassionate about the contents of their thoughts, but prone to linger on them. That doesn’t sound good: that sound like a recipe for boredom.
Note that we’re not saying that there aren’t dispassionate people with healthy attention spans; we are talking here about excluding the combination of the extremes of the traits. What you won’t see—what, in fact, we don’t see, except perhaps in rare variants of AD(H)D—is a dispassionate person who has trouble tearing themselves away from what they’re thinking about, as passionate people sometimes do.
You would think that that would be built into the brain: the dispassionate person would never have trouble tearing themselves away, because at some point they’d just get bored. But that’s begging the question or thinking circularly. What does it mean to be bored, cognitively? What controls boredom, chemically?
What we’ve found here is that the level of passion is controlled by one thing and the ability to tear oneself away is controlled by another completely independent thing. If in fact we observe that dispassionate people almost never have the problem of being unable to tear themselves away, we need to explain why that combination almost never exists. And our hypothesis about the roles of DA and NE explains that perfectly. Dispassionate people are that way because they have relatively inactive DA-producing enzymes, and that guarantees that they will never have levels of NE so high that they are prone to hyperfocusing. (It’s conceivable that someone dispassionate might have problem with hyperofocusing because of other defects in the attentional hardware, which is why I don’t rule out this being a rare form of ADD.)
So this correlation works really well for humans, and in fact has some explanatory power in terms of explaining why the vast majority of people who are prone to hyperfocus are passionate about thinking.
In the last installment of the model proper (there’ll be a second set of posts on implications), we’ll look at the history of the understanding of the role of NE in attention.
So, by a process of elimination we’ve decided that norepinephrine (NE) controls attention by sending a signal that multiplies the salience tags in active memory, thus controlling the salience gradient. The more NE, the more likely we are to attend to the most salient potential attendums.
How much sense does this make? We’ll look at this two ways: chemically and historically (personally, even). And the chemical argument will divide into two parts: one about evolution, and one looking at traits in humans.
The key chemical fact about NE is that it’s very close to dopamine (DA) structurally. In fact, in the synthesis of NE from the amino acid tyrosine, DA is an intermediate step; the brain actually makes DA, for a moment, in the process of making NE.
Note that I’m not saying that the brain “makes NE out of DA,” although that’s technically true (and you may read that elsewhere). But that implies than some of an existing, usable cache of DA is being converted to NE, and that’s not at all true. In fact, if that were true, the levels of DA and NE would be inversely correlated; if you had a lot of one, you would have only a little of the other. But in fact the levels of the two neuromodulators are positively correlated: if you have a lot of one, you tend to have a lot of the other. The DA-producing cells all have a pair of enzymes which make DA out of tyrosine. The NE-producing cells add a third enzyme, dopamine β-hyroxylase, which turns the DA into NE. If you have alleles (“genes”) for especially active or inactive versions of either of the first two enzymes, you will thus tend to have high or low levels of both DA and NE.
Furthermore, if you think about this chemical chain, you will see that there’s nothing to prevent someone from having very high levels of DA production but very low levels of NE—you’d just need very active variants of the DA-making genes and a very weakly productive version of the NE-making one. But the opposite would be impossible. If you have very low levels of DA production, that sets an upper limit on how much NE you can produce; the NE-making cells just don’t have enough DA to convert to NE even if the NE-making enzyme is very active. So folks who make very little DA are forced to make relatively little NE as well.
So, evolution has selected for these two relationships: in general, DA and NE levels are correlated, and specifically, low DA and high NE is a forbidden combination. Does this make sense in terms of our hypothesized roles for each?
The first thing you might want to know is at what point in evolution this relationship was established. And it so happens that four of the five neuromodulators of the control brain go way, way down the evolutionary ladder, and are found in invertebrates. NE is the exception. Invertebrates don’t have NE; they instead have octapamine (OA) serving an apparently analogous role. OA is in the same family of chemicals but does not have DA as a precursor.
So in the original neuromodulatory paradigm, which is incredibly ancient, the five chemicals had unrelated manufacturing pathways. But at some more recent evolutionary point (not necessarily when the vertebrates evolved; I’m not sure anyone has ever examined the neurochemistry of hagfish and lampreys, which are on neighboring sides of the invertebrate / vertebrate division), this substitution happened:
Tyrosine -> [one enzyme] -> tyramine -> [dopamine β-hydroxylase] -> OA
Tyrosine -> [two enzymes] -> DA -> [dopamine β -hydroxylase] -> NE
It’s important to note the conservation of the dopamine β -hydroxylase enzyme. The neuromodulator filling our hypothesized attention-controlling role went from OA to NE because a different substrate was provided for this enzyme.
We can infer a surprising amount about the behavior of our early vertebrate ancestors from this knowledge. Let’s begin by reminding ourselves of the original purpose for varying the salience gradient: to adapt the strength of attention to the current environment, on the fly. It’s highly adaptive if you have the ability to keep attention focused once you have identified a predator threat, and nearly as adaptive if you can keep it focused after having identified a food source or potential mate. And it’s highly adaptive if you can instead keep attention volatile when there is no potential threat, food, or mate in sight, and the environment needs to be scanned and searched for same. So it’s no wonder that some sort of attentional control goes back essentially to the earliest animals.
What can we infer from the substitution of DA for tyramine as the substrate for the enzyme that produced the attention-controlling neuromodulator? The apparent evolutionary purpose of this substitution was to correlate the levels of DA and that chemical, so that organisms with a high or low supply of one chemical would tend to have the same sort of supply of the other. That immediately tells us something crucial: that there were already different functional alleles for the enzymes involved in the synthesis of DA and OA. Because if every organism had the same allele for each of the four synthesizing enzymes, the levels would already be correlated: they’d be the same for every individual. And a little thought reveals that we would in fact need variation in both the levels of DA and of OA in order to make correlating them meaningful.
Now, it’s not necessarily true that evolution would accommodate different alleles for these enzymes, and hence different levels of the neuromodulators. If there were a single best level of OA to have, any mutation of the enzymes involved in its synthesis would have rendered the organism less able to compete. The mutation would have been selected against, weeded out. But we know from the shift from OA to NE that more than one allele was present in the population: there were (at the least) high-OA and low-OA organisms, and even though their behavior would be different as a result, neither had an evolutionary advantage.
And what can we make of that? If we’re right about the role of OA / NE, we are talking about organisms with different innate attention spans. And if different attention spans and hence different behaviors were equally adaptive, then we are talking about the organisms filling different behavioral niches. And in fact it’s not hard to imagine that an organism with a different attention span than its conspecifics would have an evolutionary advantage when hunting or being hunted, which could even mean evolutionary pressure to select for a wide variety of OA levels. Each distinct level of OA would correspond to a different behavioral niche in the great predator / prey dance.
So this tells us a remarkable amount about the sophistication of the behavior of the earliest vertebrates: their environments and predator / prey interactions were complex enough to accommodate multiple behavioral niches. There were organisms with short attention spans and ones with long attention spans. And there were organisms with low DA levels and with high DA levels. And whatever was mediated by that trait, it was evolutionarily advantageous for the low DA organisms to have a short attention span (and, to a lesser extent, for the high-DA organisms to have a longer one).
So what was the DA trait involved in this adaptation? We’ve hypothesized that DA turns on phenomenal consciousness, especially the experience of emotion, and hence the intensity of pleasure responses (and almost certainly pain responses as well). But that’s not the only thing DA does. DA initiates movement, and its current relative level represents the organism’s energy reserve or capacity for action. And as we saw quite a while ago, DA holds information in working and active memory, since that provides the simplest way of setting the correct salience tag.
This last use of DA would seem to be the best candidate for the trait needing correlation, since it’s the one involved in attention. Individuals with high levels of DA would have larger stores of working and active memory. Let’s imagine four types of hunting behavior, derived from the four combinations of DA level and attention span (when hunted, behavioral differences melt away, as every organism gets its attention span driven up to transient high levels) .
High-DA, long attention: Their ability to keep attention focused on a potential prey situation is rewarded by their large capacity to store potentially relevant details of that environment.
High-DA, short attention: Their propensity to shift attention is rewarded by their ability to store potentially relevant details of multiple different environments.
Low-DA, short attention: Their propensity to shift attention is compatible with their relatively limited ability to store information about the environment. Once they’ve observed as much as they can absorb, if there’s no prey found, they move on. (The reason why low DA does not put them at an evolutionary disadvantage is that it confers advantages unrelated to attention, such as a diminished conscious experience of pain.)
Low DA, high attention: OK, this one doesn’t work. They’d be attending to their environment past the point where they could extract and store additional information about it. They’d all get outcompeted by the other three types, and they’d starve.
And that explains why OA was replaced by NE. By making the size of working and active memory a prerequisite to the strength of the attention span, you eliminate individuals who have the ineffective combination of a low memory capacity but long attention span. And that would confer an evolutionary advantage on the mutation that caused the correlation. If we start with two equally sized populations of sharks, one of which still uses OA to control attention and one which has the mutation that substitutes NE, in the next generation the NE sharks will be more prevalent, since some of the OA sharks will have starved. With each passing generation the population imbalance will increase, and eventually, the OA sharks would become extinct.
Those who have heard me talk about the brain know that I have a complete paradigm for the role of the fundamental neuromodulators, one I developed about a decade ago. A minimally thorough version of that theory filled a 62-page term paper for Personality Psychology, and a short version of the argument for just one of the six chemicals (acetylcholine) was a 39-page paper for another course. Nevertheless, I’ve explained it all in fifty minutes on several different occasions … so let’s see if there’s a 1000-word blog post version.
Five of the six neuromodulators constitute a system I call the “control brain” (the sixth, adenosine, is probably the most important of them all and is way outside the scope of this summary). The control brain doesn’t do any thinking or feeling; its job is to control the rest of the brain (the “main brain”) so that our style of thinking and feeling can be varied to match our circumstances. You can think of each and every neuron in the main brain as having a “volume knob” and a “tone control” which determine the degree and style of their activity, respectively. The control brain is a system for globally turning these knobs, in a coordinated fashion.
Physically, the control brain is located in the brainstem, just above the mechanisms that control the body. Here there are “factories” for the five chemicals (this is a good time to point out that a summary this brief is full of oversimplifications; for instance, one of the five chemicals is actually made in the hypothalamus, which is just above the brainstem, and another has a second factory even higher up in the brain). All of the brain’s serotonin, for instance, is made in a tiny cluster of cells in the brainstem called the raphe nuclei. The neurons of a chemical factory do not project to their neighbors, like the majority of neurons in the brain. Instead, their axons (business ends, where the chemical is released) project up into the main brain, branching and subdividing endlessly, so that each serotonin-releasing neuron synapses on (“innervates”) thousands of target neurons. Serotonin is the signal that turns the volume knob on these cells, and the relative handful of serotonin-producing cells in the raphe nuclei are thus able to control the entire brain.
What’s initially puzzling is that there are five parallel control systems. So there must be five fundamental parameters of the brain’s information processing system that are being controlled. We’ve already mentioned dopamine (DA) and its hypothesized role in turning on phenomenal consciousness (subjective experience), especially feelings of pleasure and suffering. So one basic parameter is the degree to which we are consciously aware of the information being processed by the target neurons. That leaves us four parameters to explain, and that’s a good number, because you could come up with four parameters with a 2 x 2 design, for instance, two different ways of controlling two different things.
Let’s start by observing that the brain stores information, and that the chief feature of the storage system is that information is connected to other information (that song reminds me of you). So there are two fundamental types of neural circuitry in the brain’s storage system: circuitry which encodes information, and circuitry which connects information to other information. You would want to control these two types of circuits independently. Adjusting the volume on the encoding circuits would have the effect of controlling the level of general brain activity, which is probably the most obvious parameter of them all. Adjusting the volume on the connecting circuits would control the degree of associative spread, the degree to which things remind you of other things. And that is a very good thing to control; there are times you want to be empirical and think just about the facts, and times you want to be interpretive and thing about all the implications of the facts. (The savvy among you may have realized that I have just described the Jungian “sensing / intuition” dichotomy which constitutes one of the traits of the MBTI.)
So all we need to complete the paradigm is two different ways of controlling these two types of circuits. And we have that already: it was in the last part, where we hypothesized a multiplicative signal to increase the salience gradient. A multiplicative signal is of course contrasted with a simple additive signal. An additive signal “pays no attention” to the level of activity in its target; a plus 5 additive signal turns 0 to 5, 5 to 10, and 10 to 15. A multiplicative signal does “pay attention to” the level of activity in its target, by getting feedback from it; a times 5 multiplicative signal leaves 0 as 0, but turns 5 to 25 and 10 to 50.
So, one chemical controls the encoding circuits additively, and another controls them multiplicatively. And another pair of chemicals controls the connecting circuits, again, one additively and one multiplicatively.
Histamine is known to be the brain’s chief mediator of cortical arousal, the primary determinant of whether you’re asleep or awake (this is why antihistamines cause drowsiness), and is thus the obvious candidate for the chemical that additively controls the encoding circuits. Acetylcholine (ACh) additively controls the connecting circuits and hence associative spread (the brain defaults to maximum associativeness and ACh inhibits the spread); this hypothesis explains a wealth of observations, from the nature of cognition during REM sleep (when ACh levels are higher than in waking) to the cognitive style of those at risk for Alzheimer’s (a disease which targets ACh neurons exclusively).
That leaves us with two chemicals: norepinephrine (NE) and serotonin (5-HT), which are made right next to each other and have profoundly similar patterns of innervation (the other three are unique). And they must be the two multiplicative signals.
I’m not known for false modesty, so let me un-humbly suggest that the idea that serotonin is a multiplicative inhibiting signal for the brain’s connecting circuits is a serious contender for Best. Brain. Idea. Ever. By being multiplicative, it has no affect on the ordinary weak connections like “that song reminds me of you.” But it is the only way that strong connections, like deeply felt beliefs and lifelong emotional responses, can be broken and hence rearranged. I’ve been proposing that serotonin fundamentally controls cognitive and emotional flexibility for well over a decade, and the world is slowly coming around to share that idea. But I’m fairly certain I’m the only person who can explain how serotonin does this at a low level of neural circuitry.
And that leaves us with norepinephrine as the multiplicative signal for information encoding, and hence the chemical that controls the salience gradient. In the next post, we’ll explore how much sense this makes.
So, how are attendums selected for attention?
If you were designing a system from scratch, your first thought might be to make this a top-down process. You would build a module in the executive control center in the prefrontal cortex that continually monitored the salience level of every attendum in active memory, and selected the one that was most salient.
The fundamental problem with such a design is that it’s a lot of circuitry. Is there something simpler that could do at least as good a job?
Well, here’s an idea. Why not make it a bottom-up process? Let the attendums compete for entry into consciousness. Without worrying about the details of the actual neural mechanism, we can think of all the attendums in a sort of rugby scrum at the gateway to consciousness. Each attendum has strength and vigor proportional to its salience, so the most salient attendum is likeliest to win the scrum and gain a foothold in consciousness. It will, however, be continually subject to being overmastered and supplanted by another attendum of comparable or newly superior strength. (Those familiar with Gerald M. Edelman’s concept of “neural Darwinism” will recognize the inspiration for designing a brain based on Darwinian principles of selection.)
There would appear to be one major bug in such a design: the most salient attendum is not certain to be the one selected. This is inherent in any advantageously efficient bottom-up design. Any bottom-up mechanism that always resulted in the most salient attendum being selected would be functionally equivalent to our top-down system, and would require as much circuitry, or even more (instead of the cortex monitoring the salience of every attendum, each attendum would have to monitor the salience of every other). The fundamental tradeoff here is to simplify the selection mechanism by allowing a probabilistic selection among attendums based on their relative strength.
Why do I believe that the brain has made such a tradeoff? Because it’s not really a tradeoff at all. The “bug” is in fact a feature. It is a good idea to sometimes let an apparently less salient attendum win the battle for consciousness. And that is because the salience programs are all essentially out of date. As we saw in Part IV, they can only be created or revised when the attendum is in consciousness, because the creation of the salience program requires the good and bad feelings that are only present in consciousness. The salience program of every attendum is thus based on the state of affairs the last time we attended to it. And things may well have changed since then.
This of course reflects our subjective experience of attentional switching. We sometimes (I originally said “often,” but that’s true only at one end of a personality spectrum we’re about to discuss) find ourselves thinking about something we hadn’t thought of in a while, and when we do, we sometimes discover that events or insights that have happened in the interim have changed its relevance or importance. A truly efficient brain would to a lot of “checking in” on apparently less-salient attendums to see if their salience programs needed updating. And a terrifically simple way to accomplish that is to simply let them sometimes win the battle for consciousness.
Now, there is one more thing to consider, and then we’ll have a complete model. And that is that there are good times and bad times to let a less salient attendum win the battle for consciousness. When we are sitting and doing nothing is a very good time for “letting our mind wander,” to unexpectedly find ourselves thinking about something that we’ve been regarding as relatively less important, perhaps even trivial. When we are fleeing from a bear, on the other hand, is an extraordinarily bad time to attend to anything other than fleeing from the bear.
What we need then, is a way of controlling the contrast among the salience tags. When we’re sitting daydreaming, we can imagine the contrast turned all the way down, so that the strongest salience tags are not much stronger than the weakest, with the difference in fact being not significantly greater than the built-in “margin of error” that is inherent in the competitive mechanism. The playing field is thus essentially leveled. When we’re attending to a project for work (or a blog post), we can imagine the contrast turned up about half way, so that only reasonably strong attendums have any chance of seizing control from it (one that always has a good chance is “I need to pee.”) When we’re running from a bear, the contrast would be all the way up, so that the difference between running from the bear and the next most salient attendum would be greater than any margin of error in the selection process. Hence the thought of stopping to urinate would not cross our minds regardless of the urgency of our need.
We can model this mathematically. Imagine that the salience tags range from 1 to 10 in strength, and that the “margin of error” in the competition for consciousness is 8, which is to say that the salience-1 tags compete anywhere from 1 to 9 and the salience-10 tags compete anywhere from 2 to 10. There is thus ordinarily only a very slight advantage for the most salient attendums over the least. This is the baseline daydreaming state, with the contrast turned all the way down.
Now imagine that we had a way of effectively multiplying all the salience tags. Let’s multiply them all by 2. Now they range from 2 to 20, and the salience-1 tags have an effective strength of 2 to 10, and the salience-10 tags have an effective strength of 12 to 20. So there is no chance of a salience-1 attendum ever winning the consciousness battle. This is a reasonable minimal level of concentration.
With a multiplying signal of 10, the salience-9 tags would be something like 83-91 and the salience-10 tags would be 92-100. This is running from the bear.
I like to thing in terms of a salience gradient, the steepness of the line you would draw if you plotted the effective salience of every attendum from weakest to strongest and connected them. With no multiplying signal, the slope of this line, the salience gradient, is very shallow. What the multiplying signal does is increase the steepness of this line, the salience gradient, and hence the likelihood of the most salient attendums winning the battle for consciousness.
So, what might this multiplicative signal be? We’ll answer that next.
(As an aside, I am hopeful that the concept of active memory will be crucial in figuring out the mechanisms by which memories are consolidated and pruned during sleep. We have much evidence that this happens, but it won’t be possible to understand what’s going on unless we correctly understand the memory systems involved.)
Next: How Active Memory Works, Part 1