See Upstream Color, If You Can

As you may know, Upstream Color is the long-awaited second film from Shane Carruth, the autodidact auteur behind 2004's extraordinary Primer. To say I'd been looking forward hugely to this film would be an understatement.

To say that I was not let down would be one, too. It's better than I dare dreamed.

I adore narratives that demand repeat exposure and reveal more of themselves with every iteration. That of course describes the work of both Gene Wolfe and Christopher Nolan, but it's also Primer (some would say almost to a fault). That's all that I hoped for from Carruth's new film; an emotionally resonant text that would, above all, set those "oh my god I think I understand this" bombs going off in my head, and do so in different ways each time I saw it. (At some point I'm going to propose that the acronymic omgitiut should be recognized as a full-fledged film genre; if you recognize the source of the phrase you know that such films can be domestic dramas as well as sf puzzle-boxes.)

What I got was something much more. Imagine that Terrence Malik made one of these films, and you've got something like Upstream Color, and indeed critics who are indifferent to narrative challenge for its own sake are swooning over this, and asserting that they love it despite feeling no need to solve the problems it presents.

There is, I think, almost a precise parallel here in the career of Darren Aronofsky. Pi was a terrific debut, though nowhere near as good as Primer. After making the even better and much more conventional Requiem for a Dream, Aronofsky then spent years trying to make a film that would tell a challenging sf story with glorious visuals, using SFX to achieve the sort of aesthetic rapture you get from a Malick film, rather than the sense-of-wonder that you'd get from Kubrick or the opening shot of Star Wars. When the funding fell through, he made the film anyway, after re-writing it as a small budget film. And that was The Fountain.

After Primer, Carruth spent years trying to make an sf film called A Topiary, and when the funding fell through, he made Upstream Color instead. In both its artistic aims and thematic concerns it seems to me as close as possible as any film could be to The Fountain (and vice versa). I liked The Fountain quite a bit and I'm looking forward to seeing it again some day. After one exposure to each, though, I'd make the following comparison: the science in Upstream Color is much more interesting and feels more like it has complexity, internal logic and consistency; and in every way, Upstream Color is the more accomplished film. Yes, I'm asserting that as good as Darren Aronofsky is, he's not in Carruth's class as a writer or director. And Carruth is his own cinematographer, composer, co-editor, and first camera operator, and excels in each role, and he's more than solid as the male lead.

What Carruth has done here is almost unprecedented in film history. And I'm not talking about doing everything but the catering -- that goes without saying. It's this: spectacular debuts are almost never followed by a significantly better film. Have you even heard of Gran Casino, In This Our Life, Stagestruck, A Woman is a Woman, There's Always Vanilla, Alex in Wonderland, The Last Movie, or Crimewave? They are respectively, second films by Bunuel (following L'Age de Or), Huston (The Maltese Falcon), Lumet (12 Angry Men), Godard (Breathless), Romero (Night of the Living Dead), Mazurky (Bob & Carol & Ted & Alice), Hopper (Easy Rider), and Raimi (The Evil Dead). (Two more recent examples of the principle: Andrew Niccol's Gattaca and S1m0ne, and Florian Henckel von Donnersmarck's, The Lives of Others and The Tourist.) Even when a director hits home runs his first two times out, the first film is usually superior: Citizen Kane and The Magnificent Ambersons, Pather Panchali and Aparajito, The 400 Blows and Shoot the Piano Player; or regarded as more or less its equal: Badlands and Days of Heaven, or Being John Malkovich and Adaptation.  .

I can come up with only four instances where a director or directors topped a classic first film with an even better second effort, and two of those have asterisks. Gene Kelly followed his first collaboration with Stanley Donen, On the Town, with Singin' in the Rain -- but Donen made several films in the interim (his first being Royal Wedding). The Coen Bros. followed Blood Simple with Raising Arizona, but that is regarded, I think, as an incremental improvement rather than a dramatic one -- still impressive, but not eye-opening. And that leaves us with Mike Nichols following Who's Afraid of Virginia Woolf with The Graduate, and the comparison that I think is closest to the bone: Reservoir Dogs and Pulp Fiction. That's the one time in film history where a director's first two films give me the same sense of "we knew this guy was great, but, really, we had no idea."

Upstream Color comes out as a Blu-Ray / DVD combo pack on May 7, but it deserves to be seen on a big screen. Proceeds go to financing Carruth's next film. Anita and I will be back to see it a second time next weekend, dragging friends along. See it if you can.

Trailer #2!

The list of blog posts I intend to write keeps growing ... but I'm managing to keep myself too busy to get to any of them.

A fellow named Peter Keating is doing a story for ESPN Magazine on, more or less, "life after sabermetrics" -- what happens to folks who get laid off by professional baseball clubs? He's raising the question of whether pro teams really understand how to get the most benefit from folks like myself. The article's about three former Red Sox consultants: Mike Gimbel, Voros McCracken, and (largely, it seems) me. I've talked to Peter for hours on the phone, and had the pleasure of not only meeting him in NYC (he lives in Jersey) but taking him to his first Mission of Burma show (my 267th). He's a great guy; our conversations have taken long detours as he volunteered the opinion that Buffy the Vampire Slayer's "Once More With Feeling" is the greatest thing in the history of television (I concur) and quizzed me on my favorite Star Trek: TOS episodes -- because, of course, he needed to know. Last Tuesday a three-person photo crew spent the afternoon here with two SUVs full of gear and shot 342 photos of me, which may match my previous lifetime total. I hope at least one or two came out well.

I'm guessing the article will appear in April, and it should provide an interesting counterpoint to pages 160-2 of Francona: the Red Sox Years, which talks about my role with the club (and gets the typical amount of facts wrong, but not in any kind of damaging or insulting way).

In the meantime, after a six-month hiatus I've finally resumed working on "the book," which is to say A Nature of Consciousness, which is to say the scientific paper "A Testable Theory of Phenomenal Consciousness and Causal Free Will" from which it will be adapted. I couldn't be more pleased with how the work is going, or more terrified that it will make me more famous than I care to be. Which, I'm learning, may not be a high bar: after being the relentless center of attention for the photo crew for four hours, I told the photographer that I may be sending future requests for photos in his direction.

And in the other meantime, I added 38 movies to the list of 2011 movies I wanted to see, to bring the total to 166; I've got twenty left to see, and when I'm done I'll do a massive data analysis in an attempt to build a model predicting my own rating from Netflix's guess and a slew of other numbers. (In case anyone questions the sanity of so thorough an approach, my current Top 10 includes two movies that I had initially decided not to bother with, and the very last batch added to the queue has already produced a Top 35 film.)  I hope to get the full 2011 rundown online in late March of early April, together with whatever I can glean about the relationships among critical and audience tastes after crunching all those numbers.

And that's why none of the following has been written yet:

  • A review of The Hobbit: An Unexpected Journey for, and the promised 4th part of my series for them

  • An essay for film buffs on the nature and meaning of Slipstream as a genre (one of 2012's best films, Holy Motors, is quintessential slipstream, but no one in the film world knows that concept)

  • A solution of a major psychopharmacological riddle: the mode of action of the super-stimulant Provigil (modafinil)

  • The final attention theory post

  • Most importantly, a series of posts entitled This is Your Brain at the Movies, including the results of a survey I constructed where three of my proposed fundamental personality traits can be shown to explain about 50% of how much someone likes Cloud Atlas. Bits and pieces of this have been scattered all over the Web in the form of comments to reviews written long ago.

I'll be accepting bets on whether my next entry will be a) one of those, b) something else entirely, or c) another meta-entry. But in the meantime, for those thirsting for actual content, I'll leave you with this quick list of favorite 2012 films, so far, in order:

Top 10: Cloud Atlas, The Dark Knight Rises, Lincoln, Moonrise Kingdom, Seven Psychopaths, Zero Dark Thirty, Holy Motors, Amour, The Master, Silver Linings Playbook. HM: The Avengers, End of Watch, Beasts of the Southern Wild, Monsieur Lazhar, The Turin Horse, Barbara, The Cabin in the Woods.

(Significant films not yet seen: Once Upon a Time in Anatolia, Cosmopolis, Magic Mike, Killer Joe, Searching for Sugar Man, Wuthering Heights, Killing Them Softly, How to Survive a Plague, Compliance, Kahaani, Headhunters, Sound of Noise.)

Present and Coming Attractions!

Over at TheOneRing.Net, I'm a guest writer. You don't want to know how many hours I put into this piece (and its sequels), but I'm very proud of the result. Thanks to TORn for running it!

In the meantime, this space should feature, in mid-January, an epic "2011: The Film Year in Review." Yes, 2011, because it takes a full year to catch up to the previous year's obscure movies as they come out on DVD. Last night, for instance, I watched a very satisfying Bollywood road trip epic, Zindagi Na Milegi Dobara, which ranked #171 at the U.S. box office (my original assertion that it was never released in the US is a function of BoxOfficeMojo's broken search function) and didn't crack the top 235 in Crirtic's Top 10 list mentions. I've tentatively ranked it as my 41st favorite movie of the year--out of 111 that I've seen so far.

The full recap will rank 128 movies, from least good to best, with full information such as Rotten Tomatoes, Metacritic, and IMDB and Netflix user ratings ... and a pithy spoiler-free review of each.

Oh, yes, and the long-delayed Part IX of the Attention-Switching Model post. And hopefully, much else.

A Model for Attention-Switching, Part VIII: Norepinephrine in Humans

We’ve seen that there is not only a potent evolutionary rationale for the evolution of norepinephrine (NE) as the neuromodulator regulating attention, but that our hypothesis about its role gives us remarkable insight into the behavior of our earliest vertebrate ancestors.  But you probably don’t have any friends with file drawers full of unfinished projects who are also ray-finned fishes, let alone sharks.  So let’s see what sense our hypothesis makes of human behavior.

And let’s forget our NE hypothesis for a moment and just start with DA. We’ve proposed that it turns on phenomenal consciousness, especially the experience of emotion. This means that high-DA people are passionate and low-DA people are dispassionate. (This explains, incidentally, why highly opinionated people tend to talk loudly and gesture with their hands: remember that DA also controls motor activity through a second pathway.) In terms of cognition, high-DA people tend to be highly emotionally invested in the things they’re thinking about, while low-DA people tend to be less invested—more, well, dispassionate.

Is there a cognitive trait that would make sense to correlate with these two different emotional relationships to the contents of our thoughts? Well, sure: the more emotionally invested you are in what you’re thinking about, the longer you’d want to stay thinking about it. The more dispassionate you were, the more likely you’d be to move on and start thinking about something else. So it makes perfect good sense to correlate passion with length of attention span—and that means DA with NE.

And which of the four combinations of the two traits might be a particularly bad idea? Someone with high DA and low NE will be passionate about the contents of their thoughts, but flighty and prone to attentional shifts. Since attentional shifts can lead to creativity, that doesn’t sound like a bad combination. But low DA and high NE would give you someone dispassionate about the contents of their thoughts, but prone to linger on them. That doesn’t sound good: that sound like a recipe for boredom.

Note that we’re not saying that there aren’t dispassionate people with healthy attention spans; we are talking here about excluding the combination of the extremes of the traits. What you won’t see—what, in fact, we don’t see, except perhaps in rare variants of AD(H)D—is a dispassionate person who has trouble tearing themselves away from what they’re thinking about, as passionate people sometimes do.

You would think that that would be built into the brain: the dispassionate person would never have trouble tearing themselves away, because at some point they’d just get bored. But that’s begging the question or thinking circularly. What does it mean to be bored, cognitively? What controls boredom, chemically?

What we’ve found here is that the level of passion is controlled by one thing and the ability to tear oneself away is controlled by another completely independent thing. If in fact we observe that dispassionate people almost never have the problem of being unable to tear themselves away, we need to explain why that combination almost never exists. And our hypothesis about the roles of DA and NE explains that perfectly. Dispassionate people are that way because they have relatively inactive DA-producing enzymes, and that guarantees that they will never have levels of NE so high that they are prone to hyperfocusing. (It’s conceivable that someone dispassionate might have problem with hyperofocusing because of other defects in the attentional hardware, which is why I don’t rule out this being a rare form of ADD.)

So this correlation works really well for humans, and in fact has some explanatory power in terms of explaining why the vast majority of people who are prone to hyperfocus are passionate about thinking.

In the last installment of the model proper (there’ll be a second set of posts on implications), we’ll look at the history of the understanding of the role of NE in attention.

A Model for Attention-Switching, Part VII: The Evolution of Norepinephrine

So, by a process of elimination we’ve decided that norepinephrine (NE) controls attention by sending a signal that multiplies the salience tags in active memory, thus controlling the salience gradient. The more NE, the more likely we are to attend to the most salient potential attendums.

How much sense does this make? We’ll look at this two ways: chemically and historically (personally, even). And the chemical argument will divide into two parts: one about evolution, and one looking at traits in humans.

The key chemical fact about NE is that it’s very close to dopamine (DA) structurally. In fact, in the synthesis of NE from the amino acid tyrosine, DA is an intermediate step; the brain actually makes DA, for a moment, in the process of making NE.

Note that I’m not saying that the brain “makes NE out of DA,” although that’s technically true (and you may read that elsewhere). But that implies than some of an existing, usable cache of DA is being converted to NE, and that’s not at all true. In fact, if that were true, the levels of DA and NE would be inversely correlated; if you had a lot of one, you would have only a little of the other. But in fact the levels of the two neuromodulators are positively correlated: if you have a lot of one, you tend to have a lot of the other. The DA-producing cells all have a pair of enzymes which make DA out of tyrosine. The NE-producing cells add a third enzyme, dopamine β-hyroxylase, which turns the DA into NE. If you have alleles (“genes”) for especially active or inactive versions of either of the first two enzymes, you will thus tend to have high or low levels of both DA and NE.

Furthermore, if you think about this chemical chain, you will see that there’s nothing to prevent someone from having very high levels of DA production but very low levels of NE—you’d just need very active variants of the DA-making genes and a very weakly productive version of the NE-making one. But the opposite would be impossible. If you have very low levels of DA production, that sets an upper limit on how much NE you can produce; the NE-making cells just don’t have enough DA to convert to NE even if the NE-making enzyme is very active. So folks who make very little DA are forced to make relatively little NE as well.

So, evolution has selected for these two relationships: in general, DA and NE levels are correlated, and specifically, low DA and high NE is a forbidden combination. Does this make sense in terms of our hypothesized roles for each?

The first thing you might want to know is at what point in evolution this relationship was established. And it so happens that four of the five neuromodulators of the control brain go way, way down the evolutionary ladder, and are found in invertebrates. NE is the exception. Invertebrates don’t have NE; they instead have octapamine (OA) serving an apparently analogous role. OA is in the same family of chemicals but does not have DA as a precursor.

So in the original neuromodulatory paradigm, which is incredibly ancient, the five chemicals had unrelated manufacturing pathways. But at some more recent evolutionary point (not necessarily when the vertebrates evolved; I’m not sure anyone has ever examined the neurochemistry of hagfish and lampreys, which are on neighboring sides of the invertebrate / vertebrate division), this substitution happened:

Tyrosine -> [one enzyme] -> tyramine -> [dopamine β-hydroxylase] -> OA


Tyrosine -> [two enzymes] -> DA -> [dopamine β -hydroxylase] -> NE

It’s important to note the conservation of the dopamine β -hydroxylase enzyme. The neuromodulator filling our hypothesized attention-controlling role went from OA to NE because a different substrate was provided for this enzyme.

We can infer a surprising amount about the behavior of our early vertebrate ancestors from this knowledge. Let’s begin by reminding ourselves of the original purpose for varying the salience gradient: to adapt the strength of attention to the current environment, on the fly. It’s highly adaptive if you have the ability to keep attention focused once you have identified a predator threat, and nearly as adaptive if you can keep it focused after having identified a food source or potential mate. And it’s highly adaptive if you can instead keep attention volatile when there is no potential threat, food, or mate in sight, and the environment needs to be scanned and searched for same. So it’s no wonder that some sort of attentional control goes back essentially to the earliest animals.

What can we infer from the substitution of DA for tyramine as the substrate for the enzyme that produced the attention-controlling neuromodulator? The apparent evolutionary purpose of this substitution was to correlate the levels of DA and that chemical, so that organisms with a high or low supply of one chemical would tend to have the same sort of supply of the other. That immediately tells us something crucial: that there were already different functional alleles for the enzymes involved in the synthesis of DA and OA. Because if every organism had the same allele for each of the four synthesizing enzymes, the levels would already be correlated: they’d be the same for every individual. And a little thought reveals that we would in fact need variation in both the levels of DA and of OA in order to make correlating them meaningful.

Now, it’s not necessarily true that evolution would accommodate different alleles for these enzymes, and hence different levels of the neuromodulators. If there were a single best level of OA to have, any mutation of the enzymes involved in its synthesis would have rendered the organism less able to compete. The mutation would have been selected against, weeded out. But we know from the shift from OA to NE that more than one allele was present in the population: there were (at the least) high-OA and low-OA organisms, and even though their behavior would be different as a result, neither had an evolutionary advantage.

And what can we make of that? If we’re right about the role of OA / NE, we are talking about organisms with different innate attention spans. And if different attention spans and hence different behaviors were equally adaptive, then we are talking about the organisms filling different behavioral niches. And in fact it’s not hard to imagine that an organism with a different attention span than its conspecifics would have an evolutionary advantage when hunting or being hunted, which could even mean evolutionary pressure to select for a wide variety of OA levels. Each distinct level of OA would correspond to a different behavioral niche in the great predator / prey dance.

So this tells us a remarkable amount about the sophistication of the behavior of the earliest vertebrates: their environments and predator / prey interactions were complex enough to accommodate multiple behavioral niches. There were organisms with short attention spans and ones with long attention spans. And there were organisms with low DA levels and with high DA levels. And whatever was mediated by that trait, it was evolutionarily advantageous for the low DA organisms to have a short attention span (and, to a lesser extent, for the high-DA organisms to have a longer one).

So what was the DA trait involved in this adaptation? We’ve hypothesized that DA turns on phenomenal consciousness, especially the experience of emotion, and hence the intensity of pleasure responses (and almost certainly pain responses as well). But that’s not the only thing DA does. DA initiates movement, and its current relative level represents the organism’s energy reserve or capacity for action. And as we saw quite a while ago, DA holds information in working and active memory, since that provides the simplest way of setting the correct salience tag.

This last use of DA would seem to be the best candidate for the trait needing correlation, since it’s the one involved in attention. Individuals with high levels of DA would have larger stores of working and active memory. Let’s imagine four types of hunting behavior, derived from the four combinations of DA level and attention span (when hunted, behavioral differences melt away, as every organism gets its attention span driven up to transient high levels) .

High-DA, long attention: Their ability to keep attention focused on a potential prey situation is rewarded by their large capacity to store potentially relevant details of that environment.

High-DA, short attention: Their propensity to shift attention is rewarded by their ability to store potentially relevant details of multiple different environments.

Low-DA, short attention: Their propensity to shift attention is compatible with their relatively limited ability to store information about the environment. Once they’ve observed as much as they can absorb, if there’s no prey found, they move on. (The reason why low DA does not put them at an evolutionary disadvantage is that it confers advantages unrelated to attention, such as a diminished conscious experience of pain.)

Low DA, high attention: OK, this one doesn’t work. They’d be attending to their environment past the point where they could extract and store additional information about it. They’d all get outcompeted by the other three types, and they’d starve.

And that explains why OA was replaced by NE. By making the size of working and active memory a prerequisite to the strength of the attention span, you eliminate individuals who have the ineffective combination of a low memory capacity but long attention span. And that would confer an evolutionary advantage on the mutation that caused the correlation. If we start with two equally sized populations of sharks, one of which still uses OA to control attention and one which has the mutation that substitutes NE, in the next generation the NE sharks will be more prevalent, since some of the OA sharks will have starved. With each passing generation the population imbalance will increase, and eventually, the OA sharks would become extinct.

 And at this point you probably wonder what this means for people. That’s the next post.

A Model for Attention-Switching, Part VI: The Chemical Paradigm

Those who have heard me talk about the brain know that I have a complete paradigm for the role of the fundamental neuromodulators, one I developed about a decade ago. A minimally thorough version of that theory filled a 62-page term paper for Personality Psychology, and a short version of the argument for just one of the six chemicals (acetylcholine) was a 39-page paper for another course. Nevertheless, I’ve explained it all in fifty minutes on several different occasions … so let’s see if there’s a 1000-word blog post version.

Five of the six neuromodulators constitute a system I call the “control brain” (the sixth, adenosine, is probably the most important of them all and is way outside the scope of this summary). The control brain doesn’t do any thinking or feeling; its job is to control the rest of the brain (the “main brain”) so that our style of thinking and feeling can be varied to match our circumstances. You can think of each and every neuron in the main brain as having a “volume knob” and a “tone control” which determine the degree and style of their activity, respectively. The control brain is a system for globally turning these knobs, in a coordinated fashion.

Physically, the control brain is located in the brainstem, just above the mechanisms that control the body. Here there are “factories” for the five chemicals (this is a good time to point out that a summary this brief is full of oversimplifications; for instance, one of the five chemicals is actually made in the hypothalamus, which is just above the brainstem, and another has a second factory even higher up in the brain). All of the brain’s serotonin, for instance, is made in a tiny cluster of cells in the brainstem called the raphe nuclei. The neurons of a chemical factory do not project to their neighbors, like the majority of neurons in the brain. Instead, their axons (business ends, where the chemical is released) project up into the main brain, branching and subdividing endlessly, so that each serotonin-releasing neuron synapses on (“innervates”) thousands of target neurons. Serotonin is the signal that turns the volume knob on these cells, and the relative handful of serotonin-producing cells in the raphe nuclei are thus able to control the entire brain.

What’s initially puzzling is that there are five parallel control systems. But the actual situation is even more complex, because (as already implied) the five chemicals target two different parameters in the target neurons (the "volume" and "tone" controls"). We’ve already mentioned dopamine (DA) and its hypothesized roles in holding information in active memory, and in turning on aspects of phenomenal consciousness (subjective experience), such feelings of desire. DA is the sole chemical that turns up only the tone control (the ubiquitous chemical cAMP), which tells us that it's part of a system for priming information for conscious access and actually placing it into consciousness. That half of the chemical paradigm, like the role of adensone, is outside the scope of this summary, but the short version is that DA primes information in active memory, serotonin (5-HT) primes information in long-term memory, and norepinephrine (NE) adds its cAMP boost to push information over the threshold of consciousness. 5-HT thus essentially handles the past, NE the present, and DA the future.

Four of the chemicals turn up the "volume control," a chemical called PLC that triggers a cascade that boosts intracellular calcium level and hence makes cells more likely to fire (whereas cAMP ultimately alters the shape and hence behavior of proteins, such as receptors). That leaves the second half of the chemical paradigm with four parameters to discover, and that’s a good number. You could, for instance, explain four parameters with a 2 x 2 design, for instance, two different ways of controlling two different things.

Let’s start by observing that the brain stores information, and that the chief feature of the storage system is that information is connected to other information (that song reminds me of you). So there are two fundamental types of neural circuitry in the brain’s storage system: circuitry which encodes information, and circuitry which connects information to other information. You would want to control these two types of circuits independently. Adjusting the volume (PLC activity, calcium level) on the encoding circuits would have the effect of controlling the level of general brain activity, which is probably the most obvious parameter of them all. Adjusting the volume on the connecting circuits would control the degree of associative spread, the degree to which things remind you of other things. And that is a very good thing to control; there are times you want to be empirical or characterizing and think just about the facts, and times you want to be interpretive and thing about all the implications of the facts. (The savvy among you may have realized that I have just described the Jungian “sensing / intuition” dichotomy which constitutes one of the traits of the MBTI.)

So all we need to complete the paradigm is two different ways of controlling these two types of circuits. And we have that already: it was in the last part, where we hypothesized a multiplicative signal to increase the salience gradient. A multiplicative signal is of course contrasted with a simple additive signal. An additive signal “pays no attention” to the level of activity in its target; a plus 5 additive signal turns 0 to 5, 5 to 10, and 10 to 15. A multiplicative signal does “pay attention to” the level of activity in its target, by getting feedback from it; a times 5 multiplicative signal leaves 0 as 0, but turns 5 to 25 and 10 to 50.

So, one chemical controls the encoding circuits additively, and another controls them multiplicatively. And another pair of chemicals controls the connecting circuits, again, one additively and one multiplicatively.

Histamine is known to be the brain’s chief mediator of cortical arousal, the primary determinant of whether you’re asleep or awake (this is why antihistamines cause drowsiness), and is thus the obvious candidate for the chemical that additively controls the encoding circuits. Acetylcholine (ACh) additively controls the connecting circuits and hence associative spread (the brain defaults to maximum associativeness and ACh inhibits the spread); this hypothesis explains a wealth of observations, from the nature of cognition during REM sleep (when ACh levels are higher than in waking) to the cognitive style of those at risk for Alzheimer’s (a disease which targets ACh neurons exclusively).

That leaves us with two chemicals: norepinephrine (NE) and serotonin (5-HT), which are made right next to each other and have profoundly similar patterns of innervation (the other three are unique). And they must be the two multiplicative signals.

The idea that serotonin is a multiplicative inhibiting signal for the brain’s connecting circuits turns out to have huge explanatory power. By being multiplicative, it has no affect on the ordinary weak connections like “that song reminds me of you.” But it is the only way that strong connections, like deeply felt beliefs and lifelong emotional responses, can be broken and hence rearranged. I’ve been proposing that serotonin fundamentally controls cognitive and emotional flexibility for well over a decade, and the world is slowly coming around to share that idea. But I’m fairly certain I’m the only person who can explain how serotonin does this at a low level of neural circuitry.

And that leaves us with norepinephrine as the multiplicative signal for information encoding, and hence the chemical that controls the salience gradient. In the next post, we’ll explore how much sense this makes.

A Model for Attention-Switching, Part V: The Selection Mechanism

So, how are attendums selected for attention?

If you were designing a system from scratch, your first thought might be to make this a top-down process.  You would build a module in the executive control center in the prefrontal cortex that continually monitored the salience level of every attendum in active memory, and selected the one that was most salient.

The fundamental problem with such a design is that it’s a lot of circuitry.  Is there something simpler that could do at least as good a job?

Well, here’s an idea.  Why not make it a bottom-up process?  Let the attendums compete for entry into consciousness.  Without worrying about the details of the actual neural mechanism, we can think of all the attendums in a sort of rugby scrum at the gateway to consciousness.  Each attendum has strength and vigor proportional to its salience, so the most salient attendum is likeliest to win the scrum and gain a foothold in consciousness.  It will, however, be continually subject to being overmastered and supplanted by another attendum of comparable or newly superior strength.  (Those familiar with Gerald M. Edelman’s concept of “neural Darwinism” will recognize the inspiration for designing a brain based on Darwinian principles of selection.)

There would appear to be one major bug in such a design: the most salient attendum is not certain to be the one selected.  This is inherent in any advantageously efficient bottom-up design.  Any bottom-up mechanism that always resulted in the most salient attendum being selected would be functionally equivalent to our top-down system, and would require as much circuitry, or even more (instead of the cortex monitoring the salience of every attendum, each attendum would have to monitor the salience of every other).  The fundamental tradeoff here is to simplify the selection mechanism by allowing a probabilistic selection among attendums based on their relative strength.

Why do I believe that the brain has made such a tradeoff?  Because it’s not really a tradeoff at all.  The “bug” is in fact a feature.  It is a good idea to sometimes let an apparently less salient attendum win the battle for consciousness.  And that is because the salience programs are all essentially out of date.  As we saw in Part IV, they can only be created or revised when the attendum is in consciousness, because the creation of the salience program requires the good and bad feelings that are only present in consciousness.  The salience program of every attendum is thus based on the state of affairs the last time we attended to it.  And things may well have changed since then.

This of course reflects our subjective experience of attentional switching.  We sometimes (I originally said “often,” but that’s true only at one end of a personality spectrum we’re about to discuss) find ourselves thinking about something we hadn’t thought of in a while, and when we do, we sometimes discover that events or insights that have happened in the interim have changed its relevance or importance.  A truly efficient brain would to a lot of “checking in” on apparently less-salient attendums to see if their salience programs needed updating.  And a terrifically simple way to accomplish that is to simply let them sometimes win the battle for consciousness.

Now, there is one more thing to consider, and then we’ll have a complete model.  And that is that there are good times and bad times to let a less salient attendum win the battle for consciousness.  When we are sitting and doing nothing is a very good time for “letting our mind wander,” to unexpectedly find ourselves thinking about something that we’ve been regarding as relatively less important, perhaps even trivial.  When we are fleeing from a bear, on the other hand, is an extraordinarily bad time to attend to anything other than fleeing from the bear.

What we need then, is a way of controlling the contrast among the salience tags.  When we’re sitting daydreaming, we can imagine the contrast turned all the way down, so that the strongest salience tags are not much stronger than the weakest, with the difference in fact being not significantly greater than the built-in “margin of error” that is inherent in the competitive mechanism.  The playing field is thus essentially leveled.  When we’re attending to a project for work (or a blog post), we can imagine the contrast turned up about half way, so that only reasonably strong attendums have any chance of seizing control from it (one that always has a good chance is “I need to pee.”)  When we’re running from a bear, the contrast would be all the way up, so that the difference between running from the bear and the next most salient attendum would be greater than any margin of error in the selection process.  Hence the thought of stopping to urinate would not cross our minds regardless of the urgency of our need.

We can model this mathematically.  Imagine that the salience tags range from 1 to 10 in strength, and that the “margin of error” in the competition for consciousness is 8, which is to say that the salience-1 tags compete anywhere from 1 to 9 and the salience-10 tags compete anywhere from 2 to 10.  There is thus ordinarily only a very slight advantage for the most salient attendums over the least.  This is the baseline daydreaming state, with the contrast turned all the way down.

Now imagine that we had a way of effectively multiplying all the salience tags.  Let’s multiply them all by 2.  Now they range from 2 to 20, and the salience-1 tags have an effective strength of 2 to 10, and the salience-10 tags have an effective strength of 12 to 20.  So there is no chance of a salience-1 attendum ever winning the consciousness battle.  This is a reasonable minimal level of concentration.

With a multiplying signal of 10, the salience-9 tags would be something like 83-91 and the salience-10 tags would be 92-100.  This is running from the bear.

I like to thing in terms of a salience gradient, the steepness of the line you would draw if you plotted the effective salience of every attendum from weakest to strongest and connected them.  With no multiplying signal, the slope of this line, the salience gradient, is very shallow.  What the multiplying signal does is increase the steepness of this line, the salience gradient, and hence the likelihood of the most salient attendums winning the battle for consciousness.

So, what might this multiplicative signal be?  We’ll answer that next.

A Model for Attention-Switching, Part IV: Salience and Dopamine

So, let’s recap.

I’ve proposed that what neuroscience recognizes as prospective memory is actually part of a third memory system, active memory, which is intermediate between short-term and long-term memory. Indeed, the need for an intermediate memory system seems obvious once you contrast the capacities (tiny versus vast) and durations (seconds versus lifetime) of short-term and long-term memory.

So, how is active memory organized? It contains the key information about everything that’s “on our mind”: everything we are thinking of doing at some point in the future, and everything we are thinking of merely thinking about. Each one of these is a potential “attendum”—something we might attend to. You can think of each attendum as having a marker or token, which corresponds to the “hot spot” of the neural representation of the information. And here’s the key feature: each marker has associated it with it a salience tag which measures the importance or relevance of the attendum at the present time. It is the salience tag which determines the likelihood of an attendum actually being attended to: the higher the salience, the likelier we are to attend.

It’s quickly obvious that these salience tags are in fact small programs (in the computer sense) which run continually and unconsciously and calculate the salience at the present time as a function of the current environment (physical and emotional / cognitive). For instance, the salience of “stop and get milk” is very low until we sight the supermarket, at which point it becomes very high. That could be done with one line of computer code, along the lines of

IF in-sight(supermarket) THEN high-value ELSE low-value.

The salience program for thinking about a troubled personal relationship would be much more complex. There are times when you would want to suppress all thoughts of the absent friend, and times when thinking through the history of the friendship (all of which would be loaded in active memory, for fast access) would be desirable.

Where do these salience tags come from? It’s one of the great recent insights of modern neuroscience that all of our decisions are mediated by conscious emotion. We are often not conscious of the information we are evaluating emotionally (hence the trusted “gut feeling” at one end of a continuum and apparent pure irrationality at the other), but without conscious emotion we are unable to choose, unable to evaluate. Everything we do (and, according to me, everything we think about) is driven by the carrot of reward and the stick of punishment, by the balance of good feelings versus bad feelings. The brain continually attempts to optimize the difference between them. Our feelings are the currency of our motivation.

It thus becomes obvious that you could not create a salience tag or write a salience program without conscious emotion. In all but the most extraordinary cases, there would be conscious thought as well. The first time we think of doing something, or have a new thought that we might want to continue later—that’s when our feelings evaluate the importance of the action or thought, and that’s when a salience program is written, and stored in active memory along with the information itself.

As you probably know, the brain chemical that mediates reward is dopamine (DA). Of all the principal neuromodulators (brain chemicals that control the state of the brain), it’s the best understood. When we perceive a potential reward, that perception is accompanied by the release of DA, and the subjective feeling of desire appears to be caused by DA release as well. So, as we evaluate the salience of a new attendum, that salience is in fact measured by the level of DA it is evoking.

(Strictly speaking, this is known to be true only of attendums that relate to potential rewards, like “stop and get milk” or “apply for that job.” Little is known about the neurochemical correlate of punishment, but I’m betting it’s DA as well. DA serves a secondary role as the mediator of movement (damage to this DA system causes Parkinson’s), and that makes sense because action is necessary to secure a potential reward. But action is also necessary to flee a potential punishment. I believe that DA is in fact responsible for turning on aspects of subjective awareness (“phenomenal consciousness” in the terminology of philosophy of mind), and subjective awareness is all about feelings, of all kinds.)

Now, here’s what’s really interesting. It’s well established that DA holds information in working memory. Without adequate DA, you can’t remember a phone number. Drugs which boost DA are the long-time preferred treatment for AD(H)D, which until relatively recently was presumed to be caused strictly by DA deficits. Yet as we have conceptualized it, the problem with ADD isn’t just in working memory, it’s in active memory. The absent-mindedness that is a core ADD trait is a problem holding information in active memory.

This ties together very neatly. The current DA level evoked by a newly thought-of attendum is a measure of its salience. We need some chemical to hold the attendum in active memory, and the more salient the attendum, the more strongly we want to hold it there—if active memory capacity is limited, we want the least salient attendums to be bumped out first. It’s simply economical to use the already-evoked DA level to hold the information in memory. Holding it there with some other chemical (virtually all signals in the brain are chemical) would just require an extra step of translation.

Again, we have given a simplified picture by talking of salience tags rather than programs. When we create a simple salience program, we are contemplating the importance of the attendum under various foreseeable circumstances. We imagine the very intense reward we expect to get from the bedtime milk and cookies, and that creates a high DA level, but we link the expectation of that reward to the sight of the supermarket, and the sight of the supermarket only. We are equally conscious of the fact that we don’t want to think about buying milk until we see the supermarket. For classic prospective memory tasks like this, there is probably a built-in program that codes for a low salience level when the evoking stimulus is absent—low enough so that it has no chance of being attended to, but not so low that it gets bumped out of active memory altogether.

What are more complex salience programs like? Let’s consider two things I’ve been thinking about a lot and that are more or less equally important to me: the Red Sox, and this series of posts. I usually wake up in the morning and think about the possibility of writing another post in the late evening, but that thought is very unlikely to cross my mind after about 5:00 PM, because the Red Sox salience program is driven by time of day and the salience starts to ramp up to very high levels a few hours before game time. I’ll usually mess around with baseball stats for an hour or two after the game is over—and then at some point, as the salience of the Red Sox drops through sheer acclimation (more about that if I eveer get around to considering boredom), the thought occurs to me that it might be time to write another post. And that thought is compelling (salient) in exact proportion to how many days it’s been since the last one. With enough introspection, it should be possible to identify a great many standard salience program modules like these. The most complex salience programs, such as those for thinking about weighty emotional issues, are probably subject to frequent revision and elaboration.

So, we have active memory filled with attendums, and each has a salience program that monitors the environment. The programs set the level of DA for each attendum at that moment, and one of those attendums is selected for attention.

How? That’s the subject of the next installment.

A Model for Attention-Switching, Part III: The Contents of Active Memory

Those of you have heard my talks on the neurochemical basis of personality, or on the cognitive structure of feelings, are familiar with the big idea that contributes significantly to the former and forms the foundation of the latter. It’s probably the best single insight I’ve had about the brain.

The insight is this: the brain treats thinking as one of the things we do. For the brain, there is no fundamental difference between thinking and doing; thinking is a type of doing. And this means that any mechanism in the brain that regulates action also regulates thought, as a form of action.

All actions are motivated by rewards and punishments; the idea tells us that the thoughts we think are also motivated the same way. The size of the reward we get merely for thinking turns out to be a key personality trait. In my theory about the nature of feelings, I propose that feelings are not just about what we expect to happen to us in the future, but about what we expect to feel in the future.

Applying this insight to active memory immediately tells us what’s missing. Active memory has to be filled with everything we intend to think about, not just everything we intend to do. In other words, it contains everything that’s “on our minds.”

When we’re sitting, waiting for the bus, and we’ve got nothing to read and no music player to listen to, where do our minds go? We start thinking about things. Things that are on our minds. We might start thinking about a vexing personal problem, or a scientific hypothesis about the brain, or the ideal Red Sox batting order. All of these things would be things we were thinking about recently, and things we were expecting to think about again. That’s why we say they’re “on our minds.” Active memory is a place to store these potential thoughts in a way that makes them much quicker and easier to access than if they were stored in long-term memory, even if there were a system for tagging information in long-term memory as “active” (which I believe is the current conception of the way information is primed for preferential access).

But there’s one more aspect of active memory that we need to complete the picture, and it’s actually the first aspect I was aware of.

In the spring of 1978 I took my first ever psychology course, Robert Coles’ course in Defense Mechanisms. A little Googling makes me fairly certain that the text he used was the recently-published Adaptation to Life, by his Harvard colleague George E. Vaillant. The course met before noon, which meant (I am not proud of this) that I attended only one or two lectures. My roommate, Thomas M. Keane, Jr., was also enrolled in the course, so I was able to keep tabs on the required work through him. (The term paper assignment was to read a biography and analyze the life within in terms of defense mechanisms; I did Humphrey Carpenter’s Tolkien: A Biography and knew I was being an idiot to not make a photocopy for myself before handing it in to my section leader. In my defense, the paper was not only days late, grades were closing tomorrow; I remember running to hand it in.)

At Harvard, there is a two week break called “reading period” between the end of classes and the start of exams. It’s a godsend to people like me; by my senior year I had taken to filling it with twelve subjective days of more or less 28 hours each, with the exact sleep schedule timed to put my morning exams in my circadian afternoon or, better yet, evening. I borrowed Tom’s copy of the text and read the entire thing in two days, then aced the exam, and got an A- in the course in the days before grade inflation (there were lots of little assignments I had never completed).

When my circadian clock shattered into shards in the early 1990s, I got interested in the brain, and I did a lot of introspection about my own. I have often mentioned that what really hooked me on neuroscience was discovering that I personally falsified Robert Cloninger’s theory mapping neurotransmitters to personality traits, as presented in Listening to Prozac—I was in the 99th percentile for half the descriptors of his proposed dopamine trait, “novelty seeking,” and in the 1st percentile for the other half.

Naturally, I also did a lot of thinking about what made me smart. The Coles course was a big clue. Clearly the course and book had been designed so that each week’s lectures constituted a manageable chunk of information. I figured that other students must be absorbing that information each week and writing it to their hard drives. The next week, they’d read another chapter, and to see where it fit in the big picture, they would search their hard drives and fetch this or that relevant bit of information from it, and then they’d write it all back when they were done. And gradually, over the course of a semester, they would build up this edifice of connected knowledge. What I had done instead, it seems, is absorb all of the information at once; I could handle much more of it then my peers. And that allowed me to see all the connections effortlessly; I never had to search my hard drive to fetch missing information that was needed to make full sense of what I was reading, since that information was already in my brain. Wow, I thought: I’ve got, like twenty megabtyes of RAM (gigabytes, now).

So active memory contains not just a list of things we hope to do and a list of things we expect to think about it, but large quantities of information belonging to the latter: as much relevant information, in fact, as will fit.

As I said, there has always been an awareness that there must be a mechanism for priming some of the information in long-term memory for preferential access. What I am proposing that this mechanism in fact consists of a third, intermediate memory system, active memory, which also handles prospective memory tasks like “stop and get milk.” The chief feature that distinguishes active memory from long-term is speed of access by the executive processes of the prefrontal cortex and by working memory. It can thus be thought of as a version of working memory, but of vastly greater capacity. Again, the computer analogy seems applicable: the memory cache on the CPU chip and RAM are very similar, and quite different from the hard drive.

(As an aside, I am hopeful that the concept of active memory will be crucial in figuring out the mechanisms by which memories are consolidated and pruned during sleep.  We have much evidence that this happens, but it won’t be possible to understand what’s going on unless we correctly understand the memory systems involved.)

Next: How Active Memory Works, Part 1

A Model for Attention-Switching, Part II: From Prospective to Active Memory

Memory is traditionally divided into two types by duration of retention: short-term memory and long-term memory. Short-term memory encompasses various working memory systems such as the phonological loop (where you rehearse phone numbers) and the visiospatial sketchpad (where you store the image of one step of Lego instructions while looking at the image of the next step in order to compare them. I can store about half of a picture at a time, which is why my Lego-building age appears to be approximately eight). Long-term memory include episodic memory (that day at the beach you’ll never forget), semantic memory (knowledge), and procedural memory (how to ride a bike).

There is, however, also a well-established division by temporal direction: retrospective memory, memory for the past, versus prospective memory, memory for things we plan to do in the (usually near) future. As mentioned at the end of Part 1, the classic example of prospective memory is telling yourself to “stop and get milk.” As you finish the last drop of milk with your Cheerios and contemplate the delicious chocolate chip cookies you expect to have for dessert that evening, you tell yourself, do not forget to stop and get milk on the way home. The thought may well not cross your mind again all day at work. But if your prospective memory functions well, when you approach the supermarket on your homeward commute, its sight will trigger it, the thought popping miraculously into consciousness —even if you are deep in thought about, say, evening plans even more alluring than cookies and milk.

Prospective memory has been fairly well-studied. For instance, it has been established that holding a task in prospective memory takes a very small but measurable toll on performance of other tasks.

A question that, as far as I know, remains unasked by memory researchers: is prospective memory a type of short-term memory, or long-term? I believe the answer to both is “pretty obviously not.” Short-term memory lasts for seconds, not hours. Long-term memory lasts for lifetimes, not hours. These answers thus lead to an even more provocative unasked question: is there a third fundamental memory system intermediate between short- and long-term memory, which includes but is almost certainly not limited to prospective memory?

(The reason why such a hypothesized memory system is likely to store more than just prospective memory is, I think, obvious: prospective memory just isn’t important enough. Even if we realize that it stores not just “stop and get milk” but “look for a new job some time before the end of this month” and a host of other such important intentions, it still falls short. Prospective memory researchers believe it to be the repository of elaborate career plans (even if these are hard to study in the lab), yet no one has suggested that it’s a major component of memory on a par with short-term and long-term.)

Is there any other reason to believe in a third, intermediate, memory system? While it’s often perilous or even unwise to draw parallels between the brain and computers, I think this is one of the times when it yields insight. And computers, of course, have three levels of memory: the memory cache on the CPU chip, RAM, and the hard drive. The memory cache quite obviously corresponds to working memory—it’s the information being processed right now, directly, by the CPU or by the executive engine in the prefrontal cortex. The hard drive obviously corresponds to long-term memory; they both have very large capacities and contain information which is potentially permanent. So we ought to be asking: is there something in the brain that corresponds to RAM? The question becomes even more intriguing when we realize that the chief distinguishing feature of the three memory levels in a computer is not the duration of storage, but the speed of access by the CPU. If it’s true that permanence of storage and speed of access represent a fundamental trade-off in information storage, then the need for an intermediate third memory system in the brain becomes even more apparent.

We could call this hypothesized intermediate memory system “medium-term memory,” but that would tell us nothing we already didn’t know. So let’s emphasize the possible importance of speed of access and call it active memory. Prospective memory refers to a type of information that can be stored in active memory.

And as just argued, there must be other types of information stored there as well. These other types must bear some relationship to prospective memory as currently defined, and yet no one has expanded the concept of prospective memory to embrace them, and hence discovered the existence of active memory. So what insight is everyone missing?

Next: What’s the Big Idea?