Sunday, June 28, 2009

So What Does Dan Actually Do?

"What do you study, Dan?" This is a surprisingly difficult question with several answers, depending on who is asking.

To another scientist of the right kind, I can give the reasonably detailed answer, "I study protein folding kinetics by molecular dynamics simulations, followed by Bayesian analysis of states and rates. The general idea is to criticize and improve master equation models of microscopic protein dynamics."

For some other scientists, "I study protein folding kinetics using computer simulation." Short, sweet.

For non-scientists? This is trickier, and I'm a bit ashamed not to have a short answer prepared. However, the short answer is difficult; look at the "other scientists" response, in particular. Parts of that sentence parse for reasonably intelligent laiety, like "computer simulation." I calculate stuff (generally stuff that's either too hard to measure or too small or fast to see, but that's part of a more detailed answer). Much harder is explaining what "protein folding" is. "Kinetics" is also probably tricky, but it's just a technical word for the measurement of how and how fast something happens. "Protein folding kinetics" is studying how and how fast protein folding happens.

I imagine people think of folding sliced ham in half when they hear "protein folding," but the protein in protein folding doesn't refer to dietary protein like a slice of ham, but to biochemical protein. This is much like "water": when most people (including scientists) refer to water, they mean the liquid. "Water" sometimes means the molecule, water, H2O, meaning two hydrogen atoms attached by chemical bonds to an oxygen atom.

Proteins are molecules, too, of a class called macromolecules because they're huge compared to ordinary molecules like water. Other macromolecules include lipids--that is, fats, which actually aren't that big--and nucleic acids like RNA, which is about as big as a protein, and DNA which is huge compared to proteins and RNA.

In technical language, proteins are linear polymers of amino acids. This means that proteins are bigger molecules made from sticking small amino acid molecules together, end to end, by chemical bonds. The amino acids are a class of molecules, of which twenty or so are used in biology, with one end (the "amino" end, made of nitrogen, with a name not coincidentally reminiscent of the solvent, ammonia) which can stick, chemically, to the other end (the "acid" end). With dozens or hundreds of these molecules stuck together, end on end, one gets a protein macromolecule, which topologically speaking is a long chain of amino acids, like a string.

Proteins perform lots of different tasks for the cell, with one type of protein doing about one specific task. For instance, hemoglobin is a protein that carries oxygen in the blood; IDH is a less famous protein that cleaves one molecule, isocitrate, into two, carbon dioxide and alpha-ketoglutarate.

However, a protein in the form of a string of amino acids cannot do its job. Hemoglobin can't carry oxygen in string form, and IDH can't cleave isocitrate. In order to perform its function, the protein string needs to fold into a specific shape. Luckily, the information for folding into the right shape is encoded in the string of amino acids, some of which are oily, and avoid water, and some of which are hydrophilic, or "water-loving," but better called water soluble. However, remember that the amino acids are stuck together into a chain; the information of the proper fold is encoded in the different arrangements of the chain so as to hide oily amino acids from water, and to get water soluble amino acids into contact with water. The proper folded conformation is the most likely conformation that does the best job getting all of the right amino acids away from or in water, according to which amino acids are in the chain.

The particular chain for a protein is unique; hemoglobin is different from myoglobin which is different from IDH. Therefore, the fold is different. One way to study protein folding, known as structure prediction, is to try to guess from the sequence of amino acids what the final structure will be. On the other hand, I study protein folding kinetics, which is how the protein gets to the folded state in the first place.

Protein folding is impossible, with state-of-the-art technology, to observe directly in the lab, so I and my coworkers (and others around the world) use computers to model how the atoms in the protein macromolecule move around. The atoms have velocities, and they exert forces on one another. For instance, positively charged atoms attract negatively charged atoms, according to the ordinary rules of electrostatics. Since they exert forces, the atoms accelerate or change velocity, and they zoom around. However, the atoms are chemically bound in certain arrangements, they don't zoom around too much, so really all the protien chain can do is flop around in different arrangements.

Spaghetti illustrates this quite nicely. A pot of noodles thrown on the floor shows lots of pasta (protein chains) which flop in different ways. The forces that a protein chain exerts on itself makes the protein move between different conformations. Some of these conformations are (or are very close to) the proper, functional conformation of a protein.

With molecular simulation, we calculate the next conformation, in time, that a protein molecule will adopt, from its characteristics at the current conformation. Thus, the protein chain moves from conformation to conformation, and eventually folds into the correct, functional shape.

I study how a protein moves between the different shapes it can adopt, and, more interestingly, how quickly it can change from shape to shape, using a computer. The final piece is to state why a person would want to know about this: as might be guessed, knowledge about how proteins fold is one of the keys to understanding their many functions.

Science Is Entertainment, Part 1: The Mozart Effect? Nope

In the 90's, an odd bit of scientific research suggested some temporary improvement in spatial reasoning due to listening to classical music, particularly Mozart, and particularly in infants. I have a funny picture in my mind, a pregnant woman with headphones over her abdomen, gestating the next Karl Gauss or Thomas Edison. I don't know if anyone ever actually did that (we'll see if we have more Gausses in a decade or so), but you have to admit it's funny, not least because the research never showed that listening to Mozart makes you smarter. Nor did the researchers make such a claim.

However, this so-called "Mozart effect" did lead some classical music lovers to declare haughtily this final evidence of the superiority of their beloved genre. What's more, it became a justification for listening to classical music.

The problem is: a person doesn't need an excuse to listen to Mozart or Beethoven or anyone else. The reason for listening to classical music? Because it's entertaining. Whether it has some mysterious other benefits, I don't know, but to use "increased intelligence" as an excuse--especially when there are so many other factors that are definitely known to increase intelligence, like breakfast--is at the top of the silly list.

Now, I'm not a huge fan of classical music, but I've always liked Chopin (I remember my mom playing it on the piano from time to time after we were in bed), Bach if the right player is playing, and I can definitely dig Wolfgang. I find Vivaldi boring, Bach wrong with the wrong player playing, and, sorry Claude, La Mer has been much too inspirational to Danny Elfman and John Williams to be taken seriously. (I mean, I'd rather watch Big Fish, Star Wars, oh yes!) Fantasia sucks, yes, I said it, except that creepy one, Night on Bald Mountain, is awesome, and Ave Maria is very pretty (if a bit overdone).

Notice how in the above stompy rant about classical music I didn't mention how much smarter or dumber the music makes me. Those kinds of considerations just don't matter.

The reason for listening to classical music, or any other kind: enjoyment. There does not need to be any other reason, although there may be (cultural identity, relaxation, whatever). The enjoyment pays for the cost of listening.

To right-angle the discussion, and to apocryphy, consider the Superconducting Supercollider (SSC). I don't know if this is a true story, and I would appreciate someone pointing out clarification on this matter, but it serves quite well allegorically. A prominent physicist is testifying before a congressional panel about the SSC. A congresscritter asks, "Does it have any consequences for Defense?" meaning, if we're spending all this money, we ought to be able to shoot it at the Soviets or something. The physicist replies, "No, but it gives us something worth defending."

I require a couple of comments (as if this weren't a blog). First, America is worth defending anyway, because, um, people live here, and in any case we had all kinds of nifty stuff to shoot at Soviets already. Second, I'm skeptical that Congress should have a role in funding projects like this one anyway. For one thing, arguably the SSC didn't have a commercial function, either, or else private industry would have funded one. Leaving that for another time, and just accepting that Congress funds science, we get to the point of the SSC story.

The reason for having a Superconducting Supercollider is because it's cool. If the US is going to spend that kind of money on basic physics, the reply should have been not that it gives us something worth defending (what the heck, I mean, it's like if we didn't have the world's coolest basic physics, should we just throw in the towel? I am in mighty need of a break, if anyone has one) but that, and here's my thesis, finally, we should do it because it's fun.

One can and should question whether the government spending money to keep physicists and their hangers-on (me, for one) entertained. But that's not the point here. Science is extremely fun, which is why scientists do it. Okay, fine, scientists crave recognition as well, and they want to be well-thought-of by their peers (which is another reason not to spend government money on it ...) but if research and science weren't fun nobody would spend years studying to get to the point required to make new contributions.

When I tell non-scientists that I'm a scientist, people perhaps pity me that I had to go through all that suffering just to, uh, suffer more. But that's not the way it was. I enjoyed every single science course I took, and I enjoy new, clever research now. The reason is that it's fun to figure things out. I think the same probably happens to mathematicians--what you like doing math?--but it's really just a manifestation of what leads people to do Sudoku, crosswords, and the like. It's fun to figure stuff out.

This is the primary reason for starting Free Range Science at all. Once I realized that science is fun, there is every reason to tell non-scientists about cool science. Even if a non-scientist couldn't solve the puzzles, it's intellectually satisfying to understand a problem, and I believe that the non-scientist can get a great sense of enjoyment from hearing about the solutions.

Here is an example that the reader is sure to enjoy. The relevant experiments were performed by Avery, MacLeod, and McCarty in 1944, based upon work of Griffiths, who in 1928 did work with two kinds of bacteria that cause pneumonia, one of which kills laboratory mice, and one of which doesn't. I describe, fairly carelessly, the results of their experiments.

When just the DNA (not the whole, live, bacterium) from the virulent kind of baterium is injected into mice, nothing happens; live bacteria are need to produce the lethal infection. When the DNA from the virulent strain is injected along with the non-virulent bacteria, the mice die; the virulent DNA changes the non-virulent bacteria to virulent. However, when just protein from the virulent bacteria is injected with the non-virulent bacteria, the mice don't die.

So we have these observations:

1. virulent bacteria + mice = dead
2. non-virulent bacteria + mice = live
3. virulent DNA + mice = live (DNA by itself is not enough; live bacteria are needed, observation 1)
4. virulent DNA + non-virulent bacteria = dead (different result as just non-virulent bacteria, observation 2)
5. virulent protein + non-virulent bacteria = live (same result as just non-virulent bacteria, observation 2)

Figure this out, and reap the rewards of biological logic: what molecule (DNA or protein) carries the information required to have a virulent strain?

This actually isn't the lynchpin for why DNA is the genetic material, because nothing in science is ever that conclusive, but it's still pretty convincing. An experiment like this generates more questions, which generates more experiments; the more such experiments are performed, the more certain we get that DNA is, in fact, the genetic material. It holds the information required to for virulence.

Actually that brings up yet another fun thing about science: arguing what results mean. Done well, this leads to more experiments, which are the only things that can be used to resolve scientific disputes. For instance, at that point in time (remember, nobody knew what DNA did) one might argue that the DNA of the virulent strain provides some kind of food for the non-virulent strain, which super powers it and makes it virulent. This can be tested by looking at how much food energy the non-virulent DNA has compared to the virulent DNA. An experiment would show that the amount of energy in each type of DNA is more or less the same, discrediting the DNA-as-Popeye-spinach hypothesis.

The joy of figuring out problems translates just as well to math as to science. A recurring complaint in algebra classes is, "When are we ever going to use this?" The proper answer is not, all the time, lots of jobs require a knowledge of algebra. This answer is misleading; while some jobs (like mine) require the ability to "solve for x," but almost everyone gets by, after high school, without ever solving an equation. Not that the things people suffer through, like algebra and geometry, cannot be used by everyone every day--my wife reports using geometry when sewing and algebra when running a register. But that doesn't answer the question: useful as it may be, sewing does not require geometry.

No, the real answer to why people should learn algebra is the satisfaction of figuring things out. Because it's fun to figure out puzzles. The best high school math teachers, I'm sure, already know this.

Why science? Because of its entertainment value. It's the same answer to the questions, Why Mozart? Why Superconducting Supercollider? Why learn algebra? The advantage of human animals lies not only in the ability to reason and figure things out, but in its enjoyment. Not too different from sex, in fact. Mainly, we do it for fun. Just as we have fun having sex, we have fun while solving scientific problems, and something amazing and wonderful happens. That we get Mozart, an understanding of DNA, and babies is a bonus, but the benefits are not enough. The fun is enough.

Thursday, June 25, 2009

Why Things Are Different Colors

A few years ago, my mother, fascinated by all of the colorful and weird effects of pottery glazes, posited that a chemistry course or two could answer her questions about why things are different colors. In reality, it wasn't until three years into an undergraduate degree in chemistry that this question was fully answered in inorganic and physical chemistry courses.

Oh, don't get me wrong, freshman "general" chemistry covered it, too, but then it was hidden in a thicket of electronic states too abstract for the freshman to fully appreciate. And then we moved on to the concept of equilibrium, which is more important than color but not as colorful.

As sophomores, students take organic chemistry, the chemistry of carbon, and there color leaches back in again, although it was difficult to discern while fighting through the headaches of interpreting magnetic resonance spectra (don't ask (unless you're really interested), it's just a way to figure out what atoms are connected to what other atoms, among other things, and hence what kind of chemical transformations a molecule can undergo). Most organic chemicals, in my experience, are white crystalline powders, white oily powders, or colorless oils. Not necessarily boring, especially for how they smell (mint, bananas, rotting corpses) and how they fly in mass spectrometers (also a way to figure out how atoms are connected with each other, among other things). Some organics are colored, but there are more fun ways to make molecules be colored, and hence to find color in everyday objects.

And even in physical chemistry--that branch that approaches questions about how molecules move, and hence react and interrelate, and describes why things are the way they are--color is buried under mountains of stationary states, Hermitian operators and their eigenspectra, and semiclassical field approximations.

Okay, so you do get "color" in some sense in physical chemistry, usually right at the beginning. It's a problem known as the ultraviolet catastrophe that baffled the physicists of the day, whose solution contains one of the important clues that led scientists to paint over Newton's old white picket fence with the more shocking colors of quantum mechanics. In short: things that are hot emit light. Camp fires, incandescent (not an accidental epithet!) lightbulbs, stars. In the greatest spirit of physics, these kinds of things are idealized as "black bodies," with no inherent color of their own, which is why they're black. According to classical physics, hot things should emit a little bit of infrared and a lot of color. The hotter the black body, the less infrared and the more color. The problem is that classical physics also predicts that black bodies should give up an infinite amount of ultraviolet light, higher in energy than visible light (color) which is higher than infrared. This contradicts actual observations, such as, I don't immediately get a bad sunburn sitting next to a lamp, and the more quantitative observation that black bodies give out a little bit of infrared (low energy), a lot of "color" in the form of visible light (medium energy), and a little bit of ultraviolet (high energy).

It turns out that quantum mechanics was the right way to solve the black body problem, and that classical mechanics was the wrong way. This is exactly the kind of thing that led to the formulation and acceptance of quantum mechanics: it predicted and explained things that classical mechanics could not, and what's more it predicted classical mechanics.

This should give the reader the impression that color is due to quantum mechanics, which is the correct impression. To be more specific, color happens because of the funny ways that atoms behave and consequently how they interact with light. But this is physical chemistry, and physical chemistry does not really explain why things are colored; it just says how they could be colored if they were.

No, the best course to find out about color (other than this article, of course) is inorganic chemistry. (Bad news for the dabbler, but I'm the scientist here: dabblers can just ask me. Or continue reading.) Inorganic chemistry can officially be thought of as the chemistry of everything that isn't mostly carbon (inorganic means not organic which means not made of carbon), but unoffially and more realistically it is the study of metals.

"Metals" are the elements of the periodic table that, in their pure forms, tend to be shiny, solid (at ordinary temperatures, with mercury being the main exception), and conductive, allowing heat and electricity to flow. Combined with other elements--oxygen, nitrogen, carbon, hydrogen, and their ilk--metal atoms often cause color.

The reason that metals in compounds cause color is due to the way they interact with light. Ordinary atoms are made up of a positively charged nucleus, which is as small as a point, and a collection of negatively charged electrons whizzing here and there around the nucleus, attracted to it because of the opposite charge, but never getting there, as far as anyone can tell, because no one, least of all the electrons, is sure how much of an electron's energy is divided between "I'm whizzing around" (AKA kinetic energy), and "I want to go towards the nucleus" (AKA potential energy). Funny little things, electrons.

In fact, it's this uncertainty that leads, indirectly, to color: atoms can only absorb certain wavelengths of light. (This is very important, and if you take nothing else from this article, take the fact that atoms can only absorb certain wavelengths of light.) The closer to the nuclei that electrons are, the higher energy the light has to be in order to be absorbed. This explains why organic liquids tend to be clear: their electrons are all very close to the nuclei, so they absorb high-energy light, ultraviolet, that you can't see anyway, and let all the light you can see pass through.

The difference with metal atoms is that they have electrons that are futher from the nucleus than in many nonmetals. These far-from-the-nucleus electrons interact with lower-energy light--visible light, the light that makes things colored--absorbing some portions of it. What light gets absorbed depends on the chemical details: dichromate (chromium), for instance, absorbs blue light, so it looks yellow; hemoglobin (iron) absorbs blue and yellow, so it looks red; cupric ions (copper) absorb red and yellow, so solutions of cupric ions look blue.

It all has to do with which electrons are where, in the particular compound the atoms are in, which depends on chemical details like what other atoms are connected to the metal atoms. Other atoms tend to push and pull electrons from the metal atom, and the metal atoms tend to push and pull electrons from the other ("ligand") atoms. The particular balance of how that pushing and pulling is done determines where the electrons tend to be--near the metal nucleus, where they interact with ultraviolet light, or far where they interact with visible light.

Organic compounds (that is, not containing metal atoms, but the distinction becomes fuzzy in lots of places) can have these kinds of effects, as well, but in order to get electrons that are "further from the nuclei" requires tricks like conjugation, which spreads electrons around, making them easier to move and hence to interact with visible light. This is why rhodopsin (in mammalian eyes) is colored--biology has designed a molecule to interact with visible light, which enables seeing, and it does this by making conjugated compounds. This is related to the concept of "unsaturation" that everybody is familiar with in conjunction with dietary fat. Some fats are very unsaturated; most colored organic compounds are much more unsaturated (=conjugated) than that.

In order to understand the details of all the pushing and pulling that makes things different colors, quantum mechanical calculations are required. These are not too hard, but not too easy, either. Four years of a chemistry degree should do the trick. The simpler answer, if you wonder why something is colored, is that electrons interact with light, and the further from atomic nuclei the electrons are--the squishier they are, the more they move around--the less ultraviolet they absorb, and the more visible light. No degree required, just imagination: which better serves the potter and the artist, anyway.

Monday, June 22, 2009

Science Is Entertainment, Part 0: So, What Do You Want to Know?

In part because the intellectual pleasure associated with figuring stuff out, science should be of interest to anyone and everyone.

In this spirit--what do you want to know about?

Del and I are superbly prepared to understand and communicate a multitude of issues in the fields of chemistry, biology, and physics. We can probably do applied math, too. I would guess that we're both equally abysmal with regard to ecology, but other than that, I think we've got it covered. Del is better at biology than me, and I know more chemistry. The physics we can both handle.

If you need geology, we'll have to get a geologist.

With that said, let us know--what do you want to know?

Thursday, June 18, 2009

Inferring Something You Can't See

In his book Information Theory, Inference, and Learning Algorithms, David MacKay describes an interesting problem in Bayesian statistics (that school of statistics that purports to describe, quantitatively, rational thought itself). I paraphrase this here, and comment from my comfy couch in front of the TV what this might mean for customer service at Target.

Here is the statement of MacKay's problem:
Unstable particles are emitted from a source and decay at a distance x, a real number that has an exponential probability distribution with characteristic length L. Decay events can be observed only if they occur in a window extending from x = 1 cm to x = 20 cm. N decays are observed at locations {x1,..., xN}. What is L?
This problem interests me because of the little hitch it throws: the window from x = 1 cm to x = 20 cm. This is the part of the problem that turns a trivially easy problem into a much harder one.

No panic is required or desired on the part of the reader. This is not an exercise to be graded, and one's worth thereby judged. I am a scientist, trained in fact to be able to solve this kind of problem (or, admittedly, to fully understand MacKay's solution); someday, perhaps you, gentle reader, will pay me for this kind of junk, but for now it's free. At the moment, my goal is to describe the solution to an easier problem than MacKay's, then describe the solution to the more difficult problem actually presented. The rhetorical part of the key to solving the difficult problem--incorporation of all relevant information--is something that I can assure the reader will be able to understand, criticize, recognize when applied to everday situations, and argue about with others over agreeable beverages of choice.

To understand the problem requires understanding an exponential distribution. Here is what an exponential distribution looks like:


















This graph shows the essential feature of the exponential distribution: as one moves further to the right, away from the particle source, the probability of a particle decaying decreases (this may seem counterintuitive, but it's true; for the initiated, this is formally a probability density). This means that if we were to watch some number, say 100 particles, decay, we would get data that looks like this:


















The smooth-looking exponential distribution is an idealized experiment corresponding to the rough-looking histogram above. If an infinite number of particles had decayed, then the histogram would be as smooth as the distribution. Instead, with a finite number of decays we see that the data is a bit rough. In fact, this is the reason for treating this problem statistically.

If it weren't for the window, the solution to this problem is very easy. In such a case, a good estimate "characteristic length" L is the average of all the decay distances. Notice that the good estimate is the average of all decay distances, not just the ones (because of the window) that our experimental setup can observe.

However, when we include the effects of the window, we can't see all the particles decay. Instead, we only detect a subset of them, those that decay between 1 cm and 20 cm from the source, illustrated here:


















Because of the window, some of the particle decays that occurred are invisible to us; the histogram is truncated before 1 cm and after 20 cm. For this reason, we are left with the terrible result that we can't just average the particle decay distances we see to estimate the value for L, because we'd be neglecting particle decays that we don't see.

However, the problem can be trivially solved using Bayesian statistics. The essential features of this method (that is to say, the non-mathematical manifestation of same) are to incorporate all of the information available to us to estimate L. The key to doing this is to use not the distribution that governs particle decays, but the distribution that governs the particle decays we can observe. This is an exponential distribution, just as before, except that the probability of observing a decay closer than 1 cm from the source and further than 20 cm from the source is now zero. In other words, the probability distribution of the decays that we can see (as opposed to the probability distribution of the decays) looks like this:

















As can be seen, the distribution is truncated before 1 cm and after 20 cm; we can't see particles decay in those regions, so the probability of observing those things is zero. Hence, the probability distribution is zero in those places.

So you can see that this distribution encodes the information that we're observing particle decays in a window. This is the distribution we feed to the Bayesian beast; its job is to take a distribution and spit out statistics about it, in this case the characteristic decay distance L. If one goes through the math, one finds that the best estimate of L is not the average of the observed distances (as it would have been if we could observe all particle decays), but a much more complicated formula. Still, using immense arcane powers derived mainly from computer algebra systems, this problem can be solved.

So, what does this have to do with customer service?

At Target, they have a computer you can use to complain about the crappy customer service. Conversely, if you had a really great time, you can use the computer to let the company know. However, the company can't use this data--people complaining of crappy service and rejoicing in excellent service--to make inferences about the experiences of the whole collection of customers--since the data thus collected is biased--or can they?

The answer, if you are thinking about the particles-decaying-in-a-window problem, is heck yes that data can be used to infer about the experiences of all customers. The essential problem is that the responses at the computer--the "data" in the customer service problem--mostly come from people with extreme views about the customer service. Thus the sample of their views tends to reflect extreme viewpoints. Even when Target offers incentives to fill out surveys, for instance, they get a sample from the kind of person willing to fill out a survey, rather than what they would really like, which is the customer service experience of all customers, not just the weird customers who fill out surveys and make complaints. They miss all the people who had an okay experience, and those who can't be bothered to fill out surveys, which I expect is most people. These kinds of responses, like particles decaying outside of our window, are invisible to us, but they are still important.

"Biased" here really just means that the company can't "average" the responses that they get in order to determine the overall quality of customer service. The average would be over samples that don't reflect all of the customers, so it would not reflect the experiences of all of the customers.

Back to the particle decay problem: it is possible, using knowledge of the window, to solve the problem, even when we can't just take a simple average. In the customer service case, the solution should be the same: to infer customer service experiences from biased samples, we just have to know what the "window" is that determines what it is we can see. Once I figure out the shape of the window, I'll let you know.

Free Range Science

Please allow me to explain.

Not "free range" in the sense of made without chemicals. I'm a chemist of sorts; my kind of science, and that of my friends, often is quite chemical. It is often, however, organic.

And not "free to roam," either. This science has fences; and I try to take pains to put it in the right pastures at the right time. No sense grazing where there's no grass. The pastures are small, but as each is grazed down, I'll move the science, so over time it visits all of them. And then visits all of them again, eventually, after each has had time to recover to lushness.

But I think it's going to make some swell statistical mechanics steaks, biophysics burgers, and probably the most thermo-dynamic Rocky Mountain oysters you've ever tasted.