Friday, April 30, 2010

Here is a picture of an interesting molecule, methyl-imidodiphosphate. It's related to a molecule we use in the Webb lab at The University of Texas at Austin, GDPNP, and I'm using it to get molecular mechanics parameters so we can run calculations on the proteins we study, which have GDPNP bound to them. GDPNP is related to a biologically important molecule called GTP which is involved in almost every signalling process in every living cell.

Notice the two phosphorus atoms (bronze) bound to the nitrogen atom (blue)? The hydrogen (white) seems to lie almost in the same plane as the phosphoruses and the nitrogen. Naive molecular mechanics force fields (and chemical bond theories like VSEPR) can get this geometry wrong.

Thursday, August 6, 2009

Another Stupid Physics Thing

One of my sophomore physics instructors was kind of a goofball, and he told a snarky anecdote which I will now paraphrase.

First, the physics. There's this thing called gravitational potential energy--and when an object can decrease its gravitational potential energy, it will. This is almost an overly complicated way of saying "things fall towards the earth," but in fact it is a quantitative rule one an use to calculate not only that an object will fall, but how hard it will splat when it hits. This is accomplised by writing the gravitational potential energy as a mathematical expression:
potential energy = mass × gravitational acceleration × height
or for the symbolophiles among us,
U = mgh
As an object falls, its gravitational potential energy gets converted into kinetic energy (motion), or we say, "The gravitational field does work on the object, to increase its speed in a downward direction." Potential energy stores work, so when that initial mgh of potential energy decreases by a certain amount, the falling object's kinetic energy gets that certain amount back because of the work done. As the kinetic energy increases in this way, so does the falling object's speed. In fact, the total energy--gravitational potential energy plus kinetic energy--is a constant throughout (until the object hits the ground and, with any luck, goes splat).

This is sort of why we say "energy is conserved"--energy is neither created or destroyed for a falling object. In fact, a force field like that of the earth's gravitational field is called a conservative force field.

Similarly, if we increase the potential energy of an object during some process, then we have to have "done work" some time in the process. We have to supply energy by lifting with our muscles or burning fuel to run an engine (which is actually what our muscles are anyway) or make some other object decrease in potential energy (which is really what the engine is doing, too) in order to increase the object's potential energy.

Another property of a conservative force field, related to the total (kinetic + potential) energy being a constant, can be seen from the expression U=mgh for the gravitational field. This forumla makes reference to an object of mass m at a height h above some reference point (such as the ground). It makes no reference to how the object of mass m got to height h. Those details don't matter: all that matters is the height h.

So the physics slapstick goes like this. A weightlifter is bench-pressing a barbell. After his workout, he replaces the barbell back onto its rack, where he found it. At this point, a fruity physicist walks by and remarks, unkindly, "Since the barbell is at its original height h, its potential energy is mgh, exactly the same as before you started your workout, when its energy was also mgh. So therefore the energy change of the barbell is zero. This means that you did no work on the barbell! If you had done work on the barbell, its height would have changed during the process (your workout)."

Now, I don't know that "you did no work on the barbell" is exactly what my instructor said. If he had it would be totally wrong, so wrong, in fact, as to smell.

It is true that the potential energy of the barbell is the same before and after the workout. But certainly the weightlifter feels like he has done work! This is because the weightlifter has done work. He has expended chemical energy pushing on the barbell in order to make it go up and down at a controlled rate. In order to push the barbell up, let's say he has expended an amount of energy Δe.

Energy, however, cannot be created or destroyed, so that energy Δe must have gone somewhere. It can't have ended up in the barbell--remember, its energy is the same before and after the workout, since its height is the same before and after and all of its energy is potential energy, which is proportional to the height.

Where did the energy Δe go, then? We can figure this out by breaking down the process of pushing a barbell up from its original height h then letting it fall back down to the same height h.

1. As the barbell goes up, there are two forces acting on it: the force of gravity, which pulls it down, and the weightlifter, which pushes it up. The weightlifter expends energy, and, since the barbell increases in height, this energy goes into increasing the barbell's potential energy.

2. As the barbell goes down, the same two forces are acting. Gravity is pulling down, and, since the weightlifter is probably not letting the barbell fall freely but is controlling its speed, he is still pushing up (just not as hard as gravity is pulling down). In any case, gravity is now doing work on the barbell to pull it back down to its original height.

So two things happened during this stroke (and as it is repeated). The weightlifter did work on the barbell. Then, gravity did work on the barbell. And since the energy of the barbell is the same before and after, since it starts and ends at the same height h, the work that the weightlifter did and the work that gravity did are exactly the same magnitude and opposite in sign.

That is to say, all of the following statements are correct:
During the workout, the weightlifter did work Δe.
During the workout, gravity did work -Δe.
The net work done on the barbell is the work done by the weightlifter plus the work done by gravity.
*The net work done on the barbell is zero, as can be seen since its energy is the same before and after the workout.*
But the statement marked with the asterisks (*) is not the same as the following completely untrue statement
The work done on the barbell is zero.
It's like so: if the taco truck guy gave me $5, then I gave him $5, then the net change in my wealth is $0, but that doesn't mean that no money was exchanged. Don't trust physicists who are trying to snark at weightlifters!

Update: this morning, I got a call from a telemarketer offering me a free one-month trial of life insurance (so I get to try it out!). Except that it wasn't free, per se. I would be charged about $60, then if I decided the life insurance didn't fit or was the wrong color, I could return it for a full refund. They'd give me my $60 back. Except that this isn't a free offer. My net change in dollars would be zero, but that doesn't mean there was no exchange of money, just like zero net work doesn't mean that nothing happened ...

Monday, August 3, 2009

A Bad Homework Problem about Entropy

Some disciplines in physics have been around for so long that it seems like all of the possible homework and exam problems have been tried out. There is a theory that these problems are stored away in the libraries of fraternities and sororities, ripe for the study of any student.

When this theory is brought up, it's always taken for granted that learning how to do the work on the exam is necessarily a bad thing. I'll just say that this isn't the only possible reaction to learning that students know how to do the assigned problems.

Thermodynamics is one of those disciplines. All of thermodynamics has basically been figured out since the 19th century. Thermodynamics is extremely important to understand in many fields, especially chemistry and biochemistry, so students still have to take this old, dusty subject. So the exam problems get repeated, and books probably copy each other. Upon occaision, however, one comes across something different. The "something different" is usually stupid. For instance,
[The instructor's] desk typically gets to the point where there are a lot of papers randomly distributed through out (sic) it. Every month or so I clean and organize the papers. ... treat the papers on my desk as a [two-dimensional] ideal gas. Each paper has an area a and the whole desk has an area A. We'll say that the papers are ideal particles ... this ideal gas [is] at constant temperature and thus assume [that the energy is constant] ... If I can clean my desk in a reversible manner, how much work will it take?
To someone not in a thermodynamics course (or someone who hasn't taken one in the past), this rightly seems like gibberish. To parse it, let's restate it with the trickier parts in bold:
[The instructor's] desk typically gets to the point where there are a lot of papers randomly distributed through out (sic) it. Every month or so I clean and organize the papers. ... treat the papers on my desk as a [two-dimensional] ideal gas. Each paper has an area a and the whole desk has an area A. We'll say that the papers are ideal particles ... this ideal gas [is] at constant temperature and thus assume [that the energy is constant] ... If I can clean my desk in a reversible manner, how much work will it take?
The first concept to understand is an ideal gas. This is a hypothetical material that is a very good model for real gases (like air) at moderate temperatures and pressures (room temperature and atmospheric pressure are not bad). The way the problem is stated, this ideal gas is composed of particles (the paper) that are like the atoms and molecules in a real gas, in that they don't interact much--this is what is meant by ideal particles. (I say "don't interact much" rather than the more common "don't interact" because particles in an ideal gas can crash into one another, like billiards balls, and still have the properties of an ideal gas.)

But this ideal gas is a little funny, since it exists in two dimensions instead of the usual three. One can imagine a model for such a gas, for instance with two panes of glass a tiny bit separated. The gas inside might behave like a two-dimensional ideal gas. However, it's not important that this system even be realizable: it's a model, and using the equations taught in the course the students should be able to compute the properties of such a system, even if it is weird.

What do the areas have to do with this problem? Well, the trick (in the course) to solving this problem was to recognize this system as a lattice model and being able to write down its entropy. This is a little equation, due to Boltzmann (who killed himself, probably because he was tired of stupid homework problems)
S = k ln W
which is an equation which relates microscopic properties--in particular, the combinatoric factor W--with a macroscopic property, the entropy S. This is a little like calculating the macroscopic energy of a system by adding up all the microscopic energy of all the constituent particles, but a little trickier, since it involves weirder stuff, like a logarithm and a number k, which is actually just a constant that guarantees that we measure temperature with a certain scale, like Fahrenheit (although no one does that) and energy with a certain scale, like kilowatt-hours (although no one does that, either, at least not in science classes. Ask an engineer). W is often called "the number of ways of arranging the system."

To understand this, imagine the lattice model as a muffin tin with 12 divets for muffins. (The analogy will work for cupcakes, if one prefers.) There are 12 different ways to "arrange the system" if there is one muffin. If there are 2 muffins, the problem is a little trickier, so we make it a little easier by saying the muffins are "ideal particles" and can occupy the same hole in the muffin tin (since ideal particles don't interact). Working through the math, there are 12-to-the-2 power, or 144, different ways of arranging 2 muffins in a 12-hole muffin tin. This is W for muffins.

Generalizing this, if there are m holes in the muffin tin, and n muffins, there are m-raised-raised-to-the-n-power different arrangements.

Don't worry if the math gets too nutty here. The important part is to know that, given a certain interpretation of the problem, it's solvable. (But since I'm a sicko I'm going to outline the math anyway.)

For the papers on a desk, the number of "holes" is A/a since one can fit exactly this many sheets of paper, each of area a, on a desk with area A. There are N papers, so for the papers on a desk,
W = (A/a)-to-the-N-power
so the entropy can be calculated:
S = N k ln (A/a)
But why calculate the entropy? Well, for this system, the work is the temperature T times the entropy change in entropy as we take all the randomly distributed papers and put them in a stack in a specific spot. Stacking the papers up, and putting them in one specific spot is like saying, in the final, clean state, is that there is 1 arrangement in the final state, so its entropy is zero (since the logarithm of 1 is zero). So the change in entropy as we take all the papers and stack them in one, specific spot is just
change in S = 0 - N k ln (A/a) - 0 = - N k ln (A/a) (it's a negative number)
and as I said before, the work requred is just the temperature times this number (it doesn't look like a number, but if one plugs in values for things like the area, it is!).

But what do we mean by work? This is what the question asks for, and the key (given in the statement of the problem) is that the energy is constant. Energy cannot be created or destroyed (it's "conserved") so if there were an energy change it would have to be because the system had heat removed or work done on it; those are the only two ways that energy can change. In our case, the energy change is zero, so if there were heat taken out of the two-dimensional gas, it would equal to the negative of the work. (Since if that were true, the heat and work would add up to zero, and since the heat and work are zero together, the energy change is zero.)

This is not the first place the problem starts to get silly. There is no heat added to the papers. Their temperature does not change. Someone just made up a stupid homework problem as if there were heat being removed from the system. If this were a two-dimensional ideal gas, and not stacks of papers, we would have to remove heat from the system in order to get all the particles to stick together. In fact, this is just like what happens when a gas gets cooled enough: it starts to condense into a liquid (usually). In our case, we're pretending to suck heat out of the randomly arranged papers. If we can calculate this pretend heat, we can calculate the work that one would have to do to clean the desk, at least if the heat is removed reversibly, which for our purposes it just means that the cleaning is done as carefully as possible, without wasting any energy or motion.

The heat sucked out turns out to be the entropy change times the temperature. Therefore the reversible (no wasting) work would be the negative of this, which is
work = negative of heat = - T x (change in S) = N k T ln (A/a)
The important part to recognize here is that the work done depends on the number of papers, N, which makes sense since one should have to do more work to clean up more papers, but other than that this answer is nonsense for cleaning up a desk (although it's correct for taking all the particles in a two-dimensional gas and squeezing them into one specific spot, as long as the squeezing is done reversibly).

For one thing, it depends on the temperature, which is silly. The papers do have a temperature; one could measure it with, among other things, an infrared thermometer. It seems wrong that it would take more work to clean the papers at 50 ° F than at 40 ° F, and that's because it is wrong. Temperature is the average kinetic energy of particles, in this case the particles that constitute the papers, not the papers themselves! Papers that are just sitting on a desk, in order or not, don't have kinetic energy and so they don't have a temperature: rather, their constituent particles have kinetic energy and so have a temperature.

What this does is just confuse people. Some students will get it. But there is no reason for instructors to confuse people on purpose. Physics is already hard enough without people running around making it more confusing. The above discussion about "temperature" of the particles in a piece of paper and "temperature of the papers" should seem confusing to the reader. That's because it is. The papers don't have a temperature because they're just sitting there on a desk. The particles that make up a paper, on the other hand, do have a temperature (as can be verified with a thermometer) since, on a microscopic scale, they're jostling around a little bit (but not a lot--if they jostled around too much the paper would fly apart or burn. In fact, this is one of the reasons why paper burns when its temperature is raised. The other reason is the presence of oxygen).
And for another thing, the approach to calculating the work is just plain wrong. An everyday reading of the problem would be something like, "How much do I have to exert myself to clean up some papers?" This could be measured in terms of the number of food calories burned (how much food energy was used by a person) who cleaned up the paper. Rather easier would be the question of the minimum amount of work needed to pick up N papers. This means fighting against gravity since, in principle, the papers have mass, and they need to be picked up and moved.

To do this much better calculation, we break down the steps to pick up one piece of paper. First, we have to pick up the piece of paper off of the table. To pick it up, we have to exert a force to overcome gravity and make the paper move up. It turns out that the work required to do this is just the mass of the paper, times how far we lifted up the paper, times the gravitational acceleration (which is about 10 meters per second per second, but the important thing is that it's a number that we can look up). For instance, if the paper had mass 1 gram, and we picked it up and moved it 1 centimeter above the desk, then we would have to do
1 g times 1 cm times 10 meters per second per second = 0.0001 Joules
A Joule is just a unit of energy; it's the same as about one-four-thousandth (1/4000) of a food calorie. A grape has about 4 food calories, so using the energy of one grape, one could (in principle) pick about 1000 one-gram papers 1 cm off of the table.

The next step to cleaning up the papers, after picking them up, is moving them over to the stack. This actually takes very little work, because the only force we have to fight against is gravity, and gravity works downward, not sideways. That is, there is no force that resists motion of a paper sideways other than its own inertia, which just means that we have to give it a little push to get it to move. Then it moves to our stack, then we have to give it another little push to make it stop moving. This will depend on the mass, too, and the distance the paper moves. The total amount of work will depend on how hard we push, and if we push just a tiny bit, then we can pretend that the work to move the paper sideways is negligible (zero).

The last step, after moving the paper, is dropping it onto the stack. This costs the cleaner no energy, since gravity takes care of moving the paper down.

To clean up N papers, then, depends on just a few things, among them the mass of each paper, and the height to which we pick up each paper off of the table. If the papers all have the same mass, m, and we pick them all up the same distance, h, from the table, then the work really done is
work = N m g h
which isn't that great of an answer, either, but at least it makes sense! Notice there's no temperature or k or area of the table A in this answer. Why should those values matter to the amount of work done? The fact is that they don't! But it makes intuitive sense that if the papers are heavier (m is bigger), then it would take more work. The "thermodynamics" answer doesn't depend on how massive the papers are, which doesn't make sense.

Furthermore, the real answer depends on how high we pick up the papers, h. It makes sense that if we pick up each paper 1 mile above the desk that the work will be more. It's much more efficient to pick them up a little bit (like .5 or 1 cm) than a lot. Again, the fake thermodynamics answer doesn't take this action into consideration. (Unless you pretend that we can slide the papers, h = 0, in which case the fake-thermo answer is still horribly wrong.)

What's more, if we went to a planet with stronger gravity, it should take more work to clean a desk, too. Stronger gravity means g is higher, and you can see that if g is higher the real answer predicts that more work will be required to clean.

The point of all this is that answers should make sense. The two-dimensional ideal gas answer for cleaning desks doesn't make sense, since it does not depend on obvious things like the mass of the papers and gravity, which are intuitively what should matter when we're doing something like cleaning. But things like collections of papers sitting on a desk don't have a "temperature" since they're not moving around. So it makes no sense, intuitively, for the answer to have anything to do with temperature.

Maybe this is why Boltzmann killed himself--he was being driven crazy by stupid homework problems in thermodynamics! Or maybe people weren't carefully checking their answers with their intuition.

Friday, July 24, 2009

The Federal Government Should Not Fund Research

The growth rate of the Chinese economy is astounding: something like 9 % per year. This compares to a growth of more like 2 % in industrialized Western countries. What, if anything, are we doing wrong?

Terence Kealey's The Economic Laws of Scientific Research analyzes how government involvement affects scientific research. This topic is of immense importance: science in the United States is in great part done by academic scientists, who must take time out of research and advising the junior scientists in their care, to write grant proposals. These proposals are usually submitted to the National Institutes of Health or the National Science Foundation, but sometimes the Department of Energy or the Department of Defense and other agencies. The proposals are judged by a panel of scientists (taking time out of their research and advising) to be worthy or not of funding; some of the worthy proposals are funded. Thus the submitting scientist can fund further research by purchasing equipment, recruiting graduate students, and hiring research scientists.

Kealey claims that this model is the philosophical descendant of Baconism, where the flow of government money fuels basic research, which then fuels applied science, which then generates technological innovation that makes everyone wealthier and our lives better. He points out that the Baconian idea is not the only possible model; in fact, another possibility is that the free market can drive scientific research. In this model, companies and individuals see possible profit in applied science; for instance, if some innovation can make the company more efficient, then the company can make more money. At times, learning the applied science requires funding basic research, so capitalists will fund that as well.

As a good scientist should, Kealey proposes these two models, in the form of two hypotheses: which system is better at producing innovation, and thus creating wealth? He then examines the available evidence to decide between the two hypotheses. There are several issues implicit in this test, however.
  1. Usually, in science, when we want to test between two alternatives, we need to do an experiment. In the case of deciding between government-funded and industry-funded science, the experiment would be setting up two countries which are the same in all ways, except the way that science is funded. Since this is impossible, we are left with historical evidence only: the data available is the progress of science in the past couple of centuries. This might seem like a drawback, but geologists do this kind of thing all the time! (In a few days, I'll be posting a description of how evidence--derived from experiments or from historical sources--is rationally used to test hypotheses.)
  2. The evidence available is usually only economical. The easiest things to look at are correlations between how much government funds science (in total dollars or as a fraction of total funding), and things like wealth and economic growth (for example, gross domestic product per capita). We really want to know which system is makes peoples' lives better, but "better" is subjective. I'll get back to this point further along.
  3. "Government funding" and "industrial funding" may not be the only possibilities. For instance, wealthy people might give money to scientists for the fun of it, or because they might find the results interesting. For instance, Mary Herbert (nee Sidney) did this kind of thing. But since government and industry are the biggest, it's a good approximation that there are only two possibilities.
However, in examining historical economic evidence, Kealey finds several trends that he describes as the economic laws of scientific research:
  1. "[T]he percentage of national GDP spend on science increases with national GDP per capita" (italics original) regardless of the source of funding, government or industry.
  2. "[P]ublic and private funds displace each other." This means that if the government increases money spent on research, then industry will decrease its spending.
  3. "[P]ublic and private displacements are not equal: public funds displace more than they do themselves provide."
Number 3 is the kicker. Kealey has found evidence that the more government spends on science, the disproportionately less that industry spends on it. Taken to its logical conclusion, if government spend nothing on science then industry would spend the maximum amount it ever would on research--after all, it's how they're going to make more money! And it would mean more money for scientists, too. Furthermore, since individuals and industry are taxed so that the government can fund science, taxes would be lower, meaning more immediate wealth, and, for industry, the ability to hire more people.

There is actually an analogy to welfare that makes me extremely angry. Some people (possibly my hero, Walter E. Williams) believe that a rule like Kealey's #3 happens with welfare. This would read: the more that government spends on welfare, the disproportionally less that individuals spend on it. You can believe that supporting government-sponsored social programs helps poor people, but I doubt it would help as much as helping them your own damn self.

Now, Kealey has assumed that wealth is a measure of scientific success, but there may be other reasons beyond wealth for doing science that justify government intervention. Like, science is cool, it can be fun, it can teach kids to think critically in school. It can generate culture by allowing cultural achievements (perhaps, maybe, possibly NASA is a good example, but doesn't it really seem more like comparing penis size with the Soviets?). The government may have better control over things like environmental policy. However, as Kealey argues, economic arguments cannot support government funding of science. Libertarian philosophy would go further, saying that government funding should not be involved in culture or environmental policy, and what's more actually diminishes the ability of individuals (who are really what make up culture) to contribute to these kinds of achievements.

Oh, and China? Kealey figured that one out, too, although his book was published (1996) before the Chinese economy achieved those huge growth rates. The explanation is the same one that includes the observation of the "slow" growth of first the British economy, then the US economy, starting in the 19th century. These were the economic and scientific powerhouses. Their scientists were doing basic (industry-sponsored) research. When you have to invent brand new things, of course your economy will grow slowly, compared to those that can copy your old innovations! This explains the fast growth of French, Swedish and Japanesse economies in the middle of the 20th century, until they saturated at the same old "slow" growth rate of 2 %, the same as the British and American economies. Kealey: "the countries that emerge into capitalism late enjoy higher rates of growth than do the pioneers [Britain and the United States]," until they saturate at the natural growth rates of innovative countries.

So it's not that Western countries are doing something wrong, it's that China is doing something right: copying, and adopting more free market approaches to technology. If they continue this trend, and peel away the central planning, then the Chinese will benefit, as will the rest of the world. Go China!

Sunday, July 19, 2009

What is Temperature? The Zeroth Law of Thermodynamics

Physical chemistry can be complicated and hard to understand. Fully grasping canonical partition functions, Gibbs samplers, or quantum mechanical anharmonic oscillators, can take years. These are very detailed, interesting subjects, but their direct application to the understanding of all, my ultimate goal for Free Range Science, can unfortunately be extremely difficult.

However, reaching back into the 19th century, chemists and physicists developed a set of theories for understanding some aspects of the everyday: thermodynamics. From a 21st-century perspective, the early thermodynamicists seemed obsessed with steam engines--understandably, because most of scientific progress reflects the desire to improve the technological innovations of the day. But what they leared about engines, which are devices that turn heat flow into work, has consequences for a myriad of seemingly different topics. Today, we use thermodynamics less to understand engines, and more to understand modern issues such as how tightly a drug molecule binds to a protein molecule, how much light energy is required to drive a photocatalyst, and a myriad of other issues. (Although these days the discussion focuses more upon the microscopic manifestation of thermodynamics, called statistical mechanics.)

Thermodynamics deals with everyday objects as well as extraordinary ones, and it can answer a question that has to do with almost everything there is: what is temperature?

Everyone is familiar with the use of a thermometer to test the temperature of an object. Simply, the thermometer is placed in suitable contact with the object, then we wait a while, and after some time the thermometer registers what we call the "temperature" of the object.

In order to understand what measuring a temperature really means, we will make an analogy. At a very deep level, the analogy is exact, since the way we understand, mathematically, what happens when we try to measure temperature with a thermometer exactly the same way we understand the process I'm about to introduce: the stretching of rubber bands.

Imagine two rubber bands hooked, each hooked over its own nail. Both nails are pounded partway into a board. Then we stretch the rubber bands until they meet, and we tie them together or to some connector.

We should be careful to state some conditions that we think might matter in the situation at hand, since if they don't hold the analogy fails. Let's assume that the rubber bands are strong enough so they won't break, we've attached the ends together so that they won't slip apart, and that the nails are firmly implanted in the board. If there is breakage or slippage then the way we understand this situation will be much less helpful to our ultimate goal, to make an analogy to temperature and heat flow.

The next imporant concept is of equilibrium, the state in which nothing changes. In the rubber band situation, this is called mechanical equilibrium, where all of the forces balance out. Because they're stretched out and tied to one another, the rubber bands exert forces on one another. In the situation where the knot is not moving, the system is at mechanical equilibrium; each rubber band pulls on the other one, but the force each rubber band exerts on the other is the same.

If the forces didn't balance out, such as the situation where we pull the knot towards one nail or the other and let it go, then there would be motion. If we let the motion of the system evolve, then eventually it would find the state of mechanical equilibrium. While not in equilibrium, one rubber band is pulling harder than the other one. This leads to motion in the form of the overstretched band shrinking some, whereas the understretched rubber band extends a little. Since the distance between the nails is fixed, this can be described as the rubber bands exchanging lengths with one another: if the length of one rubber band decreases by 1 cm, then the length of the other increases by 1 cm.

It should be obvious that the exchange of length doesn't necessarily occur until both rubber bands are the same length; what if one were just shorter than the other? On the contrary, the exchange of length occurs until the forces balance out--until mechanical equilibrium is reached.

A similar process occurs when we bring a thermometer into contact with an object--a little kid, the outside air, or a pot of boiling sugar--in order to say something about its temperature.

Temperature is like the forces in the rubber band situation: something will be exchanged between two objects of different temperatures just as length will be exchanged between two rubber bands. This "something" will be exchanged until the temperatures of the two objects are the same, just as in the rubber band case length is exchanged until the forces are equal. Once the exchange is complete, for temperatures, the objects will have reached thermal equilibrium. The thing exchanged is often called "entropy," but it is useful to think of probability being exchanged instead. Probability and entropy are deeply related, but probability is much simpler to understand, physical scientist or not.

The probabilities in question are those for the different possible arrangements of particles and energy in the two objects. There are huge numbers of states for particles to arrange themselves in. A good analogy is a bag of rice; there are thousands of rice grains, and each one can be oriented in any direction. The rice grains also "interact" with each other; since they're solid, they can't overlap, which limits the number of orientations of any rice grain, much like microscopic particles in a very dense system. Opening a bag of rice reveals one of the possible states that the system can be in. Not all arrangements of rice grains are equally probable: for instance, a jumbled-up arrangement is much more likely than an arrangement where all of the rice grains are pointing in the same direction.

The analogy of probability flowing to the exchange of length in the rubber band situation is even deeper. Like the total length of the rubber bands, which is a constant, the probability has a fixed total value for two objects brought into contact--it has to add up to 1, or 100 %. Each of the possible arrangements of particles and energy in each system has some probability; the probability that the particles and energy are arranged in some way is 1. How the probability is partitioned among all of the immense number of arrangements--how some states are more probable than others--depends upon the details of how the particles interact, which can get very complicated, and upon the temperature.

When two objects at two different temperatures are not in contact, they tend to spontaneously arrange the matter and energy that constitute them into the most probable ways. This spontaneous arrangement is summarized by the Second Law of Thermodynamics, which says that entropy cannot decrease in an isolated object, but it really just means, quite trivially, that one is more likely to find matter and energy arranged in probable ways than in improbable ways.

However, when two objects of different temperatures, themselves having attained their most probable arrangements, are brought into contact, the two objects considered together are no longer in a probable arrangement, even though they were in their most probable states before being brought into contact. There is more energy per particle in the hotter object than in the colder object, so the particles of the hotter object will tend to transfer their energy to the particles of the colder object. (This happens because the objects are in contact, so it happens where the objects are in contact. Surface area matters--the energy transfer will be faster if the contact surface is larger.) As the two objects exchange energy, the arrangement spontaneously evolves until it's in a more probable state. In consequence, the originally hotter object assumes a less probable state, from its point of view, in order that the originally cooler object can assume, from its point of view, a more probable state. This is all done in such a way that, considering the probabilities of different arrangements of both objects together, the ultimate arrangement--both objects the same temperature--is more probable than the initial state with different temperatures.

So temperature is the force that causes probability to flow, until thermal equilibrium is reached, when the temperature-forces balance. But this really only explains thermal equilibrium, rather than temperature. To finally understand temperature, however, we consider what is called the Zeroth Law of thermodynamics, which states that if object A is in thermal equilibrium with object B, and that C is in thermal equilibrium with B, then A and C are also in thermal equilibrium with each other. This might sound trivially silly, almost as silly as calling something "zeroth," but the reason for the law was one of logical consistency; thermodynamicists realized that the mathematical structure of thermodynamic theory, already well-developed, required an assumption that is logically prior to the other three laws in order to make the theory more consistent. This is also why it's called the "zeroth" law: the first, second, and third laws had already been invented.

(We have now seen the zeroth, first, and second laws of thermodynamics. The third law is not very interesting.)

We call an object like object B, above, a thermometer. We can look at the behavior of the thermometer as it interacts with objects of different temperature, and use our results to define a numerical scale. Put into contact with A, until thermal equilibrium is reached, we obtain a numerical value of the temperature of A. If we repeat with object C and we find that the numerical value of the temperature is the same, then we say A and C are the same temperature (obvious, right?), without actually having to put them into contact to see if any heat flows. This is effectively a prediction that if we put A and C into contact, that no heat would be exchanged--they would already be in their most probable states, without having to exchange any probability.

Wednesday, July 15, 2009

Accepted paper: Bayesian single-exponential rates

Happy news for me today: a (very, very long) paper of mine was accepted for publication today in the Journal of Physical Chemistry B. The impact on science is small, even as important as I think the topic is, but the impact on my career could be huge.

The paper describes a method for estimating the rate of some process essentially by counting the number of times the process occurs. The analogy I like to use is of a road: one plants by the side of the road (lawn chair and cooler are mandatory, just as for those computing rates in molecular simulation) and counts the number of cars that pass in some time period. One estimate of the rate is to take the number of cars that pass and divide by the time period. For instance, if the number of cars was 120 and the time period was 12 hours (this is not a busy road), then one might say the rate is 10 cars per hour.

This could be the same with molecular simulation. My real job is to direct computers to run simulations of protein folding. We make models of the proteins in unfolded states, then let them evolve according to Newton's laws of motion; with the luck of statistical mechanics, some of them reach folded states. Say that I observed 10 folding events in 100 microseconds; then an estimate of the rate would be 1 folding event every 10 microseconds, same as with the cars.

Interestingly, the division method (in this case also known as a maximum likleihood estimate) is not necessarily the best method for estimating a rate. For one thing, this and similar methods provide only point estimates of the rate and do not reflect our uncertainty as to how good the estimate is. To illustrate, imagine observing a road for 1 second and seeing no cars pass; clearly a rate of 0 cars per second (minute, hour) is not a good estimate of the rate. We would prefer a way to know how good our point estimate is.

For this purpose, we can compute a probability distribution of the rate, which describes our beliefs about the rate. That is, we assign a probability to each possible value of the rate. These probabilities in turn describe how surprised we would be, after making an observation, that the true rate would turn out to be any number. If we had made lots of observations (many cars in some long time period) then the probability distribution would be very sharp: we would and should be very surprised to find that the true value is different from the maximum likelihood estimate (number of observations divided by time period) if we have lots of data.

In contrast, if we have a little bit of data--no cars in 1 second--then our probability ends up being not sharp but broad. With silly data like this, we would not be surprised to find that the true rate is anything. This is because the maximum likelihood estimate should not be taken seriously given such a small amount of data.

The manner in which probability distributions of the rate are built is Bayesian inference, that method of statistics that allows things with "true" values like protein folding rates to take on probabilities which reflect our belief that the true value is within some range. As I show in the paper, these methods quite naturally show what is intuitive above: that lots of data gives sharp, reliable estimates and that a tiny bit of data gives poor estimates. Intuition can be made systematic.

Most fun, I can use the methods in the paper to calculate my future performance. I have three papers this year (so far, anyway). If everything stays the same, the probability that I publish between 2 and 5 papers next year is about 59 %. (I call this state "quantitative professional scientific happiness," or QPSH, come up with your own pronunciation) What's more, the probability that I publish less than 2 papers next year is only 14 %. The probability that I publish more than 5 papers is a little more than 26 %.

So you should be more surprised, if everything stays the same, if I publish more than 5 papers next year than if I publish between 2 and 5. You should be even more surprised if I publish less than 2. But who knows if everything will stay the same? Perhaps we should always be surprised?

Sunday, June 28, 2009

So What Does Dan Actually Do?

"What do you study, Dan?" This is a surprisingly difficult question with several answers, depending on who is asking.

To another scientist of the right kind, I can give the reasonably detailed answer, "I study protein folding kinetics by molecular dynamics simulations, followed by Bayesian analysis of states and rates. The general idea is to criticize and improve master equation models of microscopic protein dynamics."

For some other scientists, "I study protein folding kinetics using computer simulation." Short, sweet.

For non-scientists? This is trickier, and I'm a bit ashamed not to have a short answer prepared. However, the short answer is difficult; look at the "other scientists" response, in particular. Parts of that sentence parse for reasonably intelligent laiety, like "computer simulation." I calculate stuff (generally stuff that's either too hard to measure or too small or fast to see, but that's part of a more detailed answer). Much harder is explaining what "protein folding" is. "Kinetics" is also probably tricky, but it's just a technical word for the measurement of how and how fast something happens. "Protein folding kinetics" is studying how and how fast protein folding happens.

I imagine people think of folding sliced ham in half when they hear "protein folding," but the protein in protein folding doesn't refer to dietary protein like a slice of ham, but to biochemical protein. This is much like "water": when most people (including scientists) refer to water, they mean the liquid. "Water" sometimes means the molecule, water, H2O, meaning two hydrogen atoms attached by chemical bonds to an oxygen atom.

Proteins are molecules, too, of a class called macromolecules because they're huge compared to ordinary molecules like water. Other macromolecules include lipids--that is, fats, which actually aren't that big--and nucleic acids like RNA, which is about as big as a protein, and DNA which is huge compared to proteins and RNA.

In technical language, proteins are linear polymers of amino acids. This means that proteins are bigger molecules made from sticking small amino acid molecules together, end to end, by chemical bonds. The amino acids are a class of molecules, of which twenty or so are used in biology, with one end (the "amino" end, made of nitrogen, with a name not coincidentally reminiscent of the solvent, ammonia) which can stick, chemically, to the other end (the "acid" end). With dozens or hundreds of these molecules stuck together, end on end, one gets a protein macromolecule, which topologically speaking is a long chain of amino acids, like a string.

Proteins perform lots of different tasks for the cell, with one type of protein doing about one specific task. For instance, hemoglobin is a protein that carries oxygen in the blood; IDH is a less famous protein that cleaves one molecule, isocitrate, into two, carbon dioxide and alpha-ketoglutarate.

However, a protein in the form of a string of amino acids cannot do its job. Hemoglobin can't carry oxygen in string form, and IDH can't cleave isocitrate. In order to perform its function, the protein string needs to fold into a specific shape. Luckily, the information for folding into the right shape is encoded in the string of amino acids, some of which are oily, and avoid water, and some of which are hydrophilic, or "water-loving," but better called water soluble. However, remember that the amino acids are stuck together into a chain; the information of the proper fold is encoded in the different arrangements of the chain so as to hide oily amino acids from water, and to get water soluble amino acids into contact with water. The proper folded conformation is the most likely conformation that does the best job getting all of the right amino acids away from or in water, according to which amino acids are in the chain.

The particular chain for a protein is unique; hemoglobin is different from myoglobin which is different from IDH. Therefore, the fold is different. One way to study protein folding, known as structure prediction, is to try to guess from the sequence of amino acids what the final structure will be. On the other hand, I study protein folding kinetics, which is how the protein gets to the folded state in the first place.

Protein folding is impossible, with state-of-the-art technology, to observe directly in the lab, so I and my coworkers (and others around the world) use computers to model how the atoms in the protein macromolecule move around. The atoms have velocities, and they exert forces on one another. For instance, positively charged atoms attract negatively charged atoms, according to the ordinary rules of electrostatics. Since they exert forces, the atoms accelerate or change velocity, and they zoom around. However, the atoms are chemically bound in certain arrangements, they don't zoom around too much, so really all the protien chain can do is flop around in different arrangements.

Spaghetti illustrates this quite nicely. A pot of noodles thrown on the floor shows lots of pasta (protein chains) which flop in different ways. The forces that a protein chain exerts on itself makes the protein move between different conformations. Some of these conformations are (or are very close to) the proper, functional conformation of a protein.

With molecular simulation, we calculate the next conformation, in time, that a protein molecule will adopt, from its characteristics at the current conformation. Thus, the protein chain moves from conformation to conformation, and eventually folds into the correct, functional shape.

I study how a protein moves between the different shapes it can adopt, and, more interestingly, how quickly it can change from shape to shape, using a computer. The final piece is to state why a person would want to know about this: as might be guessed, knowledge about how proteins fold is one of the keys to understanding their many functions.