Ed. Addison Wesley
3-8 Information theory. Relation between information and entropy.
Recent developments in electronics and communications have led to the realization that general properties related to what might be called the "assembly of a pattern" exist. The pattern may consist of the display on a cathode-ray tube, which has to be assembled from currents developed in a series of vacuum tubes by the received signal, or of a set of sound vibrations of variable frequency and amplitude impressed on an ear. Problems such as these seem to have a common general feature, and this observation has been formulated by Shannon as a new theory called "information theory."
Information theory is recognized as having application far beyond the telegraph wire. A collection of essays on the topic of information theory and biology has been made by Quastler, and before we proceed to the usual definitions and formalism, it is worth while to see what new approaches are made available by information theory. Perhaps one of the most interesting of these is the use of known diversity to give a numerical description. The mere fact that there are, say, 600 kinds of enzyme, enables us to make a numerical estimate of the specificity of an enzyme. This can then be compared with, say, antigen-antibody specificity on the same numerical basis. When we are faced with extremely complex systems, information theory can also be used to gain some idea of how complex the system really is. In this sense information theory is salutory, for it can be a starting point for a numerical analysis that may well be picked up and finished in a quite different fashion. Quastler points out that in biology information theory, which uses only one dimension, can be applied to a model in which steric restraints are three dimensional and hard to visualize. In this way it is a convenient tool for excluding hypotheses. It does not tell how to do something, but rather how difficult it is to do.
With this brief introduction we can give a skeleton account of information theory and show a few applications to biology.
Definition of information.
If an event has a probability P of occurring before a "message" is received and a probability P' of occurring after the message is received, the information in the message is H, where
To see how this operates, suppose that we have to pick out one letter from a total of 16 letters. Then, before picking, P= 1/16, while after the selection is made, P' = 1, so P'/P = 16. Now 24 = 16, and log2 16 = 4, or there are 4 bits of information. Thus H as defined above is in what we call "bits, " an attractive name and one made respectable by being the contraction of "binary units."
This definition of information bears an obvious similarity in concept to the definition of entropy as k log W, for W is a measure of probability.
One very interesting aspect of information theory becomes evident when this similarity is examined further. The first case in which such an examination was made concerns "Maxwell's demon." This perceptive and intelligent creation of Maxwell's mind was supposed to be able to let fast molecules through and shut out slow molecules, thus raising the temperature in one place and lowering it elsewhere, without doing any work. Such a process contradicts the second law of thermodynamics. Szilard, in 1929, pointed out that the process of reducing the entropy required the use of information, and in 1951 Brillouin made the definite suggestion that the amount of negative entropy supplied had a numerical correspondence to the amount of information used to produce the entropy diminution. Brillouin, incidentally, also pointed out that if the information is obtained physically (e.g., from a flashlight illuminating the molecules), then no entropy decrease occurs.
The correspondence between entropy and information can be obtained in several ways. Perhaps the simplest, due to Linschitz, is as follows. If the total number of possible configurations available to a molecule is P, then this is a measure of the probability, and hence of the entropy, by the relation S = k In P. But to determine which configuration exists, we have to make H binary choices, where
S = kH In 2 or H = S kIn2
The values of H and S refer to one molecule. Normally k is in ergs. If calories per mole per degree is used for S, we have
where R = 1.98 cal/mole/deg.
So far we have used the simplest kind of definition of information. If all the possible states are not equally probable, but the expected value of the ith state is pi, then the definition of information becomes1
We can show directly that the change in entropy occasioned by the selection
and removal of one particular state is the negative of the information contained
in that state.
Thus we have two ways of determining information: (1) directly in terms of binary choices or the more elaborate relation H = -p log2 p, and (2) in terms of physical entropy changes.
3-9 Information content of some biological systems.
We first employ the method of binary choices. As an example, consider a protein
molecule, which by its very nature has one or more polypeptide chains. The forms
of these chains are not very diverse, numbering perhaps eight. To select one
kind of chain out of eight requires three binary choices, or three bits. Suppose
there are 1000 residues selected from among 20 amino acids. Each selection involves
very nearly four bits. (Actually one out of 16 is four bits, but the frequency
of amino acids is not uniform, so that by asking for the more likely ones first,
the actual amino acid can be specified in less than four bits, but not much
less.) Each choice has to be made 1000 times, so there are 4x103 bits in a particular
protein of a particular form. To choose the form requires three bits, so the
total information content of a protein is still effectively 4x103 bits, although
admittedly the computation is rough.
In a nucleic acid molecule, the four bases require that any one base can be specified by two bits. Thus, apart from any differences in type of molecule, a nucleic acid molecule contains twice the number of nucleotides. Since a nucleotide has a molecular weight of about 250, a nucleic acid molecule of molecular weight 106 has 4000 nucleotides, making 8000 bits altogether. It is interesting that such a nucleic acid molecule has about the same information content as a protein molecule of less than a fifth its molecular weight. Quastler has pointed out that if a correspondence between nucleic acid and protein exists, as is essential if nucleic acid is involved in protein synthesis, then to get an equivalent amount of information the nucleic acid molecule has to be bigger than a protein molecule. If there is a purine-pyrimidine correlation, as proposed in some DNA models, then there is only one bit per nucleotide, and it would be expected that for equal information content a DNA molecule would have to be 10 times as large as a protein molecule. This is, roughly speaking, what is observed.
We can now turn to a very different approach to the information content of a protein and ask about the number of bits of information involved in, say, enzyme-substrate combination or in an antigen-antibody combination. This can be answered directly in one or two individual cases. If we take the action of urease on urea, and assume that the urea molecule must lie in a definite orientation and have the correct molecular dimensions, we can argue thus. To specify any substrate, say urea, involves the specification that it contains 10 atoms (about three bits) and that each atom be specifically identified (about one bit per atom), making 13 bits. About four more bits may be needed, because the 10 atoms can combine in more than one way, and the right way (urea) must be chosen. Therefore the selection of the substrate involves, roughly, 17 bits.
This amount of information is more than is needed for the enzymesubstrate combination and subsequent reaction. Thus we have excluded all sorts of 10-atom combinations, and indeed configurations of more than 10 atoms, and in so doing have required information. However, the enzyme may actually react with many more combinations of atoms that are chemically much too hard to test. For example, thiourea has been totally excluded as a substrate, whereas, in fact, it is not excluded.
The problem, then, is to see how many of these 17 bits are needed if it is supposed that the important point in the interaction between urea and urease is the existence of complementary structures corresponding to the oxygen, carbon, and nitrogen atoms, as indicated in Fig. 3-9.
If we consider the pattern alone, without regard to any chemical similarity or attraction possibilities, then we can treat the carbon as the origin of a grid of squares, say 1/5 A in size. If the requirement is that three particular squares be occupied by an atom and there are about 25 squares to choose from, then, since 25 = 24.6, 4.6 bits are required for each square, or 13.8 bits altogether. If in place of a requirement of 1/5A and precision arrangement of all four major atoms we substitute 1/3A and only three major atoms (only two if the carbon is chosen as a starting point), then the number of bits shrinks to 6.5. The reader can clearly see that in this direct approach the actual process critically determines the amount of information deduced.
FIG. 3-9. Representation of one way to estimate the information necessary in an enzyme-substrate combination. The substrate is urea, and the specific surface of urease is divided into a grid. The fineness of division of the grid needed to specify the exact way in which the urea molecule must be placed will affect the number of bits of information required.
FIG. 3-10. Diagrammatic representations of the features which enter into (a) enzyme-substrate, (b) antigen-antibody, and (c) genotype-phenotype relations due to Quastler. In each case the actual specific relation requires the common possession of a fraction of the total information present in the two related molecules.
A totally different viewpoint is possible. This is again due to Quastler. While
the above treatment gives a figure for the information needed for some one method
of operation, it is possible to ask how much information is biologically required
regardless of any operational method. One way to answer this question is to
ask how many enzymes there are, and then to say that any particular enzyme has
to be chosen from among this number. Another method is to determine the number
of possible substrates, and then to say that an enzyme must have the information
necessary to pick out one of these. Taking the first approach, there are about
600 enzymes that can readily be conceived. To pick one out of these takes nine
bits. The second approach requires some estimate of relative concentration.
Quastler points out that 90% of the dry weight includes only 61 classes of substrate,
and that 99% includes 300 classes of substrate. The selection of one substrate
thus corresponds to seven bits, so that from the point of view of biological
necessity, only seven to nine bits are involved in the enzyme-substrate process.
Of course, we cannot say that this last figure tells us the form of enzyme action. But we can say that if the evolutionary process has led to the simplest method that meets necessity, then highly elaborate and precision configurational matching is not necessary, and so may not be found. This is taken up again from a different point of view in Chapter 8.
A somewhat similar approach leads Quastler to the same kind of figure for antigen-antibody relationship and for genie control of a phenotype. The three diagrams in Fig. 3-10 illustrate how the information may be conceived as being operative for the enzyme-substrate, antigen-antibody, and gene-phenotype relationships. It will be seen that in each case there are both irrelevant features and features that might be concerned with specificity, plus the seven to nine bits concerned with specific relations.
3-10 Information content of a bacterial cell.
Since a bacterial cell is actually the smallest object in which all the major
functions of life are found (viruses being essentially parasites so far as this
line of thought is concerned), an estimate of the information content of such
a cell is of interest. Two such estimates have been made, one by Morowitz in
terms of the direct approach of information theory and one by Linschitz based
on physical entropy and its relation to information.
The direct approach has been likened to preparing the instructions for constructing a building from its component parts. If the building is to be made of bricks, then a three-dimensional grid can be imagined, forming a three-dimensional honeycomb of cells. To decide whether a brick is or is not in a cell is, then, the way in which the amount of information is reduced to a number. Obviously it makes a big difference whether the cellular framework set up in one's imagination is coarse or fine, and one decision that has to be made is whether the framework is too coarsely designed to describe the building, or whether it is so fine as to include slight cracks in the bricks, which are not relevant. This is one of the weaknesses of an estimate of information, but in spite of it, quite interesting figures can be developed.
A bacterial cell is made of water and solid material. A bacterial spore has nearly all its water content missing, and yet it can become vegetative and develop into a bacterium. So it is reasonable, as Morowitz points out, to consider the information content in the dry part. The problem then is to choose the right atoms and put them in the right places. The instructions for doing so, in binary form, are the information content. We give below a modification of Morowitz' method, which is very direct but not as rigorous as his.
To determine which atom of the 60 kinds present (all atoms are not found in living cells) should be chosen would appear to take nearly six bits. In fact, however, the elements are far from evenly distributed, so that the average number of binary choices to identify an atom is 1.5. We next need to know the number of atoms. If the average atomic weight is taken as six, the average atomic mass is 6x1.67x10-24 gm, and for a dry weight of 6x10-13 gm this means there are 6x1010 atoms to locate. Now the question of the fineness of the cellular structure has to be decided. If it is set at 2x10-10 cm, the vibrational amplitude of a nucleus, and we remember that the average spacing of atoms is about 2 A and we choose a cube of 8 A3 as the region of uncertainty for an atom, the result would seem to fit with knowledge of the nature of atomic and molecular formations. This volume, in cm3, is 8x10-24, while that of vibrational amplitudes is 8x10-30. The number of cells in our mental honeycomb is then 8x10-24/8x10-30 or 106, an uncertainty which takes 20 binary choices. The total per atom is then 21.5, and since the total number of atoms is 6x1010 the total information content is then 1.3x1012 or, in round numbers, 1012 bits.
Linschitz gave the figure 9.3x10-12 cal/deg or 9.3x10-12x4.2 joules/deg for the entropy of a bacterial cell. Using the relation H = S/(k In 2), we find that the information content is
Morowitz' deduction from the work of Bayne-Jones and Rhees gives the lower
value of 5.6x1011 bits, which is still in the neighborhood of 1012
bits. Thus two quite different approaches give rather concordant figures.
It must be pointed out that both methods tend to give high values. If the entropy calculated from the caloric values of the nutrient process is actually very wasteful, then the amount of entropy used to estimate information is lower. In the second approach, it is quite possible that many of the actual locations of atoms are not critical to life. Indeed, Holter has shown that a centrifuged amoeba that has actually developed stratification of its components can still live.
Even with these reservations, the value of 1012 bits is very high. Morowitz has pointed out that the random concatenation of a bacterium from its component atoms is very unlikely indeed. Those who wish to speculate on the origin of life can speculate about these values and see what impact they have on their prejudices.
We return to the method of estimating information. It is clear that the process chosen for assembly is most important. Referring again to the analogy of constructing a building, we can say that if walls are built by a process of pouring concrete into molds instead of the brick-by-brick method, the amount of information needed decreases. In the cell there is the possibility that certain growth patterns are required to develop from others. Thus if a string of nucleoprotein is placed in a mixture of a few enzyme molecules, salts, and amino acids and then a cell has to result, the information content is far less. For a bacterium, an estimate based on reasoning due to Dancoff and Quastler gives 104 bits. If such relative simplicity is at the base of living systems, then by all means the inevitable synthetic processes should be searched for vigorously.
Even if such processes are found, it is likely that they conform to a general principle first stated by Dancoff. Every sequential process is subject to error. We can either imagine such sequential processes as highly precise, so that no error at all can be tolerated, or we can say that the over-all demands on the sequential processes are such that the maximum possible error is allowed. Put in more specific terms, if a certain protein has to be synthesized up to 100 in number for the cell to work, then we can imagine either a process by which a precision template produces all 100, exposing the system to errors 100 times, or a process by which each protein divides, with a specific inhibitor being present to prevent the division of any incorrectly formed protein. Then any failure is stopped and the other proteins take over. This second process has a form of checking included. Dancoff suggested that a guide to setting up hypotheses about cellular action could be made in terms of choosing that hypothesis which, while still fitting the facts, allowed for the greatest error that could possibly result from its application.
One can apply this principle qualitatively to the question of whether protein or other large molecules undergo the rapid turnover of the smaller amino acids. If it is conceded that the process of exchange of one amino acid for another can involve error, then each protein must be exposed to malformation many times per second. Either this is unreasonable or the cell has a checking process of tremendous scope and efficiency. So potent would this checking process be that it should readily be observed in biochemical experiments. Until it is observed, it is more plausible to cast doubt on rapid amino-acid turnover. The work of Monod and Spiegelman on adaptive enzyme formation would seem to strengthen this viewpoint.
We conclude this chapter with a few information figures of interest, pointed out to us by Quastler.
Let us assume that analytical chemistry can distinguish one substance from 107 others, which involves 23 bits of information. Since the biological information in a DNA molecule is of the order of 104 bits, the biological operation of DNA is not likely to be seen in terms of analytical chemistry. On the other hand, x-ray structural analysis can readily extract 104 bits of information from a crystal 1 mm across. Consequently, the entry of structural analysis into biophysics is very rational.
If the figure of 1012 bits per bacterium is used, then 109 bits per second are produced as the bacterium grows and develops. This figure is so high that it makes plausible some kind of analysis in terms of a controlling set of nucleoprotein molecules. Under no circumstances can the growth of a bacterium be looked on as producing less than 1000 bits per second. It is interesting to compare this with the conscious handling of information by a human being, which Quastler sets at 23 bits per second. A printed page is about 104 bits, although often it contains much redundancy - a small comfort to authors.
1. Note that for the previously considered case, where equal expectation existed, pi = 1/p in all cases and