*Setlow-Pollard
Ed. Addison Wesley *

*pp 66-74*

**3-8 Information theory. Relation between information and entropy. **

Recent developments in electronics and communications have led to the realization
that general properties related to what might be called the "assembly of
a pattern" exist. The pattern may consist of the display on a cathode-ray
tube, which has to be assembled from currents developed in a series of vacuum
tubes by the received signal, or of a set of sound vibrations of variable frequency
and amplitude impressed on an ear. Problems such as these seem to have a common
general feature, and this observation has been formulated by Shannon as a new
theory called "information theory."

Information theory is recognized as having application far beyond the telegraph
wire. A collection of essays on the topic of information theory and biology
has been made by Quastler, and before we proceed to the usual definitions and
formalism, it is worth while to see what new approaches are made available by
information theory. Perhaps one of the most interesting of these is the use
of known *diversity* to give a numerical description. The mere fact that
there are, say, 600 kinds of enzyme, enables us to make a numerical estimate
of the specificity of an enzyme. This can then be compared with, say, antigen-antibody
specificity on the same numerical basis. When we are faced with extremely complex
systems, information theory can also be used to gain some idea of *how complex*
the system really is. In this sense information theory is salutory, for it can
be a starting point for a numerical analysis that may well be picked up and
finished in a quite different fashion. Quastler points out that in biology information
theory, which uses only one dimension, can be applied to a model in which steric
restraints are three dimensional and hard to visualize. In this way it is a
convenient tool for excluding hypotheses. It does not tell how to do something,
but rather *how difficult it is to do*.

With this brief introduction we can give a skeleton account of information theory
and show a few applications to biology.

**Definition of information.**

If an event has a probability P of occurring before a "message" is received and a probability P' of occurring after the message is received, the information in the message is H, where

To see how this operates, suppose that we have to pick out one letter from a
total of 16 letters. Then, before picking, P= 1/16, while after the selection
is made, P' = 1, so P'/P = 16. Now 2^{4} = 16, and log2 16 = 4, or there are 4
*bits* of information. Thus **H** as defined above is in what we call
"bits, " an attractive name and one made respectable by being the
contraction of "binary units."

This definition of information bears an obvious similarity in concept to the
definition of entropy as *k *log W, for W is a measure of probability.

One very interesting aspect of information theory becomes evident when this
similarity is examined further. The first case in which such an examination
was made concerns "Maxwell's demon." This perceptive and intelligent
creation of Maxwell's mind was supposed to be able to let fast molecules through
and shut out slow molecules, thus raising the temperature in one place and lowering
it elsewhere, without doing any work. Such a process contradicts the second
law of thermodynamics. Szilard, in 1929, pointed out that the process of reducing
the entropy required the use of information, and in 1951 Brillouin made the
definite suggestion that the amount of negative entropy supplied had a numerical
correspondence to the amount of information used to produce the entropy diminution.
Brillouin, incidentally, also pointed out that if the information is obtained
physically (e.g., from a flashlight illuminating the molecules), then no entropy
decrease occurs.

The correspondence between entropy and information can be obtained in several
ways. Perhaps the simplest, due to Linschitz, is as follows. If the total number
of possible configurations available to a molecule is P, then this is a measure
of the probability, and hence of the entropy, by the relation S = k In P. But
to determine which configuration exists, we have to make H binary choices, where

or

Therefore

S = *k***H** In 2 or **H** = S *k*In2

The values of **H** and S refer to one molecule. Normally k is in ergs.
If calories per mole per degree is used for S, we have

where R = 1.98 cal/mole/deg.

So far we have used the simplest kind of definition of information. If all the
possible states are not equally probable, but the expected value of the ith
state is pi, then the definition of information becomes^{1}

We can show directly that the change in entropy occasioned by the selection
and removal of one particular state is the negative of the information contained
in that state.

Thus we have two ways of determining information: (1) directly in terms of binary
choices or the more elaborate relation H = -p log_{2} p, and (2) in terms of physical
entropy changes.

**3-9 Information content of some biological systems.**

We first employ the method of binary choices. As an example, consider a protein
molecule, which by its very nature has one or more polypeptide chains. The forms
of these chains are not very diverse, numbering perhaps eight. To select one
kind of chain out of eight requires three binary choices, or three bits. Suppose
there are 1000 residues selected from among 20 amino acids. Each selection involves
very nearly four bits. (Actually one out of 16 is four bits, but the frequency
of amino acids is not uniform, so that by asking for the more likely ones first,
the actual amino acid can be specified in less than four bits, but not much
less.) Each choice has to be made 1000 times, so there are 4x10^{3} bits in a particular
protein of a particular form. To choose the form requires three bits, so the
total information content of a protein is still effectively 4x10^{3} bits, although
admittedly the computation is rough.

In a nucleic acid molecule, the four bases require that any one base can be
specified by two bits. Thus, apart from any differences in type of molecule,
a nucleic acid molecule contains twice the number of nucleotides. Since a nucleotide
has a molecular weight of about 250, a nucleic acid molecule of molecular weight
10^{6} has 4000 nucleotides, making 8000 bits altogether. It is interesting that
such a nucleic acid molecule has about the same information content as a protein
molecule of less than a fifth its molecular weight. Quastler has pointed out
that if a correspondence between nucleic acid and protein exists, as is essential
if nucleic acid is involved in protein synthesis, then to get an equivalent
amount of information the nucleic acid molecule has to be bigger than a protein
molecule. If there is a purine-pyrimidine correlation, as proposed in some DNA
models, then there is only one bit per nucleotide, and it would be expected
that for equal information content a DNA molecule would have to be 10 times
as large as a protein molecule. This is, roughly speaking, what is observed.

We can now turn to a very different approach to the information content of a
protein and ask about the number of bits of information involved in, say, enzyme-substrate
combination or in an antigen-antibody combination. This can be answered directly
in one or two individual cases. If we take the action of urease on urea, and
assume that the urea molecule must lie in a definite orientation and have the
correct molecular dimensions, we can argue thus. To specify any substrate, say
urea, involves the specification that it contains 10 atoms (about three bits)
and that each atom be specifically identified (about one bit per atom), making
13 bits. About four more bits may be needed, because the 10 atoms can combine
in more than one way, and the right way (urea) must be chosen. Therefore the
selection of the substrate involves, roughly, 17 bits.

This amount of information is more than is needed for the enzymesubstrate combination
and subsequent reaction. Thus we have excluded all sorts of 10-atom combinations,
and indeed configurations of more than 10 atoms, and in so doing have required
information. However, the enzyme may actually react with many more combinations
of atoms that are chemically much too hard to test. For example, thiourea has
been totally excluded as a substrate, whereas, in fact, it is not excluded.

The problem, then, is to see how many of these 17 bits are needed if it is supposed
that the important point in the interaction between urea and urease is the existence
of complementary structures corresponding to the oxygen, carbon, and nitrogen
atoms, as indicated in Fig. 3-9.

If we consider the pattern alone, without regard to any chemical similarity
or attraction possibilities, then we can treat the carbon as the origin of a
grid of squares, say 1/5 A in size. If the requirement is that three particular
squares be occupied by an atom and there are about 25 squares to choose from,
then, since 25 = 2^{4.6}, 4.6 bits are required for each square, or 13.8 bits altogether.
If in place of a requirement of 1/5A and precision arrangement of all four major
atoms we substitute 1/3A and only three major atoms (only two if the carbon
is chosen as a starting point), then the number of bits shrinks to 6.5. The
reader can clearly see that in this direct approach the actual process critically
determines the amount of information deduced.

FIG. 3-9. Representation
of one way to estimate the information necessary in an enzyme-substrate
combination. The substrate is urea, and the specific surface of urease
is divided into a grid. The fineness of division of the grid needed to
specify the exact way in which the urea molecule must be placed will affect
the number of bits of information required. |

FIG. 3-10. Diagrammatic representations
of the features which enter into (a) enzyme-substrate, (b) antigen-antibody,
and (c) genotype-phenotype relations due to Quastler. In each case the
actual specific relation requires the common possession of a fraction
of the total information present in the two related molecules. |

A totally different viewpoint is possible. This is again due to Quastler. While
the above treatment gives a figure for the information needed for some one method
of operation, it is possible to ask how much information is biologically required
regardless of any operational method. One way to answer this question is to
ask how many enzymes there are, and then to say that any particular enzyme has
to be chosen from among this number. Another method is to determine the number
of possible substrates, and then to say that an enzyme must have the information
necessary to pick out one of these. Taking the first approach, there are about
600 enzymes that can readily be conceived. To pick one out of these takes nine
bits. The second approach requires some estimate of relative concentration.
Quastler points out that 90% of the dry weight includes only 61 classes of substrate,
and that 99% includes 300 classes of substrate. The selection of one substrate
thus corresponds to seven bits, so that from the point of view of biological
necessity, only seven to nine bits are involved in the enzyme-substrate process.

Of course, we cannot say that this last figure tells us the form of enzyme action.
But we can say that if the evolutionary process has led to the simplest method
that meets necessity, then highly elaborate and precision configurational matching
is not necessary, and so may not be found. This is taken up again from a different
point of view in Chapter 8.

A somewhat similar approach leads Quastler to the same kind of figure for antigen-antibody
relationship and for genie control of a phenotype. The three diagrams in Fig.
3-10 illustrate how the information may be conceived as being operative for
the enzyme-substrate, antigen-antibody, and gene-phenotype relationships. It
will be seen that in each case there are both irrelevant features and features
that might be concerned with specificity, plus the seven to nine bits concerned
with specific relations.

**3-10 Information content of a bacterial cell**.

Since a bacterial cell is actually the smallest object in which all the major
functions of life are found (viruses being essentially parasites so far as this
line of thought is concerned), an estimate of the information content of such
a cell is of interest. Two such estimates have been made, one by Morowitz in
terms of the direct approach of information theory and one by Linschitz based
on physical entropy and its relation to information.

The direct approach has been likened to preparing the instructions for constructing
a building from its component parts. If the building is to be made of bricks,
then a three-dimensional grid can be imagined, forming a three-dimensional honeycomb
of cells. To decide whether a brick is or is not in a cell is, then, the way
in which the amount of information is reduced to a number. Obviously it makes
a big difference whether the cellular framework set up in one's imagination
is coarse or fine, and one decision that has to be made is whether the framework
is too coarsely designed to describe the building, or whether it is so fine
as to include slight cracks in the bricks, which are not relevant. This is one
of the weaknesses of an estimate of information, but in spite of it, quite interesting
figures can be developed.

A bacterial cell is made of water and solid material. A bacterial spore has
nearly all its water content missing, and yet it can become vegetative and develop
into a bacterium. So it is reasonable, as Morowitz points out, to consider the
information content in the dry part. The problem then is to choose the right
atoms and put them in the right places. The instructions for doing so, in binary
form, are the information content. We give below a modification of Morowitz'
method, which is very direct but not as rigorous as his.

To determine which atom of the 60 kinds present (all atoms are not found in
living cells) should be chosen would appear to take nearly six bits. In fact,
however, the elements are far from evenly distributed, so that the average number
of binary choices to identify an atom is 1.5. We next need to know the number
of atoms. If the average atomic weight is taken as six, the average atomic mass
is 6x1.67x10^{-24} gm, and for a dry weight of 6x10^{-13} gm
this means there are 6x10^{10} atoms to locate. Now the question of
the fineness of the cellular structure has to be decided. If it is set at 2x10^{-10}
cm, the vibrational amplitude of a nucleus, and we remember that the average
spacing of atoms is about 2 A and we choose a cube of 8 A^{3} as the
region of uncertainty for an atom, the result would seem to fit with knowledge
of the nature of atomic and molecular formations. This volume, in cm3, is 8x10^{-24},
while that of vibrational amplitudes is 8x10^{-30}. The number of cells in our
mental honeycomb is then 8x10^{-24}/8x10^{-30} or 10^{6},
an uncertainty which takes 20 binary choices. The total per atom is then 21.5,
and since the total number of atoms is 6x10^{10} the total information
content is then 1.3x10^{12} or, in round numbers, 10^{12} bits.

Linschitz gave the figure 9.3x10^{-12} cal/deg or 9.3x10^{-12}x4.2 joules/deg for
the entropy of a bacterial cell. Using the relation H = S/(k In 2), we find
that the information content is

Morowitz' deduction from the work of Bayne-Jones and Rhees gives the lower
value of 5.6x10^{11} bits, which is still in the neighborhood of 10^{12}
bits. Thus two quite different approaches give rather concordant figures.

It must be pointed out that both methods tend to give high values. If the entropy
calculated from the caloric values of the nutrient process is actually very
wasteful, then the amount of entropy used to estimate information is lower.
In the second approach, it is quite possible that many of the actual locations
of atoms are not critical to life. Indeed, Holter has shown that a centrifuged
amoeba that has actually developed stratification of its components can still
live.

Even with these reservations, the value of 10^{12} bits is very high.
Morowitz has pointed out that the random concatenation of a bacterium from its
component atoms is very unlikely indeed. Those who wish to speculate on the
origin of life can speculate about these values and see what impact they have
on their prejudices.

We return to the method of estimating information. It is clear that the process
chosen for assembly is most important. Referring again to the analogy of constructing
a building, we can say that if walls are built by a process of pouring concrete
into molds instead of the brick-by-brick method, the amount of information needed
decreases. In the cell there is the possibility that certain growth patterns
are required to develop from others. Thus if a string of nucleoprotein is placed
in a mixture of a few enzyme molecules, salts, and amino acids and then a cell
has to result, the information content is far less. For a bacterium, an estimate
based on reasoning due to Dancoff and Quastler gives 10^{4} bits. If
such relative simplicity is at the base of living systems, then by all means
the inevitable synthetic processes should be searched for vigorously.

Even if such processes are found, it is likely that they conform to a general
principle first stated by Dancoff. Every sequential process is subject to error.
We can either imagine such sequential processes as highly precise, so that no
error at all can be tolerated, or we can say that the over-all demands on the
sequential processes are such that the maximum possible error is allowed. Put
in more specific terms, if a certain protein has to be synthesized up to 100
in number for the cell to work, then we can imagine either a process by which
a precision template produces all 100, exposing the system to errors 100 times,
or a process by which each protein divides, with a specific inhibitor being
present to prevent the division of any incorrectly formed protein. Then any
failure is stopped and the other proteins take over. This second process has
a form of checking included. Dancoff suggested that a guide to setting up hypotheses
about cellular action could be made in terms of choosing that hypothesis which,
while still fitting the facts, allowed for the greatest error that could possibly
result from its application.

One can apply this principle qualitatively to the question of whether protein
or other large molecules undergo the rapid turnover of the smaller amino acids.
If it is conceded that the process of exchange of one amino acid for another
can involve error, then each protein must be exposed to malformation many times
per second. Either this is unreasonable or the cell has a checking process of
tremendous scope and efficiency. So potent would this checking process be that
it should readily be observed in biochemical experiments. Until it is observed,
it is more plausible to cast doubt on rapid amino-acid turnover. The work of
Monod and Spiegelman on adaptive enzyme formation would seem to strengthen this
viewpoint.

We conclude this chapter with a few information figures of interest, pointed
out to us by Quastler.

Let us assume that analytical chemistry can distinguish one substance from 10^{7}
others, which involves 23 bits of information. Since the biological information
in a DNA molecule is of the order of 104 bits, the biological operation of DNA
is not likely to be seen in terms of analytical chemistry. On the other hand,
x-ray structural analysis can readily extract 10^{4} bits of information
from a crystal 1 mm across. Consequently, the entry of structural analysis into
biophysics is very rational.

If the figure of 10^{12} bits per bacterium is used, then 10^{9}
bits per second are produced as the bacterium grows and develops. This figure
is so high that it makes plausible some kind of analysis in terms of a controlling
set of nucleoprotein molecules. Under no circumstances can the growth of a bacterium
be looked on as producing less than 1000 bits per second. It is interesting
to compare this with the conscious handling of information by a human being,
which Quastler sets at 23 bits per second. A printed page is about 10^{4}
bits, although often it contains much redundancy - a small comfort to authors.

1. Note that for the previously considered case, where equal
expectation existed, p_{i} = 1/p in all cases and