Sc. Am Jan. 1991
|
In
theory, all one needs to know in order to fold a protein into its biologically
active shape is the sequence of its constituent amino acids. Why has
nobody been able to put theory into practice?
by Frederic M. Richards |
![]() |
|
FOLDING
of the protein thioredoxin is generally representative of how other
small proteins fold: the initially open, unstable chain (a) becomes
increasingly compact (b and c), ultimately adopting a spherical shape
(d). The intermediate stages shown are hypothetical, because their shapes-and
those of the intermediates of most proteins-are not known fully. White
represents carbon; red, oxygen; blue, nitrogen; and yellow, sulfur.
|
In the late 1950s Christian B. Anfinsen and his colleagues at the National
Institutes of Health made a remarkable discovery. They were exploring a long-standing
puzzle in biology: What causes newly made proteins which resemble loosely
coiled strings and are inactive-to wind into specifically shaped balls able
to perform crucial tasks in a living cell? In the process the team found the
answer was simpler than anyone had imagined.
It seemed the amino acid sequence of a protein, a one-dimensional trait, was
fully sufficient to specify the molecule's ultimate three-dimensional shape
and biological activity. (Proteins are built from a set of just 20 amino acids,
which are assembled into a chain according to directions embedded in the genes.)
Outside factors, such as enzymes that might catalyze folding, did not have
to- be invoked as mandatory participants.
The discovery, which has since been confirmed many times-at least for relatively
small proteins-suggested that the forces most responsible for proper folding
in the cell could, in theory, be derived from the basic principles of chemistry
and physics. That is, if one knew the amino acid sequence of a protein, all
that would have to be considered would be the properties of the individual
amino acids and their behavior in aqueous solution. (The interior of Most
cells is 70 to 90 percent water.)
In actuality, predicting the conformation of a protein on the basis of its
amino acid sequence is far from simple. More than 30 years after Anfinsen
made his breakthrough, hundreds of investigators are still at work on that
challenge, which has come to be widely known as the protein folding problem.
The solution is of more than academic interest. Many major hoped-for products
of the developing biotechnology industry are novel proteins. It is already
possible to design genes to direct the synthesis of such proteins. Yet failure
to fold properly is a common production concern.
For a time, those of us working on the folding problem despaired of ever finding
an answer. More recently, however, advances in theory and experiment, combined
with growing interest on the part of industry, have kindled new optimism.
Most of the detailed information available so far comes from studies on small,
water-soluble, globular proteins containing fewer than 300 or so amino acids.
The relative importance of various rules of folding and assembly may be somewhat
different for those proteins than for others-notably long fibrous proteins
and varieties residing in cellular membranes. Indeed, some large proteins
have recently been shown to need folding help from other proteins known as
chaperonins. The balance of the article will not consider such complexities
but will focus entirely on the unassisted folding reaction undergone by a
great many proteins.
It would be wonderful if researchers had an atomic-level microscope that could
take a movie of individual protein molecules folding up from their extended,
unstable state to their final, or native, state, which is more stable. From
a collection of movies, all aspects of the reaction pathways could be seen
directly. Unfortunately, no such instrument exists; investigators must fallback
on much less direct measurements and very careful reasoning.
One can gather helpful clues to the rules of folding by examining the three
dimensional structures of unfolded and fully folded proteins and by analyzing
the properties of individual amino acids and small peptides (linear chains
of amino acids). Fortunately, the architecture of hundreds of native proteins
has been determined by such imaging techniques as X-ray crystallography and,
more recently, nuclear magnetic resonance (NMR). Both techniques have advanced
dramatically in the past decade, as has theoretical work attempting to predict
folding mathematically by computer.
Isolated amino acids consist of a central carbon atom-called the alpha carbon-bound
to an amino group (NH2), a carboxyl group (COOH) and a side chain. The differences
among amino acids, then, stem from differences in their side chains, namely,
in shape, size and polarity. Shape and size affect the packing together of
amino acids in the final molecule. Polarity (or lack of it) determines the
nature and strength of interactions between amino acids in a protein and between
the protein and water.
For instance, polar amino acids interact strongly with one another in what
are called electrostatic interactions. The molecules are considered polar
if they carry a formal charge (owing to the loss or gain of one or more electrons)
or if they are electrically neutral overall but have localized regions where
positive or negative charges predominate. (Positive charges are contributed
by protons in atomic nuclei, negative charges by electrons surrounding the
nuclei.) Molecules are attracted when their oppositely charged regions are
close; they are repelled when like charged regions are close.
Nonpolar amino acids can also attract or repel one another, albeit more weakly,
because of what are called van der Waals forces. Electrons and protons vibrate
constantly, and the vibrations result in attractions between substances that
are near one another. The attraction turns into repulsion when the substances
are about to touch.
In aqueous solution, polar amino' acids tend to be hydrophilic; they attract
water molecules, which are quite polar. In contrast, nonpolar amino acids,
which generally include hydrocarbon side chains, tend to be hydrophobic: they
mix poorly with water and "prefer" to associate with one other.
Alternatively, one can think of them as being squeezed out of the water as
a consequence of the strong attraction between polar substances.
The peptide bond linking one amino acid to the next in,a sequence influences
folding as well; it markedly constrains the universe of possible conformations
that can be taken by the protein backbone (the repeating series of alpha carbons,
carboxyl carbons and amino nitrogens in a peptide chain). A peptide bond forms
when the carboxyl carbon of one amino acid binds with the amino nitrogen of
the next, releasing a molecule of water. The resulting strong linkage between
the connected amino acids-called residues once they are joined-is quite rigid.
Consequently, rotation about the peptide bond is severely limited. Indeed,
the atoms lying between alpha carbons are held in a single plane, so that
they essentially form a stiff plate. Folding of the peptide backbone is therefore
accomplished mainly by rotation of the plates around other bondsnamely, those
connecting the plates to the alpha carbons.
Examination of the peculiarities of denatured, or unfolded, proteins has added
still other hints to how folding is accomplished. Unfolded or newly formed
proteins are often called random coils, implying that no region of the backbone
looks significantly different from any other region. In fact, the chains are
probably never truly without some regions that are twisted, associated or
otherwise different from the rest of the molecule. Certain of these substructures,
which are probably unstable and fluctuating, might well serve as "seeds,"
around which stably sculpted regions eventually form.
Significantly more is known about the folded than the unfolded state. For
instance, most of the backbone of the compact, native molecule can be divided
into regions of secondary structure, which are distinct segments having characteristic
shapes. (The amino acid sequence is the primary structure.) The secondary
elements fall into three main categories: helices (mainly the so called alpha
helix), beta strands or beta sheets, and turns connecting the helices and
strands [see illustration on opposite page]. In beta strands the backbone
is extended, or stretched out; in beta sheets two or more parallel or antiparallel
strands are arranged in rows.
![]() |
|
AMINO ACIDS (a) are linked together in a protein (b) by a strong bond that forms between the carboxyl carbon of one amino acid and the amino nitrogen of the next. Because the resulting linkage, which is known as a peptide bond, holds the joined atoms rigidly in a plane, the bond limits the conformational options of the protein. Folding is accomplished mainly by rotation about the axes of the bonds connecting the central alpha carbon with the amino nitrogen and carboxyl carbon. |
Secondary
elements can combine with one another to form motifs, or supersecondary structures,
and the final assembly of all secondary elements is the tertiary structure.
Several tertiary classes have been identified, such as the all alpha-helix
class, the all betastrand class and particular arrangements of combinations
of helices and beta strands.
The presence of different secondary elements raises the possibility that certain
amino acids favor development of specific secondary arrangements. For example,
some amino acid residues are found more often in helices than elsewhere, whereas
others tend to be found in beta sheets. On the other hand, none of these or
other similar statistical correlations are strong.
Several other discoveries show that, as might be expected from the hydrophobic
and hydrophilic properties of the amino acids, the tendency of water and nonpolar
residues to avoid one another has a profound effect on the final shape of
a protein. The interior of native proteins is largely free of water and contains
mainly nonpolar, hydrophobic amino acids. Conversely, residues with formal
charges almost invariably reside at the surface, in contact with water. Polar
residues are found on both the outside and the inside, but in the interior
they are invariably joined to other polar groups by hydrogen bonds. Such bonding,
in which two atoms (usually nitrogen and oxygen) are joined through a shared
hydrogen atom, apparently enables the residues to remain comfortable in the
interior, away from water.
Hence, one rule of folding seems to be that contact between water and hydrophobic
amino acids must be limited as much as possible, although this genera) rule
is not sufficient to predict which specific residues will appear where. For
example, it is not possible to identify which nonpolar residues will remain
at the surface, as some fraction of them invariably do.
![]() |
|
DIFFERENCES
in the shape, size and polarity of amino acids derive from differences
in their side chains. In phenylalanine, for example, the side chain
is nonpolar and cyclic, whereas the side chain of arginine is both strongly
polar and linear.
|
Another general
rule, based on other analyses, posits important steric constraints. The final
product has to be packed efficiently, that is, the space must be filled without
having neighboring atoms overlap. Structural studies show that the amino acids
in folded proteins are generally packed about as tightly as other small organic
molecules pack together. Computer modelers can safely assume that in the final
protein (with a few rare exceptions), the lengths of bonds between atoms and
the angles between consecutive bonds will be identical to those that have
been found in smaller organic molecules.
Researchers agree on details of the structure of folded proteins, but they
diverge on many other points. There is, for instance, little agreement on
the nature and number of folding pathways.
At one extreme is the doubtful suggestion that a newly made protein tries
out all possible conformations until it finds the unique, stable structure
of the native protein. This proposal assumes that all conformations are equally
likely to be tested; yet they are not. Also, as was pointed out years ago
by Cyrus Levinthal, then at the Massachusetts Institute of Technology, no
molecule would have the opportunity to test anywhere near all the possible
conformations in the time it takes for proteins to fold-a few seconds at most.
At the other extreme is the notion that proteins follow a single, defined
pathway: every molecule of a given protein becomes compacted by following
one defined sequence of steps. Considering the great number of conformations
an unfolded molecule can adopt, that idea seems improbable as well. This hypothesis
is akin to the proposition that everyone will enter New York City via Interstate
95, regardless of where they start.
A third suggestion, which admits of one or more pathways, assumes the hydrophobic
effect is all important initially, much more so than electrostatic interactions
or space-filling concerns. This idea holds that the chain collapses rapidly
to approximately its final density in order to remove hydrophobic amino acids
from water. Then, in this much reduced space, it rapidly reorganizes itself
into the correct secondary and tertiary structures. From a mechanical point
of view, this scenario seems unlikely because the chain would have to open
up somewhat to permit the required movements. Nevertheless, the model has
some experimental support.
The best guess today is that secondary structure forms before most proteins
are able to compact extensively. Molecules of the same protein can follow
different pathways to the same end, but the choice of pathway is limited.
Various models along these lines have been proposed, including what is called
the framework model of Robert L. Baldwin of Stanford University and Peter
S. Kim, now at the Whitehead Institute for Biomedical Research.
In general, such models suggest that the unfolded chain rapidly forms marginally
stable bits of secondary structure. Some of these segments interact. If they
pack together particularly well, or form bonds readily, they stabilize one
another, at least for a time. The stabilized units, or microdomains, lead
the molecule toward greater structural organization by associating with other
segments or helping to bring distant segments into contact, or both.
Inherent in this kind of model is the assumption that the hydrophobic effect
is large but can be spent incrementally. Some fraction of its energy is expended
to influence the formation of secondary elements, and the rest promotes the
association of those elements into the tertiary conformation.
Knowing something about the structures that repeatedly appear en route to
the native state would help to clarify the rules of folding. Regrettably,
trapping intermediates is difficult, in part because folding is a highly cooperative
process. Interactions that promote folding by one part of the protein also
promote folding elsewhere in the molecule; hence, intermediate shapes do not
persist for long. Neverthelesg, clever techniques have captured or identified
some characteristics of a number of intermediates.
There is now firm evidence, for example, that certain proteins form an intermediate
that is larger than the native form of the protein and has its secondary structure
intact. Oleg Ptitsyn of the Institute for Protein Chemistry at Pustshino in
the Soviet Union calls this structure the "molten globule." The
existence of such a structure is puzzling, however.

Because the globule is larger in volume than the native molecule, it must
contain a considerable amount of water, and many of the side chains in the
globule do seem to be in contact with water. Yet the force of the hydrophobic
effect should be squeezing this water out. How can one have a stable, observable
intermediate under these conditions? What can its structure actually be? These
intriguing questions cannot yet be answered.
In other experiments, Thomas E. Creighton and his colleagues at the Medical
Research Council Laboratory of Molecular Biology in Cambridge, England, have
studied the folding of the protein pancreatic trypsin inhibitor (PTI), which,
like many proteins, forms internal disulfide bonds as it folds. A disulfide
bond is a sulfur-sulfur (S-S) linkage between the side chains of two cysteine
amino acid residues. Creighton and his co-workers unfolded the native product
and then started the folding reaction, interrupting it at different intervals.
Thdy thereby captured intermediates that could be identified by a particular
disulfide bond. In this way, a folding pathway was tracked for the first time.
The complete structures of the intermediates are not yet known in detail,
but the work has revealed that folding does not necessarily proceed along
a single, direct track. As PTI folds, intermediates having disulfide bonds
that do not exist in the final molecule appear and then disappear. In other
words, parts of the molecule apparently act something like a party host who
brings two well-matched strangers together and then, when the two are engrossed
in conversation, leaves them and mixes with other guests.

Studying a major proposed intermediate of the same PTI protein, Kim and Terrence
G. Oas, also at the Whitehead institute, have found evidence that even though
some parts of a molecule associate only transiently, other parts probably
do form structures that remain stable. From Creighton's work, they knew that
two specific stretches of the molecule become connected by a disulfide bond
early in the folding process and that the bond persists. They wondered wether
the supersecondary structure in the region around the bond also formed early
and persisted.
To answer the question, they chemically synthesized two separate fragments
of the protein, each including one of the two cysteines that participate in
the stable disulfide bond. The small peptides had no discernible structure
of their own, but when they joined in solution, they adopted a conformation
closely resembling that seen in the native chain.
This finding confirms that nativelike structures can indeed form early, and
it suggests that certain parts of the molecule may be more important than
other parts in initiating folding. The result also indicates that interactions
between apparently unstructured segments of a protein may facilitate the development
of secondary structure.
Intermediates are being studied by another ingenious method that capitalizes
on the many internal hydrogen bonds found in all native proteins. First normal
hydrogen atoms bound to the nitrogen involved in peptide bonds are exchanged
with a related atom-the hydrogen isotope deuterium (D)-by placing the chains
in heavy water, D20. Then folding is initiated.
As folding proceeds, what would have been hydrogen bonds become "deuterium"
bonds (N-D-0) instead. At some chosen time, normal water (H20) is substituted
for the heavy water. When that happens, any deuterium atoms not protected
by being in deuterium bonds trade places with hydrogen from the water. Folding
then continues to completion.
By identifying the regions of the compacted molecule that contain protected
deuterium, one can determine which parts folded before the others. Moreover,
a series of tests that progressively lengthen the time of transfer to water
can potentially reveal the order in which several different intermediates
form.
With this technique, Heinrich Roder of the University of Pennsylvania was
able to show that a first step in the folding of the protein cytochrome c
is the association of two helices at opposite ends of its chain. In a study
of ribonuclease, Baldwin and his Stanford colleague Jayart B. Udgaonkar showed
that the beta-sheet part of that enzyme-found in the middle of the molecule-forms
early. Suorrundings by themselves have yet to yield general rules of folding,
but they do highlight the power of the method for identifying folding pathways.
![]() |
|
PLAUSIBLE
MODEL of how proteins fold allows for several energetically favorable
pathways, although only two possibilities are shown. First the chain
forms regions of unstable structure (uncolored cylinders). By associating,
certain regions become stabilized (color). These stabilized microdomains
then facilitate the association of other regions and thus lead the molecule
toward increasing structural organization. Eventually, all pathways
lead to one or more "rate-limiting" intermediates, which all
give rise to the same final conformation for the protein.
|
Experimentation is not by any means limited to studies of intermediates. A
number of scientists are approaching the folding problem by wielding genetic
engineering technology as a tool to examine the effects of amino acid substitutions,
deletions and insertions, both on protein structure and on the folding process.
So far the experimental data, as well as computer simulations of amino acid
substitutions, indicate that replacement of one or even several residues usually
does not interfere with the development of proper architecture. In other words,
the answer to the folding problem lies not with a few key amino acids but
with some more global aspect of the amino acid sequence. In contrast, only
a small part of the entire molecule may be responsible for a protein's activity.
Single amino acid substitutions in the active region can dramatically affect
biological behavior even when the overall structure of the protein seems unaffected.
In a different kind of experiment, Siew Peng Ho and William F. DeGrado of
the du Pont Company and, separately, Jane S. and David C. Richardson of Duke
University are trying to design fully original proteins that will fold into
selected conformations. In this way, they are testing various hypotheses,
such as the proposal that certain sequences of hydrophobic and hydrophilic
amino acids are likely to form an alpha helix and then a cluster of interdigitating
helices (a helix bundle).
They have succeeded in making proteins of a specified architecture. Nevertheless,
researchers are still far from being able to predict the tertiary structure
of any given protein on the basis of its amino acid sequence if they lack
other information about the substance. heoretical endeavors complement the
experimental work. For example, the shape of a folded protein might in principle
be determined by a mathematical formula known as the potential-energy function.
One feeds a computer a host of numerical values that describe the strength
and other aspects of the attractions between all pairs of atoms in the protein
chain. Then the computer adjusts the coordinates of the atoms so that the
overall energy is lowered until a minimum is found-that is, until all further
changes result in an increase in energy. (The final structures of proteins
are generally assumed to represent the minimumenergy state.)
The function takes into account such factors as the influence on energy of
the length, stretching and twisting of bonds and the strength of electrostatic
interactions, hydrogen bonds and van der Waals forces. The approach has been
valuable for confirming or improving models of structures that were determined
experimentally.
For molecules whose final structures are a complete mystery, however, problems
arise. Certain of the numbers plugged into the equations may have large margins
of error. Furthermore, there is no way to know whether the reported energy
minimum represents the absolute minimum or simply an intermediate low-energy
state. At the moment, theory does not provide any way of ascertaining what
the absolute minimum value ought to be.
In a related approach that might eventually yield motion pictures of proteins
in the act of folding, Martin Karplus of Harvard University applies New- ,
ton's laws of motion to the atoms in a protein. The forces on the atoms of
a molecule in a given state are derived from the potential-energy function.
Then the computer calculates the acceleration of each atom and its displacement
at the end of an extremely short interval.
![]() |
|
CONCEPT
OF WATER-ACCESSIBLE SURFACE enables one to estimate the force exerted
by water on a molecule in aqueous solution. The accessible surface (outer
line)-shown for a thin slice through a protein-is determined by tracing
the path of the center of an imaginary water molecule as it rolls along
the protein's external atoms. The surface is conceptually divided into
water-loving (colored line) and water-hating (black line) parts, depending,
respectively, on whether the atoms are polar or nonpolar. A large water-hating
surface corresponds to a strong compressive force, and a large water-loving
surface corresponds to a strong expansive force. The compressive force
dominates when protein molecules are folding.
|
![]() |
By repeating
the process over a period controlled by the available computing power, the program
can reveal movements of the individual atoms. Consequently, it is now becoming
possible to identify the effects of small mutations on protein stability and
dynamic behavior. Yet limits on computing power make it impossible to track
more than a few nanoseconds in a molecule's life, a span too short to directly
reveal much about protein folding.
In spite of its limitations, theoretical work based on the potential-energy
function holds much promise. Studies involving the function should make it possible
to discern the relative importance of various forces acting on a protein, such
as electrostatic interactions and van der Waals repulsions. Teasing apart the
influences is critical because a folded protein is only marginally more stable
than an unfolded one. Hence, the factors that make the difference are likely
to be subtle. (The slight energy differential between the stable and unstable
state might reflect the need for a cell to inactivate proteins rapidly as its
needs change.)
In the long run, calculations involving the potential-energy function may well
succeed in predicting the tertiary structure of any protein from its amino acid
sequence. In the meantime, other less fundamental but still useful ways of thinking
about the folding problem have emerged.
Any resolution of that puzzle will have to include a way of defining the force
exerted on a protein molecule by water. In principle, estimates of the hydrophobic
effect are, or can be, embedded in the potential-energy function, but exactly
how best to accomplish that step is far from clear.
One method for analyzing the effect of water has emerged from work done by Byungkook
Lee at Yale University in 1971. Lee developed an algorithm to calculate the
solvent-accessible area of a protein of known structure-that part of the complex
surface in direct contact with water. On the basis of preliminary findings,
he and I suggested the algorithm would be useful in studying protein folding.
We divide the accessible area of an extended protein chain (or any selected
molecule) according to the nature of the atoms that contribute to the area.
Are they nonpolar and therefore hydrophobic (mainly carbon and sulfur atoms),
or are they polar and therefore hydrophilic (mainly nitrogen and oxygen atoms)?
The surface tension of water in contact with such atoms is known. This tension
is, as Cyrus Chothia of the Medical Research Council has pointed out, a direct
measure of the force exerted on the molecule by the solvent. Surface tension
is high when nonpolar molecules and water are in contact, just as it is when
oil is mixed with water-that is, a strong force tends to reduce the area of
contact between the water and the oil, and to squeeze a protein chain into a
ball. Tension is low when polar atoms and water are in contact, and the hydrophobic
effect is not seen.
Summation of the nonpolar accessible areas of an unfolded chain yields a measure
of the potential hydrophobic effect. In general, as might be expected from structural
analyses, the net force acting on most protein chains is large and positive,
tending to reduce contact with the solvent and thus to compact the chain.
Various investigators are also examining the extent to which packing considerations
direct folding. In one approach, lists have been made of the amino acid sequences
of molecules that adopt essentially the same three-dimensional conformation.
On the basis of the steric properties of the amino acids in the molecules-such
as shape and volume-Jay W. Ponder of Yale has generated other lists of amino
acid sequences that theoretically should adopt the same conformations.
Just how well those sequences actually fit their assigned classes is still being
determined experimentally, but many do seem to fit. This finding, together with
the profound influence of water, makes it conceivable that the hydrophobic effect
and steric considerations by themselves determine how a protein folds.
If that is the case, what is the role of long- and short-range electrostatic
interactions in protein folding? Undoubtedly, the contribution of such interactions
varies from protein to protein. For many proteins, large changes in the formal
charges can be made without significantly affecting the final overall structure.
Hence, it may be that electrostatic interactions are often more important for
stabilizing the final conformation than for forming it in the first place.
![]() |
| CONFORMATION of a fold in the interior of the protein crambin (left), depicted mainly as a chain of alpha carbons (orange), derives from the tight packing of five nonpolar amino acids (blue spheres). That conformation is maintained in a computer-generated "mutant" (right) even when four of the five amino acids are replaced with others. Indeed, many combinations of amino acids can be accommodated if the substitutes resemble the originals in shape and volume. Knowledge of how amino acids pack may go a long way toward predicting the shape of a protein. |
Determining whether this possibility is correct requires an ability to gauge
the strength of electrostatic interactions. Yet the mathematics is complicated
by the fact that atoms in a folding protein are often separated by water, which
can mute the long-distance attractions or repulsions in ways difficult to estimate
in the absence of detailed structural information. Moreover, as the protein
folds, the distances between the atoms constantly change, which adds further
complexity.
The precise effects of hydrophobic, steric and electrostatic interactions, then,
remain a matter of conjecture. Research into protein folding, however, is proceeding
enormously faster today than in the past. Those of us involved in the effort
still cannot "play the music," but we are rapidly learning certain
of the notes. That progress alone is heartening, as is knowing that a solution
to the folding problem will resolve a question of deep scientific interest and,
at the same time, have immediate application in biotechnology.
|
FREDERIC
M. RICHARDS is Sterling Professor of Molecular Biophysics and Biochemistry
at Yale University. Richards joined the Yale faculty in 1955, three years
after earning his doctorate from Harvard University.
|