Sc. Am Jan. 1991
In theory, all one needs to know in order to fold a protein into its biologically active shape is the sequence of its constituent amino acids. Why has nobody been able to put theory into practice?
by Frederic M. Richards
FOLDING of the protein thioredoxin is generally representative of how other small proteins fold: the initially open, unstable chain (a) becomes increasingly compact (b and c), ultimately adopting a spherical shape (d). The intermediate stages shown are hypothetical, because their shapes-and those of the intermediates of most proteins-are not known fully. White represents carbon; red, oxygen; blue, nitrogen; and yellow, sulfur.
In the late 1950s Christian B. Anfinsen and his colleagues at the National Institutes of Health made a remarkable discovery. They were exploring a long-standing puzzle in biology: What causes newly made proteins which resemble loosely coiled strings and are inactive-to wind into specifically shaped balls able to perform crucial tasks in a living cell? In the process the team found the answer was simpler than anyone had imagined.
It seemed the amino acid sequence of a protein, a one-dimensional trait, was fully sufficient to specify the molecule's ultimate three-dimensional shape and biological activity. (Proteins are built from a set of just 20 amino acids, which are assembled into a chain according to directions embedded in the genes.) Outside factors, such as enzymes that might catalyze folding, did not have to- be invoked as mandatory participants.
The discovery, which has since been confirmed many times-at least for relatively small proteins-suggested that the forces most responsible for proper folding in the cell could, in theory, be derived from the basic principles of chemistry and physics. That is, if one knew the amino acid sequence of a protein, all that would have to be considered would be the properties of the individual amino acids and their behavior in aqueous solution. (The interior of Most cells is 70 to 90 percent water.)
In actuality, predicting the conformation of a protein on the basis of its amino acid sequence is far from simple. More than 30 years after Anfinsen made his breakthrough, hundreds of investigators are still at work on that challenge, which has come to be widely known as the protein folding problem. The solution is of more than academic interest. Many major hoped-for products of the developing biotechnology industry are novel proteins. It is already possible to design genes to direct the synthesis of such proteins. Yet failure to fold properly is a common production concern.
For a time, those of us working on the folding problem despaired of ever finding an answer. More recently, however, advances in theory and experiment, combined with growing interest on the part of industry, have kindled new optimism.
Most of the detailed information available so far comes from studies on small, water-soluble, globular proteins containing fewer than 300 or so amino acids. The relative importance of various rules of folding and assembly may be somewhat different for those proteins than for others-notably long fibrous proteins and varieties residing in cellular membranes. Indeed, some large proteins have recently been shown to need folding help from other proteins known as chaperonins. The balance of the article will not consider such complexities but will focus entirely on the unassisted folding reaction undergone by a great many proteins.
It would be wonderful if researchers had an atomic-level microscope that could take a movie of individual protein molecules folding up from their extended, unstable state to their final, or native, state, which is more stable. From a collection of movies, all aspects of the reaction pathways could be seen directly. Unfortunately, no such instrument exists; investigators must fallback on much less direct measurements and very careful reasoning.
One can gather helpful clues to the rules of folding by examining the three dimensional structures of unfolded and fully folded proteins and by analyzing the properties of individual amino acids and small peptides (linear chains of amino acids). Fortunately, the architecture of hundreds of native proteins has been determined by such imaging techniques as X-ray crystallography and, more recently, nuclear magnetic resonance (NMR). Both techniques have advanced dramatically in the past decade, as has theoretical work attempting to predict folding mathematically by computer.
Isolated amino acids consist of a central carbon atom-called the alpha carbon-bound to an amino group (NH2), a carboxyl group (COOH) and a side chain. The differences among amino acids, then, stem from differences in their side chains, namely, in shape, size and polarity. Shape and size affect the packing together of amino acids in the final molecule. Polarity (or lack of it) determines the nature and strength of interactions between amino acids in a protein and between the protein and water.
For instance, polar amino acids interact strongly with one another in what are called electrostatic interactions. The molecules are considered polar if they carry a formal charge (owing to the loss or gain of one or more electrons) or if they are electrically neutral overall but have localized regions where positive or negative charges predominate. (Positive charges are contributed by protons in atomic nuclei, negative charges by electrons surrounding the nuclei.) Molecules are attracted when their oppositely charged regions are close; they are repelled when like charged regions are close.
Nonpolar amino acids can also attract or repel one another, albeit more weakly, because of what are called van der Waals forces. Electrons and protons vibrate constantly, and the vibrations result in attractions between substances that are near one another. The attraction turns into repulsion when the substances are about to touch.
In aqueous solution, polar amino' acids tend to be hydrophilic; they attract water molecules, which are quite polar. In contrast, nonpolar amino acids, which generally include hydrocarbon side chains, tend to be hydrophobic: they mix poorly with water and "prefer" to associate with one other. Alternatively, one can think of them as being squeezed out of the water as a consequence of the strong attraction between polar substances.
The peptide bond linking one amino acid to the next in,a sequence influences folding as well; it markedly constrains the universe of possible conformations that can be taken by the protein backbone (the repeating series of alpha carbons, carboxyl carbons and amino nitrogens in a peptide chain). A peptide bond forms when the carboxyl carbon of one amino acid binds with the amino nitrogen of the next, releasing a molecule of water. The resulting strong linkage between the connected amino acids-called residues once they are joined-is quite rigid.
Consequently, rotation about the peptide bond is severely limited. Indeed, the atoms lying between alpha carbons are held in a single plane, so that they essentially form a stiff plate. Folding of the peptide backbone is therefore accomplished mainly by rotation of the plates around other bondsnamely, those connecting the plates to the alpha carbons.
Examination of the peculiarities of denatured, or unfolded, proteins has added still other hints to how folding is accomplished. Unfolded or newly formed proteins are often called random coils, implying that no region of the backbone looks significantly different from any other region. In fact, the chains are probably never truly without some regions that are twisted, associated or otherwise different from the rest of the molecule. Certain of these substructures, which are probably unstable and fluctuating, might well serve as "seeds," around which stably sculpted regions eventually form.
Significantly more is known about the folded than the unfolded state. For instance, most of the backbone of the compact, native molecule can be divided into regions of secondary structure, which are distinct segments having characteristic shapes. (The amino acid sequence is the primary structure.) The secondary elements fall into three main categories: helices (mainly the so called alpha helix), beta strands or beta sheets, and turns connecting the helices and strands [see illustration on opposite page]. In beta strands the backbone is extended, or stretched out; in beta sheets two or more parallel or antiparallel strands are arranged in rows.
AMINO ACIDS (a) are linked together in a protein (b) by a strong bond that forms between the carboxyl carbon of one amino acid and the amino nitrogen of the next. Because the resulting linkage, which is known as a peptide bond, holds the joined atoms rigidly in a plane, the bond limits the conformational options of the protein. Folding is accomplished mainly by rotation about the axes of the bonds connecting the central alpha carbon with the amino nitrogen and carboxyl carbon.
elements can combine with one another to form motifs, or supersecondary structures,
and the final assembly of all secondary elements is the tertiary structure.
Several tertiary classes have been identified, such as the all alpha-helix
class, the all betastrand class and particular arrangements of combinations
of helices and beta strands.
The presence of different secondary elements raises the possibility that certain amino acids favor development of specific secondary arrangements. For example, some amino acid residues are found more often in helices than elsewhere, whereas others tend to be found in beta sheets. On the other hand, none of these or other similar statistical correlations are strong.
Several other discoveries show that, as might be expected from the hydrophobic and hydrophilic properties of the amino acids, the tendency of water and nonpolar residues to avoid one another has a profound effect on the final shape of a protein. The interior of native proteins is largely free of water and contains mainly nonpolar, hydrophobic amino acids. Conversely, residues with formal charges almost invariably reside at the surface, in contact with water. Polar residues are found on both the outside and the inside, but in the interior they are invariably joined to other polar groups by hydrogen bonds. Such bonding, in which two atoms (usually nitrogen and oxygen) are joined through a shared hydrogen atom, apparently enables the residues to remain comfortable in the interior, away from water.
Hence, one rule of folding seems to be that contact between water and hydrophobic amino acids must be limited as much as possible, although this genera) rule is not sufficient to predict which specific residues will appear where. For example, it is not possible to identify which nonpolar residues will remain at the surface, as some fraction of them invariably do.
DIFFERENCES in the shape, size and polarity of amino acids derive from differences in their side chains. In phenylalanine, for example, the side chain is nonpolar and cyclic, whereas the side chain of arginine is both strongly polar and linear.
rule, based on other analyses, posits important steric constraints. The final
product has to be packed efficiently, that is, the space must be filled without
having neighboring atoms overlap. Structural studies show that the amino acids
in folded proteins are generally packed about as tightly as other small organic
molecules pack together. Computer modelers can safely assume that in the final
protein (with a few rare exceptions), the lengths of bonds between atoms and
the angles between consecutive bonds will be identical to those that have
been found in smaller organic molecules.
Researchers agree on details of the structure of folded proteins, but they diverge on many other points. There is, for instance, little agreement on the nature and number of folding pathways.
At one extreme is the doubtful suggestion that a newly made protein tries out all possible conformations until it finds the unique, stable structure of the native protein. This proposal assumes that all conformations are equally likely to be tested; yet they are not. Also, as was pointed out years ago by Cyrus Levinthal, then at the Massachusetts Institute of Technology, no molecule would have the opportunity to test anywhere near all the possible conformations in the time it takes for proteins to fold-a few seconds at most.
At the other extreme is the notion that proteins follow a single, defined pathway: every molecule of a given protein becomes compacted by following one defined sequence of steps. Considering the great number of conformations an unfolded molecule can adopt, that idea seems improbable as well. This hypothesis is akin to the proposition that everyone will enter New York City via Interstate 95, regardless of where they start.
A third suggestion, which admits of one or more pathways, assumes the hydrophobic effect is all important initially, much more so than electrostatic interactions or space-filling concerns. This idea holds that the chain collapses rapidly to approximately its final density in order to remove hydrophobic amino acids from water. Then, in this much reduced space, it rapidly reorganizes itself into the correct secondary and tertiary structures. From a mechanical point of view, this scenario seems unlikely because the chain would have to open up somewhat to permit the required movements. Nevertheless, the model has some experimental support.
The best guess today is that secondary structure forms before most proteins are able to compact extensively. Molecules of the same protein can follow different pathways to the same end, but the choice of pathway is limited. Various models along these lines have been proposed, including what is called the framework model of Robert L. Baldwin of Stanford University and Peter S. Kim, now at the Whitehead Institute for Biomedical Research.
In general, such models suggest that the unfolded chain rapidly forms marginally stable bits of secondary structure. Some of these segments interact. If they pack together particularly well, or form bonds readily, they stabilize one another, at least for a time. The stabilized units, or microdomains, lead the molecule toward greater structural organization by associating with other segments or helping to bring distant segments into contact, or both.
Inherent in this kind of model is the assumption that the hydrophobic effect is large but can be spent incrementally. Some fraction of its energy is expended to influence the formation of secondary elements, and the rest promotes the association of those elements into the tertiary conformation.
Knowing something about the structures that repeatedly appear en route to the native state would help to clarify the rules of folding. Regrettably, trapping intermediates is difficult, in part because folding is a highly cooperative process. Interactions that promote folding by one part of the protein also promote folding elsewhere in the molecule; hence, intermediate shapes do not persist for long. Neverthelesg, clever techniques have captured or identified some characteristics of a number of intermediates.
There is now firm evidence, for example, that certain proteins form an intermediate that is larger than the native form of the protein and has its secondary structure intact. Oleg Ptitsyn of the Institute for Protein Chemistry at Pustshino in the Soviet Union calls this structure the "molten globule." The existence of such a structure is puzzling, however.
Because the globule is larger in volume than the native molecule, it must contain a considerable amount of water, and many of the side chains in the globule do seem to be in contact with water. Yet the force of the hydrophobic effect should be squeezing this water out. How can one have a stable, observable intermediate under these conditions? What can its structure actually be? These intriguing questions cannot yet be answered.
In other experiments, Thomas E. Creighton and his colleagues at the Medical Research Council Laboratory of Molecular Biology in Cambridge, England, have studied the folding of the protein pancreatic trypsin inhibitor (PTI), which, like many proteins, forms internal disulfide bonds as it folds. A disulfide bond is a sulfur-sulfur (S-S) linkage between the side chains of two cysteine amino acid residues. Creighton and his co-workers unfolded the native product and then started the folding reaction, interrupting it at different intervals. Thdy thereby captured intermediates that could be identified by a particular disulfide bond. In this way, a folding pathway was tracked for the first time.
The complete structures of the intermediates are not yet known in detail, but the work has revealed that folding does not necessarily proceed along a single, direct track. As PTI folds, intermediates having disulfide bonds that do not exist in the final molecule appear and then disappear. In other words, parts of the molecule apparently act something like a party host who brings two well-matched strangers together and then, when the two are engrossed in conversation, leaves them and mixes with other guests.
Studying a major proposed intermediate of the same PTI protein, Kim and Terrence G. Oas, also at the Whitehead institute, have found evidence that even though some parts of a molecule associate only transiently, other parts probably do form structures that remain stable. From Creighton's work, they knew that two specific stretches of the molecule become connected by a disulfide bond early in the folding process and that the bond persists. They wondered wether the supersecondary structure in the region around the bond also formed early and persisted.
To answer the question, they chemically synthesized two separate fragments of the protein, each including one of the two cysteines that participate in the stable disulfide bond. The small peptides had no discernible structure of their own, but when they joined in solution, they adopted a conformation closely resembling that seen in the native chain.
This finding confirms that nativelike structures can indeed form early, and it suggests that certain parts of the molecule may be more important than other parts in initiating folding. The result also indicates that interactions between apparently unstructured segments of a protein may facilitate the development of secondary structure.
Intermediates are being studied by another ingenious method that capitalizes on the many internal hydrogen bonds found in all native proteins. First normal hydrogen atoms bound to the nitrogen involved in peptide bonds are exchanged with a related atom-the hydrogen isotope deuterium (D)-by placing the chains in heavy water, D20. Then folding is initiated.
As folding proceeds, what would have been hydrogen bonds become "deuterium" bonds (N-D-0) instead. At some chosen time, normal water (H20) is substituted for the heavy water. When that happens, any deuterium atoms not protected by being in deuterium bonds trade places with hydrogen from the water. Folding then continues to completion.
By identifying the regions of the compacted molecule that contain protected deuterium, one can determine which parts folded before the others. Moreover, a series of tests that progressively lengthen the time of transfer to water can potentially reveal the order in which several different intermediates form.
With this technique, Heinrich Roder of the University of Pennsylvania was able to show that a first step in the folding of the protein cytochrome c is the association of two helices at opposite ends of its chain. In a study of ribonuclease, Baldwin and his Stanford colleague Jayart B. Udgaonkar showed that the beta-sheet part of that enzyme-found in the middle of the molecule-forms early. Suorrundings by themselves have yet to yield general rules of folding, but they do highlight the power of the method for identifying folding pathways.
PLAUSIBLE MODEL of how proteins fold allows for several energetically favorable pathways, although only two possibilities are shown. First the chain forms regions of unstable structure (uncolored cylinders). By associating, certain regions become stabilized (color). These stabilized microdomains then facilitate the association of other regions and thus lead the molecule toward increasing structural organization. Eventually, all pathways lead to one or more "rate-limiting" intermediates, which all give rise to the same final conformation for the protein.
Experimentation is not by any means limited to studies of intermediates. A number of scientists are approaching the folding problem by wielding genetic engineering technology as a tool to examine the effects of amino acid substitutions, deletions and insertions, both on protein structure and on the folding process.
So far the experimental data, as well as computer simulations of amino acid substitutions, indicate that replacement of one or even several residues usually does not interfere with the development of proper architecture. In other words, the answer to the folding problem lies not with a few key amino acids but with some more global aspect of the amino acid sequence. In contrast, only a small part of the entire molecule may be responsible for a protein's activity. Single amino acid substitutions in the active region can dramatically affect biological behavior even when the overall structure of the protein seems unaffected.
In a different kind of experiment, Siew Peng Ho and William F. DeGrado of the du Pont Company and, separately, Jane S. and David C. Richardson of Duke University are trying to design fully original proteins that will fold into selected conformations. In this way, they are testing various hypotheses, such as the proposal that certain sequences of hydrophobic and hydrophilic amino acids are likely to form an alpha helix and then a cluster of interdigitating helices (a helix bundle).
They have succeeded in making proteins of a specified architecture. Nevertheless, researchers are still far from being able to predict the tertiary structure of any given protein on the basis of its amino acid sequence if they lack other information about the substance. heoretical endeavors complement the experimental work. For example, the shape of a folded protein might in principle be determined by a mathematical formula known as the potential-energy function. One feeds a computer a host of numerical values that describe the strength and other aspects of the attractions between all pairs of atoms in the protein chain. Then the computer adjusts the coordinates of the atoms so that the overall energy is lowered until a minimum is found-that is, until all further changes result in an increase in energy. (The final structures of proteins are generally assumed to represent the minimumenergy state.)
The function takes into account such factors as the influence on energy of the length, stretching and twisting of bonds and the strength of electrostatic interactions, hydrogen bonds and van der Waals forces. The approach has been valuable for confirming or improving models of structures that were determined experimentally.
For molecules whose final structures are a complete mystery, however, problems arise. Certain of the numbers plugged into the equations may have large margins of error. Furthermore, there is no way to know whether the reported energy minimum represents the absolute minimum or simply an intermediate low-energy state. At the moment, theory does not provide any way of ascertaining what the absolute minimum value ought to be.
In a related approach that might eventually yield motion pictures of proteins in the act of folding, Martin Karplus of Harvard University applies New- , ton's laws of motion to the atoms in a protein. The forces on the atoms of a molecule in a given state are derived from the potential-energy function. Then the computer calculates the acceleration of each atom and its displacement at the end of an extremely short interval.
CONCEPT OF WATER-ACCESSIBLE SURFACE enables one to estimate the force exerted by water on a molecule in aqueous solution. The accessible surface (outer line)-shown for a thin slice through a protein-is determined by tracing the path of the center of an imaginary water molecule as it rolls along the protein's external atoms. The surface is conceptually divided into water-loving (colored line) and water-hating (black line) parts, depending, respectively, on whether the atoms are polar or nonpolar. A large water-hating surface corresponds to a strong compressive force, and a large water-loving surface corresponds to a strong expansive force. The compressive force dominates when protein molecules are folding.
the process over a period controlled by the available computing power, the program
can reveal movements of the individual atoms. Consequently, it is now becoming
possible to identify the effects of small mutations on protein stability and
dynamic behavior. Yet limits on computing power make it impossible to track
more than a few nanoseconds in a molecule's life, a span too short to directly
reveal much about protein folding.
In spite of its limitations, theoretical work based on the potential-energy function holds much promise. Studies involving the function should make it possible to discern the relative importance of various forces acting on a protein, such as electrostatic interactions and van der Waals repulsions. Teasing apart the influences is critical because a folded protein is only marginally more stable than an unfolded one. Hence, the factors that make the difference are likely to be subtle. (The slight energy differential between the stable and unstable state might reflect the need for a cell to inactivate proteins rapidly as its needs change.)
In the long run, calculations involving the potential-energy function may well succeed in predicting the tertiary structure of any protein from its amino acid sequence. In the meantime, other less fundamental but still useful ways of thinking about the folding problem have emerged.
Any resolution of that puzzle will have to include a way of defining the force exerted on a protein molecule by water. In principle, estimates of the hydrophobic effect are, or can be, embedded in the potential-energy function, but exactly how best to accomplish that step is far from clear.
One method for analyzing the effect of water has emerged from work done by Byungkook Lee at Yale University in 1971. Lee developed an algorithm to calculate the solvent-accessible area of a protein of known structure-that part of the complex surface in direct contact with water. On the basis of preliminary findings, he and I suggested the algorithm would be useful in studying protein folding.
We divide the accessible area of an extended protein chain (or any selected molecule) according to the nature of the atoms that contribute to the area. Are they nonpolar and therefore hydrophobic (mainly carbon and sulfur atoms), or are they polar and therefore hydrophilic (mainly nitrogen and oxygen atoms)?
The surface tension of water in contact with such atoms is known. This tension is, as Cyrus Chothia of the Medical Research Council has pointed out, a direct measure of the force exerted on the molecule by the solvent. Surface tension is high when nonpolar molecules and water are in contact, just as it is when oil is mixed with water-that is, a strong force tends to reduce the area of contact between the water and the oil, and to squeeze a protein chain into a ball. Tension is low when polar atoms and water are in contact, and the hydrophobic effect is not seen.
Summation of the nonpolar accessible areas of an unfolded chain yields a measure of the potential hydrophobic effect. In general, as might be expected from structural analyses, the net force acting on most protein chains is large and positive, tending to reduce contact with the solvent and thus to compact the chain.
Various investigators are also examining the extent to which packing considerations direct folding. In one approach, lists have been made of the amino acid sequences of molecules that adopt essentially the same three-dimensional conformation. On the basis of the steric properties of the amino acids in the molecules-such as shape and volume-Jay W. Ponder of Yale has generated other lists of amino acid sequences that theoretically should adopt the same conformations.
Just how well those sequences actually fit their assigned classes is still being determined experimentally, but many do seem to fit. This finding, together with the profound influence of water, makes it conceivable that the hydrophobic effect and steric considerations by themselves determine how a protein folds.
If that is the case, what is the role of long- and short-range electrostatic interactions in protein folding? Undoubtedly, the contribution of such interactions varies from protein to protein. For many proteins, large changes in the formal charges can be made without significantly affecting the final overall structure. Hence, it may be that electrostatic interactions are often more important for stabilizing the final conformation than for forming it in the first place.
|CONFORMATION of a fold in the interior of the protein crambin (left), depicted mainly as a chain of alpha carbons (orange), derives from the tight packing of five nonpolar amino acids (blue spheres). That conformation is maintained in a computer-generated "mutant" (right) even when four of the five amino acids are replaced with others. Indeed, many combinations of amino acids can be accommodated if the substitutes resemble the originals in shape and volume. Knowledge of how amino acids pack may go a long way toward predicting the shape of a protein.|
Determining whether this possibility is correct requires an ability to gauge the strength of electrostatic interactions. Yet the mathematics is complicated by the fact that atoms in a folding protein are often separated by water, which can mute the long-distance attractions or repulsions in ways difficult to estimate in the absence of detailed structural information. Moreover, as the protein folds, the distances between the atoms constantly change, which adds further complexity.
The precise effects of hydrophobic, steric and electrostatic interactions, then, remain a matter of conjecture. Research into protein folding, however, is proceeding enormously faster today than in the past. Those of us involved in the effort still cannot "play the music," but we are rapidly learning certain of the notes. That progress alone is heartening, as is knowing that a solution to the folding problem will resolve a question of deep scientific interest and, at the same time, have immediate application in biotechnology.
FREDERIC M. RICHARDS is Sterling Professor of Molecular Biophysics and Biochemistry at Yale University. Richards joined the Yale faculty in 1955, three years after earning his doctorate from Harvard University.