Sequence Space:
The Final Frontier
Boldly going where not even nature has gone before to explain why Earth’s biochemistry is so well suited for life.
by Grame Stemp-Morlock & Zeeya Merali
October 24, 2008
Imagine, for a moment, a parallel universe with an alternative version of Earth, where biological molecules evolved slightly differently: a tweak to a protein here, a nudge to an amino acid there. Even tiny changes could have had profound implications for the development of biochemistry and for the appearance of life.
Lucky for us, that wasn’t the path taken on our planet. But why wasn’t it? Just how did nature manage to create all the right building blocks for life-as-we-know-it, instead of taking one of the myriad of alternate turns?
You may think that biologists have that question covered. After all, Charles Darwin’s theory of natural selection describes how the mutations and adaptations that are best suited to the environment survive over time. But why biological molecules—such as proteins, enzymes, and even DNA—have their particular structures is a mystery that can’t simply be explained by looking at the conditions in their environment.
To pin down the origins of new protein structures,
Thom LaBean and
Erik Schultes at Duke University along with
Peter Hraber of the Los Alamos National Laboratory are voyaging into unchartered territory. They are investigating the vast set of protein sequences that have never been tested by biology, that is, the paths that nature didn’t take.
TWIST OR FOLD?What determines the
shape of the triose
phosphate isomerase
protein? Just how different protein structures arise is an important question for Darwinian evolution to answer, says
Johnjoe McFadden, a biologist at the University of Surrey, UK. Different protein folds are very distinct, and it’s tough to see how one configuration would mutate into another. "According to Darwinian evolution, each step you take—each mutation—has to be useful," he explains. But at first sight, the intermediate mutations needed take you between two different protein structures do not seem to serve any function.
"These intermediate configurations are not seen in nature—they are missing links," says McFadden.
If we want to understand why large molecules have the structure that they do today, we need to move away from our "biocentric" view of proteins, says LaBean. Biologists spend a lot of time looking at every individual protein that has been found, he explains. "In the end you put all that information together and you think you know something about proteins," he says.
People have been working blindly, taking the few sequences that nature has given us.
- Erik Schultes
LaBean and his colleagues, however, want to look at the bigger picture, which includes the options that nature has yet to examine: "We think about sequence space, the space of all possible sequences, not just the ones sampled by biological evolution."
Sequence space encompasses every possible arrangement of units in a molecule, for example, every arrangement of
amino acids in a protein, or every sequence of
nucleotide bases that could be strung together to make up a DNA molecule. Some sequences are being used in DNA, RNA, or other biological molecules, in the natural world, and some aren’t. The team is interested in them all. By determining what fraction of the possible sequences are functional, it will help biologists understand how evolution works so efficiently.
THE TELEPORTERThree neutral networks
(yellow, green, and blue)
linked by red "portals."
The highest network has
the greatest functionality.Credit: Inman Harvey For this research team, sequence space is the final frontier—and it’s very, very big. DNA and RNA are made up of four nucleotide bases. If you want to string these bases together to create an RNA sequence that is just 100 nucleotides long, there are a daunting 4
100, or around 10
60, possible sequences—more than the number of electrons in the universe. "That’s a very large number, and that’s usually where people stop thinking about sequence space," says Schultes.
But LaBean and Schultes didn’t stop there. They are using sequence space to explain how the various folds or shapes of macromolecules are connected, and how, why, and when proteins snap between different structures.
"People have been working blindly or in the dark, taking the few sequences that nature has given us, and trying to understand the general principles of folding, structure and function," said Schultes. However, the random sequences that nature hasn’t used have been neglected and could provide us with important clues, he says. "We feel it’s essential that we understand the global structure of sequence space, before we can say anything general about the way these molecules fold and take on structure."
Evolution hasn’t come to a stop, everything is still evolving.
- Thom LaBean
The idea hinges on the fact that a molecule can undergo mutation without changing shape. For example, the sequence of amino acids in a protein can change significantly, while maintaining the overall shape of the protein. Such mutated versions of molecules that retain the same shape are said to belong to the same neutral network within sequence space. LaBean and Schultes believe that neutral networks could also be the key to explaining the diversity of different protein shapes. They explain that neighboring neutral networks—each representing different protein shapes—can exist relatively close together in sequence space and sometimes intersect.
What this basically means is that a sequence can keep mutating for a while, without changing the overall shape of the protein. This corresponds to the protein stepping along its neutral network, one mutation at a time. But, at certain points along its path, the protein is sufficiently close to a neighboring network—and just a few mutations are enough to enable it to hop across. The protein’s shape dramatically shifts (
see diagram).
"It’s as though the protein teleports between shapes, without having to take on any non-functional shapes in between," says McFadden.
The team has been awarded an FQXi grant of over $134,000 to help them map sequence space. Along with Schultes and Hraber, LaBean plans to take the first detailed look at how a neutral network connects with its various neighbors (see "Mapping Mutations", right).
And to extend the map further, all they need to do is to repeat the process for other sequences on the neutral network, and keep voyaging outwards...
Inman Harvey, and expert on neutral networks at the University of Sussex, UK, welcomes the experiments: "There are still some big "ifs" in the scenario, so it’s important to test if these neutral networks are interconnected, as required, or separated."
With his colleague
Adrian Thompson, also at Sussex, Harvey has already investigated neutral networks in a decidedly non-biological context—building silicon chips. They use "artificial evolution" to come up with the best layout for silicon chips, optimizing the chips’ function. In this synthesized environment, they see evolution progressing along neutral networks.
McFadden thinks that neutral networks are an interesting possible answer to a tough question. "It’s certainly important to investigate this, if we want to understand the origin of life," he says.
LaBean’s team believes that the new approach opens the door to an entirely new kind of experimental approach to understanding evolution.
"Historically there has been a strange viewpoint that the structures and sequences that we observe in biology are somehow optimal," says LaBean. "But it’s really just where we are now. Evolution hasn’t come to a stop. Everything is still evolving."