Kurt Wüthrich (2012) - Structural Genomics - Exploring the Protein Universe

Good morning, I'm using the word universe which is a dangerous thing to do within the overall program of this meeting. So, in a couple of minutes I will try to explain to you what I mean with the word universe. But first I would give in to the encouragement that regards to provide our young colleagues with some glimpse of the sort of impact that has influenced our careers. In my case it was the fact that I studied sports at the university. And the transition from sports to science meant that I was immediately giving up the feeling of instant gratification. When you are in the discipline of high-jumping, you know instantly whether or not you have been successful. If you are in scientific research, it may take twenty years until your peers have acknowledged that you have made an advance. Though research is a great experience it has been a great experience for me for many decades. Every day there is something new but if I needed instant gratification I go out and play soccer. For this purpose I actually bought a house next to the soccer stadium near Zürich and so I can still have my instant gratification without investing too much time. Let me try to explain to you what I mean with the word ‘protein universe’. To make this understandable I have to remind you of some high school knowledge about biology. There are two main classes of biological macromolecules. On the one hand nucleic acids, on the other hand proteins. Now, I only need to talk about DNA, deoxyribonucleic acids and proteins today. And there is this simple drawing which goes back to Francis Crick in the late 1950s. We refer to it as the central dogma of molecular biology. And it simply says that we have three types of biological macromolecules. DNA contains the information. Proteins express the information in terms of function. And RNA has now joint proteins in being functional but it has long been thought to just be an intermediary stage between transfer of information from the DNA to the functional proteo. Now, about fifteen years ago a major revolution has happened. And this revolution is due to the fact that it has become possible to sequence DNA very efficiently, which now means that the information on which our functioning is based consists of about 1.8 metres of DNA. That’s it. We know the sequence of the DNA. And it is simply a linear chain. That’s the chemistry of DNA. And this chemistry of DNA is translated into the chemistry of proteins. And proteins again are linear chains. So, here you now have a very simple picture of what I referred to as the protein universe. This red circle represents the number of protein sequences that are known today. A year ago it was about 14 million, today it is about 17 million. Now what does this mean? This means that we know the genomic sequences. We know the complete sequence of the DNA in humans, in the mouse, in the cow and in about 1,400 additional higher and lower organisms. But knowledge on the level of the DNA has its limitations. And what happens? We have these 1.8 metres of DNA from human beings. Some bioinformatics specialists will now go and identify pieces of that DNA of which by past experience we suppose that they encode for a protein. But as long as we are working with these genomic sequences, genomic meaning sequence on the level of the DNA, we don’t know whether these proteins can exist. We don’t know whether they are ever expressed. We are not even always sure that the identification of the genes by so-called annotation is correct. Very often it is not very precise. So, in order to really work with the protein universe we should be able to cover this space of now about 17 million sequences with additional information. This additional information is the three dimensional structures of the protein. Because only when we know about the three dimensional structures, can we understand what is happening on the level of the proteo. Proteo meaning the ensemble of all proteins in a living being. And that’s what we do in structural genomics. We try to fill this rather large sequence base with three dimensional structures. So, from genomic sequences we want to go to three dimensional structures. That is we want to find out how these linear chains of the proteins are arranged in space. And why do we want to know? Why do we need to know about this? I gave you just two examples. You have here a few selected protein functions. Now remember all of these proteins are linear chains built of the same building blocks separated only by the arrangement of some chemical groups that hang onto the linear chain. Nonetheless functions performed by proteins extend from protective functions to catalysis, to regulation, to transport and so on and so forth. Now just think of the following. Our hair is protein, our skin is protein. In our stomach we have enzymes that help to digest our food. Enzymes must be soluble in water. Otherwise they don’t function. Now, if you only have a linear sequence it’s difficult to understand why protein on our head should be resistant to being dissolved in water whereas enzymes must be dissolved in water. Just imagine if we didn’t have different forms of these chains, you would not walk into this room after the next rain, you would be gone. Let me get a little bit more specific as to showing you why we want to have three dimensional structures. I take an example from some biomedical research that we preformed two decades ago on the drug cyclosporin A. This was the drug that opened the door to transplantation in human medicine because it suppresses the immune response to the foreign tissue. The receptor for this drug... Here you have the drug molecule in functional colours. In light blue you have the receptor. Fortunate for us it’s a relatively small protein so the structure could dissolve. Now once you have such a three dimensional structure you can take the drug out of its binding site, study the binding site, go back to the chemistry and start thinking rationally about how you might modify... This is the chemical structure of that drug. So you might find that when you inspect the binding site for the drug you might decide I better cut this part of the molecule off. Now get the type to fit, I may be able to reduce dosage of the medication. I may reduce side effects that are unwanted. And that’s the sort of thinking that can be based on knowledge of the three dimensional structure. And if your knowledge is reduced to linear knowledge on the level of the genome then you have no possibility to think in such rational terms about correlations between the structure of the protein and its function and possibly influencing its function in drug development, in other applications, agriculture or to improve the quality of soaps to wash clothes and so on and so forth. Next topic. How do we obtain three dimensional structures? There are two key methods that provide atomic resolution structures of large molecules. One is X-ray diffraction with protein crystals; the other is NMR with protein solutions. NMR does not stand for No Meaningful Results. It stands for Nuclear Magnetic Resonance. The important thing is that this technique can be used with protein solutions. Just remember, many proteins are found in body fluids and body fluids are solutions of proteins. So we can look at these proteins and the conditions that can be very close to the conditions in body fluids. Now, how did structural biologists historically approach the protein universe? I'm now talking about the situation in around 1935/1940. There was no idea what the protein universe was. So Max Perutz started to work with haemoglobin in 1936. You see, Max Perutz he’s an Austrian scientist who was awarded the Nobel Prize in Chemistry in 1962 jointly research on Kendrew for having solved the first three dimensional protein structure. Now how did Max Perutz choose the object for his studies? He chose haemoglobin because haemoglobin was plentiful and it is red. There were no great methods to purify proteins at that time so it was important that the protein was red. Because if you lost it, the colour was gone and you knew you were in trouble. You could also get one litre of blood from a horse and make a gram of haemoglobin relatively easily. And that’s how the early targets for structural biology were chosen. Prior to haemoglobin hair, especially wool from sheep was studied, by X-ray crystallography. It actually gave the first information on the structure of polypeptide chains. Then came haemoglobin and this is the state of the art of X-ray crystallography at the time the first Nobel Prize was awarded in 1962. And you see, even my career started with haemoglobin. This is a very old slide. That’s why I have not had it remade to impress on you. It’s very old, almost as old as I am. And it’s my own haemoglobin. And I recorded this spectrum in 1968 and it meant immediate career success for me. It made my scientific career. And then you see, then you’ll need an artist to make drawings of haemoglobin and to show it to a larger community what is behind such strange peaks that you can see in this case by NMR. Now, today the selection of the targets is done very differently. You see over the years starting with the work by Max Perutz in 1936, over the years, one would take a few proteins without knowing about this big red circle because they were of interest. For example one day we had mad cow disease so we solved the protein that’s involved in the onset of mad cow disease. Or you know there is trypsin in your stomach, all kind of trypsin, so these were available in relatively large quantities so these structures were solved. And by about 1990 two-hundred such structures had been chosen at random. Now today, all of a sudden we know what should be there. We know that the human genome contains approximately 25,000 genes that encode proteins. But that of course would never make up for these 17 million protein sequences that we have in the database today. What this comes from, the incredible rapid growth of this protein universe is due to in the last few years is that the sequence apparatus, the sequencing methods have been applied to microorganisms. If we start to study the genomic information that is contained in the microbes that live in our gut then we have a million times as much genetic information there as we have in the human genome. So if people start to talk about personalised medicine based on knowledge of the human genome and dismiss all the genetic information that’s contained in the microbes that live in our body, in the mouth, all human body openings, they are just kidding themselves and ourselves if you believe them. And this is the sequencing of these microbiomes, not only in the human body but the microbiome in the sea. Craig Venter went out with his boat, caught microbes in the sea, in the Pacific, sequenced those that added another 2 million proteins to this genomic data bank. And then you take one kilogram of soil and you have another 2 million sequences of microbes if you go at these two pounds of earth. That’s how the universe of protein sequences has for our sake... To talk about explosion and large numbers here is not so easy, but for us this is a real explosion. And so today we now select our proteins very differently. They never have been seen, we can just take... Let’s say it’s a human proteo, cut it up into genes and get these genes into e-coli, express those proteins and get the structures. And then all of a sudden we have hundreds, it’s actually thousands of structures that are now in a data bank of proteins which we know nothing. We just have the structure and we are now searching for new functions. That means we are now moving out into the red circle. And instead of being limited to for example looking for targets for drug design in a very limited selection of proteins that are for one reason or other the obvious choices in structural biology, we now all of a sudden move somewhere into that red circle and find new targets for drug design. These may be targets that are not around in the adult human body. It may be targets that are around only before birth or they are only around in a particular state of disease or in very old age. Targets for which we do not have mouse models and simulate. And so we are expanding very much the scope of the landscape within which we can go to lots of applications in biomedical research, in agriculture and so on and so forth. So we are now working out here, we are making excursions into that red territory and find new possibilities to apply the knowledge about the proteo. And it is not a static feature. It is expanding very fast. I gave you one example which is possibly the biggest success at this point of the whole Structural Genomics Initiative and that is the fact that cheap protein coupled receptors are now here. Until five years ago there was no structure around of a cheap protein coupled receptor. On the other hand, it was known that at least 40% of all approved drugs target GPCRs. Now thanks to the efforts of this Structural Genomics Initiative we now know how GPCRs look. And for example it is a very exciting part of the story. This is the structure of a GPCR. It’s about 40 angstroms in length. You bind any one of these drugs on the extra cellular end and over a distance of about 35 angstroms a signal is transmitted through the protein and given up to additional downstream proteins on the inter-cellular side, So it’s highly exciting now to know that as of today 14 different structures of GPCRs. And that is one of these very important far reaching excursions into the red space of genomic protein sequences. Well, the last topic I want to address is: How do we solve structures? I see I'm very limited in time, therefore I go very fast. Okay, we need a magnet, we need a glass tube to put the protein in, we get a spectrum. Because proteins are big there are lots of spins so there’s a lot of overlap. So you have to develop two dimensional NMR. Then you spread out the lines into two dimensions. If you make a block you see that there’s a lot of information. Today we use seven dimensional experiments. That means we artificially generate six time dimensions in addition to the ongoing time dimension. And then we have to develop some, well, distance geometry algorithms and things. All that work was done by physicists although it has to do with biology. And then you calculate three dimensional structures. Now, to be respectable here one has to mention Einstein. I would look very bad if I didn’t mention Einstein. Now Einstein is very important for solution NMR. The one thing I can say is that the important work that Einstein has done was done 30 km from the place where I grew up. That’s one thing. Then physicists always talk about relativity theory and such things. But Einstein also did important work. You see, in 1905 when he was working in the patent office in Bern 30 km from my birth place he published four papers. One is on the relativity, one is of the photo-electric effect but the two really important papers are on the Brownian motion. At the start of statistical mechanics. There is one paper on translational Brownian motion and there is another that was in May 1905 and there is another paper on rotational Brownian motion that was published in December 1905. What does this have to do with our work of using NMR with biological macromolecules? Well, if you have a relatively large particle that’s subjected to the thermal motion of the solvent, the water, it has a large inertia and it responds at low frequency to the onslaught of the thermal motions of the solvent. And you get low frequency stochastic motions. If you have a smaller protein, then the inertia is smaller and you have higher frequency stochastic motion. Depending on the coefficient of the radio frequency that we use in our NMR experiments and the frequency of those stochastic motions we are in completely different ranges of spin physics. And so Einstein treated the Brownian motion. If we take his theory and put it into the description of single transition bases operators that describe the behaviour of multi-spin systems in these moving particles, then you get TROSY - Transverse Relaxation Optimised Spectroscopy. And this enabled us to go from smallish proteins... That is one regime of spin physics of rotating and translating objects in solution which goes up to molecule rate of about 20,000. Here we are approaching a molecular size of one million and you see this is quite a fantastic spectrum that we can now obtain using these... I mean what we really do is that we uncouple the Nuclear Magnetic Resonance spectrum from the Brownian motion. This costs a bit of money because we need a magnet that is at least 900 or 800 megahertz in the proton frequency. But this is what happens when you read the old papers by Einstein and then apply them to the daily work. Now why is this so important? We have solved some 75,000 structures as of today trying to cover that red circle that protein sequence universe with three dimensional structures. But today we have to go one step further. We have to understand the interactions between two or multiple ones of these macromolecules. I show you here an example, a piece of DNA and the protein that bind into a so called complex. And all these complexes tend to be big and therefore if we didn’t use this TROSY experiment based on Einstein’s work on the statistical mechanics of the Brownian motion, we wouldn't be there. Thank you for your attention.

Kurt Wüthrich (2012)

Structural Genomics - Exploring the Protein Universe

Kurt Wüthrich (2012)

Structural Genomics - Exploring the Protein Universe

Abstract

STRUCTURAL GENOMICS – EXPLORING THE PROTEIN UNIVERSE

Kurt Wüthrich
The Scripps Research Institute, La Jolla, CA, USA, and ETH Zürich, Zürich Switzerland

The determination of the human genome and the genomes of a large number of other species has awakened big expectations in many different fields, including agriculture, nutrition and healthcare. However, much of the realization of these anticipated advances will have to be based on detailed knowledge of the proteome and other gene products of the organisms of interest, in addition to the rapidly expanding protein sequence universe derived by annotation of the genomic DNA sequences. More specifically one aims for coverage of the protein universe derived from the DNA sequences with three-dimensional structures, which can then provide a platform for rational drug discovery and similar applications.

My research team makes use of solution nuclear magnetic resonance (NMR) spectroscopy for protein structure determination and for collecting supplementary function-related data. NMR has for many years shared its role as a principal technique in the structural biology of proteins and nucleic acids with X-ray diffraction in single crystals. In today’s post-genomic era, structural biologists using these techniques are faced with new opportunities and challenges, following novel strategies of “structural genomics”. In this venture the NMR method is unique, when compared to structure determination by X-ray crystallography, in that atomic resolution structures and other function-related data can be obtained under solution conditions close to the physiological milieu in body fluids. By generating information on protein structure, stability, dynamics and intermolecular interactions in solution, NMR has an exciting role in the longer-term challenge leading from the expanding protein universe to new insights into protein functions and chemical biology.

Cite


Specify width: px

Share

COPYRIGHT

Cite


Specify width: px

Share

COPYRIGHT


Related Content