Bruce Beutler (2015) - Finding Mutations that Affect Immunity

Good morning. I'd like to talk to you today about our work with immunity, and about how the mouse has improved quite dramatically as a model organism for forward genetics. My work began with a kind of old premise, more than 100 years ago, when microbes had only recently been recognized as the causative agents of infectious disease, already people had begun to wonder about how they harmed the host. And Richard Pfeiffer, shown here in the background behind Robert Koch, had an interesting observation in that respect. In 1891, he coined the term "endotoxin" to describe something intrinsically toxic associated with Gram-negative bacteria. He noticed that even heat-killed organisms could, if injected into guinea pigs, cause shock and death, reminiscent of an authentic infection, although he couldn't recover any viable organisms from the peritoneal cavity after he administered the microbes. Richard Pfeiffer became very famous for this observation, and although he's a rather obscure character today, in his lifetime he was nominated 33 times to receive the Nobel Prize in physiology or medicine. The reason for that was that, then as now, hundreds and maybe even thousands of people die every day of endotoxin-induced shock, and this is what a typical patient with Gram-negative septicaemia might look like. It's a child with meningococcal sepsis. This is a severe systemic form of inflammation, and it has to be recognized that all inflammation had obscure origins in those days, and here Pfeiffer had perhaps identified a single molecular species that could cause inflammation. Over the years that followed his initial report, it was found that endotoxin, which we now call lipopoly­saccharide or LPS, was associated with all Gram-negative bacteria. It was a structural component of the outer leaflet of the outer membrane. It had a lipid and polysaccharide moiety, and the Lipid A moiety was the toxic part of the molecule. Eventually, Lipid A molecules were synthesized entirely artificially, and they can do most of what Pfeiffer recognized long ago. We might draw a typical LPS structure like this, and from our work and the work of others, it emerged more recently that LPS activates macrophages to make cytokines. These cytokines, and especially tumour necrosis factor among all of them, bind to receptors on many other cells, and trigger the release of terminal mediators of vasodilation and lead to shock and to aggregation of platelets and all of the things that contribute to the clinical syndrome. A major question, obviously, was, "What's the receptor for lipopoly­saccharide?" And this was a tough problem that resisted direct attempts to crack it biochemically. By 1990, from the work of Ulevitch and Wright, it was known that antibodies against the surface antigen CD14, present on macrophages, would inhibit the LPS response very strongly. But CD14 was a GPI-anchored protein. It had no cytoplasmic domain, and it was guessed that it could only work by associating with some kind of mysterious co-receptor, that did have a cytoplasmic domain and could transduce the signal across the membrane. The nature of that receptor was unknown, however, where TNF was concerned, it was known by that time that NF kappaB was the critical transcription factor that would lead to activation of the TNF gene. Once the TNF mRNA was made and processed, it was sequestered in a locked form, today we would say in P-bodies most likely, and that had to be unlocked in order to allow translation of the mRNA and production and processing of the protein. The big question still loomed: What was the true membrane-spanning, activating receptor for LPS? I felt that that question, if solved, would offer the key to understanding how the host becomes aware of infections within the first minutes after inoculation with bacteria. In short, how innate immune sensing works. Finding the LPS receptor ultimately depended on two unrelated sub-strains of mice that couldn't respond to LPS because of spontaneous mutations. The first of these had been identified in 1965 by Heppner and Weiss, it was the C3H/HeJ strain. It was shown to be refractory to any amount of LPS, yet the animals were highly susceptible to Salmonella typhimurium, as was later shown, and also to LPS and other Gram-negative microbes. In 1978 a completely unrelated strain of mouse was found to be refractory to LPS. This was the C57 black/10ScCr mouse, and by crossing the two strains, both of which had recessive problems, one found that the F1 hybrid offspring were refractory to LPS as well. So it was guessed that the two animals had allelic defects, and in both cases there were closely related control strains that showed a normal LPS response. This set the stage for positional cloning of what, by then, was called the LPS Locus. Positional cloning in the classical sense isn't practiced anymore, and not all of you know exactly what it is, but essentially, it's cloning by phenotype. It's possible, taking a phenotype alone to find the location of a gene, by first establishing a critical region, and that's the phase of genetic mapping, then one clones all of the DNA across the critical region in a physical mapping effort, and finally, one has identified all of the candidate genes within the critical region and would go and look for the mutation, the causative mutation. Typically there would be only one in this sort of circumstance. This was a very difficult kind of cloning to accomplish. Back in 1993, we set about to do genetic mapping using only 11 markers on mouse chromosome 4, where we knew the LPS Locus was, and that covered most of the chromosome. We expanded the area and on 2,093 meioses, we established a critical region between two new markers, B and 83.3, and we felt that this was about 2.6 million base pairs, though today we know it was about twice as large. We then did physical mapping, cloning a large number of bacterial artificial chromosomes in an overlapping format to span this region, and then we began to sequence them, starting at the middle of the interval and working outward bidirectionally. That was how our life went for about three years. We would fragment BACs, sequence them, blast the results against libraries of expressed sequence tags that were maintained at NCBI, and looked for genes. And over all that time we found only a collection of pseudogenes, though of course they didn't come with a label that they were authentic genes and that they had any mutation that would distinguish one strain from the other. We were getting rather scared by the summer of 1998, because we had covered more than 90% of the region and we were running out of material to sequence. And still we hadn't found the gene. Then, of course, it's always in the last place you look. Far toward the proximal end of the interval we found a gene encoding an orphan receptor, called the toll-like receptor 4. Now, this was very exciting right from the start. First of all, the gene we had found in our critical region had leucine-rich repeats in its ectodomain, just like CD14 did, and we could imagine that perhaps by proximity or by transfer, LPS would go from CD14 to TLR4 and then trigger a response. And this was, indeed, a single-spanning transmembrane protein. Second, on the cytoplasmic side, there was strong homology between the TLR4 receptor and the interleukin-1 receptor. The interleukin-1 receptor was known to have inflammatory effects when triggered by a protein ligand, interleukin-1, and it could activate NF kappaB. So again, we thought probably this motif would work to drive the activation of the TNF gene and other inflammatory cytokine genes, when stimulated. Third, there was an observation, then two years old, by Jules Hoffmann and his colleagues, who had been looking for mutations that would cause susceptibility to fungal infection in the fly. And in a beautiful paper, in Cell in 1996, Jules and his colleagues showed that flies with mutations in toll, the namesake of this super-family of proteins, would be susceptible to infection by fungi, specifically Aspergillus fumigatus. Here you see a dead fly with hyphae sprouting from its thorax because the fly couldn't make a critical antimicrobial peptide, drosomycin. This seemed a parallel story to the case that we were concerned with, where mutation made mice highly susceptible to Gram-negative bacterial infection. Of course, all of those ideas would amount to nothing if we didn't find a mutation, but in due course, we did. And we found that in the C3H/HeJ strain there was a single base pair change that altered the cytoplasmic domain of TLR4, making it unable to signal. And in the C57black/10ScCr strain there was a deletion encompassing all of the axons of the toll-like receptor 4 gene, a 74 kb deletion, and we ultimately defined its exact margins. And these two allelic defects, which weren't present in the control strains, convinced us completely that this was the gene we were after. There was still some question about whether TLR4 was actually a receptor for LPS. And over a period of a year it was determined by Kensuke Miyake and colleagues that there was another sub-unit to the complex, called MD-2, shown in magenta here, a basket-shaped protein that interacts strongly with TLR4, which is shown in cyan. It has all of these leucine-rich repeats, which make a kind of slinky-shaped molecule, and here you see that the molecule is dimeric and LPS, as it was shown in 2009 by Jie-O Li and colleagues, who finally crystallized the complex, fits into the pocket of MD-2 but has some contact with the backbone of TLR4 as well. This, when it occurs, creates a conformational change that's sensed across the membrane. And this is where all of the inflammatory effects of lipopolysaccharide begin, with this one molecule. The next questions we wanted to address had to do with signalling by TLR4 and how that worked. We were enamoured of the forward genetic approach by that time, but there were no other spontaneous mutations in mice that could tell us anything about how LPS signalling worked. So we decided that we'd have to create new phenotypes using a mutagen, and we began to do so in 2000. Over the next 11 years or so, we identified many phenotypes that were related to TLR signalling and we also focused on other aspects of immunity, keeping them under surveillance with different screens at the same time. In those days, ENU mutagenesis was a blind process. ENU or ethyl-nitrosourea is the only mutagen that one can really use effectively in the mouse, the only chemical mutagen. It's given to male mice, it mutates the spermatogonia and mutations are transmitted to the sons of these mice, the G1 generation. A single G1 defines a pedigree, and the G1 mice were bred to be six mice to produce daughters. They were then back-crossed to their own daughters, and that brought some of the mutations to homozygosity in every G3 offspring born to that cross. We typically made very small pedigrees because we didn't want to be in the position of screening the same mutations over and over, and our thinking on that has totally changed, as you'll see in a moment. We didn't know it at that time, but we know today that the average sperm derived from a G0 animal has 60 to 70 mutations that change coding sense. And it's known from long experience that if you see a phenotype, it's almost always from a coding change, rather than an intergenic change of some kind. Now, this was a blind process, as I've said. The only way we knew the ENU was working, was by seeing phenotypes or detecting them in our screens. And it was encouraging to see that we saw a lot of peculiar mice, and of course we would track down all of the mutations and anything we happened to see visibly, as well. Over 11 years we found 34 mutations that fell into 20 genes, that informed us quite a bit about how TLR signalling worked. We had mutations in the toll-like receptors themselves, of which there are 12 in the mouse, and I'm only illustrating some of those here. We also found mutations in co-receptors, in addition to those that I've mentioned already. Mutations in chaperones like UNC93B, that bring the TLRs where they need to be. Some channel proteins are required for signalling from the endosome by TLRs as well. Then there are adaptor proteins that are recruited to the receptor in order to signal further. There's a layer of kinases that become activated, then an ubiquitin ligase, TRAF6, becomes activated as a result. It ubiquitinates itself and other proteins and TAB2 brings these all together to allow signalling to proceed as it should. And finally, one has another layer of kinases that degrade, ultimately phosphorylate, and lead to the degradation of I kappaB and NF kappaB translocation, and there are still other proteins that are needed for TNF to be processed and released from the cell. We still took TNF production as the endpoint of our screen. In the beginning, this was very hard, just like it had been before. But it got easier when the mouse genome was sequenced and annotated, then one didn't have to make contagions anymore, you knew what all the genes were, it wasn't terra incognita, like in the old days. It also got faster when better sequencing technologies came online, first capillary sequencing, then after that, massively parallel sequencing platforms. But by 2011, it was clear that the rate-limiting step in mutation finding was genetic mapping. The usual paradigm of outcrossing an established stock to a marker strain, then backcrossing, making a critical region assignment, and then looking for mutation there, had begun to slow us down. And we could declare many more phenotypes than we could actually solve. Sometimes a year or more was required to track down a mutation. So a new approach was needed. I began to think of what the most magical approach would be, and I thought in terms of Google Glasses in those days. I wanted to have a magical pair of glasses, with which I could look at a family of mice, like this one here, and even if the mutation were not obvious, as I've shown it, it would tell you which mice were affected by a mutation. And not only that, but in the blink of an eye, it would tell you, this is a mutation in SOX10. These are the coordinates of the mutation, the amino acid change, the motif, the human homologue. And if there were structural data, it would even show that to you as well. Well, actually, this is all a reality now, and we are able to find mutations in real time, and I'll tell you precisely how it's done. First, we make G1 mice, just as we always did, but then we whole-exome sequence every G1 mouse that's produced, up front, to find every mutation they might transmit into the pedigree. We've been at this for quite a while and we found that the mean number of mutations that change coding sense was 63 per G1 mouse, and the modal number was 70. And there are ways to make it higher than that, but we'd prefer not to because we wind up with too much G3 lethality. If the number is greater than 30 mutations, then we move forward with that pedigree, otherwise it's discarded. Moving forward means we order an Ampliseq panel, which is a collected of PCR primers calculated not to interfere with one another, that target every one of the mutation sites and allow us to genotype them. Then, the G2 mice and G3 mice are all genotyped at all the mutation sites that we've created with ENU. Only then, the mice are released for phenotypic screening. In this case it involves visual inspection, weighing the mice, giving them a glucose tolerance test. Then subjecting them to a battery of tests for innate immune performance, by macrophages, immunizing them, and doing flow cytometry to assess adaptive immune development and performance. We do then a DSS challenge, we infect them with mouse cytomegalovirus, and then they are passed on for other screens that are in the area of neurobehavioural responses. As of June 28, 2015, we had created nearly 64,000 mutations in this way, and now it's no longer a blind process. We know what every mutation is, and we know what genes they affect. These mutations fell into 17,204 genes, or upwards of 3/4, I think, of the 24,981 genes that the mouse has in total. Now, this is an enormous number of mutations. If they were present, even in the heterozygous state in one G1 mouse, they would almost certainly be lethal. But, of course, they're distributed among more than 1,000 pedigrees and they affect We are able to calculate that we've mutated 17% of all genes to a state of phenovariance, and I'll come back to what I mean by that, and tested them in the homozygous state three times or more, in at least one of our screens. In all, we have about 135 screens, and that's what most of the mice were subjected to. Where adaptive immune performance alone is concerned, we came across 60 known genes that were known to be required for immune development or function, and we detected them by phenotype. But along with those, we found hundreds of genes that were previously not known to be involved in immunity. And all this suggests that a large fraction of our genome is needed for immune defence, as I would've guessed. But now we're in a position, maybe to make more precise estimates about just how large that fraction is. To look through these data, one needs software that lets the observer explore all of the mutations. And we wrote a programme called Linkage analyser, and a browsing programme called Linkage explorer that makes that possible. So one may focus on any particular gene, on any screen of interest, on a subset of the mice, or on a trivial phenotype name. One can restrict the search to different types of mutations, one can insist only on looking at large pedigrees if he or she wants to. The number of observations in the homozygous state can be controlled and the observer also chooses the p-value of association between the phenotype and the genotype of interest. And this is done by altering this value here. And there are other ways, as well, to restrict the quality of the observations. To give you an example, we might say that we're interested just in assays having to do with CD8 cells, with their number or activation state. So you can key in CD8 under the screen name, we insist on seeing the mutation three or more times in the homozygous state, we insist on a relatively strong p-value of association, .0005, and we check these other items as well, I won't go through all of them, and we click "submit." Then, in short order, you get back a list of genes, in this case, a list of 102 variant alleles of 100 implicated genes that come from 70 different pedigrees. From this fact alone you can see that we don't always resolve to a single mutation, sometimes we have linkage of two mutations that fall within one linkage peak, as you might guess. But usually we get down to a single mutation that's implicated. In the first column you see gene names, and some of these, if you're immunologists any of you, will be familiar to you. Themis is known to be involved for positive selection of t-cells, and it shows up in a screen for CD8 cells or the CD4-CD8 ratio. Some are unknown. I doubt any of you are aware that SNRNP40, which is a component of the U5 spliceosome, has a selective role in immunity, but it does. In the next column you see the coordinates of the mutation, estimates of what the mutation does, also you see the screens in which scoring was observed. And then farther over, you see the number of observations of homozygous reference allele, heterozygotes or homozygotes for the mutation. In these three columns, you see the score for linkage in either an additive, a recessive, or a dominant model of inheritance. And if you want to actually look at the plot of inheritance, you click on one of those numbers, and you see a Manhattan plot. This is a log-scale plot of the probability of linkage and you might mouse-over any of the mutations that you see here, all of these are the mutations in the pedigree, only one of them shows strong linkage above the Bonferroni correction line, and if you mouse-over it you find that this is SNRNP40. You might not know what SNRNP40 is, so you can click on that and you get some information about the gene, which has been precalculated. You see that our mutation makes a shortened version of the protein, and this would be interactive in the real programme, where you could mouse-over and see the domain structure. You can click on the gene model, and you find that the mutation is near to axon 5 and is believed to remove axon 5, which creates an in-frame product. And you find much other information as well, it's all been pre-calculated. If you'd like to see the authentic data, the raw data, you can click on the peak value as well, right click, and there you see the phenotypic performance of the homozygous mutants, the heterozygotes, and the reference allele homozygotes. Now, there's overlap between the heterozygotes and homozygotes. This would've been a terrible thing to try to map using a qualitative approach, as we always used in the old days, and you might not even completely believe these data because, after all, we have only a limited number of mice here. And you might think it's some kind of a fluke. But you have to keep in mind that gradually, as we approach saturation, we hit the same genes over and over again, and the computer detects this, and generates "superpedigrees" whenever this occurs. Either with identical alleles transmitted from the same G0 or different alleles that hit the same gene. They get combined into a single large artificial pedigree. Eventually all mutations will be incorporated that way. At present, more than half of all genes are falling into superpedigrees and the number is climbing quickly. With multiple alleles, confidence about an association between phenotype and genotype increases. The same kind of browsing programme is used to examine superpedigrees. In the case of SNRNP40, which I've looked up for you here, we have a total of 16 pedigrees, 16 G1 mice, but only four different alleles, so there is the one I showed you and three others. The one I showed you is a probably null allele by our estimation. And again, you have the same kinds of models of additive, recessive and dominant inheritance. If we click on one of these, we see something rather different. We find that now we've assayed 376 G3 mice from all of these pedigrees, here are all the mutations contained in all of the pedigrees, those in dark blue have multiple alleles of their own, and if you mouse-over the top value, that of course is the SNRNP40 shown in red. You see now there's really no ambiguity, you have a clear shift in the phenotypic performance of homozygotes compared to heterozygotes or reference allele. And that's what gives this spectacular p-value. I would say, in this case, you don't even really need further confirmation, but our standard procedure is to make a crisper targeted allele in every case. Now the great question that we can address nowadays that we couldn't before is, how much damage have we done to the genome with our 64,000 mutations? If we just concentrate on one screen, the CD8 screening, then we see that in all the pedigrees that were examined, but not all of these were even transmitted once to homozygocity. But, mutations in 45.9% of genes were transmitted to homozygosity three times or more. That says nothing about how much damage those mutations caused, but we can look at that as well. If we're talking about probably null alleles, premature stop codons or critical splice junction errors, then nearly 6% of all genes have been affected and examined three times or more in the homozygous state. This would be a very conservative estimate of how much damage was done. If we consider probably null or probably damaging alleles, where probably damaging is established by a programme called PolyPhen-2, that guesses about how much damage a particular amino acid change does, then we get to 24.91% of all genes were mutated and examined three or more times in homozygous state. This would be a very liberal estimate of how much damage was done to the genome. And the true value, we know, is bracketed by these two estimates. We don't know exactly where it lies between them, we have a sense that it's somewhere near the middle, and so we're inclined to say that about 16 to 17% of all of the protein-encoding genes have been mutated to phenovariance. We're only quite near the beginning of the process, of course, and we could draw a red line and say that these are the conservative and liberal estimates of damage. As time goes on, these two curves will converge with each other, though they'll never quite touch, and we'll always be somewhat in doubt about the exact amount of damage we've done to the genome. But what have we accomplished? In the old days it took us five years to positionally clone just one gene, now it takes about one hour. Then, one phenotype was solved in five years, now, one or two phenotypes are solved every day in our lab. That means we proceed 3,000 times as quickly as before. We're limited now only by the rate at which mutations can be produced and screened. And this means that we can interrogate about 1,400 mutations every week. And many of them, about 1/2% of them or so cause phenotype in at least one of our screens. We can project that we'll destroy the majority of genes and analyse their phenotypic consequences within about three years, and then we'll know what most of the genes are that are needed for robust immunity, as we define it. This story was a very long one in terms of time, and I have especially to credit Alexander Poltorak for the positional cloning of the LPS Locus. I now have a much larger group than I did back in those days and it's mainly in the present group, the computational people Chun Hui Bu, Stephen Lyon, Sara Hildebrand, David Pratt and Xaiowei Zhan who deserve the credit for the automated positional cloning that I showed you. And we had help as well from Tao Wang and Yang Xie in the Center forcomputational biology. Thanks very much to all of you.