Panel Discussion (2014) - Large Data and Hypothesis - Driven Science in the Era of Post-Genomic Biology; Panelists Hoffmann, Bishop, Beutler, Schmidt

Moderator: Thank you for coming and of course also thanks to the panellists but I will introduce them in a second anyway. So we will talk, or discuss let's say better, the issue of Big Data and hypothesis driven research. So is it all hypothesis-driven research that counts? Is it hypothesis generating Big Data after the whole genome project? And many other of these biomics projects. Biomics, Big Data, systems biology, quantitative biology, computational biology - that are the buzz words of modern biology. And we want to know more whether this just reflects a paradigm shift from hypothesis-driven to data-driven biology? Or is it just a hype, a hype about something that's come up but is not really helpful? We'll see. Is it mutually exclusive or is it complementary? We have 4 panellists here. They are all Nobel Laureates, 3 in Physiology and Medicine and 1 in Physics. And I will briefly introduce the 4 laureates who have volunteered to discuss these issues with us: Jules Hoffmann from Strasburg received the Nobel Prize in 2011 on the discoveries of activation of innate immunity on my very left, on your very right. And on my very right and on your very left, Bruce Beutler who jointly got the prize in the same year on the same topic and he's from the Center for the Genetics of Host Defence at the University of Texas, South Western Medical Center. On my right here J. Michael Bishop from the University of California still. And he received the Nobel Prize 1989 with Harold Varmus on cellular original of retroviral oncogenes. And we heard all 3 of them on Monday. And Brian Schmidt from the Australian National University Western Creek, 2011 Nobel Prize in Physics. For the discovery of the accelerating expansion of the universe through observations of distant super nova. Sounds interesting to me. And I can't say much about it. So Bruce, can I start with you? If I recall your presentation correctly and also the work you did which I followed before you got the Nobel Prize, the TLR4 as the LPS receptor, toll-like receptor for LPS. It took you years and it was a kind of genetic approach. One could argue it's an unbiased approach. And the data you presented to us were more or less: You made mutations, you found mutations and then you looked whether they are involved in inflammatory bowel disease and other diseases. But I recall that the IBD was on your slides. So how do you envisage the Big Data? Is it a hype? I would like to have a citation, Sean Carroll, physicist in Caltech, actually did his PhD at the same lab as Brian Schmidt: And he once said, "Hypothesis is not just a simple tool, it's needed to understand the system." Now where do you stop? You're generating data, you're generating more data. Do you want to understand the mechanisms? Are you going in that or are you satisfied if you see that a certain gene is playing a role in a certain disease? Bruce Beutler: Well of course we always want to understand the mechanism. I think of myself as a reductionist. I start with a phenotype really. Not with a mutation but with a phenotype. Something very complicated, a mouse that runs in circles let's say. And what I would say is that it's not necessary to have a hypothesis to make quite a bit of progress, to come down at least to the level of the gene and the mutation and to say that this mutation is required to get this kind of behaviour or this kind of immunologic effect. Whatever the variant you're looking at might be, there's nothing wrong with hypothesis of course. But my point would be that one can go very far without them. And this doesn't make any comment on whether it's Big Data or small data or intermediate data. There really is no dividing line there that I think any one of us could define. It's a question of whether work is driven by hypothesis or whether its hypothesis-free. And both definitely have their place in my view. Moderator: So with all the data you're accumulating currently, I guess at least because we do kind of similar projects and I'm often kind of almost suffocated by all these data that are coming up. You're going through them step by step, you have a whole team of computational biologists? Bruce Beutler: No, once we have a collection of mutations, the hope is that they are mutually reinforcing in some way. And that we can make better sense out of let's say 20 mutations that cause the same phenotype than out of one. And at that, if we have multiple alleles of many genes that's more helpful too. One can still carry on with the genetic approach. For example one can do a suppressor screen, one can look for epistatic effects between mutations - you can go very far that way. But certainly we don't shy away from looking at things with biochemical approaches or cell biological approaches too. Those definitely have their place. Moderator: Jules, you got the Nobel Prize in the same year on the same topic. You benefitted clearly from the insect genetics. But to me somehow I always had the feeling that hypothesis-driven research is the one you are pursuing very eagerly. And let me just cite Chris Anderson in Nature Methods 2 or 3 years ago, "Biology is too complex for hypothesis and modelling. The classical approach is dead." Would you agree? Jules Hoffmann: No (laugh). So what I would like to say is that, well most of the young people here to whom we're talking have heard the story which I gave the other day. And so in the beginning the question was clearly, what explains the resistance of the insects. So it was a question, it was not a hypothesis. We didn't know what it would be. And there was a time when you couldn't collect Big Data. You know it was not yet a Big Data time. So we had to work along. And as I mentioned we were lucky to have someone who had approached this in an unbiased way and that was Nüsslein-Volhard who had worked out the dorsoventral pathway. And then we got, through the NF KB story, we got into that. And then in the first part of our work it was not totally unbiased because we were asking, 'Is this cascade reused in immunity?' So that's not totally unbiased. But then when we did not understand how antifungal defences went, gram-positive bacterial. Then we resolved to unbiased mutagenesis screen ourselves. And so obviously our laboratory has always been very strongly orientated toward unbiased mutagenesis. It's still not Big Data. But then we ran into problems, Big Data, small Big Data problems. When we were looking at the interaction of these molecules through the appropriate techniques of seeing, of identifying the proteins which were the ones which we had found for genetics. And there we ended up finding 400 proteins, most of which are required for the activation of the system. And now we're in a situation where we really, you know when you have 400 candidates, and you know the sequences because thanks to the Big Data of drosophila genome sequencing. We know the identity of the molecules but we do not know their function. So here we start now being in a problem where we have 1 by 1 gone through experiments, such as, So there's, I'll conclude on this now. So for me it's a little bit artificial at this stage, at least in the work which we are doing, to make a difference between 1 or the other. In the beginning there's always a question, with other questions. You can say it's a hypothesis but you can have a question without having a hypothesis how to solve it. And then after that when you come to have many, many candidates, then you have to really painstakingly, labour intensively go through each of the candidates. Moderator: Well I think you could nowadays also just by correlation find a relationship between let's say a mutation and a phenotype. And that would be not necessarily needing any, in need of any hypothesis or model. Jules Hoffmann: No, that can speed things up. But it doesn't, it's not a demonstration you see. And will not be accepted, in eLife it would maybe accept it but not in the other 3 (laugh). I'm kidding because I know he has left. Jules Hoffmann: I'm on the board, on the editorial board of eLife so I'm not speaking against it but I was just making a joke. Moderator: Good. So you started your Nobel Prize winning career clearly with a hypothesis. There is a healthy cell. It contains a proto oncogene and when that is triggered cancer may develop or will develop. That the model you used was very carefully selected, Rous sarcoma. And then still cancer had its ups and downs. Many promises were made. Therapies were proposed. Cures were proposed. And still we're in a very early stage. And more recently the US has initiated the cancer genome atlas. And I view that as a real unbiased approach. There is a mutation. You may associate it with a cancer. You don't necessarily need to know what the mechanism is underlying. But still could use that for a predictive diagnostic. You could perhaps use it for a target. So is that the way to go? J. Michael Bishop: It's not the sole way to go, no. I'd like to make a preparatory remark here. I think that these 2 categories don't fully define how a scientific discovery is made. For example, if you read in the history of science as I do, you're going to encounter repeatedly not only examples of but repeated expressions of the aphorism that serendipity is extremely important in scientific discovery. And that can arise in the context of looking at large amounts of data. But more commonly it arises in the experimental laboratory where you do an experiment with an unexpected result and it leads you off on a new path. So that hypothesis, it can be hypothesis-driven in the sense that you start out with hypothesis but it leads you somewhere that you didn't at all encounter or expect. And so I think these categories are a little restrictive. So about how Harold and I began. We began with a much simpler hypothesis than what you outlined. Our sole hypothesis was that maybe the viral src oncogene was acquired from the cell. I never remotely dreamed that if that experiment was positive it would lead, along with subsequent events, to where we now stand in cancer research. The initial hypothesis was very limited and eminently testable. And it turned out to be correct. But it wasn't as elaborate, it wasn't as anticipatory as you described it. Now that brings us to the current day when by 1989 That was compiled in 1989 for my Nobel lecture. And now we have 100s of them because of genome sequencing. And the objective is to get an absolutely complete inventory. And people who are better at math than I am have projected and other participants in the cancer genome atlas project or the international projects, I prefer to think about - project that to really saturate the list we're going to need a million cancer genomes. And I've alluded in other context in fact that we already have some rough idea about this because saturation mutagenesis of a sort that reveals cancer genes in a mouse, the number when they quit was 2,000 potential cancer genes in a mouse. So we're not there yet with humans. So that's Big Data. A million, I wrote this down, a million cancer genes is 10^18th bites. That's a lot of data to store and managing it and manipulating it. The estimated price is just a hundred million dollars a year when we start using these data in some way. And I have to say that the analysis is way behind the data collection in this project. We don't have enough bioinformaticists at the moment. But the point is that, yes it's Big Data. And you could say it's, the bias is that there are cancer genes there to be found. But what it does is to fuel experiment. Because once you see an association, association is not proof. Once you see an association between a gene and a particularly cancer, you've got to go back to the laboratory and ask by one means or another whether this gene has the capability to participate in tumour genesis. Now will that be done for all 2,000, I rather doubt it. But Big Data can fuel hypothesis and lead to more experiments. And in the case of the cancer situation it's a virtuous cycle. So you start with Big Data, say the cancer genome, and you identify a candidate oncogene in breast cancer by virtue of association, high frequency that this particularly gene is mutated in a way that activates it in breast cancer. Then you go to the lab. You verify it. Or if you're really brave you go ahead and start developing a drug against it. And you take the drug into the clinic and it works for a few months and then resistance emerges. And what's the best way to figure out the resistance? Back to the genome. That's what we hope to be able to do, eventually, to use full genome sequence of patients tumours. Not only to identify the targets but to anticipate the kinds of resistance that are already built into the cell, to this drug that we simply couldn't anticipate from our current knowledge of the signalling pathways. So the 2 are complementary. One can fuel the other. And you can really be badly fooled by Big Data and there are a couple of good examples. My 2 favourites are, one the famous Google flu tracking. Maybe many of you are aware of this. Google thought they had outdone our so called Center for communicable diseases in the United States. Because they were screening for what they considered signs of flu and were predicting the size of epidemics. And for a year or 2 it looked like they had it. And then all of a sudden it just fell apart. And what happened was that when people who knew what they were talking about, looked at the metrics that were being used and they were way too squishy. They didn't discriminate the common cold from flu etc. Another more rigorous example is that in a very large study of statins. I presume you all know about statins, a cholesterol reducing drug. What was noticed in a Big Data assessment was an association between the use of statins and 2 unanticipated medical outcomes. Relief of a reaction to acute sepsis and relief of a very refractory disease known as chronic obstructive pulmonary disease. The data were really quite suggestive. Large clinical trials were mounted to test this. The results were just reported within the last month and both were a big flop. That association was completely spurious for whatever reason. So it just dramatises the fact that you can get ideas or hypothesis from Big Data, but you have got to be pretty careful about the idea that mere association proves cause. It simply does not. Moderator: I fully agree, particularly on the flu example with Google. At the same time it gives you an example that by mere algorithms you can come to a conclusion. The question is then is it the right one, but there are algorithms of course who also predicted the right conclusion. Perhaps that's the time to go to Brian Schmidt on astrophysics. I understand that you look at the universe and conclude about the expansion of the universe. That must be Big Data. Actually, in the context of looking into this, because you just mentioned Google. I found out that Google means 10^100. Is that the dimension you are talking about? Brian Schmidt: No 10^100 is the number of, is more than the number of electrons in the visible universe. So that's a big number. But you know we've been working in astronomy for necessity of Big Data for as long as we can. So my people saw my talk in 1994. I had to deal with 20 to 30 gigabytes a night. And that was Big Data back then. Of course I take that in about 3 or 4 minutes with our telescopes now. But we're dealing with petabytes. And 10s of petabytes. And even you know exabytes which is 10^18. And we can go bigger than that. But we're limited by power ultimately and computers and money. Really it becomes money. And the reason we do this is, it's useful. Big Data is a tool. And it's one of the tools in our arsenal. And when you have an experiment either in physics or astronomy and you want to be able to go through, and let's just say propose a hypothesis, sometimes you need to shine light out so you can even make a hypothesis. So I'll give you an example. Right now we have a complete theory of particle physics. It was completed with the Higgs Boson. We have no reason, we have no other, we can make hypothesis but they're completely unfounded or almost unfounded. What we need to do is, we need to go out and search the universe for new particles because we have a hint they might exist. So our hypothesis is if we take literally petabytes of data a second and sift through it all, we might find a hint of a new particle. And we might be able to then, once we see that hint, we can then start building a model which we can then further test. Very similar analogy to what we're talking about here with, for example, associations. Now I'm going to take a slightly different point of view. If your statistics are strong enough then you have proven association. If I show an association has a significance of you know 22 9s, then I'm sorry you have proven it. We don't know what it means so it's not that interesting. But it is the ability to go out and essentially say we need to look at that further. One of the things that happens in biology Well ok you put it in a footnote in the table on a physics thing. You'd say well maybe this is going to be interesting but we don't care. You guys very rarely have 4 9's and when you do you really know, you think you have something. We think we might have something with 4 9s. And the reason is, is that people ask questions of data all the time. There are 600 people in this room right now. If you all ask data 10 times, then we're going to get a 3 9s right here in this room just by asking 10 questions each of the data. And so that's the danger of Big Data is you have to go through and really think through the statistics, through what we would say a Bayesian framework, where you use the fact that we're asking questions of the data. How many of those are we doing? And if you screw up you find false things that don't bear themselves out. So you have to raise the standard of the statistics. And that's the hard part, it's a different framework. But I would say that it does give you that search light. And the other thing it gives you too is a very powerful diagnostic to test your hypothesis. Rather than have a black and white yes or no, which is very convenient. I'm afraid most of my time we don't get that. We have to go through and barely, we have tiny amounts of signal in the data. But when I add it up a billion times then I have 99.999999% surety of my answer. And that's the other value of Big Data, you can squeeze information where formally you just can't see it by eye. So it's different. Moderator: So can I ask you actually...? J. Michael Bishop: Can I just, can I put a wrinkle on what he said about statistics. I absolutely agree. And obviously that's the way the genome data are being used to call potential oncogenes. However, the ugly truth about biology is that that gene which you call on very sound statistical grounds may have been selected for as a secondary consequence of the neoplastic phenotype. Granted the search lights picked it out and you're obliged to pursue it. But it doesn't at that point prove a direct driver role in the cancer. Brian Schmidt: I agree, it's not correct. J. Michael Bishop: It could be a resistance gene, it could be any of several other sorts of genes. That was... Brian Schmidt: It certainly could be indirect, absolutely but it gives you that search light. Ok that and then you have to say ok actually that means that. J. Michael Bishop: If I didn't think it gave a search light I wouldn't be an advocate for the genome project. And I'm not a genomicist at all. But I'm an advocate for the project. Moderator: Can I then ask you a question which goes more perhaps in Big Data in medicine actually? As you told us here 600 people, let's say we have 20 parameters of each individual. That already makes it almost impossible to find statistics easily in a small group. So how do we solve that? Is there a way? I mean do we need clinical trials with 50,000 individuals? And who pays the bill. Brian Schmidt: So one of the things you can get away with, with Big Data, is you avoid the clinical trial. You put everyone's records - this isn't very popular for privacy advocates but I think some countries will do this - and you do all billion people. And I think you'll be able to circumvent some of the clinical trials which are amazingly expensive until the VERY end. So you've really gone in and you say, I've done this, it's not controlled. But you have so much data that you can actually get out other factors. And so I would think there's a lot of scope for doing that. You're still going to need a clinical trial at the end probably. But I think you can have those be 99.9% sure. Moderator: So would you tell us take observational studies on data that are already around? Brian Schmidt: Yeah but you want billions of samples. Not hundreds. Moderator: Do you know how many people live on this earth? Brian Schmidt: Yeah it would be really good to have all 7 billion. And I predict, you know in 30 years we're all going to have our genome there. And that will be part of the deal. Moderator: You may have to exclude Germany for certain reasons. They are very strict on this. J. Michael Bishop: And I'd like to add to that. Because one thing that's already becoming apparent is that Big Data on health care records has led to repurposing of a number of existing drugs in very productive ways. Unexpected associations of the sort I mentioned. Those were flops but there have been also successes in repurposing drugs that came right out of very large population surveys. And they're just going to get bigger with the caveat about the privacy issues, which we have to solve. Brian Schmidt: And I was talking on the first day about, you have all these phase 2, phase 3 studies where you have information. And when you add all that information up downstream you may well, as you say, figure out another use for that too, where you suddenly realise that actually people with this genome sequence can use that drug. But we just didn't have enough information to begin with. So I think adding all that information allows you - doesn't fix things, you still have to ask the right questions - But it gives you a great tool. Bruce Beutler: I just want to make a comment: that we shouldn't lose sight of the fact that the quality of the data matter greatly also. And they correspond to the resolution of the experiments you do. And furthermore if we're talking about needing Big Data to test a drug for example then it must have a very weak effect. A good drug, a drug with a strong effect, it can be detected with a rather small number of individuals. Brian Schmidt: That's true but let's say it's only effective, it's almost 100% effective on people who have 1 in a 1,000, this sequence in their DNA. Then Big Data allows you to pick that out and that's the whole idea of this personalised medicine where you might be able to save some drugs. Who knows how well it will work, it really depends how the medicine works. Moderator: This is probably, as you said the next step is to personalise medicine and you were talking about the kind of wonder drug that works in everybody. So this was a kind of a first round. An appetiser. I wonder whether there are now already questions, comments from the audience which I would very much encourage. Do you want to make a comment first? J. Michael Bishop: No. I express my concern that maybe we've given them indigestion instead of an appetite. Moderator: Okay, good. There are many questions and there are 2 microphones left and right. So you would have to go to the microphone please. Question: My question is to Professor Bishop. So you talked that Big Data is often considered synonymous to systems biology. But systems biology has another component which is the bottom-up approach. That is taking up a few genes of proteins, studying their interactions, making up a gene regulatory network and then studying the dynamics and then moving up. So in that way we are not being blindly following Big Data. Because if a car is broken down, Big Data will tell us that steering is not working, wheel is not working, this is not working, that is not working. It doesn't tell us what exactly is not working, it says nothing is working So on the other side, bottom-up approach tries to identify what exactly is the dynamics of the system. Maybe it starts with a smaller case and then tries to build. So how do you think this systems biology term, which has been misused in order to promote Big Data politically, scientifically etc. etc., is actually a hindrance in the actual progression of the understanding? Moderator: If I may just add. The next round I wanted to have systems biology as a full round but still if you want to give a short answer to that. J. Michael Bishop: There is no short answer to that. If I heard you correctly you equated Big Data to systems biology. If so that's incorrect. Second problem is I want to know how you define systems biology because everybody I ask gives me a different definition. Including a good friend of mine who is a chair of a systems biology department at a leading university. Can't define it for me yet. So I just, I don't know how to answer your question, frankly. Because it seems to me there was some confusion of definition in it. And could you put it more succinctly? Moderator: Can I ask you that we do that in the next round when we go into more details about that, sorry, is that ok? Thank you, next Question. Question: So I have a question for Doctor Bishop also. The cancer genomics or Big Data help people to find like new oncogene or new tumour suppressor genes. But like on Monday in your discussion session we discussed that for the new cancers you appear to be targeting maybe the synthetic lethality gene could be the potential target. But do you think we can find this synthetic lethality genes, for example to make oncogene or PFD3 tumour suppressant medications from this Big Data? Because for me it's now pretty obvious we can find synthetic lethality genes. J. Michael Bishop: Sincerely, this is a somewhat specialised question and we might want to discuss it in private. You and I have already had discussions in private about this. Synthetic lethality is a phenomenon discovered with microbes. You can have 2 innocuous mutations, A and B that have a mild phenotype or no phenotype. If you combine them in the same organism, the same microbe, they're lethal. And in a therapeutic setting the oncogene is mutation A and the therapeutic is mutation B. And the therapeutic B is not, in the synthetic lethality approach, is not directed at the target. It has been either randomly selected by techniques I can talk about or has been deliberately utilised. And it hits something else in the cell which in combination with the oncogene is lethal. That's synthetic lethality. And its virtue as a therapeutic approach is that you can hugely increase the number of drugs used to treat a certain genetic lesion. Because you're not targeting that lesion, you're targeting many other things in the cell, some of which will have a synthetic lethal interaction with the over expression of the gene or the deficiency of the gene. And there is one drug in the clinic in use that's used for BRCA deficiencies in the synthetic lethal manner. So there is a Big Data of sorts involved in this. You can do genome wide screens. You have a cell line in the lab that has a mutation in ras for example. And then you use that line. And you take a library of RNAi that represents the entire genome, probably redundantly, and you screen for RNAi's that will kill the cell that has the mutant ras gene in it. And you will come up with, and Steve Elledge has done this, you will come up with dozens of hits, none of which are affecting ras, they're affecting other genes in the genome. They become candidates for synthetic lethal targets. But then you've got to go through it all over again, find the off target affects that are quite common with RNAi, and then you have to decide if you know anything about the function of the genes, how you're going to target it. But in principle - and this is being done widely, big pharma has adopted this - you can do genome wide screens for genes that have synthetic lethal interaction with your preferred target gene, ras, mic, you name it. Question: So my question is we still need to do the screening experiment to find the genes, right? Instead of we can directly get information from the Big Data or cancer genomics, we directly come up with some candidates of synthetic lethality. J. Michael Bishop: I don't think we know enough about the signalling pathways to do that. We don't know nearly enough about the signalling pathways to make a prediction like that. That's what systems biology is about actually. Question: Particularly you know we have some of the most brilliant minds from around the world here in this room. But particularly... J. Michael Bishop: Out there in the audience yes. Question: And on stage as well. But particularly for us Americans you can't speak about Big Data and not have us think about the movie Gattaca. Some of you have seen it and if you haven't seen it, even if you aren't into the science it's a really good movie. But my question to you is, I feel in the next couple of years we will not only have sequenced many of the genes involved in a lot of genetic diseases but we also will probably have sequenced the genome of much of our population. What do you think of the morality of being able to look at someone's genome and be able to predict, you know, predict the certain genetic diseaseses. Do you think there's a potential? There would be discrimination based on particular genetic diseases that you have. Will insurance companies deny coverage for someone who is predisposed to heart disease? Some kind of defect causing stroke or atherosclerosis or all sorts of things. What do you guys feel about that? Moderator: Who wants to take that up? J. Michael Bishop: I'll start. That's not a problem of the future. We've had that problem. I mean we had that problem with Huntington's disease families etc. And it is a difficult problem and I think our society is still working it out. But certainly genetic counselling is done routinely for Huntington's disease families. You're just expanding it to the somewhat less likely possibility that we can look at polygenic determinants of susceptibility and actually make any sense out of them. But right now we already have the problem and I don't think we have a simple straightforward solution. Frankly, we leave it up to the patients: Do you want to know whether you're carrying Huntington's disease or not? And many say,"No I don't want to know because there's nothing to be done about it." And some say, "Yes I want to know", for a variety of reasons. Moderator: So the patient is the owner of his or her ... J. Michael Bishop: Yes exactly. Moderator: Anybody else. Jules Hoffmann: I was going to say it's not only a negative. Huntington, that's a special point that can tell you that you can change, you have to change your diet for instance, you have to be careful with this and that. So it can help you if you - I wouldn't mind knowing, well ok. But ... Jules Hoffmann: The point I wanted to make is that we should not look at this as purely negatively and purely dangerously and purely frightening. I think it could help us. Do you get my point? What's your opinion about it? Question: I personally think the human race has a history of discriminating for any reason. I think that if we find a reason to differentiate between people, based off their genes, based off of their culture, based off of their race, I feel like we will. I think we may be progressing in a lot of ways but I feel like we have the tendency to categorise people into schemas. And I feel like if my employer has the ability to take my DNA off a coffee cup and see that my life expectancy is 20 years less than someone else's and I lose a job because of it. I feel like that could be very negative. J. Michael Bishop: But that's correctable by legislation and that's already happening. Bruce Beutler: I'd agree with that, It's an interesting point.) that much good came come of knowing everyone's sequence. Including possibly even understand polygenic disease. If we knew the exact relationship of everyone else in the world to everyone else in the world. It's not very elegantly said but in any case we could follow phenotype. We could draw a lot of inferences that we can't now. But obviously there are potentials for abuse and they would have to be guarded against. And I agree that it could be done legislatively. Brian Schmidt: I was just going to say it's clearly a role for government. We already use government to legislate against discrimination. And that's how it's going to have to be done. Because you're right, if you're an insurance company, you want to make money. Question: I'd rather trust you guys than our governments. Brian Schmidt: You're going to need to trust your government. Moderator: Thank you, next one from this microphone. Question: My first question was in a similar direction about ethics and what do we actually do if we have this whole genome. I think we answered that already. But I have a second question: what about if we do personalised medicine? We have a small sub population of patients and we are trying that right now in industrialised countries. And we have a problem of cost. So what about economy? And how do we pay that? And is it possible to do that if we have smaller and smaller groups? Moderator: Anybody to comment on that? Economic issues. Well it started very expensive, the whole genome. We are now down by $500 or whatever per genome but still it's a lot of money. Is that the way you wanted to phrase your question? Question: Yes. It's also if we are looking in oncology, sometimes we are talking about populations of 5% or 10% that might have a benefit from it. And so it's really, well that's a huge, if we have a small group, we have a huge cost because it's, well ... Moderator: Let me perhaps simplify it: if you look at developing countries, $500 is still a lot. Where is personalised medicine going into? Notably from a kind of financial issue? Anybody comment on it? Brian Schmidt: I guess I note that the whole, you know Big Data is becoming quite cheap. So pretty soon it's going to be $10 to sequence the DNA of you know anyone. You're going to put that, it's going to be very cheap. It's going to be an adjunct to the information you use to help diagnose a patient. But that being said, just like now, you have to rationalise the cost. And so there will be economic forces at work about who can pay, just like there is now. There will be economic rational about which particular research is done. If you've got 10 people across the world with a disease and we know if we spend $1 billion we can fix it, we're not going to do it because it's just not cost effective, because we have other things that are higher priority. There will be a prioritisation. But the whole notion of Big Data, I think, has the opportunity to make things much, much cheaper than now. And so in the end everyone will be a winner. Moderator: So let me suggest that the queuing now is closed. So we have 3 here and 1 here and then we go on in a second round and come back to, and actually I'm very delighted that we get all these comments and questions from the audience. Next one. Question: I have a very basic question about the nature of hypothesis and I hope it's not a trivial Question. But in this discussion about hypothesis-driven research and Big Data it seems to me that even these screening approaches have a very simple hypothesis behind them. That there would be some, you'll be able to see in the Big Data some associations or maybe even a causative gene that comes up, associated with a phenotype. And my question is, are we talking about hypothesis? Can you define what you all see as hypothesis? Or are we talking about causality or maybe establishing causality or is it something else? Bruce Beutler: I can give an example. In the kind of screening that one does genetically in forward genetics. Definitely there's a hypothesis and the hypothesis is that these phenotypes we see have nothing to do with mutations. That's the null hypothesis. And you test it on every mutation in every animal you have. And you look then for things that appear to contradict the null hypothesis. We call that linkage. And this is just one example. But I think you could quote from R. A. Fisher if I remember his words. He said, "Every experiment exists to give the facts a chance of disproving the null hypothesis." And that runs through all Big Data essentially. Moderator: We can go on that later. I think it's going into philosophy and there are still some differences between hypothesis and null hypothesis. Let's go, next Question. Question: Well I have a very similar question actually. About the definition of hypothesis. So according to me there are 2 ways you can make a hypothesis. One is you can make a hypothesis based on say previous observations. The other way is to make a hypothesis based on developing a physical or a mathematical model. So if you have a physical or a mathematical model you can make powerful predictions. And again test your predictions and you can have a complete faith base of what kind of results that you can obtain. But on the other hand if you don't have a physical or a mathematical model or if you cannot make a physical or a mathematical model then you are more restricted towards making a hypothesis based on previous observation. For example when you do a screen you make a null hypothesis. And then you think whether it affects the, whether your gene is going to affect whatever phenotype that you're looking at or it's not going to affect. Can you elaborate on the hypothesis that is made which is not based on physical or mathematical model? Like how do you go about it? Brian Schmidt: I would argue that both are mathematical, one is just more complicated than the other. One is a binary, there is, you know, that thing causes something to happen, a yes/no - it's a very simple model. And the other one is: I have a set of equations and so that causes this function to react like that. They're both mathematical models. Just it's level of complexity, so you can range it up from anywhere, from very simple to very complex. J. Michael Bishop: I want to point out that the overwhelming majority of progress in biology over the generations has been made with hypotheses that were not mathematical informed. And you've left an important variable out of the equation. Imagination. Most hypotheses in my field come out of imagination. Often fuelled but not always fuelled by pre-existing data. The Hypothesis that Harold and I worked on I tell you was not based on any preceding data other than knowing that the gene, viral oncogene existed. And Darwin. Those were the foundations of our hypothesis, nothing else. Brian Schmidt: But those are still observations. J. Michael Bishop: Yeah sure. Question: I want to know what about data that comes in very slow. For example some kind of syndrome that has a very low incidence, comes in once in every 100 or 150 years. And we know we cannot wait for, I mean, it's a question of life for an individual. Like Doctor Schmidt said that 30 years for the genome project. But we know this thing is not coming, this thing is not coming very soon. Is it ok to go to a hypothesis very quickly. Or like should we wait? And what should we do about small data that comes slowly hidden within Big Data? J. Michael Bishop: Sounds like you're assuming that Big Data can be used to address any problem in biology for example. I'm not sure that's correct. Am I correct? Is that what you're suggesting? Question: No sir. I would like to know that the things that come in very slow, on which we actually do not have Big Data even if you look for it. I mean we have to look for a large amount of data to find that small data. But the collection of the small data - should we wait for those small incidences and diseases that are rare? Should we jump into a hypothesis or should we go on a more slower path? J. Michael Bishop: I'm in favour of jumping in to the hypothesis. That's what we did and it paid off. And I can't think of any sort of Big Data that would have helped us with that particular hypothesis. Brian Schmidt: I would just say that don't presume you know how your hypothesis is going to be tested. You may have a very long vision and say it's going to take 75 years to test this hypothesis. And someone 6 weeks later will say oh actually I have had a short circuit and bam. So a good hypothesis, if it's well motivated: out there as fast as you can. J. Michael Bishop: Absolutely. Moderator: The queue is growing longer there. Let's take very brief questions. Question: You mentioned that, it was mentioned I think by Doctor Bishop that there's under analysis of the data, the analysis are beyond the data production. So I was wondering if you had any solutions to that. Is it data sharing that needs to be more effective? Or publication of the data before publication of the papers? What are the solutions to speed up the analysis of the data? Bruce Beutler: I think sharing always helps. Making things widely available always helps. At some point people do, I think I emphasised, have to make hypotheses to go beyond the initial discoveries or the initial observations. And there's nothing wrong with that. The only problem with hypotheses that I can see is that they run a little bit against human nature. People like to be right rather than wrong. And they should be very stringently testing hypotheses and that isn't always the case. I'm sure it doesn't apply to anyone in this room but it does apply to some people and it's the basis of a lot of error in science. Question: My question is: what are we going to do with all those thousands and thousands of variants that have very small effect sizes that we detect from those studies? Each of those will have a very minor effect but if you take them together they might actually be useful. What are we going to do with those? I mean we cannot just go back to the lab and validate them and find their function. It's a hard task, right. Moderator: It's a question for modelling systems biology - forgive me - approach where you look at it in its complexity rather in a single pathway. But I didn't, actually, did not want to go into answering question, is anybody? Jules Hoffmann: I think Bruce you're the man to answer that Question. Bruce Beutler: I would just say when you are doing some kind of unbiased screen to detect phenotype, it's you who are in the driver seat. And you decide what is important enough. What's of a magnitude sufficient to capture your attention and what is not. And according to your resources you have to draw the line somewhere. Looking for additive effects between weak phenotypes for example can have its place. You have to be pretty brave to go out and do that, I would say at this point, but it's possible to do it. Question: Hello, good afternoon. My question is we're all a product of 3.5 billion years of evolution. And for the first time in the history of life on planet earth our generation of people got the toolbox to actually directly influence this evolution. Our consciousness got this feedback loop to influence our evolution. And we consider some genes good or bad, some variants are good or bad, some are causing disease on the level of individuals, like we don't get cancer as a species. And I guess my question is now we have the chance to actually eradicate these genes. And would eradicating these variants of these genes, might it hurt us as a species? Removing this diversity, might it hurt us as a species? Bruce Beutler: I think we're very far from actually doing that. Might it hurt us as a species? If you remove all chance for genetic disease - well perhaps yes. I think if you reduce diversity eventually that's not a good thing, assuming that one could do that. But we are far from being able to actually do that. Jules, do you want to comment? Jules Hoffmann: I would also say this is very dangerous. I mean there must be a limit. There's a certain number of diseases where, when you give genetic counselling to families a certain number of diseases, where this can be put forward. But if you really go to the level of susceptibility to heart disease or to cancer and so on, you would eliminate people who have, who are able to do fantastic things until the age of 40, 50 or so. Just think of Mozart and people like that. If you had eliminated them because they were susceptible. I think he died of tuberculosis didn't he? So we have to be extremely careful there. And I think all society would be careful on that. And not only the governments but the various religions which are around in the world. There must be a limit. We have to set that. Would you agree? Not totally. Question: I would say so but I don't feel confident in setting it. I don't see, I mean there is this continuity to everything, right. Jules Hoffmann: That's true. That's true. Question: And I can't really see where the border lies. Jules Hoffmann: Yeah, absolutely. I agree with you. But again we agree that we have to be careful as a society. We cannot go ahead, otherwise it's .. Question: Again it's a powerful tool box and it never happened before in history of life on this planet. Jules Hoffmann: Exactly, you're right. And this is an aspect which occasionally makes society a little bit suspicious about science. As we could, actually we could do that probably over the next 50 years, eradicate all the susceptibility genes and so on in all new born children. But it would be, I think it would be dramatic. Bruce Beutler: They'd be back in a few more generations anyway. Question: You never know ... Moderator: Actually we're running short of time. But I see and it's actually very nice to see all these questions coming up. So let's go on. Your Question. Question: Very sorry, I am here again. I want to ask a question, how do you treat unexpected results? For example you have a Question. You ask the data a Question. You're connected to some data. Then you did the statistics. And you find some other positive results that you didn't expect. I think this is very commonly happening to us. But on the other and like if you ask your data 10 questions, there must be something positive. So how do you treat these unexpected results? Moderator: Do you have a special person, laureate to ask that? Question: No. Moderator: Anybody volunteering for answering? Jules Hoffmann: You got occasionally unexpected results didn't you? We DID. Bruce Beutler: We all love unexpected results, I think, all of us. Jules Hoffmann: Absolutely, it's fantastic. Bruce Beutler: All of us love unexpected results and 9/10 of the time it's something that we've done wrong methodologically and it has a trivial explanation. But occasionally it really is a big discovery. And also, of course, we and others plan for exceptions. We try to surprise ourselves actively by perturbing the system with mutations for example. Or by screening for drugs to find things that will perturb the system. And those can be of interest in themselves too. Brian Schmidt: I just will say though that when you're asking you have to raise the bar on what is, what the data is saying to you when it's unexpected. Because the statistics become quite, I would say complicated. It's not just the questions you're asking, it's what everyone else in the world is asking. And you know one of the problems we have is the bar is probably too low and we have a lot of things which are spurious. And that's very expensive for the field and lowers the quality of the science. So it's a very challenging Question. And the easiest way is to be, have a higher bar of what's right, what we think is right. Moderator: Next one. Question: I have a question about personal genome. Because technology is moving so fast forward now that biology is not able to catch up with it. So how do we tell, especially in genetic counselling, how do we tell the patients that they have a mutation but we don't know what it does? So what do you think about that? Bruce Beutler: You're asking how do you tell a patient that he or she has a mutation or that there's one in utero when one doesn't really know what the gene does? Question: Yes. Bruce Beutler: That comes up quite commonly and it's a difficult problem. Of course, there's a larger and larger catalogue of experience on what happens to every gene when it's mutated. But sometimes we really don't know. Question: So what I meant is are we ready for personal genome sequencing. J. Michael Bishop: We're having trouble understanding your Question. Moderator: Are we ready for personalised genome sequencing and personalised medicine more or less, I guess. Bruce Beutler: I would say technically we're ready. And we're ready except in those cases when we don't really know what happens. And then one simply has to admit that we don't know. Question: Wouldn't that provide of lot of anxiety for the patients if they do not know what will happen to them? Brian Schmidt: I think that's already the case. I don't know how long you're going to live - you don't expect me to know. We don't know everything. We get more information. You will be able to say, we have a 35% probability that you will get Alzheimer's if you have this particular you know genetic sequence. So people say here they don't want to know because what can we do about it. I want to know because I want to plan my life around that. So there already is uncertainty in everything. There always will be uncertainty in everything. I don't think it changes the equation at all. Question: Good afternoon. I would like to ask what kind of Big Data do you think will be more helpful to cure disease, wider data or deeper data? And I mean I feel like in the past there's been a lot of emphasis on number of samples, number of patients, like in genomes for example. But what about having even less patients but trying to collect deeper data like sequencing whole genomes, adding epigenetics and expression and proteomics and deeper phenotypes instead of single phenotypes. J. Michael Bishop: That's all being done. It's all being done. But numbers are really important, as you've heard, for statistics. So you can't just drop the need for numbers. But you know proteomics is a growth industry, epigenomics is a vibrant field But in any event you just can't abandon the need for numbers. Doctor Schmidt made that so clear a while ago. Brian Schmidt: But I will say that's a question of quality of data which is how much you have and quantity. And depending on the question you're asking some things you know that are very common, you need lots of quality. And things that are very rare you definitely want very broad. So you really want both. But I think it really depends on the question you're asking. Moderator: Last question for this round. Question: I guess I'm the lucky one. So I have a question about another form of Big Data. It's actually our scientific publications. There's increasing number of people studying biology and the mechanism is getting more detailed in different disease setting, especially in the field of cancer biology and immunology. And I was wondering if I wouldn't keep up with it. Maybe this is a question addressing to the editor of eLife. If there's anything else that we could do to make this process easier. Jules Hoffmann: Yes I fully agree with you. I'm in the same situation as you are. I cannot keep up. And so I am desperate occasionally. And just think of, coming back to the joke about eLife, the 3 other journals which were mentioned by Randy Schekman. If you try to keep up to read those 3 journals, every week or 2 weeks. You'd spend most of your time reading articles and you don't do research. So you're fully right. I mean let me - let me tell you a joke, just an anecdote for a second, if you allow, Mr. Chairman. When I first met the father-in-law of my son, he was a pilot and he said to me, what are you doing. I said we're doing research. He said, research? There's already so much known, no one can keep it in, so much. Why would you like to add to the research, to the data? You're right, fully right. I sympathise with you and I suffer like you from that. But that's not an answer. Moderator: Ok you want to comment on it as well. Brian Schmidt: So the literature in all of our fields is exponentially growing, literally exponentially growing. And it is a Big Data problem as well. But the fortunate thing is there are tools helping us on that as well. When you go through and you want to query the web, you don't go through and go through each page. You've got Google to help you or you've got Bing or Yahoo or whatever. So we're going to need tools like that and they're being developed. And you're going to be able to say: these are the types of things I'm interested in. It's going to learn from you and it's going to say whenever any new journal comes out, articles in this area that you're interested in. It will throw it out to you. So I think we can do that. The hard part are the things that you don't know you're interested in. And that's why you go to talks and listen to what other people are saying. And hopefully follow Twitter. Do you follow me on Twitter, probably not? Then you'll know what I'm talking about. Jules Hoffmann: Also if I may add just one point there. We also must agree on this that not all the papers which are in the literature are good papers. And there's a lot of things which send us into the wrong directions. Bruce would you agree with that? Bruce Beutler: I would more than agree. Brian Schmidt: But Google can help out on that as well. Moderator: Yes that is clearly. Let's go on with one more round - I'm a little bit shivering about that. But anyway, in 1968 Mesarovic actually termed another buzz word in Big Data and until 2000 19 publications used that word only, as determined by Google and in 2005 it was 500 and now it's thousands. The word is 'systems biology'. And I'm a little bit worried that I might be corrected immediately. But I would like to bring up systems biology. And the very first question from the audience was on that actually. So let me just very simply say what I understand and I know that others may have another view and probably we don't come to a conclusion. So you want to understand the role of a part, let's say an enzyme in the context of the whole, say a cell. And you don't just want to see what the enzyme produces as a product but rather you want to see the consequences of that whole mechanism in the whole cell. And often technically, and we mentioned that already, you need a confounder or perturbing, let's say a mutation or a drug, and you do not just want to see what that affects the product but how the whole cell is affected. Can we somehow agree on that? You don't look as if you want to agree on that. Anyway the Nobel laureate Sidney Brenner once defined systems biology as an incorrect philosophy. And he said, low input, high throughout, no output. I don't know whether you - this is not my situation. Anyway I want to just actually ignite discussions on that. And I just wonder how Big Data can enable a systems biology approach. Is it the time or is it still too early? J. Michael Bishop: Brian you said you had a counter to Sidney's remark. Brian Schmidt: Well I was just going to say that one of the great powers of Big Data except for it sounds to me it's having a physical model of how life works, which would be really useful but maybe completely impossible. But big, you know if you have low signal to noise per unit of information, you need a lot of it. And then through this great thing, it was the root N of the number of pieces of information, you gain signal. So he's saying that never happens. Well it does happen. And it happens all the time with Big Data if you're careful and you don't have major problems and you don't do foolish things. So it's not always true. It can be problematic if you have complete garbage big information that doesn't, you know, that doesn't actually have fidelity. Then you're right, garbage in, garbage out. But I think that person is missing, one of the big powers of Big Data is to actually answer quite subtle questions. But there are many ways to go wrong. Bruce Beutler: My main objection to systems biology, however we define it, is that many of the advocates for systems biology pose it as a counter point to reductionism. And say reductionism has run its course and now we have to look at things in a very different way. To me most knowledge in the end is a matter of interpretation and it must come down to reductionism. You can take high resolution movie of a gazelle running across a plain or an amoeba crossing a microscope field. And you can look at every pixel in it and you'll have at least trillions of data points. And yet you really won't understand very much about how it works. And I think that often is a problem with what people call systems biology. Whether it's the definition that you used or whether it's something a bit different. J. Michael Bishop: No comment. Moderator: Jules? Jules Hoffmann: I concur with Michael Bishop. Moderator: So then I ask the audience whether they can help me and bring up some comments. J. Michael Bishop: I want to make the point that the tenor of this discussion is blurring some important point. I mean these are, we're talking about various tools about how to do science. And to the best of one's ability the scientist uses whatever tool is going to help them address the problem they're trying to address. And to privilege one form of enquiry over another, I think, is a mistake. And frankly, I think what fuels it is competition for research funds. Brian Schmidt: Can I say one thing? So we in physics are the ultimate reductionists. Everything comes down to 16 particles and 16 anti-particles, an extra one that we just discovered. And that's where everything works. It's all down to those 4 forces of nature. That's where, we know everything boils down to that. But it's not very useful. J. Michael Bishop: You don't have systems physics do you. Brian Schmidt: Well we do, that's our system. And I know it's exactly what runs us and everything you're talking about. The problem is, it's so complicated that that reductionist view which we know is at the centre doesn't help. So I think all of us would really like to have a complete system description of a cell. Where I could say I can perturb any aspect of it and I'm going to know exactly what's going to happen. That's great. But it isn't going to happen any time soon. I would also like to know if I'm going to get hit by a raindrop when I go outside. I've got all the physics, I just don't actually have enough. It's too complicated. So I think that's the problem at the heart of something like systems biology at this point, it's asking too much. It's a great noble goal but you need to break things into smaller pieces, so you actually see something in a human lifetime. Moderator: Do you have questions, comments to this part? Yes one. Question: So my comment is it actually depends on at what level you want to understand the system. You can define any system in different levels. For example you just cannot describe how water is flowing in a waterfall by looking at how water molecules vibrate, right? You cannot simulate every single water molecule how it vibrates and just try to understand how the water flows. So it entirely depends on what level you want to understand the system. And according to me you need to understand at all levels and ultimately it depends on the person who wants to understand the level at which he wants to interpret the data. That's my personal comment. Brian Schmidt: So I think you and I agree, which is you want to understand the problem in hand. And, you know, it's great to have the theory of everything but we don't have it. Question: I just wanted to comment because I think that one of the ways of making profit off all this Boston chimera AIDS, to go through a systems microscopy approach for example. In the sense that maybe most of us use microscopes all the time but we can handle very few things at a time. Once you start, for example, doing whole genome screenings and then maybe you can handle more variables. The problem is that we don't have, many times we don't have the tools from a mathematics point of view, or physics, to put those things together. So maybe it's not just to get funding that is interesting for a systems biology approach. It could be really useful and we could get more information for example, as you were saying, about just how a cell works. The problem is as well that we don't have reasonable tools. Now I'm thinking of, I don't know, during my PhD I was using a lot of colloidal physics to try to put that into the description of a cell, how organelles move and things like that. Tools that were developed in physics 70, 80 years ago. But just try to put that inside a package of say terabytes and terabytes of images of things. Just really a comment, but I'm a defender of systems biology. Moderator: No real comment to that? It was a comment actually. J. Michael Bishop: I'm just surprised so many of you are philosophers. Question: So as I said previously, systems biology is often confused wrongly with Big Data. Systems biology doesn't have only one part that has Big Data. The other part is the physics approach that is taking some finite amount amount of proteins, studying their interactions and how they work at a level that we are trying to understand. So I think that approach, the bottom-up approach of systems biology which is completely obscured because of the high number of publications coming in the big systems biology So what's your view as a biologist and as a physicist on that? Because I've seen constant tension throughout the whole panel discussion. And I think this is the intermediate bordism part which I'm trying to take here. J. Michael Bishop: What you describe sounded like biochemistry to me. That's what biochemists do: work at the, you know, molecular level, trying to see how enzymes work and how myosin contracts and so forth. If I understood you correctly, that's what you were describing. That's what biochemists do. And the truth of the matter is there has, through half of my career at least, there's been lamination that biochemistry was falling out of style because molecular biology was so sexy and so dramatic. But it's coming back and it's coming back in part in my own field because we're going to have the genomic data, we're going to have identified the genes and the gene products. But targeting them is going to require that we understand what they do. And that's biochemistry. If you want to go down to the atomic level I refer you to Doctor Schmidt. Brian Schmidt: The biochemistry to my mind looks just like physics. So it's really... Moderator: It's all physics anyway. J. Michael Bishop: I know from talking to this guy that everything is physics. Brian Schmidt: But it looks like physics. Question: So in this day of Big Data and systems biology. Is there no space left for a traditional biologist anymore? Does everybody have to first do some sort of Big Data experiments or can people still run labs with traditional biological approaches and answer important mechanistic questions? Moderator: Well the question is, is there any room for a mechanistic approach to biology or do we all have to do Big Data biology? That's easy, I guess. J. Michael Bishop: Well in my field most people are still doing small science. The genomics is being done in a limited number of centres. Especially the really big genomics, you know the Broad, Wash U, Seattle, UCSF is an outstanding place, we're not going Big Data genomics. Question: It just sounded like from how the panel was discussing it that Big Data was important and that everybody had to learn how to incorporate it in the way they did experiments or think about it. J. Michael Bishop: I think you're going to be informed by it but I don't think you're going to have to even understand how it's done, if you can rely upon the literature and the people who are doing it to be trustworthy. Jules Hoffmann: Now, we are not yet to the conclusion, given the conclusion that we all think that it should be Big Data and nothing else, no? That will come I hope Mister Chairman. But we'll conclude in the end. And you will feel reassured. Question: Thank you. J. Michael Bishop: I want to refer you to an editorial that Bruce Alberts, the former editor of Science magazine, the former president of the National Academy of Science US, and a good friend of mine wrote in Cell Magazine at least 10 years ago. It's called 'Small science is good science'. I reread it in preparation for this panel and it's as right today as it was then. Go read it. Question: I'm also wondering if system biology is even anything new because we always try to find out by doing research how things are connected to each other, how we can interact with the system. And I mean we are just now at a position where we can get indirect more data, where we can look at all the different research that there is and just combine it. But is it really anything new to what we had before, apart from we just have more information? Moderator: Well there is not a real defender of systems biology here. Jules Hoffmann: I think we have to redefine what we understand as systems biology because we are confusing aspects from biochemistry to accumulating a certain number of data. Maybe Mister Chairman you want to redefine systems biology. Moderator: I would actually go back to the general discussion very soon if you don't mind. Because I see time is running actually. And I would be happy to discuss that more because this is really the panel list here. Next Question. Question: It's a short Question. My question is about, I mean we have now a lot of data: We have genomics, we have proteomics, we have metabolomics. But sometime when you try to put everything together you actually find that the data are not fitting, one with the other. So what's your impression on that? What do you think? It's more important where we should go. Moderator: In modern terms, different platforms, bring them together. It's not as easy as some simple minded people like me thought and had to learn it. Brian Schmidt: But there's huge scientific opportunity when things don't fit together. Those are the places you want to be sorting out. Right. Because that's where there are problems. And you know doing the stuff where it works, that's not very interesting. Working on the stuff that doesn't work makes sense right now. The interfaces, that's where the opportunity is to figure out which one is right. Or maybe they're both wrong. J. Michael Bishop: And just to put another spin on it. You work in a particular field, ok. And as that field progresses, if and when it has relevance to another field that will become apparent. And let me give you a personal example. I thought I was finished with intermediary metabolism when I finished when I finished my second year course in biochemistry in medical school. Lo and behold, guess what happened? About 5 years ago, metabolism suddenly became the hottest field in cancer research. The metabolism people were working away, the cancer people were working away. A few smart, imaginative and daring people began to see the connections. And all of a sudden metabolism research is right there at the core of cancer research. That's how it happens: each field develops and then somebody, to echo what Brian just said, someone sees a connection and it happens and then the world was changed. My world has changed completely. I now have relearned the TCA cycle. Moderator: So we have 3 more questions and then we come to concluding remarks. Is that ok? Question: My question is, nowadays there are several examples already where softwares in Big Data are basically outsmarting the researchers that are asking the Question. One specific example is one operator from Google for instance. That was designed to discover whether the announcements that people were making on Google were fake or not. And they kept coming back flagging that some car advertisers were fake. And the programmers could not figure out why. Until they tried to buy one of cars. And discover a whole group of people that were actually stealing cars and all the cars were stolen. And somehow the algorithm was figuring out that this was fake. Even though the researchers who wrote the code or the many lines of code that are there, could not actually understand why the software, where exactly the software was figuring this out. There are several other examples. And this seems to be something that will be happening more and more in the future as we create more complicated software. Those softwares are giving us the answers that we do not understand where they are coming from. So what's the role of the scientist in the future that seems to be going this direction? Brian Schmidt: This is an area of machine learning. And machine learnings there's a number of things. And this is I think probably, what you call a deep learning algorithm that has many, many layers and nodes and sort of mechanised on sort of algorithm kind of human brain. The problem is that it's almost impossible to figure out why it works the way it does, it just is very good at pattern recognition. So pattern recognition is great for prediction at some level. But again it comes back to my office mate Sean Carroll. Ok that's one thing, but there's not a knowledge. You need to have a model under there. So it's a very useful tool but it's a tool. And it allows you to go and look and find out the mechanism which is they were stolen cars. And it figured it out and yeah it's a complicated pattern. But it is just pattern recognition. Question: There is an experience that I want to share with you. I am working on the reverse genetics. That means we make hypotheses that some certain genes are involved in the pathway. And then we mutated each area of these genes to see if we could get the expected phenotype. So we use this strategy in our laboratory. And when we finished one article and submitted it to the, I won't mention the name of the journal, we got refused. And I remember the reviewer comment is that in the era of 'omics', I think means Big Data, your experiments seem to be too tiny, you should change your way to solve problems. So I got confused. I think we do benefit a lot from Big Data but it should not be the golden rule. What do you think, Bruce? Bruce Beutler: Well first of all I didn't review your paper. And second of all I agree with you. It's obvious that there are many tools that we should use and we should use the most appropriate for the task at hand. I don't know exactly what your experiment was, what hypothesis you were testing. But that's a legitimate way to go in general. And it's not a fair criticism. Moderator: Last question, sorry we are running short really. Question: Oh yes I just want to tell you the results. Question: I just want to say my article was at least published in eLife. Jules Hoffmann: Great. That will please Randy Schekman. Moderator: We'll send him an email tonight. Last Question. Question: As we are approaching the end of the discussion. I'm just wondering, I have a question about the bigger picture. So we discussed the differences of methodologies in medicine or whether it be physics, systems biology or biochemistry. I'm just wondering if there are differences between the approach, the methodology in medicine or science and the methodology used in other disciplines. And what the differences are. Or are we essentially using the same methodology overall, whether it be data-driven or hypothesis-driven. Thank you. Moderator: Anybody to respond? Brian Schmidt: We have different nuances but ultimately we're scientists trying to answer questions. And so when answering astronomy Question. The bag of tricks is usually fairly different than when you're trying to look at how, you know, some gene expresses itself. But when you get to the Big Data it's kind of similar what we do and we are able to reuse the bag of tricks between both disciplines. So to my sense, we all try to do the same thing, it is very similar. Science is pretty universal. But it's how the question is framed and how we're going to try to answer it that dictates how you approach. And where those are similar we use the same techniques but often they're quite different. J. Michael Bishop: The values are shared: rigour, reproducibility, good experimental design, controls. The values are the same across the whole spectrum. Moderator: That brings us to the last round. Very brief last comments. The Lindau meeting is Educate, Inspire, Connect. So how would you like to give a short message to the young research? Jules. Jules Hoffmann: I think it was certainly very helpful that we had this exchange. But basically regarding the question which was asked, I would plead in favour of keeping everything open and asking our questions. When we have to do the omics or whatever, we do it. When we need Big Data we try to get them and so on. But we should have nothing, we shouldn't orient our mind in a very specific way, say this is modern, this is what everyone does and so on. We should feel very free and go ahead. The essential thing is to ask good questions and then to try to get answers. It's banal but it's my conclusion. Brian Schmidt: The secret of science is, just to echo what Jules said, is you need to ask the right questions that you can answer. Big Data is one tool to do that, it's not the only tool. You don't have to learn it. You need to know what it's capable of doing. It's useful to know people who can do it. And if you're interested in it then learn it. But one tool in your quiver, it's not the only one. J. Michael Bishop: A general comment. I'm often asked what was the most important thing that I did early in my career. And there are 2 answers. And 2) the willingness to take a chance on an idea. And I think those are 2 crucial ingredients for a young scientist to forge a successful career. Bruce Beutler: I would just add that knowing what to work on, seeing something as an important problem - it's a very personal matter - but that's what I would advise everyone to do. Find something that's very puzzling, that's an exception to the norm but in some way or another is very important. And then what tools you use to address that problem that's entirely up to you. And it's something that you should be guided by the problem in doing. Moderator: Thank you. So there is no real discrepancy between Big Data and hypothesis-driven research. I have another citation from a physicist David Goodstein, I don't know whether you know him. And I think that is the message that you also conveyed in the last minutes. J. Michael Bishop: I think there's a limit to that but... Moderator: Next round, not for today. Thank you so much. Thanks really to all of the people. Moderator: Really thank you. It was a lively discussion. Not only on the podium here but also from the audience. End.

Panel Discussion (2014)

Large Data and Hypothesis - Driven Science in the Era of Post-Genomic Biology; Panelists Hoffmann, Bishop, Beutler, Schmidt

Panel Discussion (2014)

Large Data and Hypothesis - Driven Science in the Era of Post-Genomic Biology; Panelists Hoffmann, Bishop, Beutler, Schmidt

Abstract

Canonically, biology including medicine considers hypothesis-driven research as its ultimate goal. In biomedicine, experimental proof of a hypothesis is sometimes translated into a clinical intervention, a process termed translational medicine. With recent achievements in genomics and other biomics increasingly large datasets are being generated which, e.g., allow assessment of genetic variability of humans and their predisposition to certain diseases. Some scientists take the position that this approach lacks any hypothesis and sometimes disqualify it as fishing experiment. Others argue instead that new hypotheses can be generated through analysis of large datasets which can subsequently be contested in specific analytical systems. However, large datasets and hypothesis-driven research are not mutually exclusive. Rather they are complementary and when applied in an iterative way can provide deep insights into biological and medical phenomena leading to a systems biologic view of life.

Content User Level

Beginner  Intermediate  Advanced 

Cite


Specify width: px

Share

COPYRIGHT

Content User Level

Beginner  Intermediate  Advanced 

Cite


Specify width: px

Share

COPYRIGHT


Related Content