Herbert Hauptman (1989) - A New Minimal Principle in X-ray Crystallography

In contrast to Dr. Deisenhofer’s beautiful lecture mine concerned as it is with methods of crystal emulative structure determination is of necessity highly theoretical. However I hope to show by my lecture today that it doesn’t follow that it must be incomprehensible as well. The first slide shows in a schematic way the fundamental experiment which was done by Friedrich and Knipping in the year 1912 at the suggestion of Max von Laue. It shows very briefly that x-rays are scattered by crystals and the scattered x-rays if caused to strike a photographic plate will darken the photographic plate at the points where the scattered rays strike the plate. And the amount of blackening on the photographic plate depends upon the intensity of the corresponding scattered x-ray. Because of the consequences of this experiment, because this experiment was the key which unlocked during the course of the next seventy-five years, the mystery of molecular structures, this experiment must be regarded as a fundamental landmark experiment of this century. The slide on the right shows a typical molecular structure. It’s the structure of decaborane which consists of ten borane atoms and fourteen hydrogen atoms. The borane atoms are located at the vertices of a regular icosahedron. I’ve shown these two slides together because I wish to stress the mathematical equivalence between the diffraction pattern, which is to say the arrangement and the intensities of the x-rays scattered by crystal and the molecular structure on the right, the information content of this defraction pattern and the information content of the molecular structure, which is to say the arrangement of the atoms in the molecule, the information content of these two slides is precisely the same. If one knows the molecular structure shown on the right one can calculate unambiguously completely the nature of the defraction pattern shown on the left. Which is to say the directions and intensities of the x-rays scattered by the crystal which consists of the molecules shown on the right. And conversely if one has done the scattering experiment and has measured the directions and the intensities of the x-rays scattered by the crystal then the molecular structure shown on the right is in fact uniquely determined. What I would like to describe next is precisely what the relationship is between the structure shown on the right as an example and the defraction pattern shown on the left. Here we have an equation which I hope doesn’t frighten you. On the left hand side is simply the electron density function which is simply a function of the position vector r and it gives us the number of electrons per unit volume. And on the right hand side is the formula which enables us to calculate the electron density function Rho(r). If we knew all these quantities on the right, those of you who are familiar with the elements of x-ray crystallography or even with most elementary mathematics, know that this function on the right is simply a Fourier series, a triple Fourier series. The scaling parameter v is not important for our present purpose. This expression on the right is a sum taken over all triples of integers, so called reciprocal lattice vectors. And on the right hand side we have simply a Fourier series expressed in pure exponential form. We have the magnitudes or the non-negative numbers which are the coefficient of the exponential function. We have the reciprocal lattice vector H and a triple of integers. We have an arbitrary position vector R, which has also three components. This is simply the scale of product. And here we have the phases of the structure factors, the magnitudes of which are shown here as the co-efficient of the exponential function. If we knew everything that we need to know on the right, which is to say these magnitudes and these phases, then we could calculate this function. This triple Fourier series as a function of the position vector R. And therefore we could calculate the electron density function Rho(r), read off the positions of the maxima of the electron density function and that would give us the positions of the atoms or in other words the crystal structures. The problem which was alluded to just a few minutes ago, is that although these magnitudes are obtainable directly from the defraction experiment, from the measured intensities. The intensity of the x-ray scattered in the direction labelled by the reciprocal lattice vector H, although these magnitudes are directly obtainable from the experiment these phases are lost in the defraction experiment. And so although from the very earliest years because of the known relationship between the fraction patterns and crystal structures it was felt that the fraction experiment did in fact unlock the key to the determination of crystal and molecular structures. Because these phases were missing, because they were lost in the defraction experiment it was thought that after all what could be observed in the defraction experiment was in fact not sufficient to determine unique crystal structures. The argument that was used was a very simple one and a very compelling one. It was simply that we could use for these co-efficient, for these magnitudes the quantities which were directly obtainable from the experiment. Which is to say the intensities of the scattered x-rays in calculating this function. And we could put in for the lost phases, the missing phases arbitrary values. And depending upon which values we put in for these phases we would get different electron density functions. And therefore different crystal and molecular structures, all however consistent with what could be measured which is to say the intensities of the x-rays scattered by the crystal. And it was therefore believed for some forty years after this experiment was done, it was therefore believed that the fraction experiment could not even on principle lead to unique crystal and molecular structures. Now there was a flaw in this argument, as simple as it appears to be and as overwhelming as the logic appears to be there was a fatal flaw in it. And that was that one could not use arbitrary values for these phases. For the simple reason that if one were to do that, one would obtain electron density functions which were not consistent with what was known about crystal structures. For example one of the properties of the electron density function which must be satisfied by every crystal is that the electron density function must be non-negative everywhere. After all the electron density function Rho(r) gives us the number of electrons per unit volume. And from its very definition therefore it must be non-negative everywhere. On the other hand for a given set of known magnitudes F sub H, if one used arbitrary values for these phases in general one would obtain electron density functions which were negative somewhere, for some values of the position vectors R and therefore would not be permitted. So that the known non-negativity of the electron density function restricts the possible values which the phases may have. In fact restricts rather severely the possible values which the phases may have. And the non-negativity condition alone, the non-negativity restriction on the electron density function is in fact sufficient to enable one to solve some rather simple crystal structures. However the restrictions on the phases which are obtainable in this way are of a rather complicated nature. And therefore the non-negativity conditional law has proven to be not very useful in the actual applications. A much more useful restriction may be summarised in the one word – atomicity. Since molecules consist of atoms it follows that the electron density function is not only non-negative everywhere but must take on rather large positive values at the positions of the atoms. And must drop down to very small values at positions in between the atoms. And this requirement of atomicity, this property of the electron density function turns out to be a severely restrictive one. And in general at least for small molecules, say in molecules consisting of a hundred or a hundred and fifty non-hydrogen atoms, this requirement is sufficiently restrictive that the measured intensities in the x-ray defraction experiment is in general enough. In fact, in general far more than enough to determine unique crystal structures. I should also mention before I leave this slide is that we should carry with us the fact that if we know these magnitudes which as I said are obtainable directly from the measured intensities in the defraction experiment and if somehow or other we can find these phases then by calculating this Fourier series on the right we can calculate the electron density function Rho(r) and therefore determine the crystal and molecular structure. In the next slide I want to show that not only do the crystals structure factors, which is to say magnitudes and phases of the crystal structure factors, determine crystal structures, but that the converse is also true. In order to exploit the atomicity property of real crystal structures, it turns out we have to make a small change in these F’s. We replace theses structure factors by what is called the normalised crystal factors E, shown on this slide, and defined in this way. Again we have a magnitude E sub H which is directly obtainable from the measured intensities in the diffraction experiment. We have the missing phases Phi(H), and this complex number may be represented in polar from in this way. The product of the magnitude times the pure exponential function e ^i * Phi(H). Where this Phi(H) is of course the phase of the normalised structured factor E(H). And what this equation tells us is that if we know the atomic position vector r(J), the r(J) now represents an atomic position vector labelled by the index J. We have here a sum of a linear combination of exponential functions taken over all the N atoms, in the unit cell of the crystal. On the right hand side we have the atomic number of the atom labelled J. We have the atomic position vector r(J) in the atom labelled J. H is a fixed reciprocal lattice vector and ordered triple of integers. Sigma sub 2 is not very important. For our present purpose it is simply the sum of the squares of the atomic numbers of all the atoms in the unit cell of the crystal. What this equation tells us then is that if we know atomic position vectors we can calculate magnitudes and phases of the normalised structured factors E(H). This slide tells us that the converse is true. If we know magnitudes and phases by calculating the Fourier series we can get the electron density function and therefore the crystal structure. This tells us that the converse of that statement is also true. If we know atomic position vectors we can calculate essentially the co-efficient of this Fourier series. However, I’ve already suggested that because of the requirement of atomicity that measured magnitudes alone provide a very strong restriction on the values of the phases and in fact require that the phases have unique values. But what that means of course is that if we have measured a large number of intensities, therefore magnitudes E(H), somehow or other these phases are determined. And now our problem is, in fact the solution of the phase problem requires that using only known magnitudes E(H) how does one calculate the unknown phases Phi(H)? Now, this equation tells us actually that right away if we examine it closely we see that we have a complication. And the complication comes from the fact that the position vectors r(J) are not uniquely determined by the crystal structure. Because if we have a given crystal, then the position vectors or the atomic position vectors r(J) depend not only on the crystal structure but depend also on the choice of origin. If we move the origin around in the unit cell of the crystal and in this way do not change the crystal structure, we change the value of this function and therefore we change the value of the normalised structure factor E(H) on the left hand side. What this suggests then is that these normalised structured factors, which is to say these magnitudes and these phases depend not only on the crystal structure but also on the choice of origin. And this of course causes a complication. As it turns out the crystal structure does determine unique values for these magnitudes no matter where the origin may be chosen. But the values of the individual phases do in fact depend not only on the crystal structure but also on the choice of origin. As you can see that complicates our problem. Because if the phases are not uniquely determined by the crystal structure, if the phases are not uniquely determined by the crystal structure then certainly they are not uniquely determined by measured intensities alone. Or by the known values of these magnitudes. Because we have somehow or other to find unique values for the individual phases we have to have a mechanism for specifying the origin. So what's called for before we can even hope to solve the phase problem, to calculate the values of the phases for given values of these magnitudes. Before we can even hope to do this we have to, in the process which leads from known magnitudes to unknown phases, we have to incorporate a recipe or a mechanism for origin fixing. Now that as I say introduces a complication which is not too difficult to resolve. The way to resolve it is to separate out from the contributions to the value of a given phase. There are as I indicated two kinds of contributions to the value of an individual phase. The contribution which comes from the crystal structure. And the contribution which comes from the choice of origin. And the first thing that has to be done is to separate out these two contributions. So we can decide once and for all what part of the value of the phase depends upon the crystal structure and what part comes from the choice of origin. And the best way to do that is to observe something that I don’t, which is not possible for me to show where without causing a lot of confusion. The best way to do that is to introduce the idea of what is called the structure and variant. Which is to say certain special linear combinations of the phases which have the remarkable property that their values are in fact uniquely determined by the crystal structure, no matter what the origin may be. So the first thing to do then is of course is to identify these very special linear combinations of the phases. The so-called structure invariance and I would like to show on the next slide a typical example of such a special linear combination of the phases. The three phase structure invariant the so-called triplet is simply a linear combination of three phases, Phi(H) + Phi(K) + Phi(L) where H + K + L = 0. If this condition is satisfied this linear combination of three phases as a structure invariant and it has the property that its value is uniquely determined by the crystal structure no matter where the origin may be chosen. Now, you can see the fundamental importance of these structure invariance because it’s only linear combinations of this kind whose values we can hope to estimate in terms of measured intensities alone. We’ve already seen that measured intensities alone do not determine unique values for the individual phases, because the values of the phases depend also on the choice of origin. But measured intensities alone do determine the values of these special linear combinations of the phases. So the phase problem then is really broken down into two parts. First to use the measured intensities, to provide estimates of this structure invariance, these special linear combinations of the phases, and once the values of a sufficiently large number of these structure and variance are known, then we can hope to calculate the values of the individual phases. Provided that in the process leading from the estimated values of a large number of these structure invariance to the values of the individual phases we incorporate a mechanism for origin fixing. So these structure invariance therefore play a fundamental role in the solution of the phase problem. They serve to link the observed magnitudes, these quantities here with the desired values of the individual phases. Because we can hope to estimate these linear combinations of the phases in terms of these measured magnitudes. And once we have estimated a sufficiently large number of these we can hope to calculate the values of the individual phases. Now, I have to indicate briefly how one estimates the values of these, not only this structure invariance but others as well. In order to do this the method which was introduced is a probabilistic one. Because of the large number of intensities which are available from experiment a probabilistic approach to this problem, to the solution of the phase problem is strongly suggested. And the strategy, the device which is used is simply to replace these position vectors R or the atomic position vectors r(J) replace them by random variables which are assumed to be uniformly and independently distributed. This is using the language of mathematical probability. In every day terms what we are doing is assuming that all positions of the atoms in the crystal are equally likely that no positions are preferred over any other. And that amounts the same then that the atomic positions vectors r(J) are assumed to be a primitive random variables uniformly and independently distributed. Now, once we do that then the, (could I have the previous slide on the right hand side please). If we assume these atomic position vectors are random variables, uniformly and independently distributed then the right hand side becomes a function of random variables. The left hand side is also a function of random variables and is therefore itself a random variable. And we can calculate by standard techniques its probability distribution, if we choose to do that. However, its probability distribution will not be useful to us but what will be more useful to us is the probability distribution of the structure invariance. These linear combinations of the phases. the other direction you are going the wrong direction, the one before this). Okay this is a structure invariant. What we are asking for now is the probability distribution of this structure invariant because we know from the discussion that I’ve already given that it’s only the values of these special linear combinations of the phases which we can hope to estimate in terms of measured magnitudes alone. Therefore, what we are looking for is the probability distribution of a structure invariant in the hope that the probability distribution will give us some information about its value. In particular we not only are looking for the probability distribution of this structure invariant but we are looking for the conditional probability distribution of this structure invariant assuming as known a certain set of magnitudes. Because after all the magnitudes or intensities are known. This is what is given to us from the defraction experiment. And we want to use that information in order to estimate the values of these structure invariance. What that calls for then is the conditional probability distribution of a structure invariant given a certain set of magnitudes. And on this slide if I can have the next slide we’ll see the formula which tells us what is the probability distribution of a structure invariant. Here we have a three phase structure invariant, Phi(H) + Phi(K) + Phi(HK). I’ve written it in this form rather than this form where we clearly show explicitly that the sum of the three indices, H+K-H-K adds up to zero. So that this condition is satisfied. This triplet then is a structure invariant and we can ask for its conditional probability distribution assuming as known these three magnitudes. And these three magnitudes are of course known from the defraction experiment. I have written down the formula only in the case that all the atoms are identical and that we have N of them in the unit cell. It isn’t necessary to specialise it in this way but I’ve done so in order to simplify the formulas. This gives us the conditional probability distribution then of the three phase structure invariant, the triplet, assuming as known three magnitudes. And this is the analytic formula and in a few seconds I’ll show you what it looks like by means of the next slide. Right now what I would like to emphasise, is that we can calculate any parameters of this distribution that we chose and in particular we can calculate the expected value or the average value of the cosine of this triplet. This is the formula for it, it turns out to be a ratio of these two vessel functions. It’s not important for us to know what they look like at the moment, it’s something, these functions are known functions and I’ve abbreviated it by writing T(H). And the important thing, the only thing we should carry away with us is that the average value of the cosine, of the triplet can be calculated from the distribution. This is what it’s equal to, it depends only on known quantities measured magnitudes and the number of atoms and in the unit cell. And it turns out always to be greater than zero. The next slide, on this side shows us as picture of what that distribution looks like. And we can clearly see when the parameter A shown on the previous slide is about seven tenths the distribution looks like this. It goes from -180° to +180° and what the distribution tells us is that the values of the triplet, of the three phase structure invariant tends to cluster around zero. There are more values of this triplet in the neighbourhood of zero than there are let’s say in the neighbourhood of 180°. So the distribution then, the known distribution which we can calculate carries information about the possible values of these triplets. And in fact it enables us to estimate the triplet, the estimate in this simple case would be that this triplet is probably approximately equal to zero. But in this case when the parameter A is only about 7/10ths the estimate is not a very good one because values near 180° were much, well not very frequent are still possible. It’s still possible to get a substantial number of values of the triplet in the neighbourhood of 180°, when the parameter A is only about 7/10ths. However, when the parameter A is larger as shown on this next slide, when the parameter A is 2.3 or so the distribution looks like this. Again values of the structure invariance in the neighbourhood of zero are much more common now than in the neighbourhood of 180°. So the estimate of the triplet in this favourable case when the parameter A is about 2.3 the zero estimate of the triplet is a particularly good one in this favourable case. When the parameter A is large, bigger than two or so, then we get a very reliable estimate of the triplet. And if we can estimate a sufficiently large number of them as I’ve already indicated we can then hope to calculate the values of the individual phases provided once again that in the process leading from estimated values of the structure invariance to the values of the individual phases we incorporate a mechanism for origin fixing. What I would like to do next is show another class of structure invariance. The so called quartets which are linear combinations of four phases now, Phi(H)+Phi(K)+Phi(L)+Phi(M) where H+K+L+M is equal to zero. This is very analogous to the triplet that I showed on an earlier slide. It’s a linear combination of four phases now, instead of just three phases. Just as we did with the triplets so we can do with the quartets. We can find the conditional probability distribution of the quartet assuming as known certain magnitudes. But there is an important difference between the quartet and the triplet which I showed earlier. The distribution actually has a very similar functional form. It’s exactly the same as for the triplet but the parameter BLMN is an abbreviation for this, well I see I didn’t write the quartet on this slide. I suppose because there wasn’t enough room. But BLMN is simply an abbreviation, no it’s not an abbreviation. BLMN is given by this, Phi represents the quartet. What this shows us is that here too we can calculate the conditional probability distribution of the quartet now. Assuming as known not three magnitudes as we had in the case of the triplet but seven magnitudes - EL, EM, EN and this. These are the magnitudes corresponding to these indices and three other magnitudes, so-called cross terms. It’s not important to know what these magnitudes are it’s sufficient to know that the single parameter on which the distribution depends can be calculated from seven known magnitudes. Magnitudes obtained from the defraction experiment. The important difference though between the quartet distribution and the triplet distribution is that the parameter B now on which the distribution depends may be positive or negative depending upon the sine of this expression embraces. If these three cross stems are large then this term embraces will be positive and the parameter B will be positive. And the distribution will have a maximum around zero, as we had for the case of the triplets. But if these three cross terms are small the expression embraces is negative and this distribution instead of having a maximum at zero will have a maximum at 180°, so that the estimate of the quartet in that case and it’s a case which can be calculated in advance, the estimate of the quartet becomes not zero but 180°. However just as in the case of the triplet we can calculate again the expected value of the cosine of the quartet, again it turns out to be the ratio of vessel functions because it has the same functional form as the distribution for the triplet. And we call it for abbreviation T(LMN) but now T may be positive or maybe negative depending upon whether this parameter B is positive or negative. And we know in advance which it will be. So the next slide on this side will show us what the distribution looks like in the case that the parameter B is negative. I’ve shown it for the case -7/10ths. Now in sharp contrast to what the situation is for the triplets the distribution has a maximum at 180°. So that the estimate for the quartet instead of zero will now be 180°. But it will not be a very reliable estimate in the case that B has such a small value because as you can see values of the quartet in the neighbourhood of zero while less likely than values in the neighbourhood of 180° still will occur. What's needed then is a distribution which is sharper than the one shown here and that will happen when the value of the parameter B is say -1.2. In that case we again have a peak at 180° so that the estimate of 180° is rather reliable but certainly not as reliable as we would like it to be. Now the traditional techniques of direct methods which have proven to be useful in the case that we are determining structures of so-called small molecules, molecules of less than a 100 or 150 non-hydrogen atoms in the molecule, those can be solved in a rather routine way using estimated values of the structure invariance. The reason that the methods eventually fail when the structure becomes very large is that we can no longer obtain distributions which give us reliable estimates of the structure invariance. As the structures become more and more complex there are very few distributions which have a sharp peak, whether at zero or 180°. And therefore there are very few structure invariance, whether they are triplets or quartets, whose values we can reliably estimate. And therefore eventually the methods fail. The one point which should be emphasised however, and which I have emphasised on the next slide on the right hand side, is what I’ve called the fundamental principle of direct methods. And this simply states that the structure invariance link the observed magnitudes E with the desired phases Phi. By this I mean, this is what the traditional direct methods tell us, the direct methods for solving the phase problem, is that if we can estimate from measured intensities alone a sufficiently large number of these structure invariance whether they are triplets or quartets or whatever. Then we can hope to use those estimates to go from, which are after all determined by the measured magnitudes, we can use those estimates to derive a value or to calculate the values of the individual phases provided that in the process leading from estimates of the structure invariance to the values of the individual phases we incorporate a mechanism for origin fixing. For this reason the structure invariance serve to link measured magnitudes, known magnitudes with unknown phases. But they require that we estimate fairly reliably the values of a large number of structure invariance. Well we can’t do that for very complex structures, for very complex structures we don’t get a sufficiently large number of probability distributions which yield reliable estimates for the structure invariance. So we have to do something else, when we try to strengthen the traditional direct methods to be useful for much more complicated structures. Say structures in the neighbourhood of three or four or five hundred or even more non-hydrogen atoms in the molecule. We have to do better than we have done in the past. But again we use the fundamental principle of direct methods. We use again the fact that it is the structure invariance which link these measured magnitudes with unknown phases, even though we can no longer estimate reliably the values of a large number of these structure invariance in the case of very complex molecular structures. We can always calculate reliably these conditional probability distributions. So just as for the traditional direct methods, the structure invariance link known magnitudes E with unknown phases Phi. Now they all again link these magnitudes with these phases but the property of these structure invariance which we surely know is their conditional probability distributions. That we surely know. And so we can try to solve the following problem. We can try to estimate the values of a large number of individual phases, say several hundred, three-hundred, four-hundred or five-hundred individual phases in one block, at one stroke. By requiring that the values have the property that when we construct from those phases, several hundred phases all the structure invariance which we can construct. Let’s say all the triplets and all the quartets that those structure invariance have a distribution of values then which agrees with theoretical distributions. We know their theoretical distributions and we require that the individual phases have such values that when we generate all the triplets and all the quartets which we can that their distributions, their conditional distributions assuming as known certain magnitudes, agree with the known theoretical distributions. The one thing we know for sure is that even for complex structures we know the probability distributions of the structure invariance. We may not be able to use these distributions to give us reliable estimates of the structure invariance but we know their distributions. And we have from this point of view a tremendous amount of over-determination because from a set of say three-hundred phases or so we can generate in any given case some tens of thousands of triplets and hundreds of thousands of quartets. And we know of course the distributions of all these triplets and all these quartets. And we can ask the question, whether we can answer it or not is another question, but we can certainly ask the question. What must be the values of the individual phases so that when we generate these enormous numbers of structure invariance, perhaps millions of them in any give case, that they have distributions of values which agree with their known theoretical distribution? If I may use that term. So that’s the problem that we try to answer now and I hope in the next few minutes to tell you what the answer to that question is. On this slide I just have just a brief summary of what I’ve already shown. I’ve already shown that for the triplets, Phi(HK) and for the quartets, Phi(LMN) we can calculate these parameters of the distribution. For example the expected value of the cosine for the triplet, I already showed you the formula for that. We can also calculate what I’ve called the weight which is the reciprocal of the variance for the cosine. I haven’t shown you the formula for that but it’s easily calculated once we know the distribution. And we can do exactly the same thing for the quartet, we can calculate as I’ve already shown you what the expected value of the cosine of the quartet is and we can also calculate the variance of the cosine of the quartet. So we can assume that these are known parameters of the distributions that we are concerned with. I should mention one other thing that I haven’t stressed. That is because from a set of phases, let’s say three or four hundred phases we can generate hundreds of thousands of invariance it follows that their must exist a very large number of identities which the invariance must satisfy. The very fact of the redundancy here, the fact that we can generate hundreds of thousands of invariance from just a few hundred phases means that the invariance must of necessity satisfy a very large number of identities. We shall make important use of that over determination property of this method. On this slide I’ve shown you what the mathematical formulation is of the requirement that the structure invariance, these hundreds and thousands of them which are generated by a set of several hundred phases, the requirement that those structure invariance obey their known theoretical probability distribution. The requirement is very simple, here we have the triplets. Here we have the quartets, incidentally in this work it’s absolutely essential that we use the quartets in addition to the triplets. Although the traditional direct method depends mostly on the triplets and very little on the quartets if at all. For the present formulation we need to have both triplets and quartets, because of the fact that with the triplets the only estimates of the triplets that we can obtain are the zero estimates where the cosines are positive. But for the quartets where the quartets may have the value, most probable values may be 180° the cosines are negative and we need to use those quartets. The fact that we have one or two orders of magnitude, more of these so-called negative quartets. Quartets whose probable cosines are, the expected values of these cosines is negative we need to make very strong use of those. Well I’ve already told you that these parameters, this T is determined from the known distributions. It’s simply the expected value of the cosine of the triplet. This is the expected value of the cosine of the quartet. These are simply weights which I already described before and I relate it to the variances of the cosines of the quartets and triplets. So all these parameters are known. Phi(HK) is an abbreviation for this triplet. Phi(LMN) is an abbreviation for this quartet. The condition which has to be satisfied if we are to find an answer to the question that I raised a few minutes ago, is that the cosines of the triplets must, well the value of this function of the invariance, Phi(HK) and Phi(LMN), this function of these invariance of which there are maybe hundreds of thousands of them. So this is a sum over several hundred thousand of terms. The value of this function, of these invariance, this one and this one must be a minimum. When this function is a minimum then we can be sure that we have answered our question which I raised before. That is to say - what must be the values of the individual phases so that when we generate triplets and quartets we get distributions of values for these which agree with their known theoretical distribution? The answer to the question is to minimise this function of invariance, Phi(HKK) and Phi(LMN) subject to the constraint that all the identities which the invariance must satisfy are in fact satisfied. Now that requirement that the identities which must exist among the invariance simply because there are so many of them and there are relatively few phases, that requirement of course is a tremendously restrictive requirement. So our problem then is formulated in a very simple way. Here is a known function of several hundred thousand invariance. We have to find the values of the phases which minimise that function of several hundred thousand of invariance. Subject to the condition that all identities which must hold among the invariance are in fact satisfied. The answer is very simple. However we still have a major problem. How do we find the answer? How do we determine the phases which will make this function a minimum, considered as a function of these invariance? And the first step to the answer to that question is shown on the next slide, on the right hand side which looks very similar to this. Except now, and I’ve called this the minimal principle. It’s the minimal principle for the individual phases. This is a function of invariance, Phi(HK), Phi(LMN), but the invariance themselves are explicitly expressed in terms of individual phases. So this defines implicitly a function of phases of which there may only be a few hundred. Here we have several hundred thousand invariance, here on the right hand side when we consider this function to be a function of phases, we have only three or four or five hundred phases. So this is a function of a relatively small number of phases. And the minimal principle says that that set of phases is correct which minimises this function of the phases. So the answer to the question that I previously raised is in fact formulated in a very simple way. It’s formulated as this minimal principle. But there still remains a major problem. Even a function of three or four or five hundred phases is a function for which it is very difficult to find the global minimum, especially if as in this case there are many local minimum. In the case like this with several hundred phases there may be something of the order of ten to the one-hundredth power local minima. From this enormous number how are we to select the one global minimum which is the answer to our question? Well, it would be very nice of course, if this function were very well behaved in the sense that we could start with a random set of values for the phases. Just choose phases at random. And then use standard techniques to find the minimum nearby that. There is several ways of doing that one is the least squares technique which however has the disadvantage that it will get the local minimum which is near to the starting point, will be trapped in a local minimum far away from the global minimum that we are looking for. So that’s a method that in general will not give us the answer. Or we could use a different method, a method called parameter shift method in which we vary the phases one at a time, look for the minimum as a function of a single phase and that way escape the trap of being caught in the local minimum. We may get an answer; a minimum far removed from the stating set but in general still a local minimum as it turns out. Not the global minimum that we are looking for. So it looks as if we have traded one very difficult problem for another problem just as difficult. But I would like to describe in the remaining few minutes that I have what we have done in order to try to solve this problem. And to show in fact that at least for a small molecule we have been able to resolve this problem. We have in fact found the unique global minimum chosen from this set of maybe ten to the one-hundredth power local minima we have in fact gotten the global minimum. I would like to describe in the next few minutes how we have done this. We have taken a small molecule, a molecule consisting of twenty-nine atoms, non-hydrogen atoms in the molecule. And we’ve constructed this function, this RFV function, and we calculated that function. First when we put in, since we know the answer beforehand, we know the values of the phases. And when we put in those values, the value of this function turns out to be approximately four tenths. And then we also have put in seven other randomly chosen values for the phases and in each case as you can see the values of the function is bigger than when we put in the true values of the phases. Which of course is in agreement with the property that I’ve already stated. That it is for the true phases that this function has a minimum. And has the minimum of approximately four tenths compared to random phases which give minima running around .67 or .68 or so on. Incidentally in this case we have calculated not merely the values of the function for seven randomly chosen phases but for thousands of them. And in all cases the value of the function is much larger than four tenths. It runs from about .66 to .69 or so. So there is no doubt that we have in fact confirmation of the theoretical result that the function as a minimum when the phases are equal to their true values. Well, starting with the true values, we went through two methods for getting the local minimum near to the starting set. One method was the least squares method, we went through a number of cycles of least squares and we ended up with values from the phases near to the starting set, not exactly the same. And it gives us a minimum of .366. The set of phases incidentally corresponding to this global minimum now gives us by means of the Fourier synthesis essentially the whole structure. The whole 29 atoms appear in the Fourier map when the phases which are put in are the phases which correspond to the global minimum of this function which is .366. If we use a parameter shift method for getting the minimum near to the starting set we get the same minimum which is not too surprising. But what happens when we put in a random set of phases and we go through both processes we get a local minimum, .44 here and .46 here. It’s not a global minimum clearly, this is the global minimum so we get a local minimum. And the same thing happens with each of these other random starts; we get local minima which however are not the global minimum. Well of all these minima we have chosen two to be of particular interest, 1.4125 which is the smallest one in this column. And the other .43 which is the smallest one here except for the true global minimum. And we have made the assumption that because .41 and .43 are both less than the other local minima which run about .45 or .46, that the phases which give us these minima, these local minima now, somehow or other carry some structural information in them. They are not, certainly they are not the correct phases, we know that, the correct phases give us the global minimum. But the assumption is made that they carry some structural information. If they are to carry structural information the question is how do we find what that structural information is? And the answer of course is very simple, all we do is use the phases that we get let’s say from this local minimum, calculate the Fourier series and have a look at it. See if in fact the structure is in there. Well we’ve done that, the next slide shows what happens. We’ve done that for that minimum, this was the random start, after minimisation we get .4125. We construct the Fourier series with co-efficients using these phases and known magnitudes and we take a look at it. Well it doesn’t look very good, it doesn’t seem to have any structural information in it. But we expect there will be some structural information in it and the way that we have chosen to extract that structural information is to assume that the information is contained in the largest peaks of that Fourier series. So we’ve taken the top six peaks of that Fourier series, that gives us what we hope is a fragment of the structure. Using those presumed atomic position vectors we can now calculate normalised structure factors E, which is to say both magnitudes and phases. In this way we get a new set of phases. Different from the random set we started with and certainly different from the set which gave us that local minimum. We get a new set of phases. We use the known magnitudes of the normalised structure factors with this new set of phases in our minimal function again. Well it turns out that the value of the function is now less than what happened when we had random start but more than the local minimum which we got before. And that's not surprising because we are using only six peaks among the total of maybe several hundred peaks. We are using the six strongest peaks. But when we go through the minimisation process again we find that we get a smaller minimum than we had before. Another local minimum .39, smaller than before and so we expect that the phases which give rise to this local minimum carry still more structural information than this set of phases. Well it turns out although we might have difficulty doing this if we didn’t know the structure that the full structure, all 29 atoms do in fact appear among the strongest 135 peaks. That may not seem like a very useful result of course because it may be difficult in the case that we didn’t know the structure to see it, to see the 29 atoms in the 135 strongest peaks. Well we don’t assume that we’ve done that. Instead from this Fourier series, the Fourier series calculated with the phases which give us this local minimum. From that Fourier series we take the top twelve peaks now, again under the presumption that most or all of these peaks do in fact correspond to true atomic positions. We go through the process once more, we calculate the value of this function for the set of phases calculated on the basis of these 12 peaks. And we now find the value of this minimum function to be .439. Smaller than each of these but bigger than what we got before. Again we are not surprised at that because we are using here only 12 peaks among maybe 135 peaks. But we go through the minimisation process again, and now the local minimum turns out to be .37. By doing this process then is among these enormous numbers of local minima we have been able to find the unique global minimum or something very close to it. Sufficiently close that it’s trivial to pick out the structure. Now I see that my time is up, so I can’t describe the second application which however is very similar to this, instead of using the local minima of .41 as the next slide shows we used the next local minimum which was .43. We go through a rather similar process and we end up with the same result, essentially the same results. After two cycles 28 of the 29 atoms appear among the strongest 31 peaks and the 29th atom appears at the peak number 44. For this starting point as well as the starting point shown on the previous slide we are able to find essentially the global minimum or something very close to the global minimum and in both cases to solve this structure. What remains to be seen is whether we can do the same thing for a much more complicated structure. Say a structure with several hundred atoms where the calculations then become much greater than they are now. Because instead of using only 300 phases as we’ve done in this case. We may need to use for a much more complicated structure instead of 300 phases maybe 1,000 phases. And instead of a couple hundred thousand invariance we may need to use a couple of million. So the calculations become much greater. But if the only problem is complexity of calculation then we have made a big advance because even existing computers are capable of handling that kind of calculation. Thank you.

Herbert Hauptman (1989)

A New Minimal Principle in X-ray Crystallography

Herbert Hauptman (1989)

A New Minimal Principle in X-ray Crystallography

Comment

The mathematician Herbert Hauptman took part in 5 consecutive Lindau Chemistry Meetings, but only gave lectures at the first four. These lectures all concern the so-called phase problem of X-ray crystallography, the problem on which Hauptman had worked since around 1950, partly together with the physical chemist Jerome Karle. Together they had published a set of texts describing a way of handling this problem practically and it was for this work that they received the 1985 Nobel Prize in Chemistry together. The phase problem of X-ray crystallography was thought to imply that the direct inversion of experimental data into crystal structure is strictly impossible from a fundamental mathematical viewpoint. Therefore scientists historically used different methods to try to overcome this difficulty. For small simple crystals, it has been often been enough to extract certain crystal parameters out of the experimental data. For larger and more complex crystals, methods of changing the crystal structure by insertion of heavy atoms and comparing diffraction data with and without insertions have been (and are still) used. What Hauptman and Karle showed, is that the knowledge that crystals are made up of atoms, is enough to overcome the phase problem. Using this knowledge, they developed a probabilistic method which is particularly suited to medium complex crystals and which relies heavily on the use of computer calculations. In all his four Lindau lectures, Hauptman gives clear and pedagogical presentations and it is really a pity that we don’t have his equations. But Hauptman’s Nobel Lecture given in Stockholm in 1985 concerns the same phase problem and can be found on the web site of Nobelprize.org. If you are seriously interested in Hauptmans’s lecture, I recommend that you to look it up!

Anders Bárány

Cite


Specify width: px

Share

COPYRIGHT

Cite


Specify width: px

Share

COPYRIGHT


Related Content