optical design with automatic computers

Optical Design with Automatic Computers

Donald P. Feder

Introduction From 1950 to 1960, the art of optical design under

went a drastic change owing to the introduction of the automatic computer. First to change was the analysis of completed or partially completed designs by means of ray tracing. By 1950, ray-tracing programs had been written for a number of computers including the IBM Card Programmed Calculator, the Standards Eastern Automatic Computer (SEAC), and the Harvard Mark I. The problem of synthesizing an optical design was known to be far more difficult, but people had already begun to consider ways in which this might be done. Early work by the Harvard Study Group indicated the direction of possible progress. By 1954, work on the automatic design of lenses was under way at Harvard, at the University of Manchester in England, and at the National Bureau of Standards. Work at the Bureau was made possible when the SEAC, until then used on higher priority military work, became available for optical computations. From 1954 to 1956, I explored the possibilities of optimizing lenses using various

January 1972 / Vol. 11, No. 1 / APPLIED OPTICS 53

gradient methods. By the end of 1956, progress was sufficiently encouraging that I was able to persuade R. Kingslake to hire me on the promise that I would develop a practical automatic-design program for the Eastman Kodak Company. Eventually, this led to the writing of the LEAD program, which has been used by Kodak since 1962.

In April 1956, the Optical Design Department at Kodak, under the directorship of Kingslake, purchased a Bendix G-15 computer for use by the department. Prior to that purchase, I had a long talk with Kingslake about the virtues of the various computers that were appearing on the market. I was afraid that he had made a mistake in buying an untried machine from a company new to the computing business. There was plenty of reason for hesitation; the early computers were not very reliable, and new companies were entering (and leaving) the new field with alacrity. The first Bendix had a penchant for making numerical mistakes and was gradually replaced by a better version.1 Kings-lake's judgement was vindicated as the Bendix became the workhorse of the department until it was finally superseded by the IBM 1130 in 1966. A record of using a single computer successfully for 10 years was unheard of at that time. During this period, a second Bendix was obtained to meet the needs of the fifteen or so members of the department.

Analysis of Optical Designs

The Bendix was promptly programmed to trace meridian rays and to compute Coddington fields. Much of this work was done by Kingslake, assisted by William Price, who had joined the department in 1954. Later, programs for calculating Seidel aberrations and tracing skew rays were added. The Bendix made the tracing of rays easy and accurate but did not yield an over-all impression of the quality of the image. It was the practice to trace a fan of meridian rays and to calculate the Coddington fields along the chief ray. Occasionally a few skew rays were traced, but it was not easy to interpret the results. It was possible to tell from this type of analysis how much coma, astigmatism, field curvature, etc. were present in the image, but it was hard to know how to balance these various image errors to produce good over-all image quality. This need was satisfied by the development of the radial energy-distribution analysis.

I had written a spot-diagram program at the Bureau of Standards for the analysis of Air Force designs. However, owing to the slowness and unreliability of the early computing equipment, it was a formidable operation, and it cost about $2000 to analyze a lens by this method. In particular, the early plotters were both expensive and inconvenient to use, so, upon coming to Kodak, I resolved to write a program that would not require that the spot diagrams be plotted. This implied that they should be analyzed mathematically. At the Bureau, we had found a rudimentary energy distribution based upon counting the spots. This was done in two directions, the sagittal and the tangential, and

the results were somewhat hard to understand. I decided to try a radial energy distribution.

For this purpose, a fan of rays, uniformly distributed in the entrance pupil, is traced from a fixed object point, and their intersection points are determined in the image plane. The resulting pattern of spots is then analyzed in the following way. Suppose that 100 rays have been traced. The spot diagram is searched for the location and radius of the smallest circle that contains ten rays. Then, it is searched again for the circle that contains twenty rays. The process is continued until the smallest circle that contains all the rays has been found. This procedure can be readily carried out by the computer. Assuming now that the energy goes where the rays go, the distribution of energy in the image can be displayed by plotting the radius of the circle against the percent energy contained within it. This is then done for several fields. The program was written in 1957 and has been revised several times since. The first program cost less than $100 per lens, which was a considerable improvement over the Bureau program. Our present version was written by Phillip Creighton. By its use we can obtain a complete set of heterochro-matic energy distributions at eight focal planes, five wavelengths, and five field angles for a twelve-surface lens for under $5.00. The cost could be brought even lower if someone finds the time to optimize the ray-tracing subroutine.

Two objections have been brought against the use of radial energy distributions for analyzing optical designs. The first is perfectly valid and is based upon the wave nature of light. The energy does not really go where the rays go. However, it is known that if the aberrations are of the order of several wavelengths or larger, the approximation is good enough for practical purposes. A second objection is sometimes raised, but in my opinion it has no merit. This is that the radial energy distribution discards information about the asymmetry of the image, particularly with respect to astigmatism. This is partly true, and it is a good thing. Unless the lens is intended for the examination of picket fences whose orientations are known in advance, the information discarded is not only of no value, but it actually serves to confuse the issue. The fact that a lens forms an image that is sharp when it is scanned in one direction does not in the least compensate for a bad image when it is scanned in the other. The plot of radial energy distribution looks different when astigmatism is present than when sperical aberration is the predominant defect. Such an asymmetric image tends to be heavily penalized by the radial energy-distribution analysis, and I think this is in accord with practical experience. In my opinion, the introduction of the use of radial energy-distribution analyses, which was made possible entirely through the development of the high speed computer, has contributed in large measure to the improvement of lenses designed at the Eastman Kodak Company.

The use of the computer to analyze an optical design was a great help to the lens designer, but a more dramatic change has been brought about by the widespread

54 APPLIED OPTICS / Vol. 11, No. 1 / January 1972

use of the automatic design programs that appeared in the last decade. From a mathematical standpoint, the design of a lens can be considered to be an exercise in the solution of a set of nonlinear equations with prescribed boundary conditions. Augustin Cauchy had written a paper in 1847 describing a gradient procedure, which he called "a general method" for solving such equations. I began work on the Kodak program in 1957 after finishing the energy-distribution program. Before discussing this work, however, I had better introduce some concepts that are pretty widely used by those working in the field of automatic design.

In optics, the equations to be solved are those that give the image errors as functions of the construction parameters of the lens. The construction parameters are the independent variables: the indices of refraction, dispersions, curvatures, aspheric coefficients, and vertex thicknesses, which, along with the clear apertures of the surfaces, permit the design to be fabricated. We can imagine an N-dimensional space, where N is the number of independent variables such that every point represents an optical design. The problem is to search this space to find a point where the image errors are zero, or at least, satisfactorily small. Such a solution must also satisfy auxiliary boundary conditions, which restrict the range of solutions to those designs that have appropriate values for edge thicknesses, diameters, vignetting, glass indices, back focal distance, etc. These boundary values, as well as the image errors, are functions of the construction parameters and can be readily computed at every point in the space. To search this N-dimensional space for the best design, it is necessary to be able to evaluate any possible design. This is equivalent to saying that, when alternative designs are presented, a decision rule must be available to select the preferred one. To be sensible, such a decision rule must depend upon the quality of the design, that is, upon the values of the image errors and also upon the use to which the lens is to be put. Once such a decision rule has been formulated, it is possible to order any set of designs in the order of preference. This ordering must be a function of the image errors and therefore of the construction parameters. An ordering of this type is called a merit function, and it may be formally defined as a mapping of the variable space upon the real numbers. It has been traditional to select a merit function that assigns lower numbers to the more preferred designs. Because of its mathematical convenience, the sum of the squares of the image errors has been widely used as a merit function. This function maps the parameter space upon the positive real axis. A value of zero is found if, and only if, all the errors are zero and hence represents a perfect lens. The sum of the squares is a reasonably good choice for a merit function although it may not accurately represent the designer's preferences when the imagery is critically examined. In such a case, the lower value of this function may not correspond with the better design. Merit functions, intended for use in automatic design programs, ought to be continuous

functions of the independent variables, with continuous derivatives. Occasionally, I have met people working in this field who have stated that they did not employ a merit function. In every case, such a statement appeared to be based upon a misunderstanding; they were using an implicit function rather than explicit one. Before the advent of automatic programs, of course, everyone used an implicit function. In fact, the function was apt to vary from time to time depending upon the mood of the designer and other factors not easy to calculate numerically.

Synthesis of Designs

All the known ways of solving sets of nonlinear equations share certain basic similarities with Cauchy's general method. They all involve the reduction of a merit function by an iterative procedure. Starting at an arbitrary initial point, information is gathered about the local behavior of the functions. Based upon this information, some algorithm is employed to calculate a new point that is expected to have a lower value of the merit function. The methods differ as to the information gathered and the algorithm used to calculate the new point. Cauchy simply computed the value of the merit function and its derivatives. Then he calculated a sequence of values of the merit function along the direction of the negative gradient and used this information to locate the point where the function stopped improving. At this point a new gradient was computed, and the process was repeated. I t is possible to show that, under conditions of continuity, a sequence of points computed in this way will converge to a limiting point and that the value of the gradient at that point will be zero. In a practical case, this means that the limiting point corresponds to a local minimum.

Unfortunately, this method does not converge rapidly enough to be of practical use, even in the simplest case in which the error functions are linear in the variables. In this case, if we take a merit function consisting of the sum of the squares of the individual error functions, we obtain a quadratic function in the independent variables. The level lines of this quadratic merit function are a set of concentric ellipsoids. These ellipsoids tend to be extremely eccentric. Hence, the direction of the gradient is not toward the center where the minimum is located. The result is that the successive points of the iteration bounce back and forth across the set of ellipsoids and approach the center very slowly.

The slow convergence of the gradient method was known at least as early as 1952, and I was well aware of the problem when I began work at Kodak. I made the supposition that any method that could be used to solve nonlinear equations would also have to be efficient at solving linear ones. The most efficient way of solving linear equations is by Gaussian elimination. A set of twenty linear equations that would require hours to solve by the gradient method could be solved in seconds by the elimination method. However, when the equations are nonlinear, a linear solution is only an approximation, and the method must be converted into an


iterative one. Here again, we get into trouble. The procedure is to linearize the problem by expanding the error functions in Taylor's series and keeping only the linear terms. The resulting set of linear equations is then solved. This yields a new starting point where the process is repeated. The difficulty arises because the approximate solution often turns out to be poorer than the original. What I found was that the method would work for one or two iterations, but as it began to get in the neighborhood of the solution, it would fail and give an enormous value of the merit function.

This phenomenon was also known. In fact, Leven-berg had written a paper in 1945 describing a damping procedure for overcoming exactly this difficulty.2 I was aware of this paper but became convinced by an analysis that the method would not be satisfactory. This was rather sad, because the damped least-squares method turned out to be one of the most useful methods for solving sets of nonlinear equations. I t was not until talking to Charles Wynne when he visited Kodak several years later that I became sufficiently convinced to try the method. Wynne, it turned out, had developed it independently and had given it the name SLAMS.3

It was invented by Baker and Girard, also independently, and has been widely and successfully used in optical design.

Having set aside the method of least squares, I tried various numerical experiments intended to improve the convergence of the gradient method. In 1952, a paper by Hestenes and Stiefel described a procedure for solving linear equations. Called the conjugate-gradient method, it converted the ordinary method to one that always converged to a solution in N steps or less, where N was the number of equations in the set. This met my criterion that the method be efficient for solving linear equations. Furthermore, proof of convergence was very elegant and satisfied a personal need for mathematical clarity. Conversion to the case of nonlinear equations was trivial, although of course, convergence could no longer be expected in a finite number of steps. The information, gathered at each step, was the same as that needed for the gradient method, namely, the value of the merit function and its gradient. The direction taken at any point was a linear combination of the gradient computed at that point and the direction vector from the previous point. The formula depended in a very simple way upon the magnitudes of the gradients and could be directly applied even though the equations were nonlinear. In the ordinary gradient method, each new step is at right angles to the previous one. In the new method, this was no longer the case; instead, the successive steps tended to be more nearly parallel to each other. It turned out that, even in the nonlinear case, the convergence was much improved. Application to optical design proved to be practical although its implementation took considerable time because of the complexity of the equations used for the derivatives of the merit function and the difficulties involved in programming ways for treating the boundary conditions.

The most obvious effect of the new programs was a

great reduction in the time required to design a lens. This was rather dramatically illustrated in the summer of 1962 when we designed a four-element lens in a single evening on the IBM 7080. The occasion was a symposium on optics at the University of Rochester, and the lens became known as the Symposium lens. The design required 2y hours of machine time and could be done in a couple of minutes on todays' faster computers. The improvement of quality became apparent when we completed the first major design that used the LEAD program. This was a lens intended for use with Microfile filming, and it needed exceptionally good resolution. The starting point was an excellent inverted telephoto lens, hand designed by Max Reiss, which contained seven elements. Although under the Rayleigh limit, it did have a little spherical aberration, which affected the contrast at resolutions of over 100 lines/mm. Kingslake suggested that perhaps the LEAD program could be used to reduce this spherical aberration. After a great deal of work and many changes in the program, an improved version was finished.4 The spherical aberration was removed, and the over-all quality was substantially improved. The lens was put into production and turned out to be both cheaper and better than the lens it replaced (which was not Reiss's design) in addition to having a flat field, which eliminated the use of a curved platen at the image. This success encouraged us to improve the program further and to make it available to the entire department for routine use. This work was greatly aided by Ralph Guenther, who had joined me in 1962. Guen-ther developed into an excellent programmer and optical designer and has made many additions and improvements to the program including facilities for zoom lenses, aspherics, and literally hundreds of small programming changes, which seem to be constantly needed in large programs. In addition, he has been an invaluable help to the other designers in the proper use of the program. Since 1963, nearly all optical design work at Kodak has been done with the aid of the LEAD program.

Although everyone knows that computers are continually getting faster, the following little table will remind us of just how fast they really are. For a yardstick, let us take the time in seconds required to trace a skew ray through a single spherical surface. The machines mentioned were among those used at Kodak for optical design.

These times are not intended to be precise or to reflect in any way upon the merits of the various machines. They do indicate, however, the fantastic and steady


gain of computing speed during the last two decades. Accompanying this increase in speed, there has also been a corresponding increase of memory capacity. There is no reason to suppose that the next 10 years will not show a comparable gain. On the contrary, new devices are constantly being proposed, including, I was interested to note, holographic optical memories providing rapid access to 1010 bits of storage.5

The development of remote data processing has made it possible, now, for everyone to have access to the very fastest computers. Through the use of the so-called Cybernet, an optical designer at Kodak is able to transmit his problem to be run on a computer in Toronto or Boston and to receive the answers on our IBM 1130 printer often within minutes of the time that the problem was sent Eventually, it will be possible to communicate in this way with computers all over the country. The advantages, in case a particular machine is down or overloaded, are obvious. Furthermore, he can take advantage of other people's programs, which are available to him through the net. These include programs by Grey and by Spencer. Such programs are maintained on disk files at the central installation and can be used on a rental basis. It is however true that there are significant advantages in being able to use an in-house program such as LEAD. Optical design programs are neither foolproof nor error free, and if a designer gets into trouble, it is very helpful to have a Ralph Guenther, available to bail him out. Whether this problem can be overcome by better programming and improved documentation remains to be seen. On the other hand, it seems very likely that optical-design programs are going to become ever larger and more sophisticated so that only the very large optical companies will be able to afford the expense of developing their own programs.

Future Directions There are several directions in which present auto

matic-design programs might be improved. Merit functions have tended to be somewhat simplistic, both because the evaluation of a design during an optimization run must not take more than a second or so and also because the central problem of convergence has not been entirely solved. With further increases in computing speed, however, I look to see an increasing use of physical optics in defining the merit function. Already the use of wave-optical transfer methods is widespread for the analysis of completed designs. (Clarence Bray of this department has been active in the development of such methods and tells the Kodak story in the next section.) In 1962, I described a merit function for optical design in a paper at the Optical Society meeting. I had intended to publish an expanded version but decided to wait until I had some practical experience with the new method. At long last, we have it partially programmed for LEAD and hope to have it running in a few months. I think now that it could not have been done in 1962, because the computers then were simply too slow. In concept, this merit function is similar to the variance of the wavefront and yields the variance for a particular choice of the weighting param

eters. Suppose we calculate the wave aberrations at each point of a grid uniformly spaced over the entrance pupil. Then the merit function for that wavefront is defined as φ, where

In this equation, n is the total number of rays in the grid, and the μj are weights. If the μj are all the same, this defines the variance. It has been said that the variance is not a good measure of quality if the aberrations are too large, but I think the proper use of weighting will overcome this difficulty. The above equation can be transformed slightly and written in matrix form as

where

and

It is assumed that the weights are normalized so that

In an optimization routine, it is not necessary to trace a ray at every point of the grid. We can instead trace a sample set of rays over the pupil and use an interpolation formula. This greatly reduces the amount of time required to evaluate φ. Suppose we trace just a few rays at widely spaced points distributed over the field and aperture of the lens. Call the wave aberrations of these sample rays Ei. Then we can write H ≡ GSIE, where G is a matrix whose elements depend only upon the coordinates of the grid points, and SI is the inverse of a matrix whose elements depend only upon the coordinates of the sample set of rays that are actually traced. Therefore, we can compute G and SI before any rays are actually traced. We can substitute this equation into that for φ to obtain

Finally, we can define a transformation matrix T such that

[This is analogous to taking the square root of a positive number and works in this case because (GSI)′μ(GS1)


is a positive semidefinite matrix. ] Then we can define a new set of image errors that are linear combinations of the Ei,

so that φ = F′F. In practice, the method is a little more complicated than is indicated here, because we need to compute the merit function over the entire field of the lens and not just at one field point. This means that we have to compute a different grid matrix, G, at each field point and that vignetting must be taken into account. All this makes the computation of T more complicated but does not change the fact that it is independent of the results of the ray tracing and hence can be computed in advance. The entire procedure can be looked at as a way of converting the original wave errors, E, into a new set of errors, F, that take better account of the variance of the wavefront. How effective this will prove in practice remains to be seen.

In my judgment, more work is needed to improve the convergence during the course of an optimization run. In many cases, we are sure that the LEAD program does not reach an exact minimum, because the gradient is not exactly zero. This is very apparent in our program, because we compute the derivatives of the merit function with respect to the system parameters by an exact method instead of by using finite differences. I t may be true that further improvement of the design will be negligible, but I would feel better if the gradient were zero. Then, at least, we could be sure that a local minimum had been reached. (Strictly speaking, if the gradient is zero, we can only be sure that we have a stationary point. However, the process of convergence appears to make it impossible to reach a saddle point.)

The problem of reaching a global minimum has not been seriously attacked, probably because no one has any idea how to do it. At present, the only way seems to be to take a large number of starting points. This means that we must have a very fast machine and very good convergence procedures. A great deal of effort has gone into the development of better convergence algorithms; some of these methods are being incorporated into the LEAD program. I hope that we will be able to report the success of these algorithms at a later date.

Of course, it would be a waste to use a great deal of computer time to reach the smallest possible value of the merit function if it proved impossible to fabricate the resulting design. Presumably, different designs exhibit different sensitivities to manufacturing defects. It is possible that this can be taken into account in the optimization of a design. This introduces a new factor into design—that of tilts and decentering. In traditional design, we have always assumed that the lenses were exactly rotationally symmetric. In any real lens this is not true, and we can calculate the deterioration caused by a certain degree of sloppiness in manufacture and add this deterioration to the value of the merit function arising from the ordinary aberrations. An optimization of this modified merit function would then


yield a design that might be actually better, when built, than a design having better theoretical quality but greater sensitivity to errors in manufacturing.

References 1. I t was replaced component by component until it was finally-

removed and a G 15D machine installed. This incorporated a number of engineering improvements and proved to be a much more reliable machine.

2. K. Levenberg, Quart. Appl. Math . 2, 164 (1944). 3. For reference to this and other papers see D. P . Feder, Appl.

Opt. 2, 1209 (1963). 4. U.S. patent 3,466,117. 5. J. A. Rajchman, Appl. Opt. 9, 2269 (1970).

optical design with automatic computers

Documents