building proteins in a day: efficient 3d molecular reconstruction · 2015-05-24 · [2]william t...

1
Building Proteins in a Day: Efficient 3D Molecular Reconstruction Marcus A. Brubaker, Ali Punjani, David J. Fleet Department of Computer Science, University of Toronto Discovering the 3D atomic structure of molecules such as proteins and viruses is a fundamental research problem in biology and medicine. Electron Cry- omicroscopy (Cryo-EM) is a promising vision-based technique for struc- ture estimation which attempts to reconstruct 3D structures from 2D images. This paper addresses the challenging problem of 3D reconstruction from 2D Cryo-EM images. A new framework for estimation is introduced which re- lies on modern stochastic optimization techniques to scale to large datasets. We also introduce a novel technique which reduces the cost of evaluating the objective function during optimization by a factor of over 100,000. The net result is an approach capable of estimating 3D molecular structure from large scale datasets in about a day on a single workstation. Biological macromolecules are composed of chains of simpler monomers, and the conformation or "folding" of these chains into a 3D structure is what gives each macromolecule its specific function and properties. The ability to routinely determine the 3D structure of such molecules would potentially revolutionize the process of drug development and accelerate research into fundamental biological processes. The Cryo-EM reconstruction technique is applicable to medium to large-sized molecules in their native state. This is in contrast to X-ray crystallography which requires a crystal of the target molecule, which are often impossible to grow or nuclear magnetic resonance (NMR) spectroscopy which is limited to smaller molecules. The Cryo-EM reconstruction task is to estimate the 3D density of a tar- get molecule from a large set of images of the molecule (called particle im- ages). The problem is similar in spirit to multi-view scene carving [3, 6] and to large-scale, uncalibrated multi-view reconstruction [1]. Like multi-view scene carving, the goal is to estimate a dense 3D occupancy representation of shape from a set of different views, but unlike many approaches to scene carving, calibrated cameras cannot be assume, since the relative 3D poses of the molecules in different images are unknown. Like uncalibrated, multi- view reconstruction, we aim to use very large numbers of uncalibrated views to obtain high fidelity 3D reconstructions, but the low signal-to-noise levels in Cryo-EM, often as low as 0.05 [2] (see Fig. 1), and the different image formation model prevent the use of common feature matching techniques to establish correspondences. Computed Tomography (CT) [4, 5] uses a simi- lar imaging model (orthographic integral projection) as Cryo-EM, however in CT the projection direction of each image is known. Existing Cryo-EM techniques suffer from two key problems. First, without good initialization, they converge to poor or incorrect solutions, often with little indication that something is wrong. Second, they are extremely slow, in some cases requir- ing hundreds of thousands of CPU hours for a single reconstruction. We introduce a framework for Cryo-EM density estimation, formulat- ing the problem as one of stochastic optimization to perform maximum- a-posteriori (MAP) estimation in a probabilistic model. The approach is remarkably efficient, providing useful low resolution density estimates in an hour. We also show that our stochastic optimization technique is insensitive to initialization, allowing the use of random initializations. We further in- troduce a novel importance sampling scheme that dramatically reduces the computational costs associated with high resolution reconstruction. This leads to speedups of 100,000-fold or more, allowing structures to be deter- mined in a day on a modern workstation. In addition, the proposed frame- work is flexible, allowing parts of the model to be changed and improved without impacting the estimation; e.g., we compare the use of three differ- ent priors. To demonstrate our method, we perform reconstructions on two real datasets and one synthetic dataset. We believe that Cryo-EM is an im- portant and challenging problem which, with a few exceptions (e.g.,[7]), has seen little attention in the computer vision community. The approach pro- posed here represents a significant step forward, allowing high resolution reconstructions to be computed quickly and efficiently and without specific initialization. This is an extended abstract. The full paper is available at the Computer Vision Foundation webpage. Figure 1: The goal is to reconstruct the 3D structure of a molecule (right), at nanometer scales, from a large number of noisy, uncalibrated 2D projections obtained from cryogenically frozen samples in an electron microscope (left). Initial 1 hour 6 hours 12 hours 24 hours Figure 2: The random initialization (left) used in all experiments and re- construction of the thermus dataset after various amounts of computation. Within an hour of computation, the gross structure is already well deter- mined, after which fine details emerge gradually. [1] Sameer Agarwal, Noah Snavely, Ian Simon, Steven M. Seitz, and Richard Szeliski. Building rome in a day. In ICCV, 2009. [2] William T Baxter, Robert A Grassucci, Haixiao Gao, and Joachim Frank. Determination of signal-to-noise ratios and spectral snrs in cryo- em low-dose imaging of molecules. J Struct Biol, 166(2):126–32, May 2009. [3] R. Bhotika, D.J. Fleet, and K. Kutulakos. A probabilistic theory of occupancy and emptiness. In ECCV, 2002. [4] James Gregson, Michael Krimerman, Matthias B. Hullin, and Wolfgang Heidrich. Stochastic tomography and its applications in 3d imaging of mixing fluids. ACM Trans. Graph. (Proc. SIGGRAPH 2012), 31(4): 52:1–52:10, 2012. [5] Jiang Hsieh. Computed Tomography: Principles, Design, Artifacts, and Recent Advances. SPIE, 2003. [6] K. N. Kutulakos and S. M. Seitz. A theory of shape by space carving. IJCV, 38(3):199–218, 2000. [7] Satya P. Mallick, Sameer Agarwal, David J. Kriegman, Serge J. Be- longie, Bridget Carragher, and Clinton S. Potter. Structure and view estimation for tomographic reconstruction: A bayesian approach. In CVPR, 2006.

Upload: others

Post on 16-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Building Proteins in a Day: Efficient 3D Molecular Reconstruction · 2015-05-24 · [2]William T Baxter, Robert A Grassucci, Haixiao Gao, and Joachim Frank. Determination of signal-to-noise

Building Proteins in a Day: Efficient 3D Molecular Reconstruction

Marcus A. Brubaker, Ali Punjani, David J. FleetDepartment of Computer Science, University of Toronto

Discovering the 3D atomic structure of molecules such as proteins and virusesis a fundamental research problem in biology and medicine. Electron Cry-omicroscopy (Cryo-EM) is a promising vision-based technique for struc-ture estimation which attempts to reconstruct 3D structures from 2D images.This paper addresses the challenging problem of 3D reconstruction from 2DCryo-EM images. A new framework for estimation is introduced which re-lies on modern stochastic optimization techniques to scale to large datasets.We also introduce a novel technique which reduces the cost of evaluatingthe objective function during optimization by a factor of over 100,000. Thenet result is an approach capable of estimating 3D molecular structure fromlarge scale datasets in about a day on a single workstation.

Biological macromolecules are composed of chains of simpler monomers,and the conformation or "folding" of these chains into a 3D structure is whatgives each macromolecule its specific function and properties. The abilityto routinely determine the 3D structure of such molecules would potentiallyrevolutionize the process of drug development and accelerate research intofundamental biological processes. The Cryo-EM reconstruction techniqueis applicable to medium to large-sized molecules in their native state. Thisis in contrast to X-ray crystallography which requires a crystal of the targetmolecule, which are often impossible to grow or nuclear magnetic resonance(NMR) spectroscopy which is limited to smaller molecules.

The Cryo-EM reconstruction task is to estimate the 3D density of a tar-get molecule from a large set of images of the molecule (called particle im-ages). The problem is similar in spirit to multi-view scene carving [3, 6] andto large-scale, uncalibrated multi-view reconstruction [1]. Like multi-viewscene carving, the goal is to estimate a dense 3D occupancy representationof shape from a set of different views, but unlike many approaches to scenecarving, calibrated cameras cannot be assume, since the relative 3D posesof the molecules in different images are unknown. Like uncalibrated, multi-view reconstruction, we aim to use very large numbers of uncalibrated viewsto obtain high fidelity 3D reconstructions, but the low signal-to-noise levelsin Cryo-EM, often as low as 0.05 [2] (see Fig. 1), and the different imageformation model prevent the use of common feature matching techniques toestablish correspondences. Computed Tomography (CT) [4, 5] uses a simi-lar imaging model (orthographic integral projection) as Cryo-EM, howeverin CT the projection direction of each image is known. Existing Cryo-EMtechniques suffer from two key problems. First, without good initialization,they converge to poor or incorrect solutions, often with little indication thatsomething is wrong. Second, they are extremely slow, in some cases requir-ing hundreds of thousands of CPU hours for a single reconstruction.

We introduce a framework for Cryo-EM density estimation, formulat-ing the problem as one of stochastic optimization to perform maximum-a-posteriori (MAP) estimation in a probabilistic model. The approach isremarkably efficient, providing useful low resolution density estimates in anhour. We also show that our stochastic optimization technique is insensitiveto initialization, allowing the use of random initializations. We further in-troduce a novel importance sampling scheme that dramatically reduces thecomputational costs associated with high resolution reconstruction. Thisleads to speedups of 100,000-fold or more, allowing structures to be deter-mined in a day on a modern workstation. In addition, the proposed frame-work is flexible, allowing parts of the model to be changed and improvedwithout impacting the estimation; e.g., we compare the use of three differ-ent priors. To demonstrate our method, we perform reconstructions on tworeal datasets and one synthetic dataset. We believe that Cryo-EM is an im-portant and challenging problem which, with a few exceptions (e.g., [7]), hasseen little attention in the computer vision community. The approach pro-posed here represents a significant step forward, allowing high resolutionreconstructions to be computed quickly and efficiently and without specificinitialization.

This is an extended abstract. The full paper is available at the Computer Vision Foundationwebpage.

Figure 1: The goal is to reconstruct the 3D structure of a molecule (right), atnanometer scales, from a large number of noisy, uncalibrated 2D projectionsobtained from cryogenically frozen samples in an electron microscope (left).

Initial 1 hour 6 hours 12 hours 24 hoursFigure 2: The random initialization (left) used in all experiments and re-construction of the thermus dataset after various amounts of computation.Within an hour of computation, the gross structure is already well deter-mined, after which fine details emerge gradually.

[1] Sameer Agarwal, Noah Snavely, Ian Simon, Steven M. Seitz, andRichard Szeliski. Building rome in a day. In ICCV, 2009.

[2] William T Baxter, Robert A Grassucci, Haixiao Gao, and JoachimFrank. Determination of signal-to-noise ratios and spectral snrs in cryo-em low-dose imaging of molecules. J Struct Biol, 166(2):126–32, May2009.

[3] R. Bhotika, D.J. Fleet, and K. Kutulakos. A probabilistic theory ofoccupancy and emptiness. In ECCV, 2002.

[4] James Gregson, Michael Krimerman, Matthias B. Hullin, and WolfgangHeidrich. Stochastic tomography and its applications in 3d imaging ofmixing fluids. ACM Trans. Graph. (Proc. SIGGRAPH 2012), 31(4):52:1–52:10, 2012.

[5] Jiang Hsieh. Computed Tomography: Principles, Design, Artifacts, andRecent Advances. SPIE, 2003.

[6] K. N. Kutulakos and S. M. Seitz. A theory of shape by space carving.IJCV, 38(3):199–218, 2000.

[7] Satya P. Mallick, Sameer Agarwal, David J. Kriegman, Serge J. Be-longie, Bridget Carragher, and Clinton S. Potter. Structure and viewestimation for tomographic reconstruction: A bayesian approach. InCVPR, 2006.