dr. rosemary renaut, renaut@asu director, computational biosciences asu/compbiosci

36
Dr. Rosemary Renaut, [email protected] Director, Computational Biosciences www.asu.edu/compbiosci 11/28/2005 1 USDA

Upload: lanai

Post on 01-Feb-2016

35 views

Category:

Documents


0 download

DESCRIPTION

Dr. Rosemary Renaut, [email protected] Director, Computational Biosciences www.asu.edu/compbiosci. 1. USDA. 11/28/2005. OUTLINE THE CBS PROGRAM AT ASU : OVERVIEW CBS CURRICULUM REQUIREMENTS SOME HISTORY FUTURE PROJECTS – WHAT DO THEY INVOLVE OUR CASE STUDIES COURSE(S) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Dr. Rosemary Renaut,  renaut@asu  Director, Computational Biosciences  asu/compbiosci

Dr. Rosemary Renaut, [email protected] Director, Computational Biosciences

www.asu.edu/compbiosci

11/28/20051 USDA

Page 2: Dr. Rosemary Renaut,  renaut@asu  Director, Computational Biosciences  asu/compbiosci

OUTLINE

–THE CBS PROGRAM AT ASU : OVERVIEW–CBS CURRICULUM–REQUIREMENTS–SOME HISTORY–FUTURE–PROJECTS – WHAT DO THEY INVOLVE–OUR CASE STUDIES COURSE(S) –INTRODUCING BASIC MATHEMATICS TO LS/CSE STUDENTS

11/28/20052 USDA

Page 3: Dr. Rosemary Renaut,  renaut@asu  Director, Computational Biosciences  asu/compbiosci

A Professional Science Masters Program

Mathematics and Statistics

The School of Life Sciences

Computer Science Engineering

W. P Carey Schoool Of Business

11/28/20053 USDA

Page 5: Dr. Rosemary Renaut,  renaut@asu  Director, Computational Biosciences  asu/compbiosci

–Scientific Computing for Biosciences (4)–Case Studies/ Projects in Biosciences(4)–Structural and Molecular Biology(4)–Statistics and Experimental Design(6)–Business Practice and Ethics(6)–Internship and Applied Project(6)

CORE REQUIREMENTS

11/28/20055 USDA

Page 6: Dr. Rosemary Renaut,  renaut@asu  Director, Computational Biosciences  asu/compbiosci

–Genomics/Proteomics–Data Mining Data Bases,–Medical Imaging–Molecular/Functional Genomics–Microarray Analysis–Individualized

ELECTIVE TRACKS

11/28/20056 USDA

Page 7: Dr. Rosemary Renaut,  renaut@asu  Director, Computational Biosciences  asu/compbiosci

– Calculus and Differential Equations– Basic Statistics (junior)– Discrete Algorithms and Data Structures– Programming skills(C++/Java)– Cell biology, genetics(junior level)– Organic and Bio Chemistry (junior)

–Motivation, creativity, determination!

PRE-REQUISITES

11/28/20057 USDA

Page 8: Dr. Rosemary Renaut,  renaut@asu  Director, Computational Biosciences  asu/compbiosci

– Interdisciplinary Training/Team Work–Internship/Applied Project Report–Business, Management and Ethics– (Health Services Administration MBA)–Small Groups/Close Faculty Involvement–Computer Laboratory

–Extensive Project work/Consulting

11/28/20058 USDA

Page 9: Dr. Rosemary Renaut,  renaut@asu  Director, Computational Biosciences  asu/compbiosci

•Year 4: total 70 students, currently 27•Graduates: 32 (11 left without graduating)•Internships: NIH, ASU, Tgen, AZ Game and Fish,

US Water conservation lab, AZ biodesign•Jobs: Tgen, ASU, Codon Solutions, Medical record keeping, Matlab, St Judes Memphis, Walt Disney! Cisco•PhD programs (10) Biology, Computer Science,

Biochemistry, Mathematics

DATA

11/28/20059 USDA

Page 10: Dr. Rosemary Renaut,  renaut@asu  Director, Computational Biosciences  asu/compbiosci

–Undergraduate: NIH MARCo Quantitative Skills (sophomore)spring 05

o Modeling Comp Bio (Junior) spring 05

–Doctoral Program Computational Biosciences

–Molecular Cellular Biology / Mathematics

OTHER DEVELOPMENTS

11/28/200510 USDA

Page 11: Dr. Rosemary Renaut,  renaut@asu  Director, Computational Biosciences  asu/compbiosci

–Database Construction/Mining of Pathology Specimens (Tgen)

– Gegenbauer high resolution reconstruction for MRI, ASU

–TLS-SVM for Feature Extraction of Microarray Data, ASU–Automated video analysis for cell behavior. Tgen–EST DB for Marine Dinoflagellate Crypthecodinium cohnii, ASU

–Data mining for microsatellites in ESTS from arabidopsis thaliana and brassica species (US Water Conservation Laboratory)

–The Genome Assembler- Tgen

–A user interface to support navigation for scientific discovery ASU

–Cell Migration Software Tool Tgen

WHAT DO WE DO: SPRING 2004

11/28/200511 USDA

Page 12: Dr. Rosemary Renaut,  renaut@asu  Director, Computational Biosciences  asu/compbiosci

WHAT DO WE DO: SPRING 2005

•EVALUATION OF BIOINFORMATICS RESOURCES (Tgen/NIH/ASU)

•Pattern recognition Automated Cytoskeleton Reconstruction (ASU)

•Develop workable database on crop Lesquerella using Integrated Crop Information Systems (ICIS) (US Water Conservation Laboratory)

• Investigation of Xylella fastidiosa Within an Almond Tree

Population: A Model System for Golden Death ( ASU:

Mathecology, AZ)

•Search for Epigenetic Properties of DNA and RNA

involved in X Chromosome Inactivation , (Codon

Solutions LLC)

11/28/200512 USDA

Page 13: Dr. Rosemary Renaut,  renaut@asu  Director, Computational Biosciences  asu/compbiosci

WHAT DID WE NEED FOR THESE PROJECTS

Image AnalysisData MiningFourier AnalysisModeling Differential EquationsSequence Comparisons

Mathematics for Genetic Analysis StatisticsData base development : for BIOLOGICAL APPLICATIONSGeographic Information Systems

PERL/BIOPERL/MATLAB/MYSQL

11/28/200513 USDA

Page 14: Dr. Rosemary Renaut,  renaut@asu  Director, Computational Biosciences  asu/compbiosci

Bioinformatics: Managing Scientific Data tackles this challenge head-on by discussing the current approaches and variety of systems available to help bioinformaticians with this increasingly complex issue. The heart of the book lies in the collaboration efforts of eight distinct bioinformatics teams that describe their own unique approaches to data integration and interoperability. Each system receives its own chapter where the lead contributors provide precious insight into the specific problems being addressed by the system, why the particular architecture was chosen, and details on the system's strengths and weaknesses. In closing, the editors provide important criteria for evaluating these systems that bioinformatics professionals will find valuable.

11/28/200514 USDA

Page 15: Dr. Rosemary Renaut,  renaut@asu  Director, Computational Biosciences  asu/compbiosci

Computational Modeling Skills Motivated by Case Studies

Phylogenetics and Tree Building (for the data make the tree)

.2170.1882-.1161/.1330.1057/.1083Gorilla(C)

.2172-.1895/.1838.1910.1838.1806/.1838Orang-Utan(D)

Gibbon(E)

Chimp(B)

Human(A)

-.2172/.2142.2156/.2142.2171/.2142.2067/.2142

.2168.1940.1134-.0919/.0821

.2057.1790.1082.09190-

Gibbon(E)Orang-Utan(D)Gorilla(C)Chimp(B)Human(A)

11/28/200515 USDA

Page 16: Dr. Rosemary Renaut,  renaut@asu  Director, Computational Biosciences  asu/compbiosci

All additive trees with 5brancheswhich is the correct one?

11/28/200516 USDA

Page 17: Dr. Rosemary Renaut,  renaut@asu  Director, Computational Biosciences  asu/compbiosci

For this tree we can calculate the patristic distances between sequences :

pBD=e2+e6+e7+e4

This should match the distance from the measured dataWe do a goodness of fit for all distances

║p-d║2= ║Ae –d ║2

What is A, what is e? Any conditions on e?

Repeat for all trees:Use matlabUnderstandLeast SquaresNonnegative constraintsConstrained LSExhaustive search Genetic Algorithms

11/28/200517 USDA

Page 18: Dr. Rosemary Renaut,  renaut@asu  Director, Computational Biosciences  asu/compbiosci

Computational Modeling Skills Motivated by Case Studies

Phylogenetics and Tree Building (for the data make the tree)

169150-96.792Gorilla(C)

169-152.1154149.3Orang-Utan(D)

Gibbon(E)

Chimp(B)

Human(A)

-169169170.9166.2

16915495-79

1621449279-

Gibbon(E)Orang-Utan(D)Gorilla(C)Chimp(B)Human(A)

11/28/200518 USDA

Page 19: Dr. Rosemary Renaut,  renaut@asu  Director, Computational Biosciences  asu/compbiosci

An ultrametric tree : what are the distances ei?Solve the linear programming problem

min L(e) = min ∑ ei,

where this is the total length of the tree. Moreover each length is positive, and the total lengths are preservedeg e1=e2, and e4+e8=e1+e6+e7

LP problem with constraints max cTx with Ax≤b x ≥ 0

Students identify x, c, b, A? Use matlab: linprog

11/28/200519 USDA

Page 20: Dr. Rosemary Renaut,  renaut@asu  Director, Computational Biosciences  asu/compbiosci

BUT THERE AREMANY DIFFERENT TREE SHAPESAND WHICH IS CORRECT?WE NEED

EXHAUSTIVE SEARCH

GENETIC ALGORITHMS?

11/28/200520 USDA

Page 21: Dr. Rosemary Renaut,  renaut@asu  Director, Computational Biosciences  asu/compbiosci

HOW WAS THIS USEFUL?

Introduction to •data fitting, •optimization, genetic algorithms, exhaustive search•matlab routines,•Realistic solutions (positive branch lengths)•Start on some multivariable calculus to derive normal equations

OTHER APPLICATIONS USING SIMILAR TECHNIQUES

Neural networks for classification –how do they learn?Data mining – k-means clustering – minimize energyGradient Descent

11/28/200521 USDA

Page 22: Dr. Rosemary Renaut,  renaut@asu  Director, Computational Biosciences  asu/compbiosci

A Novel Genome Assembler: Using K-mers to Indirectly Perform N2 Comparisons in

O(N)Ho-Joon Lee, Stephanie Rogers and Maulik ShahAdvisor: Jeffrey Touchman, Arizona State University and

Translational Genomics Research Institute, PhoenixThe novel, exhaustive approach to genome assembly aims to eliminate traditional heuristics and indirectly compare each sequence fragment to all the others in less than O(N2) running time. The first step in the algorithm involves building a k-mer library; accomplished by scanning through each sequence fragment using a sliding window of size k and cataloging each of these k-mers and the sequence fragment in which it occurs. It is assumed that neighboring fragments from the genome will share k-mers. From this k-mer table, an adjacency table is built; cataloging each sequence fragment and its neighbors. This adjacency table is generated in O(N) and represents all N2 comparisons. Finally, with the information in the adjacency table, multiple breadth-first searches to collect and separate the connected fragments are performed. It is assumed that there exist disjoint graphs in the adjacency table and that each such graph represents an entire contig. This process was performed on two datasets, a simulated set and a real set: both having >40,000 sequence fragments of ~1,000 kb and 9-fold coverage. In both instances, the majority of the fragments were assigned to one contig.

Page 23: Dr. Rosemary Renaut,  renaut@asu  Director, Computational Biosciences  asu/compbiosci

Database Construction and Mining of Pathology Specimens

Charu Gaur, Jennifer Szeto and Sotiris MitropanopoulosAdvisor: Dominique Hoelzinger, Translational Genomics Research

Institute, Phoenix

New technology allows hundreds of pathology specimens from human diseases to be sampled as .6mm punches of tissues that are arrayed into new “TMA” paraffin blocks; these blocks are then sectioned with microtomes to produce hundreds of slides containing hundreds of human tissue specimens (tissue microarrays, TMAs). Databases to support analysis of these high throughput TMAs will include information on diagnosis, treatment, disease response, and multiple images from follow-on studies linked to the coordinates of each of the hundreds of punches on the TMA. Data mining from the results of TMA experiments will allow text mining and image feature extraction. In this project, we present the requirements, design, and a prototype of a web based TMA database application.

Page 24: Dr. Rosemary Renaut,  renaut@asu  Director, Computational Biosciences  asu/compbiosci

Simulating gene expression patterns during zebrafish embryo development

Ei-Ei GawAdvisor: Ajay Chitnis, National Institutes of Health, Maryland

During embryo development, it is essential that relatively homogenous groups of cells undergo differentiation to form spatially different patterns and eventually take on many different functions. Intercellular communication and morphogen gradients are two aspects that have been shown to play roles in determining cell fate. To understand better how these activities result in pattern formation, we utilized the NetLogo programming environment to simulate these processes. We were able to observe visually the possible pattern formation of gene expression from activation and inhibition of genes, intercellular interactions (top figure), and the exposure to morphogen gradients (middle and bottom figures). The three developmental phenomena of zebrafish embryos we studied were neurogenesis, somitogenesis, and morphogenesis during anterior/posterior patterning. The model for neurogenesis examines how autocatalysis and lateral inhibition (notch signaling) are required to form stable patterns. In addition, we investigated to what extent the geometry of the domains and the initial noise in the ‘her’ gene expression help determine a cell’s fate. To study somitogenesis, we explored how transcription and translation delays coupled with notch pathway and independent moving wave-front activity of fgf may contribute to the oscillation and synchronization of gene expression during formation of somites. Lastly, we used our models to examine how time and concentration of a morphogen gradient may signal gene activation and eventually form patterns of cells with stable and differing gene expression. Although, there is much room to study in more detail the systems of equations and the numerical analysis we used, overall this method is a good addition to the traditional methods of studying how gene-expression patterns may develop.

Page 25: Dr. Rosemary Renaut,  renaut@asu  Director, Computational Biosciences  asu/compbiosci

Gegenbauer High Resolution Reconstruction of Magnetic Resonance Imaging

Jim Estipona and Prasanna VelamuruAdvisors: Rick Archibald and Rosemary Renaut, Arizona State

University

A variety of image artifacts are routinely observed on MRI images. We concentrate on the Gibbs Ringing that manifests itself as bright or dark rings seen at the borders of abrupt intensity change on the images. Gegenbauer High Resolution Reconstruction method has been previously shown to eliminate the undesirable ringing at the jump discontinuities in MRI. Prior work concentrated on applying this reconstruction method on frequency data obtained from reconstructed images and not on raw K-space data obtained from the MRI scanning machine. Our project work concentrates on using the raw k-space data for reconstruction and aims at comparing the reconstructed images obtained to those reconstructed from other commonly used reconstruction methods.

Page 26: Dr. Rosemary Renaut,  renaut@asu  Director, Computational Biosciences  asu/compbiosci

Robust Clustering of PET DataPrasanna Velamuru

Advisors: Rosemary Renaut and Hongbin Guo, Arizona State University

Clustering has recently been demonstrated to be an important preprocessing step prior to parametric estimation from dynamic PET images. Clustering, as a form of segmentation, is useful in improving the accuracy of voxel level quantification in PET images. Classical clustering algorithms such as hierarchical clustering and K-means clustering can be applied to dynamic PET data using an appropriate weighting technique. New variants of hierarchical clustering with different preprocessing criteria were developed by Dr. Guo recently. Our research focus is to validate these different algorithms with respect to their efficiency and accuracy. Different inter and intra cluster measures and statistical tests are considered to assess the quality of the different cluster results.

Page 27: Dr. Rosemary Renaut,  renaut@asu  Director, Computational Biosciences  asu/compbiosci

Regularized Total Least Squares in a Support Vector Machine

Sting Chen, Beryl Liu and Carol BarnerAdvisor: Rosemary Renaut, Arizona State University

The goal was learn about Support Vector Machines, and explore use of the Regularized Total Least Squares statistical method in Support Vector Machine classification of microarray data. SVM is a special case of a Neural Net algorithm. It eliminates the rows (patient cases) of data items least valuable in determining the hyperplane until only key data items are left. Then these points are weighted, with heavier weights being given to those points that are close to the hyperplane boundary. These points are called Support Vectors. The new reduced, weighted space is called Feature Space. Finally, the trained program is given test data to see how well it can classify new patients as sick or normal.

Page 28: Dr. Rosemary Renaut,  renaut@asu  Director, Computational Biosciences  asu/compbiosci

Software Development for Alzheimer's Disease Diagnosis and Research

Guadalupe AyalaAdvisor: Rosemary Renaut, Arizona State University

We are interested in using PET to image brain activity in patients with Alzheimer’s disease (AD). In AD studies, one way to measure disease progression is by measuring Flouro-Deoxy-Glucose (FDG), which is an analog of glucose uptake in the brain. Studies which determine a local cerebral metabolic rate (LCMR) of FDG uptake in a region of interest have proved successful in understanding AD progression. More specific information may be obtained by estimating the individual kinetic parameters which describe FDG metabolism. In particular, it is believed that the individual FDG kinetic parameters may be used for early detection of AD. We had developed an application that is used to estimate the kinetic parameters in order to be able to focus towards understanding the spatial distribution of kinetic parameters in AD, and as well as towards developing a precise measure for utilization in the early detection of AD. It uses dynamic PET data obtained from one-dimensional, two-dimensional or three-dimensional measurements. It also allows the user to compare results with respect to the computational and estimation methods, filters, constraints, and input sources chosen by the user. Comparing the results could help find out what are the optimal estimation methods, what are the constraints or what is the best filtering technique that provides optimal results. Results could be compared with expected results according to theoretical information, and an educated decision can be made on what are the optimal computational methods to use for every situation.

Figures:

(top) sample input images

(bottom) user interface

Page 29: Dr. Rosemary Renaut,  renaut@asu  Director, Computational Biosciences  asu/compbiosci

BioNavigation: Selecting Resources to Evaluate Scientific Queries

Kaushal ParekhAdvisor: Dr. Zoé Lacroix, Arizona State University

Answering biological queries involves the navigation of numerous richly interconnected scientific data sources. The BioNavigation system supports the scientist in exploring these sources and paths. Scientific queries can be posed at the conceptual level rather than being restricted to particular data sources. The BioNavigation interface provides a scientist with information about the available data sources and the scientific classes they represent. The user can graphically create a navigational query. BioNavigation will evaluate the query and will return a list of suitable paths through the data sources identified by the ESearch algorithm. ESearch searches a graph for paths satisfying a regular expression and ranks them using benefit and cost metrics. BioNavigation thus complements the traditional mediation approach and provides scientists with much needed guidance in selecting data sources and navigational paths.

Page 30: Dr. Rosemary Renaut,  renaut@asu  Director, Computational Biosciences  asu/compbiosci

Evolution of Reaction Center in Photosynthetic Organisms: Conserved sequences in Photosystems

Sumedha GholbaAdvisor: Robert Blankenship, Arizona State University

The study of reaction center proteins from both the photosystems and the primordial reaction centers from bacteria reveal the conservation of certain amino acids. The multiple sequence alignment and phylogenetic trees created from the proteins show high degree of conserved regions in photosystem-II and bacterial reaction center-II, implying common genealogy. Also, the similarity between photosystem-I heterodimers and reaction center I homodimer proteins, indicate them having a single precursor. It is seen that even though L-M and D1-D2 show similar evolution with gene duplication, L-M proteins show step-by-step diversification whereas the other branch bifurcates into D1s and D2s just at the end. The reaction center I homodimer is placed nearly at the center between the photosystem I and II portions of the tree, suggesting it to be an ancestral type of reaction center. The structural alignment of these proteins depicts five well aligned -carbon helices. Their sequences show good amount of similarity in the hydrophobic domains forming the transmembrane helices, which are the main functional regions.

Figures:(top) phylogenetic tree(bottom left) structural alignment(bottom right) hydropathy plots

Page 31: Dr. Rosemary Renaut,  renaut@asu  Director, Computational Biosciences  asu/compbiosci

Otolith Aging and AnalysisWilliam T. Stewart

Advisors

Otoliths, also known as earstones are paired calcified structures used for balance and hearing in teleost fish. An otolith is acellular and metabolically inert providing biologists with a record of exposure to both the temperature and composition of the ambient water. Otoliths provide an abundance of information ranging from temperature history, detection of anadromy, determination of migration pathways, stock identification, use as a natural tag, and most importantly age validation. Growth rings (annuli) on the otolith record the age and growth of a fish from birth to death. With the use of Matlab the goal of this project is to design a program that uses digital otolith images to semi-automate the aging process.  There are three main components to this task.

Dr. Rosemary Renaut Dr. Paul Marsh

Arizona State University

Scott Byan Kirk Young Marianne MedingArizona Game and Fish Department

Page 32: Dr. Rosemary Renaut,  renaut@asu  Director, Computational Biosciences  asu/compbiosci

Sequencing a Microbial GenomeMaulik Shah

Although many genomes are available for download today, the underlying technologies should not be taken for granted. By using shotgun sequencing techniques and a gauntlet of informatics, we are able to produce high-quality DNA sequence. We will first look at some of the robotics and chemistries of preparing DNA as samples for the sequencing instruments. Then we will look at the series of applications used in taking raw data signals, converting them to sequence and then finally assembling the data into a single genome. Highlighted will be some of the techniques used to speed the informatics processes as well as some of the challenges that informatics faces in processing data and assembling the genome.

AdvisorsDr. Jeffrey Touchman Dr. Rosemary Renaut Dr. Phillip Stafford

Page 33: Dr. Rosemary Renaut,  renaut@asu  Director, Computational Biosciences  asu/compbiosci

The “Tree-of-Life” is a national and international project to collect information about the origin, evolution, and diversity of organisms, with the goal of producing a tree of all life on Earth (Pennisi, 2003). The obstacles to achieving this goal are many. From questions related to the kinds and number of data to be used, to building that phylogeny, to the methodological and computational resources required to analyze the massive amounts of data expected to be necessary to bring this to fruition. The goal of this project is to obtain a Supertree for the plant family Fabaceae utilizing phylogenetic trees found in previously published studies.

Supertree Analysis of the Plant Family FabaceaeTiffany J. Morris

Advisor Martin F. WojciechowskiSchool of Life Sciences, Arizona State University

Page 34: Dr. Rosemary Renaut,  renaut@asu  Director, Computational Biosciences  asu/compbiosci

PROTEIN INTERACTION MAPPING: USE OF OSPREY TO MAP SURVIVAL OF MOTOR NEURON PROTEIN INTERACTIONS

by Margaret BarnhartAdvisor: Dr. Ron Nieman

Spinal Muscular Atrophy is one of the leading genetic causes of death in infants. In humans, the disease state is characterized by homozygous deletion of the telomeric copy of the survival of motor neuron gene (SMN1). The centromeric copy, SMN2, rescues lethality by producing a small amount of full-length SMN protein as its minor product. The SMN gene was first characterized in 1995, and research efforts to describe the molecular mechanisms of SMN protein in the cell have since revealed a highly complex set of functions and interactions for SMN. The large amount of protein-protein interaction data collected for SMN exceeds the limitations imposed by current methods of interaction

visualization. Osprey allows a network representation of protein-protein interactions and has been used to describe the recorded sets of interactions of SMN. This method of interaction visualization allows relationships to be drawn between the functions of SMN and analogous proteins, clustering of interactions based on level of interaction or function, and ultimately, the derivation of clues to the critical function of SMN.

Page 35: Dr. Rosemary Renaut,  renaut@asu  Director, Computational Biosciences  asu/compbiosci

Internships: Requirements

11/28/200535 USDA

1. Internship is at least 400 hours , possible full time summer2. Student must write a project report with required format3. Student presents report to committee in oral exam4. International students can work off campus using EIP program5. Encourage students to seek projects outside AZ

Page 36: Dr. Rosemary Renaut,  renaut@asu  Director, Computational Biosciences  asu/compbiosci

More Information: please contact [email protected]

More information on projects: www.asu.edu/compbiosci

Do you have projects?Internships?

Jobs?

11/28/200536 USDA