pisa, 20 ottobre 2014 area della ricerca del cnr pisa, 20 ottobre 2014 – area della ricerca del...

17
Pisa, 20 Ottobre 2014 Area della Ricerca del CNR Bioinformatiha 3 Terza giornata Toscana di Bioinformatica e Systems Biology Book of Abstracts Session 1: ALGORITHMS, DATABASES AND TOOLS (9.30-11.10) 9.30-9.50 Speaker: Fabio Marroni Title: X-scan. A tool for the identification of mosaic structural variants . Authors: Fabio Marroni (1,2) , Davide Scaglione (3,4) , Sara Pinosio (2,5), Mara Miculan (1,2) , Gabriele di Gaspero (1,2) , and Michele Morgante (1,2). Affiliations: (1) Dipartimento di Scienze Agrarie e Ambientali, Universita` di Udine. (2) Istituto di Genomica Applicata (IGA Udine. (3) IGA Technology services, Udine. (4) Parco Tecnologico Padano. (5 )Institute of Biosciences and Bioresources, National Research Council, Firenze. Abstract: With increasing utilization of next-generation sequencing (NGS) approaches for the investigation of genome structure a big focus has been gained in the detection of structural variants (SVs). Several algorithms have been developed so far with the aim of identifying SVs. However, almost all available tools are based on the assumption of analyzing a diploid individual with only three possible genotypic states for SVs: non-carrier, heterozygous carrier, homozygous carrier. This makes unpractical to detect mosaic SVs, such as those that may be present in tumor tissue or in multi-layer plan tissues. We present χ-scan, an approach and software package specifically designed for the identification of mosaic variants by comparing two populations of cells derived by the same meiotic event. Mosaic structural variants (SVs) cause unbalances in allele frequencies in the mutated sample compared to the wild-type sample. χ-scan uses SNP genotypes obtained by NGS to detect unbalanced presence of one allele in one sample compared to the other, and thus presence of structural variations. We used χ-scan to confirm one deletion in chromosome 2 of the Vitis vinifera cultivar Pinot gris, and one in the same chromosome of Pinot blanc. We further tested performance of χ-scan on

Upload: others

Post on 21-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Pisa, 20 Ottobre 2014 Area della Ricerca del CNR Pisa, 20 Ottobre 2014 – Area della Ricerca del CNR Bioinformatiha 3 Terza giornata Toscana di Bioinformatica e Systems Biology Book

Pisa, 20 Ottobre 2014 – Area della Ricerca del CNR

Bioinformatiha 3

Terza giornata Toscana di Bioinformatica e Systems Biology

Book of Abstracts

Session 1: ALGORITHMS, DATABASES AND TOOLS (9.30-11.10)

9.30-9.50 Speaker: Fabio Marroni Title: X-scan. A tool for the identification of mosaic structural variants. Authors: Fabio Marroni (1,2) , Davide Scaglione (3,4) , Sara Pinosio (2,5), Mara Miculan (1,2) , Gabriele di Gaspero (1,2) , and Michele Morgante (1,2). Affiliations: (1) Dipartimento di Scienze Agrarie e Ambientali, Universita` di Udine. (2) Istituto di Genomica Applicata (IGA Udine. (3) IGA Technology services, Udine. (4) Parco Tecnologico Padano. (5 )Institute of Biosciences and Bioresources, National Research Council, Firenze. Abstract: With increasing utilization of next-generation sequencing (NGS) approaches for the investigation of genome structure a big focus has been gained in the detection of structural variants (SVs). Several algorithms have been developed so far with the aim of identifying SVs. However, almost all available tools are based on the assumption of analyzing a diploid individual with only three possible genotypic states for SVs: non-carrier, heterozygous carrier, homozygous carrier. This makes unpractical to detect mosaic SVs, such as those that may be present in tumor tissue or in multi-layer plan tissues. We present χ-scan, an approach and software package specifically designed for the identification of mosaic variants by comparing two populations of cells derived by the same meiotic event. Mosaic structural variants (SVs) cause unbalances in allele frequencies in the mutated sample compared to the wild-type sample. χ-scan uses SNP genotypes obtained by NGS to detect unbalanced presence of one allele in one sample compared to the other, and thus presence of structural variations.

We used χ-scan to confirm one deletion in chromosome 2 of the Vitis vinifera cultivar Pinot gris, and one in the same chromosome of Pinot blanc. We further tested performance of χ-scan on

Page 2: Pisa, 20 Ottobre 2014 Area della Ricerca del CNR Pisa, 20 Ottobre 2014 – Area della Ricerca del CNR Bioinformatiha 3 Terza giornata Toscana di Bioinformatica e Systems Biology Book

simulated reads. χ-scan outperformed software packages commonly used for the identification of germline SVs. Finally, we used χ-scan to detect SVs between tumor and normal tissue by using exome sequencing. The software is freely available at https://bitbucket.org/dscaglione/xscan.

9.50-10.10 Speaker: Paolo Frasconi Title: Large-Scale Automated Identification of Mouse Brain Cells in Confocal Light Sheet Microscopy Images Authors: Paolo Frasconi, Ludovico Silvestri, Paolo Soda, Roberto Cortini, Francesco S. Pavone, and Giulio Iannello Affiliations: University of Firenze Abstract: Motivation: Recently, confocal light sheet microscopy has enabled high-throughput acquisition of whole mouse brain 3D images at the micron scale resolution. This poses the unprecedented challenge of creating accurate digital maps of the whole set of cells in a brain. Results: We introduce a fast and scalable algorithm for fully automated cell identification. We obtained the whole digital map of Purkinje cells in mouse cerebellum consisting of a set of 3D cell center coordinates. The method is very accurate and we estimated an F1 measure of 0.96 using 56 representative volumes, totaling 1.09 GVoxel and containing 4,138 manually annotated soma centers. Availability and implementation: Source code and its documentation are available at http://bcfind.dinfo.unifi.it/. The whole pipeline of methods is implemented in Python and makes use of Pylearn2 (Goodfellow et al., 2013) and modified parts of Scikit- learn (Pedregosa et al., 2011). Brain images are available on request. Contact: [email protected] Supplementary information: Coordinates of predicted soma centers of a whole mouse cerebellum and additional figures. 10.10-10.30 Speaker: Sergiy Ancherbak Title: Time series analysis of gene expression data. Authors: Sergiy Ancherbak (1), Ercan E Kuruoglu (1,2) , Martin Vingron(2) Affiliations: (1) ISTI-CNR, Pisa. (2)Max Planck Institute for Molecular Genetics, Berlin, Germany Abstract: Most current methods used for gene regulatory network identification are dedicated to inference of steady state networks which are prevalent over all time instants. However, gene interactions evolve over time. Information about the gene interactions in different stages of a life cycle is of high importance for biology. A large amount of gene expression data measured at a single time instant can be found in the literature. Some sources present experimental data on the evolution of temporal sequence datasets for gene expression during the yeast cell cycle and the life cycle of Drosophila Melanogaster. However, for most of them only one to ten time series were measured for each gene. This lack of experimental information limits the success of inference on network topology.

Page 3: Pisa, 20 Ottobre 2014 Area della Ricerca del CNR Pisa, 20 Ottobre 2014 – Area della Ricerca del CNR Bioinformatiha 3 Terza giornata Toscana di Bioinformatica e Systems Biology Book

In the statistical graphical models literature one can find a number of methods for studying the network structure while the study of time varying networks is rather recent. A recently proposed sequential Monte Carlo method based on particle filtering provides a powerful tool for dynamic network inference. In this work, the PF technique was applied for gene expression data tracking and for estimation of their relationships in a network. The data used are time evolution of synthetic data proposed by the DREAM4 challenge generated from known network topologies obtained from transcriptional regulatory networks of E. coli and S. cerevisiae. The use of particle filter helped us to follow with high accuracy all changes in experimental gene expression data undergone due to different external perturbations. Moreover, gene expression temporal sequence data were utilized for learning (approximating) an “average” gene network structure. The number of reconstructed connections is higher than in the reference network, nonetheless, the main connections are compatible. Our future goals are to investigate nonstationary dynamic networks with the aim of tracking the topological changes in genetic networks.

10.30-10.50 Speaker: Alessandro Cellerino

Title: RNA sequencing provides new insights in the mechanisms controlling brain aging

Authors: Mario Baumgart, Marco Groth, Steffen Priebe, Roberto Ripa, Aurora Savino, Luca Dolfi, Giovanna Testa, , Michela Ori, Reinhard Guthke, Eva Terzibasi Tozzini, Matthias Platzer, Alessandro Cellerino.(1)

Affiliations: (1) SNS-Pisa Abstract:

We study aging in the fish Nothobranchius furzeri, the vertebrate with the shortest captive lifespan (6-7 months). N. furzeri shows reduced learning performances, gliosis and impaired adult neurogenesis, making it a convenient model for brain aging. We used RNA-seq to quantify whole-genome transcript regulation during brain aging (in total over 200 samples). Several life-extension interventions are known and it is hypothesized that these treatments act by inducing an adaptive stress response (hormesis): we tested the effects of very low doses of ROS (induced by rotenone) on the brain transcriptome. Our main results are:

1) Protein-coding RNAs are regulated according to different profiles that correspond to different biological functions: e.g. rapid decay is associated to neurogenesis genes; gradual decay is associated to axonal and synaptic genes, whereas linear increase with ribosomal, lysosome and complement activation genes.

2)A substantial overlap exists between the patterns of gene regulation we detected in N. furzeri and published datasets on human brain aging.

3) Aging of N. furzeri brain is associated with prominent regulation of chromatin remodeling genes and increased activity of the polycomb repressive complex.

Page 4: Pisa, 20 Ottobre 2014 Area della Ricerca del CNR Pisa, 20 Ottobre 2014 – Area della Ricerca del CNR Bioinformatiha 3 Terza giornata Toscana di Bioinformatica e Systems Biology Book

4)Treatment of old fish with low-dose rotenone partially reverts the gene expression changes induced by aging, thereby provide a direct proof of the hormesis hypothesis.

We are currently using transgenesis in zebrafish and N. furzeri to test the in vivo function of those genes that were identified as putative central nodes in gene regulatory networks.

10.50-11.10 Speaker: Giacomo Ceccarelli Title: Robustness criterion for choosing metrics for networks. Authors:Giacomo Ceccarelli(1), Sandro Cellerino (1), Angelo Di Garbo(1) Affiliations: (1) IBF-CNR, Pisa – (2) SNS-Pisa Abstract: Starting from RNA-seq data obtained from zebrafish, we construct a gene co-expression network following the standard analysis based on Pearson correlation and hard-thresholding. We then produce technical pseudo-replicas through Poissonian noise and we study the impact of these fluctuations on typical properties of the network structure, such as correlation and order distributions and hub genes. Session 2: MODELING (11.40-13.00) 11.40-12.00 Speaker: Ercan Kuruoglu Title: Power-law Renewal Processes for Modelling Cancer Mutations Authors: Jose M Muino(2) , Ercan E Kuruoglu(1,2), Peter F Arndt(2) Affiliations: (1) ISTI-CNR, Pisa . (2)Max Planck Institute for Molecular Genetics, Berlin, Germany. Abstract: It is a common assumption in various work on modelling mutation dynamics that mutations follow a Poisson dynamics; that is, in a given portion of genome the number of mutations follow a Poisson law. Equivalently, the distance between to mutations follows an exponential distribution. This can actually be verified when Human and Chimpanzee genomes are compared. It is of interest to see if this law generalizes also to somatic mutations which cause cancer. We have analysed data on various cancer genome to find the interoccurence time (space) distributions for different types of cancer. It has been found that specific cancer types show a power-law in interoccurrence distances, instead of the expected exponential distribution dictated with the Poisson assumption. Cancer genomes exhibiting power-law interoccurrence distances were enriched in cancer types where the main mutational process is described to be the activity of the APOBEC protein family, which produces a particular pattern of mutations called Kataegis. Therefore, the observation of a power-law in interoccurence distances could be used to identify cancer genomes with Kataegis. We present our analytical approaches to obtain a unifying model for such dynamics.

Page 5: Pisa, 20 Ottobre 2014 Area della Ricerca del CNR Pisa, 20 Ottobre 2014 – Area della Ricerca del CNR Bioinformatiha 3 Terza giornata Toscana di Bioinformatica e Systems Biology Book

12.00-12.20 Speaker: Pasquale Bove Title: Using P Systems for Modelling the Dynamics of Hybridogenetic Water Frog Populations Authors: Pasquale Bove, Roberto Barbuti Affiliation: Dipartimento di Informatica, Universita' di Pisa Abstract: Some species of European water frogs originated from hybridization between different species. Such hybrid populations have a particular reproduction system called hybridogenesis. In this paper we consider the two species Pelophylax ridibundus and Pelophylax lessonae, and their hybrids Pelophylax esculentus. P. lessonae and P. esculentus form stable complexes (L-E complexes) in which P. esculentus are hemiclonal. In L-E complexes all the transmitted genomes by P. esculentus carry deleterious mutations which are lethal in homozygosity. We analyze, by means of a P System model, L-E complexes. The results of simulations show that, by eliminating deleterious mutations, L-E complexes collapse. In addition, simulations show that particular female preferences can contribute to the diffusion of deleterious mutations among all P. esculentus frogs. Finally, simulations show how L-E complexes react to the introduction of translocated P. ridibundus. The model allows us to conclude: (i) deleterious mutations (combined with sexual preferences) strongly contribute to the stability of L-E complexes; (ii) female sexual choice can contribute to the diffusion of deleterious mutations; and (iii) the introduction of P. ridibundus can destabilize L-E complexes. 12.20-12.40 Speaker: Marco Fondi Title: Microbial metabolism at the system level:: network modelling and multi-omics integration of the Antarctic bacterium Pseudoalteromonas haloplanktis TAC125. Marco Fondi (1,2) , Isabel Maida (1), Elena Perrin (1), Alessandra Mellera (1,2), Stefano Mocali (3) Ermenegilda Parrilli (4), Maria Luisa Tutino (4) Pietro Liò (5), and Renato Fani (1,2). (1) Laboratory of Microbial and Molecular Evolution, Department of Biology, University of Florence,(2 ) ComBo, Florence Computational Biology Group, University of Florence, (3) Consiglio per la Ricerca e la Sperimentazione in Agricoltura, Centro di Ricerca per l’Agrobiologia e la Pedologia (CRA-ABP), Firenze, Italy. (4 )Department of Chemical Sciences, University of Naples Federico II. (5) Computer Laboratory, Cambridge University, Cambridge, UK. Abstract: Metabolic modelling refers to a large plethora of in silico approaches that can be adopted to quantitatively simulate chemical reactions fluxes within the cell, including metabolic adjustments in response to external perturbations. In recent years the application of such computational technique to in depth investigate microbial metabolism has spread tremendously in microbiological research. Indeed, genome scale models have revealed powerful tools to study a vast array of biological systems and applications in industrial and medical biotechnology, including biofuel generation, food production, and drug development. Well designed metabolic models can help predict the system-wide effect of genetic and environmental perturbations on an organism, and hence drive metabolic engineering experiments.

An even more realistic picture of the metabolic traits of a given organism can be obtained by exploiting high-throughput data from innovative technologies such as transcriptomics, fluxomics,

Page 6: Pisa, 20 Ottobre 2014 Area della Ricerca del CNR Pisa, 20 Ottobre 2014 – Area della Ricerca del CNR Bioinformatiha 3 Terza giornata Toscana di Bioinformatica e Systems Biology Book

proteomics. Such diverse data types can be mapped onto metabolic models and, in this way, specific functional states derived. By exploiting gene expression data, for example, genome scale metabolic networks can be turned into condition specific models in which only those reactions corresponding to expressed genes will be present and active. Such approach has been shown to provide a realistic picture of the actual metabolic state of a microbial cell and to lead to a deeper understanding of its basic functioning, including the consequences of environmental perturbations such as gene knock-outs and/or growth medium manipulation.

The Antarctic strain Pseudoalteromonas haloplanktis TAC125 is one of the model organisms of cold-adapted bacteria and is currently exploited as a new alternative expression host for numerous biotechnological applications.

Here, we investigated several metabolic features of this strain through in silico modelling and functional integration of –omics data. A genome-scale metabolic model of P. haloplanktis TAC125 was reconstructed, encompassing information on 721 genes, 1133 metabolites and 1322 reactions. The predictive potential of this model was validated against a set of experimentally determined growth rates and a large dataset of growth phenotypic data. Furthermore, evidence synthesis from proteomics, phenomics, physiology and metabolic modeling data revealed possible drawbacks of cold-dependent changes in gene expression on the overall metabolic network of P. haloplanktis TAC125. These included, for example, variations in its central metabolism, amino acids degradation and fatty acids biosynthesis.

The genome scale metabolic model described here is the first one reconstructed so far for an Antarctic microbial strain. It allowed a system-level investigation of variations in cellular metabolic fluxes following a temperature downshift. It represents a valuable platform for further investigations on P. haloplanktis TAC125 cellular functional states and for the design of more focused strategies for its possible biotechnological exploitation.

12.40-13.00 Speaker: Ettore Luzi Title: The Gene Regulatory Network (GRN) Menin-microRNA-26a-SMAD1 is involved in the osteogenic differentiation of human adipose tissue-derived stem cells (hADSCs). Authors: Ettore Luzi, Maria Luisa Brandi Affiliation: Laboratory of Neuroendocrine Complex Diseases, Center on Endocrine Hereditary Tumors, AOUC, Department of Surgery and Translational Medicine, University of Firenze Abstract:

A remarkable feature of developmental and physiological processes is that they are highly reproducible, even under conditions of genetic and environmental variability. Biological robustness (Waddington's canalization) is a property inherent to all living organisms that enables stability. The concept of biological robustness refers to the ability of a biological system to maintain its functions despite endogenous or exogenous perturbations. It should be stressed that robustness not only implies the capacity of a system to buffer perturbations to maintain phenotypic stability, but also contributes to achieve predictable and reproducible responses, allowing phenotypic switches to take place efficiently regardless of perturbations. When considering biological systems, noise or variation can have different sources, including stochastic changes in gene expression (i.e., in transcription, translation, and RNA or protein degradation), as

Page 7: Pisa, 20 Ottobre 2014 Area della Ricerca del CNR Pisa, 20 Ottobre 2014 – Area della Ricerca del CNR Bioinformatiha 3 Terza giornata Toscana di Bioinformatica e Systems Biology Book

well as environmental fluctuations. Among the mechanisms that explain robustness of biological networks is the presence of feedback and feedforward loops. Menin is the product of the MEN1 oncosuppressor gene, responsible for multiple endocrine neoplasia type 1 syndrome. Menin functions as a general regulator of transcription (TF). Transcription factors (TFs) and microRNAs (miRs) are essential regulators of gene expression. Since expression of a miRNA may be regulated by a TF , TF and miRNA may regulate each other to form feed-back loops, or alternatively, both TF and miRNA may regulate their target genes and form feed-forward loops (FFLs). Menin expression modulates mesenchymal cell commitment to the myogenic or osteogenic lineages. The microRNA 26a (miR-26a) modulates the expression of SMAD1 protein during the osteoblastic differentiation of human adipose tissue-derived stem cells (hADSCs). Our study shows that Menin-microRNA 26a-SMAD1 participate in an "incoherent feedforward loop" thus generating a Gene Regulatory Network that play a pivotal role during the hADSCs osteogenesis. Session 3: GENOMICS (15.00-15.40) 15.00-15.20 Speaker: Claudia Caudai Title: A multiscale approach for 3D chromatin structure estimation from Chromosome Conformation Capture (3C) data Authors: Claudia Caudai (1), Emanuele Salerno (1), Monica Zoppè (2), Anna Tonazzini (1), Affiliation: (1) CNR- ISTI, Pisa Italy, (2) CNR- IFC, Pisa Italy Abstract: We present a method to reconstruct a set of plausible chromatin configurations from contact data obtained through Chromosome Conformation Capture techniques. We do not look for a unique configuration because the experimental data are not derived from a single cell, but from millions of cells. As opposed to most popular methods, we do not translate contact frequencies deterministically into distances, since this often produces structures that are not consistent with the Euclidean geometry. We build a data-fit function directly from the pairs of loci with the largest contact frequencies, assuming that they are likely to be in contact, and neglecting the pairs with very low or zero contact frequencies, as we cannot infer anything about their mutual distances. To obtain configurations consistent with both the data and the available biological knowledge, we introduce a chromatin model that can be suitably constrained. Our algorithm samples the solution space generated by the data-fit function through a Monte Carlo method. At each step, the subchains are perturbed by using quaternions. To validate the new method, we applied it to real Hi-C data available online (Lieberman-Aiden et al., 2009). In particular, we analyzed the contact frequency data from the long arm of the human Chromosome 1 with a maximum resolution of 100 kb, obtaining a number of output configurations. For each configuration, the first division of the overall fiber included 25 topological domains (Dixon et al., 2012). The reconstructed structures were then assumed as single elements of a new chain (with nonuniform resolution), whose mutual interactions were estimated by the same algorithm. Highly expressed domains are known to be much less packed than the domains poor in genes or with low transcriptional activity (Versteeg et al., 2003). We checked this property in our reconstructions computing the mean square distances (MSD) between pairs of loci in each selected stretch, as functions of the corresponding genomic distances (Mateos-Langerak et al., 2009).

Page 8: Pisa, 20 Ottobre 2014 Area della Ricerca del CNR Pisa, 20 Ottobre 2014 – Area della Ricerca del CNR Bioinformatiha 3 Terza giornata Toscana di Bioinformatica e Systems Biology Book

15.20-15.40 Speaker: Giulia Menconi Title: Not all yeasts are created equal Authors: Giulia Menconi (1), Mariafrancesca Zini (1), Nadia Pisanti (1), Roberto Grossi (1), Gianni Liti (2), Roberto Marangoni (3,4) Affiliations: (1) Dipartimento di Informatica, Università di Pisa, Italia (2) Institute for Research on Cancer and Aging, Nice, France (3) Dipartimento di Biologia, Università di Pisa, Italia (4) Istituto di Biofisica, CNR, Pisa, Italia Abstract: This work concerns Mobilomics, that is the branch of genomics devoted to the identification of mobile genetic elements in an organism, as well as the study of the interaction of such elements with the host organism and their evolutive fate. The approach is not consensus-based, but alignment-based. We propose a multiple genome comparison which is not organism-oriented.First, we briefly describe the algorithm REGENDER and the pipeline for the multiple genome comparison. Then, we review the results on intraspecies' investigation on baker's yeast S. cerevisiae strains and report on a new collection of strains of another yeast species (Saccharomyces paradoxus). Finally, we show some preliminary experiments in interspecies mobile genomic elements detection: direct comparison of predicted mobile genetic elements of S. cerevisiae and S. paradoxus, as well as first steps in mobile genome identification on a collection of 6 yeast species, recently sequenced.

Session 4: PROTEINS (15.40-16.20) 15.40-16.00 Speaker: Marco Pellegrini Title: Protein complex prediction for large protein protein interaction networks Authors: Marco Pellegrini, Miriam Baglioni, Filippo Geraci Affiliation: IIT-CNR Abstract: The study of protein interactions is at the core of many attempts to understand the inner working of healthy cell functions, as well as pathological malfunctioning due to diseases, at the system-wide level. High-throughput experiments aimed at detecting protein interactions produce vast amount of data which is used to build protein-protein interaction (PPI) networks. PPI networks are then used for ``in silico" data mining and knowledge discovery activities. Complexes of physically interacting proteins are one of the basic mechanisms through which proteins cooperate towards performing their function, thus the prediction of protein complexes (PC) within PPI networks has been a focus of attention in recent years. However, as PPI evolve over time through accumulation of high quality data, PC prediction algorithms need to evolve and adapt to the new emerging features of PPI networks that, by merging data from several experiments, are much larger than those available just a few years ago. We have conducted a study on the emerging features that characterize the signature of protein complexes within the PPI networks available at the present time, which are often between one and two orders of magnitude larger (in terms of number of validated interactions) than those available just a few years ago. We have found that in these new PPI, protein complexes are very dense (for yeast almost all PC are more than 50% dense). Moreover they behave similarly to ego-networks, that is, one node in the PC is a direct

Page 9: Pisa, 20 Ottobre 2014 Area della Ricerca del CNR Pisa, 20 Ottobre 2014 – Area della Ricerca del CNR Bioinformatiha 3 Terza giornata Toscana di Bioinformatica e Systems Biology Book

neighbor for a large fraction of the nodes of the PC. These two features are exploited in the new algorithm Core&Peel to predict PC in large PPI. Experiments with five large PPI network (for yeast and homo sapiens) indicate that Core&Peel is competitive in quality with state-of-the-art complex prediction methods. Core&Peel is also quite scalable thus likely to perform well even with even larger multi-species PPI. 16.00-16.20 Speaker: Marco Bruttini Title: Antigenic fingerprinting of Bexsero vaccine components by Phage Display and Protein Microarray technologies Authors: Marco Bruttini1,2, Erika Bartolini2, Erica Borgogni2, Manuele Martinelli2, Maria Giuliani2, Cecilia Brettoni2, Stefano Bonacci2, Sara Iozzi2, Roberto Petracca2, Alessia Biolchi2, Laura Santini2, Barbara Galli2, Alessandro Muzzi2, Sara Marchi2, Claudio Donati4, Giulia Torricelli2, Silvia Guidotti2, Stefano Censini2, John Telford2, Giuseppe Del Giudice2, Giuseppe Teti3, Franco Felici3, Marzia Giuliani2, Vega Masignani2, Mariagrazia Pizza2, Flora Castellino2 and Domenico Maione2. Affiliation: 1Università degli Studi di Siena, 2Novartis V&D, 3Università degli Studi di Messina, 4Fondazione Edmund Mach, San Michele all'Adige (TN) Abstract: Serogroup B meningococcus (MenB) is a leading cause of meningitis and sepsis in developed countries. Protection from invasive disease is mediated by antibodies inducing complement-mediated bacterial killing or phagocytosis. Novartis Vaccines and Diagnostics (NVD) developed Bexsero®, a multi-component protein-based vaccine against MenB. Although antibodies are a major component of protective immune response against serogroup B meningococcus in humans, there is little knowledge about the specific epitope that contribute to bacterial clearance. In this study, Phage Display and Protein Microarray technologies have been used to identify immunogenic regions of the three major vaccine protein components recognized by sera of subjects from different age groups and vaccinated with Bexsero. Bioinformatic tools have been essential to analyze phage sequences and design protein chip layout. Moreover, Protein Microarray allowed us to screen individual instead of pooled sera and then to try to correlate the epitope recognition profiles to bactericidal titers. Different algorithms were used for cluster investigation. In particular, for NHBA we observed recognition of the N-terminal protein domain in some subjects and this seems to correlate with functional data. By using this approach, it is possible to follow the response in samples derived from clinical studies and to test how different formulations, vaccine schedules and age of recipients can influence the pattern of recognition of the MenB antigens in different vaccine formulations. This work describes a template procedure that can even be followed to characterize the immune response against other protein based vaccines.

Page 10: Pisa, 20 Ottobre 2014 Area della Ricerca del CNR Pisa, 20 Ottobre 2014 – Area della Ricerca del CNR Bioinformatiha 3 Terza giornata Toscana di Bioinformatica e Systems Biology Book

Session 5: SYNTHETIC BIOLOGY (16.20-16.40) 16.20-16.40 Speaker: Leandro Gammuto Title: Extruding minimal cells. Authors: Alessio Fanti (1), Leandro Gammuto (1), Fabio Mavelli (2), Pier Luigi Luisi (3), Pasquale Stano (3), Roberto Marangoni (1, 4) Affiliations: (1) Dip. Biologia- UniPI, (2) Dip. Chimica, UniBA, (3) Dip. Biologia, UniRoma3, (4) IBF-CNR, Pisa. Abstract: This talk is focused on the extrusion process, by means of which we transform a giant liposome (diameter= 1500-3000 nm) into several “small” liposomes (diameter= 400-500 nm). In particular we are aimed at studying the solute partition from the giant vesicle to the extruded ones. This is important since anomalous entrapment phenomena have been reported for spontaneously formed small liposomes, and these phenomena have been tentatively explained through complex interaction between membrane and solutes during the membrane formation. Here we want to investigate whether the re-distribution of solutes during the extrusion process gives rise to anomalous entrapments or not. We employed both experimental and theoretical approaches. Experimentally, we generated GUVs (Giant Unilamellar Vesicles) populations with four different fluorescent solutes entrapped. Then we extruded these populations to obtain 4 VETs (Vesicles by Extrusion) populations. Thanks to the fluorescent signal we can infer the solute concentration inside each single vesicle. From a theoretical point of view, we first generated simulated empty GUVs populations by taking the experimental size distributions. Then we fill these virtual vesicles with a Poisson process, the average value of which is represented by the concentration of the chemical species in the bulk solution. In the next step, we simulated the extrusion process by partitioning the solutes of a Giant to the several VETs, the size distribution of which is derived from that experimentally measured, while the solute filling is simulated via a Poisson process. This means that we simulated the whole experimental protocol under the hypothesis that all the steps are governed by standard stochastic processes with no anomalies. We finally compared the expected fluorescence signal obtained from the simulations to that recorded from the experimental measurements. On the basis of this comparison, we cannot reject the hypothesis that all the extrusion process is explainable by standard stochastic mechanisms. In other words, this study has revealed no anomalous mechanisms in solute partitioning during extrusion.

Page 11: Pisa, 20 Ottobre 2014 Area della Ricerca del CNR Pisa, 20 Ottobre 2014 – Area della Ricerca del CNR Bioinformatiha 3 Terza giornata Toscana di Bioinformatica e Systems Biology Book

Posters Authors : Fabrizio Pucci, Marianne Rooman Affiliation : Department of BioModeling, BioInformatics and BioProcesses, Université Libre de Bruxelles, Brussels, Belgium Title : "Full protein stability curve prediction by temperature-dependent statistical potentials" Abstract : The prediction and control of protein stability at different temperatures is a key goal in protein science. Unfortunately, it is still far from reach since not much is known about the temperature dependence of the amino acid interactions. In this work we go further into the protein stability investigation by building a method for predicting the full Gibbs-Heltmoltz stability curve of a given protein and thus how its folding free energy depends on the temperature. This mathematical function encodes all the thermodynamic parameters that characterize the folding transition and its knowledge is thus fundamental in the protein stability analysis. In summary, we used the formalism of the temperature-dependence statistical potentials to estimate the value of the folding free energy of a given protein of known structure at different temperatures. The stability curve was extracted from these energy values, using a simple extrapolation procedure and the subsequent optimization of some parameters. The method shows good performances when applied to a reference set of about fifty proteins with known stability curve. The standard deviation between the predicted and the experimental values, computed in cross validation, are equal to about 13 °C, 1 kcal/(mol °C) and 4 kcal/mol for the melting temperature, the change in heat capacity and the folding free energy at room temperature, respectively. As far as we know, this is the first method that is able to predict both the thermodynamic and thermal protein stability in a fast and accurate way on a large scale.

Authors Emanuele Bosi (1;2), Beatrice Donati (3;4;5), Marco Galardini (6), Sara Brunetti (7), Marie-France Sagot (3;4,)Pietro Li ò, Pierluigi Crescenzi (5), Renato Fani (1;2), Marco Fondi (1;2)

Affiliation: (1) ComBo, Florence Computational Biology group, Dep. of Biology, University of Florence (2) LEMM, Lab. of Microbial and Molecular Evolution Florence, Dep. of Biology, University of Florence, (3)INRIA Rhone-Alpes, Villeurbanne cedex, France, (4)Université de Lyon2 Villeurbanne , France (5) Università degli Studi di Firenze, Dipartimento di Ingegneria dell’Informazione; (6) EMBL-EBI - European Bioinformatics Institute, Cambridge, United Kingdom, (7) Dipartimento di Ingegneria dell’Informazione e Scienze Matematiche, University of Siena, (8)Computer Laboratory, University of Cambridge, CB3 0FD Cambridge, United Kingdom

Title: MeDuSa: a multi-draft based scaffolder

Completing the genome sequence of an organism is an important task in comparative, functional and structural genomics. This however remains a challenging issue from both a computational and an experimental point of view. Genome scaffolding (i.e. the process of ordering and orientating contigs) of de novo assemblies usually represents the first step in most genome finishing pipelines.

Page 12: Pisa, 20 Ottobre 2014 Area della Ricerca del CNR Pisa, 20 Ottobre 2014 – Area della Ricerca del CNR Bioinformatiha 3 Terza giornata Toscana di Bioinformatica e Systems Biology Book

We present MEDUSA (Multi-Draft based Scaffolder), a computational software for ordering and orientating contigs of de novo assembled genomes. MEDUSA exploits information obtained from a set of draft genomes from closely related organisms and modelled into a graph for analysis. As such, it does not require either the presence of a closed reference genome or the use of paired end reads for gap closure. This makes usability an interesting feature of the software. Moreover, our experiments show that MEDUSA is highly accurate and outperforms traditional scaffolders based on paired-end reads.

Author: Jacopo Acquarelli Dipartimento di Ingegneria dell'Informazione e Scienze Matematiche Università degli Studi di Siena

TTiittllee:: NNeetthheerrllaannddss BBrraaiinn BBaannkk DDaattaa AAnnaallyyssiiss wwiitthh NNoonnNNeeggaattiivvee MMaattrriixx FFaaccttoorriizzaattiioonn The current digitization level allows us to work with a large quantity of documents and, therefore, gives us the possibility of applying automatic classification tools to collect significant information

they contain. In the medical field, there are complex diseases, consisting of several subdiseases, which are not easily identifiable for the physician and that, sometimes, can be only detected after the patient’s death. Therefore, by applying an automatic classification method to a set of medical records (both clinical and pathological) we can expect to identify some regularities related to such

subdiseases able to guide early medical diagnosis. In particular, pathological data may help in classifying clinical data, to highlight their structural characteristics. To this aim, a clustering

algorithm, such as the “nonnegative matrix factorization” (NMF), can be used. NMF provides the creation of two matrices, whose product is the matrix of the document features. The two matrices describe, respectively, the degree of membership of each document to each cluster and the importance of each feature for each cluster. Therefore, we will be able to define those features that best characterize each cluster and make speculations about their meaning. Finally, adding constraints to the NMF method allow us to gain extra information from the pathological data, which can be transferred to the clinical data, so as to better define the clustering coherence.

Author; Niccolò Fontanelli Dipartimento di Ingegneria dell'Informazione e Scienze Matematiche Università degli Studi di Siena

MMaacchhiinnee lleeaarrnniinngg ttoooollss ffoorr tthhee ooppttiimmiizzaattiioonn ooff PPSSIICCOOVV,, aa ccoonnttaacctt mmaappss’’ pprreeddiiccttiioonn ssooffttwwaarree Proteins are biological molecules that perform a large number of functions within a living organism. They consist of an amino acid chain, the primary structure, whose organization univocally define the protein itself. When the protein assumes its native structure, amino acids, distant in the chain, form contacts which will have a fundamental role in defining its 3D structure. As the research has shown, there is a strong correlation between the shape of a protein and its

Page 13: Pisa, 20 Ottobre 2014 Area della Ricerca del CNR Pisa, 20 Ottobre 2014 – Area della Ricerca del CNR Bioinformatiha 3 Terza giornata Toscana di Bioinformatica e Systems Biology Book

functionality. Therefore, the exact knowledge of all contacts established among amino acids is equivalent to a complete knowledge of the shape, and therefore of the functions, of a protein. Nowadays, however, there are no tools capable of predicting the entire contact map of a protein. The PSICOV software is one of the most efficient in this area and, in the present work, we illustrate how its performance should be improved by considering spatial correlations among contacts. Actually, given a pair of residues which are in contact, it is quite likely to find additional contacts in their vicinity. Using this idea, a machine learning classification approach (SVMs) and the knowledge on the 3D structures of similar known proteins, some information can be inferred about the neighborhood of the contacts predicted by PSICOV, significantly increasing its prediction accuracy.

Author: Roberto Livi Centro Interdipartimentale per lo Studio delle Dinamiche Complesse Universita' di Firenze,

Title: Spectral methods for the analysis of DNA promoters Abstract: We present results on the study of amino acid sequences in DNA promoters. Spectral clustering and localization analysis allow to group all promoters of the genome (human and of other species) in specific subgroups, characterized by different kinds and spatial distributions of (quasi--)regular motives (substrings) separated by disordered ones. Many of these motives have been found to be related to functional roles in the control of gene expression.

Paolo Milazzo and Giovanni Pardini Dipartimento di Informatica, Universita di Pisa Identification of Components and Modular Verification of Biochemical Pathways Abstract Biochemical pathways are abstract descriptions of the interactions among the molecular species involved in a cellular process. Different molecular species mentioned in a pathway often represent dferent states of the same biological entity, such as the unbound and bound states of a certain molecule. Hence, a pathway can be seen as a network of interactions between entities changing state synchronously by means of reactions. We consider such biological entities as pathway components. We defined a semi-automatic algorithm based on the mass-conservation principle that allows the components of a pathway to be inferred from their interactions, namely from the chemical reactions constituting the pathway itself. The identification of molecular components allows formal descriptions of the pathway by means of automata or process algebra to be auto- matically generated. This enables the application of formal methods (such as model checking and bisimulation) to analyse and compare behaviours of pathways. Moreover, once the molecular components are identified, it is rather easy to perform syntactic transformations (projections)

Page 14: Pisa, 20 Ottobre 2014 Area della Ricerca del CNR Pisa, 20 Ottobre 2014 – Area della Ricerca del CNR Bioinformatiha 3 Terza giornata Toscana di Bioinformatica e Systems Biology Book

aimed at simplifying the visualization of the pathway by focusing on a subset of its components. Reduced models obtained though projection can also be used to perform formal verification of properties in a abstract or modular way. The component identification algorithm has been tested extensively on all of the curated SBML models available of the BioModels database. Moreover, component identification and modular verification based on projection has been applied to a model of the EGF signaling pathway .

P. Paradisi (1), D. Chiarugi (2) , P. Allegrini (3): (1) ISTI-CNR, (2) Max Planck Institute of Colloids and Interfaces, Potsdam (Germany) (3) IFC-CNR "A renewal model for the Super-Concentration effect"

A less investigated issue in origin of life research regards those crucial evolutionary steps leading to the formation of membrane compartments, which came into play as hosts for the first forms of cellular metabolism. The behaviour of the solutes in a water solution of proteins and lipids was experimentally studied. The lipid molecules, due to hydrophobic forces, organize spontaneously in quasi-spherical structures, called liposomes or vesicles. These structures, while are forming but still open, allow the random passage of molecules. Surprisingly, when lipid surfaces close up in a protein-containing solution to form vesicles, the entrapment frequency does not follow the expected Poisson distribution, but tends to assume a power-law behaviour, with many quasi-empty vesicles and a long decreasing tail with extremely crowded vesicles. This is referred to as “Superconcentration Effect” and it proves that liposomes can capture a high number of macro-molecules (e.g., proteins), even in diluted solutions. This observation overcomes one of the major

problems in prebiotic chemistry, i.e., how intravesicular solutes can spontaneously reach the relatively high concentrations needed for the metabolic processes to occur. Here we propose a stochastic model based on Cox's renewal theory, describing independent critical events randomly occurring in time. Waiting Times (WTs) among events are then mutually independent and they are only characterized by the WT distribution. We show that simple assumptions about the liposome-protein interactions can explain the emergence of a power-law decay in the distribution of ferritine molecules trapped inside the liposomes, thus shedding light on the role of renewal theory in the emergence of self-organized macro-structures in pre-biotic systems.

Page 15: Pisa, 20 Ottobre 2014 Area della Ricerca del CNR Pisa, 20 Ottobre 2014 – Area della Ricerca del CNR Bioinformatiha 3 Terza giornata Toscana di Bioinformatica e Systems Biology Book

Sponsors:

Organization:

Page 16: Pisa, 20 Ottobre 2014 Area della Ricerca del CNR Pisa, 20 Ottobre 2014 – Area della Ricerca del CNR Bioinformatiha 3 Terza giornata Toscana di Bioinformatica e Systems Biology Book

Interomics Tutorial Day

Algorithms in Bioinformatics and in Omics applications

Pisa, 2014 October, 21st

Area della Ricerca del CNR di Pisa

09:50 Opening and welcome remarks 10:00 - 11:30 : Introduction to Biological Networks - speakers Marco Pellegrini, Francesco Russo, Miriam Baglioni (IIT-CNR) 11:30-12:00 Coffee Break 12:00-13.30 : Bayesian Network Analysis for Biology - Speaker: Ercan Engin Kuruoglu (ISTI-CNR) 13.30-14:30 Lunch 14:30-16:00 - Data Bases in Bioinformatics - speaker Filippo Geraci, Loredana Genovese, Romina D'Aurizio (IIT-CNR) 16:00-16:30 Coffee Break 16.30-18:00 - Reconstructing chromatin geometrical structure from Chromosome Conformation Capture (3C) data Speaker Claudia Caudai (ISTI-CNR) and Monica Zoppè (IFC-CNR) 18.10 Closing remarks

Organized by

Page 17: Pisa, 20 Ottobre 2014 Area della Ricerca del CNR Pisa, 20 Ottobre 2014 – Area della Ricerca del CNR Bioinformatiha 3 Terza giornata Toscana di Bioinformatica e Systems Biology Book

Steering Committee:

Alberto Magi

Roberto Marangoni

Neri Niccolai

Marco Pellegrini

Local Organization:

Marco Pellegrini

Adriana Lazzaroni

Miriam Baglioni

Romina D’Aurizio

Loredana M. Genovese

Filippo Geraci

Francesco Russo

Organizing Secretariat:

Patrizia Andronico

Raffaella Casarosa