book of abstracts - university of belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...the...

199
Book of Abstracts Belgrade BioInformatics Conference 2016 20-24 June 2016, Belgrade, Serbia UNIVERSITY OF BELGRADE FACULTY OF MATHEMATICS

Upload: others

Post on 23-May-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Book of Abstracts

Belgrade BioInformatics Conference2016

20-24 June 2016, Belgrade, Serbia

UNIVERSITY OF BELGRADE

FACULTY OF MATHEMATICS

Page 2: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Nenad Mitic, editor

Belgrade BioInformatics Conference 2016

Book of abstractsBelgrade, June 20th-24th

The conference is organized by the Bioinformatics Research Group, Universityof Belgrade - Faculty of Mathematics (http://bioinfo.matf.bg.ac.rs).

Coorganizers of the conference are: Faculty of Agriculture, Faculty of Biology,Faculty of Chemistry, Faculty of Physical Chemistry, Institute for Biological Re-search ”Sinisa Stankovic”, Institute for General and Physical Chemistry, Institutefor Medical Research, Institute of Molecular Genetics and Genetic Engineering,Vinca Institute of Nuclear Sciences, Mathematical Institute of SASA, Belgrade,and COST - European Cooperation in Science and Technology

The conference is financially supported by

– Ministry of Education, Science and Technological Development of Republicof Serbia

– Central European Initiative (CEI)– Telekom Srbija– SevenBridges Genomic– RNIDS - Register of National Internet Domain Names of Serbia– Genomix4Life

Publication of this Book of abstracts is financed by the Ministry of Education,Science and Technological Development of Republic of Serbia

Publisher: Faculty of Mathematics, University of BelgradePrinted in Serbia, by DonatGraf, Belgrade

Serbian National Library Cataloguing in Publication DataFaculty of Mathematics, BelgradeBook of Abstracts: Belgrade BioInformatics Conference 2016, 20-24 June 2016.–Book of abstractsNenad Mitic, editor. XIX+151 pages, 24cm.

Copyright c©2016 by Faculty of Mathematics, University of BelgradeAll rights reserved. No part of this publication may be reproduced, stored inretrieval system, or transmited, in any form, or by any means, electronic, me-chanical, photocopying, recording or otherwise, without a prior premission ofthe publisher.

ISBN: 978-86-7589-108-6

Number of copies printed: 200

Page 3: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

International Advisory Committee

Vladik Avetisov The Semenov Institute of Chemical Physica, RASMoscow, Russia

Vladimir Brusic School of Medicine and Bioinformatics Center,Nazarbayev University, Kazakhstan and Depart-ment of Computer Science, Metropolitan Col-lege, Boston University, USA

Michele Caselle Department of Physics, Torino University, Torino,Italy

Radu Constantinescu Department of Physics, University of Craiova,Craiova, Romania

Oxana Galzitskaya Group of bioinformatics, Institute of Protein Re-search of the RAS, Russia

Madhavi Ganapathiraju Department of Biomedical Informatics, Univer-sity of Pittsburgh, USA

Mikhail Gelfand A.A. Kharkevich Institute for Information Trans-mission Problems, RAS, Faculty of Bioengi-neering and Bioinformatics, M.V. LomonosovMoscow State University, Moscow, Russia

Ernst Walter Knapp Fachbereich Biologie, Chemie, Phar-mazie/Institute of Chemistry and Biochemistry,Freie Universitt Berlin, Germany

Sergey Kozyrev Steklov Mathematical Institute, Moscow, RussiaZoran Obradovic Center for Data Analytics and Biomedical Infor-

matics, Temple University, USAYuriy L. Orlov Institute of Cytology and Genetics SB RAS,

Novosibirsk State University, RussiaGeorge Patrinos Department of Pharmacy, University of Patras,

GreeceNatasa Przulj Department of Computing , Imperial College

London, UKPaul Sorba Laboratory of Theoretical Physics and CNRS, An-

necy, FranceBosiljka Tadic Department of Theoretical Physics, Jozef Stefan

Institute, Ljubljana, SloveniaPeter Tompa VIB Structural Biology Research Center, Flanders

Institute for Biotechnology (VIB), BelgiumSilvio Tosatto Department of Biomedical Sciences, University

of Padova, ItalyEdward Trifonov Weizmann Institute of Science, University of

Haifa, Haifa, IsraelMatthias Ullmann Structural Biology/Bioinformatics Universitt

Bayreuth, GermanyBane Vasic The University of Arizona, Department of Elec-

trical and Computer Engineering, Bios Institutefor Collaborative Bioresearch, USA

Sergey Volkov Bogolyubov Institute for Theoretical Physics,Kiev, Ukraine

Ioannis Xenarios SIB Swiss Institute of Bioinformatics, Switzer-land

BelBI2016, Belgrade, June 2016.

Page 4: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

International Programme Committee

Milos Beljanski Institute for General and Physical Chemistry,University of Belgrade, Serbia

Erik Bongcam-Rudloff Division of Molecular Genetics, Department ofAnimal Breeding and Genetics, Swedish Univer-sity of Agricultural Sciences, Sweden

Antonio Cappuccio Immunity and Cancer, Institut Curie, FranceOliviero Carugo Faculty of Science, University of Pavia, ItalyBoris Delibasic Faculty of Organizational Sciences, University of

Belgrade, SerbiaZsuzsanna Dosztanyi Department of Biochemistry Eotvos Lorand Uni-

versity, Budapest, HungaryBranko Dragovich Institute of Physics, Mathematical Institute

SANU, Belgrade, SerbiaMarko Djordjevic Faculty of Biology, University of Belgrade, SerbiaOlgica Djurkovic-Djakovic Institute for Medical Research, University of Bel-

grade, SerbiaLajos Kalmar Department of Veterinary Medicine, Cambridge

Veterinary School, Cambridge, UKEija Korpelainen CSC IT Center for Science, FinlandIlija Lalovic Faculty of Natural Sciences and Mathematics,

Banja Luka, Bosnia and HerzegovinaNenad Mitic Faculty of Mathematics, University of Belgrade,

SerbiaMihajlo Mudrinic Vinca Institute of Nuclear Sciences, University of

Belgrade, SerbiaZoran Ognjanovic Mathematical Institute SANU, SerbiaGordana Pavlovic-Lazetic Faculty of Mathematics, University of Belgrade,

SerbiaMarco Punta Pierre and Marie Curie University, FrancePredrag Radivojac Department of Computer Science and Informat-

ics, Indiana University, USAAna Simonovic Institute for Biological Research Sinisa

Stankovic, Belgrade, SerbiaJerzy Tiuryn Faculty of Mathematics, Informatics and Me-

chanics, University of Warsaw, PolandAndrew Torda Center for Bioinformatics, University of Ham-

burg, GermanyAlessandro Treves SISSA-Cognitive Neuroscience, Trieste, ItalyNevena Veljkovic Institute for Nuclear Sciences VINCA, University

of Belgrade, SerbiaIgor V. Volovich Department of Mathematical Physics, Steklov

Mathematical Institute, RAS, Moscow, RussiaSnezana Zaric Faculty of Chemistry, University of Belgrade, Ser-

bia

BelBI2016, Belgrade, June 2016.

Page 5: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Local Organizing Committee

Bojana Banovic Institute of Molecular Genetics and Genetic En-gineering, University of Belgrade, Serbia

Milos Beljanski Institute for General and Physical Chemistry,University of Belgrade, Serbia

Branko Dragovich Co-Chair, Institute of Physics, Mathematical In-stitute SANU, Belgrade, Serbia

Marko Djordjevic Faculty of Biology, University of Belgrade, SerbiaOlgica Djurkovic-Djakovic Institute for Medical Research, University of Bel-

grade, SerbiaJelana Guzina Faculty of Biology, University of Belgrade, SerbiaJovana Kovacevic Faculty of Mathematics, University of Belgrade,

SerbiaSasa Malkov Faculty of Mathematics, University of Belgrade,

SerbiaMirjana Maljkovic Faculty of Mathematics, University of Belgrade,

SerbiaVesna Medakovic Faculty of Chemistry, University of Belgrade, Ser-

biaNenad Mitic Co-Chair, Faculty of Mathematics, University of

Belgrade, SerbiaIvana Moric Institute of Molecular Genetics and Genetic En-

gineering, University of Belgrade, SerbiaMihajlo Mudrinic Vinca Institute of Nuclear Sciences, University of

Belgrade, SerbiaVesna Pajic Faculty of Agriculture, University of Belgrade,

SerbiaMirjana Pavlovic Institute for General and Physical Chemistry,

University of Belgrade, SerbiaGordana Pavlovic-Lazetic Co-Chair, Faculty of Mathematics, University of

Belgrade, SerbiaJelena Samardzic Institute of Molecular Genetics and Genetic En-

gineering, University of Belgrade, SerbiaAna Simonovic Institute for Biological Research Sinisa

Stankovic, Belgrade, SerbiaMiomir Stankovic Mathematical Institute of the Serbian Academy

of Sciences and Arts, Belgrade, SerbiaBiljana Stojanovic Faculty of Mathematics, University of Belgrade,

SerbiaAleksandra Uzelac Institute for Medical Research, University of Bel-

grade, Serbia

BelBI2016, Belgrade, June 2016.

Page 6: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in
Page 7: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Preface

The first International Belgrade BioInformatics Conference (BelBI 2016) takesplace in Belgrade, Serbia, 20 - 24 June 2016.It grew out of the communities ofprevious conferences held in Belgrade, Data Mining in Bioinformatics (DMBI,2012) and the Theoretical Approaches to Bioinformation Systems (TABIS, 2010,2013). It is organized by the Bioinformatics group from the University of Bel-grade, Faculty of Mathematics, in cooperation with several other institutionsfrom Belgrade (Faculty of Agriculture, Faculty of Biology, Faculty of Chemistry,Faculty of Physical Chemistry, Institute for Biological Research ”Sinisa Stankovic”,Institute for General and Physical Chemistry, Institute for Medical Research, In-stitute of Molecular Genetics and Genetic Engineering, Vinca Institute of NuclearSciences, and Mathematical Institute of Serbian Academy of Science and Arts)and COST (European Cooperation in Science and Technology) Action BM1405.

The main purpose of the BelBI 2016 conference is to illuminate different aspectsof bioinformation systems, from theoretical approaches to modeling differentphenomena in life sciences, to information technologies necessary for analysisand understanding huge amount of data generated, to application of computerscience and informatics in the domain of precision medicine, finding new reme-dies against debilitating diseases and drug development.

The conference focuses on three main research fields including (but not limitedto) the following topics:

1. Theoretical Approaches to BioInformation Systems:– Structure and function of DNA, RNA and proteins– Gene expression and the genetic code– Neurons and cognition– Biological networks

2. Bioinformatics and Data Mining for OMICs Data:– Data mining methods, algorithms, and applications in life sciences and

precision medicine– Big data and data science– Data analytics, pattern recognition and machine learning in data analysis– Software and tools in genomics, proteomics, metabolomics, transcrip-

tomics, epigenomics, etc.– Sequence analysis– Predictive Models for OMICs data– Bioinformatics databases and algorithms

3. Biomedical Informatics will focus on information applied to or studied in thecontext of biomedicine:

– Translational Bioinformatics– Disease Models & Epidemiology– Predictive Modeling and Analytics in Healthcare– Biomedical Imaging and Data Visualization– Biomedical/Health database integration and management

Page 8: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

– Biomedical data/text mining

The conference program contains keynote lectures, invited talks, selected oraland poster presentations.

We did our best to bring together scientists from Europe and beyond and willhopefully provide a pleasant and stimulative place of gathering and exchange ofideas in the field of bioinformatics and related fields. We thank all the colleagueswho accepted our invitation to serve at the International Advisory, Program andOrganizing Committees. We also thank all the colleagues who accepted our in-vitation to present their research. The book of abstracts of all the presentationsis in our hands. We thank the Ministry of Education, Science and TechnologicalDevelopment of Republic of Serbia for financially supporting publication of thisbook of abstracts. We also thank our sponsors (Ministry of Education, Scienceand Technological Development of Republic of Serbia, Central European Initia-tive (CEI), Telekom Srbija, SevenBridges Genomic, RNIDS - Register of NationalInternet Domain Names of Serbia, and Genomix4Life) and all others who helpedus in making this event happen.

June 2016 Program co-chairs:Branko Dragovich

Gordana Pavlovic-LazeticNenad Mitic

BelBI2016, Belgrade, June 2016.

Page 9: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

BelBI2016 Conference program

Monday, June 20th

Morning sessionLocation: Rectors hall, Rectorate of the University of Belgrade

9:00-10:00 Registration

10:00-10:15 Opening ceremony

Chair: Branko Dragovich

10:15-11:00 Keynote speaker – Mikhail Gelfand (Moscow State University,Russia)Epigenetic state and spatial structure of chromatin

11:00-11:45 Welcome cocktail

Chair: Gordana Pavlovic-Lazetic

11:45-12:30 Keynote speaker – Vladimir Brusic (Nazarbayev University,Kazakhstan)Elemental metabolomics for improving human health

12:30-13:15 Keynote speaker – Vladimir Uversky (University of SouthFlorida, USA)Intrinsically disordered proteins in salted water and in the thicksoup

13:20-15:00 Lunch (Hotel ”Palace”)

Afternoon session: Location - Hotel ”Palace”, Conference hall

Chair: Mikhail Gelfand

15:00-15:35 Invited Speaker: Alexandre Morozov (Rutgers University, USA)Biophysical models of protein evolutionary dynamics

Page 10: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

15:00-15:35 Invited Speaker: Alexandre Morozov (Rutgers University, USA)Biophysical models of protein evolutionary dynamics

15:35-16:10 Invited Speaker: Vladik Avetisov (The Semenov Institute ofChemical Physics, RAS, Russia)Complex landscapes and ultrametricity in a biological context

16:10-16:40 coffee break

TABIS Session: Hotel ”Palace”, Conference hall

Chair: Alexandre Morozov

16:40-17:00 Aleksandr Bugay (Joint Institute for Nuclear Research,Moscow, Russia)Radiation Induced Dysfunctions in the Working Memory Per-formance Studied by Neural Network Modeling

17:00-17:20 Hanen Masmoudi (Higher institute of Biotechnology of Sfax,Tunisia)Model selection in biomolecular pathways

17:20-17:40 Silvia Grigolon (The Francis Crick Institute, United Kingdom)Identifying relevant positions in proteins by Critical VariableSelection

17:40-18:00 Jelena Guzina (University of Belgrade, Faculty of Biology, Ser-bia)Transcription initiation by alternative sigma factors

18:00-18:20 Bojana Blagojevic (Institute of Physics, Belgrade, Serbia)Achieving a rapid expression of toxic (but useful) moleculeswithin cell

18:20-18:40 Andjela Rodic (University of Belgrade, Faculty of Biology, Ser-bia)Examining regulation of restriction-modification systems byquantitative modeling

ii BelBI2016, Belgrade, June 2016.

Page 11: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

DMBI Session - Hotel ”Palace”, Banquet hall

Chair: Jovana Kovacevic

16:40-17:00 Urs Lahrmann (Fraunhofer Institute for Toxicology and Exper-imental Medicine, Regensburg, Germany)Combined genomic and transcriptomic characterization of sin-gle disseminated prostate cancer cells

17:00-17:20 Miroslava Cuperlovic-Culf (National Research Council ofCanada, Ottawa, Canada)Genome-scale Modelling, Metabolomics and Cheminformaticsanalysis guiding the Discovery of Antifungal Metabolites forCrop Protection

17:20-17:40 Milos Busarcevic (United World College of the Adriatic, Duino,Italy)Transcriptome data mining results support observed changesin host lipid metabolism during experimental toxoplasmosis

17:40-18:00 Jovana Kovacevic (University of Belgrade, Faculty of Mathe-matics, Serbia)One structured output learning method for protein functionprediction

18:00-18:20 Davorka Jandrlic (Faculty of Mechanical Engineering, Univer-sity of Belgrade, Serbia)The influence of amino acids physicochemical properties andfrequencies on identifying MHC binding ligands

18:20-18:40 Vladimir Babenko (Institute of Cytology and Genetics, Novosi-birsk, Russia)Clustering of CpG-rich elements in gene dense regions

BelBI2016, Belgrade, June 2016. iii

Page 12: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Tuesday, June 21th

COST sessionMorning session: Location - Hotel ”Palace”, Conference hall

Chair: Nevena Veljkovic

9:00- 9:35 Invited Speaker: Peter Tompa (Flanders Institute for Biotech-nology (VIB), Belgium)The role of structural disorder in protein degradation

9:35-10:10 Invited Speaker: Silvio Tosatto (University of Padova, Italy)Non-globular proteins: Towards an understanding of the ”darkmatter” in the protein universe

10:10-10:45 Invited Speaker: Oxana Galzitskaya (Institute of Protein Re-search of the Russian Academy of Sciences, Russia)Molecular mechanism of Aβ amyloid formation

10:45-11:15 coffee break

Chair: Oliviero Carugo

11:15-11:50 Invited Speaker: Marco Punta (Pierre and Marie Curie Univer-sity, France)Intrinsically disordered protein families

11:50-12:25 Invited Speaker: Alexandre de Brevern (University ParisDiderot, France)On flexibility, deformability and mobility of protein structuresin the light of a structural alphabet

12:25-13:00 Invited Speaker: Ioannis Xenarios (SIB Swiss Institute of Bioin-formatics, Switzerland)From biocuration to model predictions and back

13:00-14:00 Sponsors presentation (SevenBridges)

14:00-15:00 Lunch (Hotel ”Palace”)

iv BelBI2016, Belgrade, June 2016.

Page 13: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Afternoon session: TABIS Session: Hotel ”Palace”, Banquet hall

Chair: Bosiljka Tadic

15:00-15:35 Invited Speaker: Antonio Celani (The Abdus Salam Interna-tional Centre for Theoretical Physics, Trieste, Italy)Infomax strategies for an optimal balance of exploration andexploitation

15:35-16:10 Invited Speaker: Stojmirovic, Aleksandar (Johnson & Johnsoncomp., USA)Networks of Co-expression Modules

16:15-16:35 Asja Jelic (The Abdus Salam International Centre for Theoreti-cal Physics, Trieste, Italy)Networks of interaction in moving animal groups and collec-tive changes of direction

16:35-16:55 Anashkina Anastasia (Engelhardt Institute of Molecular Biol-ogy, Russian Academy of Sciences, Russia)Bioinformatics Basis for the ”Molecular Tweezers” Construc-tion

16:55-17:25 coffee break

17:25-19:00 Poster Session (Hotel ”Palace”, Banquet hall)

HI Session - Hotel ”Palace”,Conference hall

Chair: Olgica Djurkovic-Djakovic

15:00-15:35 Invited Speaker: Ralf Bundschuh (The Ohio State University,USA)Quantifying genome-wide DNA methylation from MethylCap-Seq data and its applications in cancer

BelBI2016, Belgrade, June 2016. v

Page 14: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

15:35-15:55 Paolo Paradisi ((Institute of Information Science and Technolo-gies, Pisa, Italy)Complexity measures based on intermittent events in brainEEG data

15:55-16:15 Ivan Jovanovic (VINCA Institute, University of Belgrade, Ser-bia)Could integrative bioinformatic approach predict the circulat-ing miRs that have significant role in pancreatic tissue in type2 diabetes?

16:15-16:35 Nikola Milosevic (University of Manchester, UK)Hybrid methodology for information extraction from tables inclinical literature

16:35-16:55 Petar Velickovic (University of Cambridge, UK)Viral: Real-world competing process simulations on multiplexnetworks

16:55-17:25 coffee break

17:25-19:00 Poster Session (Hotel ”Palace”, Banquet hall)

Wednesday, June 22th

Morning session: Location - Hotel ”Palace”, Conference hall

Chair: Vladimir Brusic

8:30- 9:05 Invited Speaker: Zoran Obradovic (Temple University, USA)Effectiveness of Multiple Blood Cleansing Interventions in Sepsis

9:05- 9:40 Invited Speaker: Natasa Przulj (Imperial College London, UK)Network Data Integration Enables Precision Medicine

9:40-10:15 Invited Speaker: Nitesh Chawla (University of Notre Dame,USA)Leveraging Electronic Medical Records for Personalized and Pop-ulation Healthcare

11:00- EXCURSION

vi BelBI2016, Belgrade, June 2016.

Page 15: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Thursday, June 23th

Morning session: Location - Hotel ”Palace”, Conference hall

Chair: Marko Djordjevic

9:00- 9:35 Invited Speaker: Yuriy L. Orlov (Novosibirsk State University,Russia)Comparative analysis of plant genome structure and antisensetranscripts

9:35-10:10 Invited Speaker: Paul Sorba (Laboratory of Theoretical Physicsand CNRS, France)Symmetry and Minimum Principle: a basis for the Genetic Code

10:10-10:45 Invited Speaker: Konstantin Severinov (Rutgers University,USA)The Influence of Copy-Number Maintenance Mechanisms ofTargeted Extrachromosomal Genetic Elements on the Outcomeof CRISPR-Cas Defense

10:45-11:15 coffee break

Chair: Vladimir Uversky

11:15-11:50 Invited Speaker: Bosiljka Tadic (Jozef Stefan Institute, Slove-nia)Algebraic Topology Analysis of Brain Graphs Emanating fromSocial Communications

11:50-12:25 Invited Speaker: Erik Bongcam-Rudloff (Swedish University ofAgricultural Sciences, Sweden)Next Generation Biotechnologies, the bad and the good: a lookinto the future

12:25-13:00 Invited Speaker: Andrea Ciliberto (IFOM-IEO, Italy)Adapt or die. Investigating the molecular basis of cell variabil-ity

13:00-13:30 Sponsors presentation (Genomix4Life, Pearson)

13:30-15:00 Lunch (Hotel ”Palace”)

BelBI2016, Belgrade, June 2016. vii

Page 16: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Afternoon sessionTABIS Session: Hotel ”Palace”, Banquet hall

Chair: Paul Sorba

15:00-15:35 Invited Speaker: Branko Dragovic (Mathematical InstituteSASA, Serbia)Ultrametric Approach to Bioinformation Systems

15:55-16:15 Natasa Misic (Lola Institute, Belgrade, Serbia)Standard Genetic Code vs Vertebrate Mitochondrial Code: Nu-cleon Balances and p-Adic Distances

15:35 -15:55 Natasa Djurdjevac Konrad (Zuse Institute Berlin, Germany)A new random-walk-based approach for finding co-expressionmodules in biological networks

16:15-16:35 Ozal Mutlu (Marmara University, Istanbul, Turkey)Structural Characterization of the Trypanosoma bruceiCK2A1-HDAC1/HDAC2 Interactions by Molecular Modelingand Protein-Protein Docking

16:35-17:10 coffee breakChair: Yuriy L. Orlov

17:10-17:30 Tamara Dimitrova (Macedonian Academy of Sciences andArts, Macedonia)Analysis of network structural characteristics through vertexcharacteristics in directed networks

17:30-17:50 Balazs Szalkai (Eotvos Lorand University, Budapest, Hungary)Graph Theoretical Analysis Reveals: Womens Brains Are BetterConnected than Mens

17:50-18:10 Balint Varga (Eotvos Lorand University, Budapest, Hungary)Comparative Connectomics: Mapping the Inter-IndividualVariability of Connections within the Regions of the HumanBrain

18:10-18:30 Yair Lakretz (Tel Aviv University, Israel)The perceptual structure of the phoneme manifold

20:00- Conference Dinner

viii BelBI2016, Belgrade, June 2016.

Page 17: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

DMBI/HI Session - Hotel ”Palace”, Conference hall

Chair: Noel Malod-Dognin

15:00-15:35 Invited Speaker: Jan Baumbach (University of Southern Den-mark, Denmark)Computational Breath Analysis Non-invasive detection ofbiomarkers in exhaled air and bacterial vapor

15:35 -15:55 Ana Simonovic (Institute for Biological Research, University ofBelgrade, Serbia)Identification of genes involved in morphogenesis in vitro inCentaurium erythraea Rafn. as a model organism

15:55-16:15 Richard Roettger (University of Southern Denmark, Odense,Denmark)On the clustering of biomedical datasets - a data-driven per-spective

16:15-16:35 Milan Vukicevic (University of Belgrade, Faculty of Organiza-tional Sciences, Serbia)White-Box Predictive Algorithms for Predicting Disease Stateson Gene Expression Data From Component Based Design toMeta Learning

16:35-17:10 coffee breakChair: Dragan Matic

17:10-17:30 Dragana Dudic (University of Belgrade, Faculty of Agriculture,Serbia)Mining PMMoV genotype-pathotype association rules frompublic databases

17:30-17:50 Ana Jelovic (University of Belgrade, Faculty of Transport andTraffic Engineering, Serbia)Filtering of repeat sequences in genomes

17:50-18:10 Milana Grbic (Univeristy of Banja Luka, Faculty of Science andMathematics, Bosnia and Herzegovina)Improving 1NN strategy for classification of some prokaryoticorganisms

18:10-18:30 Sanja Brdar (Institute for research and development of infor-mation technology in biosystem, University of Novi Sad, Ser-bia )Non-negative Matrix Factorization for Integrative Clusteringof Bioinformatics Data

20:00- Conference Dinner

BelBI2016, Belgrade, June 2016. ix

Page 18: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Friady, June 24th

Morning sessionTABIS Session: Hotel ”Palace”, Banquet hall

Chair:Vladik Avetisov

9:00- 9:35 Invited Speaker: Sergei Kozyrev (Steklov Mathematical Insti-tute, Russia)Dark states in quantum photosynthesis

9:35-10:10 Invited Speaker: Argyris Nicolaidis (Aristotle University ofThessaloniki, Greece)A Quantum Approach to the DNA Structure

10:10-10:30 coffee breakChair: Robert Waterhouse

10:30-11:05 Invited Speaker: Sergey Volkov (Bogolyubov Institute for The-oretical Physics, Ukraine)DNA polymorphism as a tool for genetic information imple-mentation

11:05-11:25 Polina Kanevska (Bogolyubov Institute for Theoretical Physics,Ukraine)DNA polymorphism as a tool for genetic information imple-mentation

11:25-11:45 Ana Stanojevic (University of Belgrade, Faculty of PhysicalChemistry, Serbia)Mathematical Modeling of the Hypothalamic-Pituitary-AdrenalAxis Dynamics in Rats

11:45-12:05 Alina-Maria Streche (University of Craiova, Department ofPhysics, Romania)Chaos and symmetry in mathematical neural flow models

13:00-15:00 Lunch (Hotel ”Palace”)

16:00-17:30 Round Table Discussion: perspectives and cooperation

17:30 Closing ceremony

x BelBI2016, Belgrade, June 2016.

Page 19: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

DMBI/HI Session - Hotel ”Palace”, Conference hall

Chair: Alexandre de Brevern

9:00- 9:35 Invited Speaker: Robert Waterhouse (University of Geneva,Switzerland)OrthoDB: an evolutionary perspective to interpreting genomicsdata

9:35-10:10 Invited Speaker:Goran Nenadic (University of Manchester, UK)What is bioinformatics made from: understanding databaseand software usage through literature mining

10:10-10:30 coffee breakChair: Jan Baumbach

10:30-11:05 Invited Speaker: Noel Malod-Dognin (Imperial College Lon-don, UK)Patient Specific Network Data Integration Enables PrecisionMedicine in Cancer

11:05-11:25 Invited Speaker: Nevena Veljkovic (VINCA Institute, Universityof Belgrade, Serbia)Transcription factors interaction inference based on sequencefeature representations

11:25-11:45 Zeljko Popovic (University of Novi Sad, Faculty of Sciences,Serbia)DORMANCYbase developing a bioinformatics database onmolecular regulation of animal dormancy

11:45-12:05 Milena Banjevic (Natera Inc., San Carlos, USA)SNP-Based Noninvasive Prenatal Screening using Cell-FreeDNA for Detection of Fetal Chromosome Abnormalities

13:00-15:00 Lunch (Hotel ”Palace”)

16:00-17:30 Round Table Discussion: perspectives and cooperation(Hotel ”Palace”, Conference hall)

17:30 Closing ceremony(Hotel ”Palace”, Conference hall)

BelBI2016, Belgrade, June 2016. xi

Page 20: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in
Page 21: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Table of Contents

A. Invited speakers

Complex landscapes and ultrametricity in a biological context . . . . . . . . . . . 1Vladik A. Avetisov

Computational Breath Analysis - Non-invasive detection of biomarkersin exhaled air and bacterial vapor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Jan Baumbach

Next Generation Biotechnologies, the bad and the good: a look into thefuture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Erik Bongcam-Rudloff

On flexibility, deformability and mobility of protein structures in thelight of a structural alphabet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Tarun Narwani, Pierrick Craveur, Nicolas Shinada, Hubert Santuz,Joseph Rebehmed, Catherine Etchebest, and Alexandre G. de Brevern

Elemental metabolomics for improving human health . . . . . . . . . . . . . . . . . . 7Ping Zhang, Constantinos Georgiou, and Vladimir Brusic

Quantifying genome-wide DNA methylation from MethylCap-Seq dataand its applications in cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Ralf Bundschuh

Infomax strategies for an optimal balance of exploration and exploitation . 10Antonio Celani

Leveraging Electronic Medical Records for Personalized and PopulationHealthcare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Nitesh V. Chawla

Adapt or Die. Investigating the molecular basis of cell variability . . . . . . . . . 12Andrea Ciliberto

Ultrametric Approach to Bioinformation Systems . . . . . . . . . . . . . . . . . . . . . . 13Branko Dragovich

Molecular mechanism of Aβ amyloid formation . . . . . . . . . . . . . . . . . . . . . . . 14Oxana V. Galzitskaya, Olga M. Selivanova, Alexey K. Surin, Victor V.Marchenkov, Ulyana F. Dzhus, Elizaveta I. Grigorashvili, Mariya Yu.Suvorina, Anna V. Glyakina, and Nikita V. Dovidchenko

Epigenetic state and spatial structure of chromatin . . . . . . . . . . . . . . . . . . . . 15Mikhail Gelfand

Page 22: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Dark states in quantum photosynthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16Sergei Kozyrev

Patient Specific Network Data Integration Enables Precision Medicinein Cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Noel Malod-Dognin

Biophysical models of protein evolutionary dynamics . . . . . . . . . . . . . . . . . . 18Alexandre Morozov

What is bioinformatics made from: understanding database andsoftware usage through literature mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

Goran Nenadic

A Quantum Approach to the DNA Functioning . . . . . . . . . . . . . . . . . . . . . . . . 20Argyris Nicolaidis

Effectiveness of Multiple Blood Cleansing Interventions in Sepsis . . . . . . . . 21Zoran Obradovic

Comparative analysis of plant genome structure and antisense transcripts . 22Salwa E.S. Mohamed, Oxana B. Dobrovolskaya, Vladimir N. Babenko,KhaledSalem, Ming Chen, and Yuriy L. Orlov

Network Data Integration Enables Precision Medicine . . . . . . . . . . . . . . . . . 23Natasa Przulj

Intrinsically disordered protein families . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24Marco Punta

The Influence of Copy-Number Maintenance Mechanisms of TargetedExtrachromosomal Genetic Elements on the Outcome of CRISPR-CasDefense . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Konstantin Severinov, Iaroslav Ispolatov, and Ekaterina Semenova

Symmetry and minimum principle: a basis for the genetic code ? . . . . . . . . 26Paul Sorba

Networks of Co-expression Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27Aleksandar Stojmirovic

Algebraic Topology Analysis of Brain Graphs Emanating from SocialCommunications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

Bosiljka Tadic and Miroslav Andjelkovic

The role of structural disorder in protein degradation . . . . . . . . . . . . . . . . . . 29Peter Tompa

Non-globular proteins: Towards an understanding of the ”dark matter”in the protein universe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

Silvio C.E. Tosatto

xiv BelBI2016, Belgrade, June 2016.

Page 23: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Intrinsically disordered proteins in salted water and in the thick soup . . . . 31Vladimir N. Uversky

Transcription factors interaction inference based on sequence featurerepresentations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

Nevena Veljkovic

DNA polymorphism as a tool for genetic information implementation . . . . 33Sergey N. Volkov

OrthoDB: an evolutionary perspective to interpreting genomics data . . . . . 35Robert M. Waterhouse, Evgenia V. Kriventseva, and Evgeny M. Zdobnov

From biocuration to model predictions and back . . . . . . . . . . . . . . . . . . . . . . 36Ioannis Xenarios

B. Speakers in sessions

Bioinformatics Basis for the ”Molecular Tweezers” Construction . . . . . . . . . 39Anastasia Anashkina and Alexei Nekrasov

Clustering of CpG-rich elements in gene dense regions . . . . . . . . . . . . . . . . . 43Vladimir Babenko, Irina Chadaeva, and Yuriy. Orlov

SNP-Based Noninvasive Prenatal Screening using Cell-Free DNA forDetection of Fetal Chromosome Abnormalities . . . . . . . . . . . . . . . . . . . . . . . . 45

Milena Banjevic, Allison Ryan, and Styrmir Sigurjonsson

Achieving a rapid expression of toxic (but useful) molecules within cell . . . 46Bojana Blagojevic and Magdalena Djordjevic and Marko Djordjevic

Non-negative Matrix Factorization for Integrative Clustering ofBioinformatics Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

Sanja Brdar

Radiation Induced Dysfunctions in the Working Memory PerformanceStudied by Neural Network Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

Aleksandr Bugay

Transcriptome data mining results support observed changes in hostlipid metabolism during experimental toxoplasmosis . . . . . . . . . . . . . . . . . . . 49

Milos Busarcevic and Aleksandar Trbovich, Ivan Milovanovic,Aleksandra Uzelac, Olgica Djurkovic-Djakovic

Genome-scale Modelling, Metabolomics and Cheminformatics analysisguiding the Discovery of Antifungal Metabolites for Crop Protection . . . . . . 52

Miroslava Cuperlovic-Culf

BelBI2016, Belgrade, June 2016. xv

Page 24: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Analysis of network structural characteristics through vertexcharacteristics in directed networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

Tamara Dimitrova

A new random-walk-based approach for finding co-expression modulesin biological networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

Natasa Djurdjevac Conrad

Improving 1NN strategy for classification of some prokaryotic organisms . 55Milana Grbic, Aleksandar Kartelj, Dragan Matic and Vladimir Filipovic

Identifying relevant positions in proteins by Critical Variable Selection . . . 57Silvia Grigolon

Transcription initiation by alternative sigma factors . . . . . . . . . . . . . . . . . . . . 58Jelena Guzina and Marko Djordjevic

The influence of amino acids physicochemical properties andfrequencies on identifying MHC binding ligands . . . . . . . . . . . . . . . . . . . . . . . 59

Davorka R. Jandrlic, Nenad S. Mitic, and Mirjana D. Pavlovic

Networks of interaction in moving animal groups and collective changesof direction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Asja Jelic

Filtering of repeat sequences in genomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62Ana Jelovic, Milos Beljanski, and Nenad Mitic

Could integrative bioinformatic approach predict the circulating miRsthat have significant role in pancreatic tissue in type 2 diabetes? . . . . . . . . . 63

Ivan Jovanovic, Maja Zivkovic, Jasmina Jovanovic, Tamara Djuric,and Aleksandra Stankovic

Mechanism of unusual flexibility of DNA TATA-box . . . . . . . . . . . . . . . . . . . . . 68Polina Kanevska and Sergey Volkov

One structured output learning method for protein function prediction . . . 69Jovana Kovacevic, Predrag Radivojac, Gordana Pavlovic-Lazetic

Combined genomic and transcriptomic characterization of singledisseminated prostate cancer cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

Stefan Kirsch, Urs Lahrmann, Miodrag Guzvic, Zbigniew T. Czyz,Giancarlo Feliciello, Bernhard Polzer and Christoph A. Klein

The perceptual structure of the phoneme manifold . . . . . . . . . . . . . . . . . . . . 72Yair Lakretz, Evan-Gary Cohen, Naama Friedmann, Gal Chechik, andAlessandro Treves

Model selection in biomolecular pathways . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73Hanen Masmoudi

xvi BelBI2016, Belgrade, June 2016.

Page 25: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Hybrid methodology for information extraction from tables in thebiomedical literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

Nikola Milosevic, Cassie Gregson, Robert Hernandez, and GoranNenadic

Standard Genetic Code vs Vertebrate Mitochondrial Code: NucleonBalances and p-Adic Distances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

Natasa Z. Misic

Structural Characterization of the Trypanosoma brucei CK2A1-HDAC1/HDAC2 Interactions by Molecular Modeling and Protein-ProteinDocking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

Ozal Mutlu

Mining PMMoV genotype-pathotype association rules from publicdatabases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

Vesna Pajic, Bojana Banovic, Milos Beljanski and Dragana Dudic

Complexity measures based on intermittent events in brain EEG data . . . . 87Paolo Paradisi, Marco Righi, Massimo Magrini, Maria ChiaraCarboncini, Alessandra Virgillito, and Ovidio Salvetti

DORMANCYbase developing a bioinformatics database on molecularregulation of animal dormancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

Popovic Zeljko D., Kadlecsik Tamas, Fazekas David, Ari Eszter,Korcsmaros Tamas, Uzelac Iva, Avramov Milos, Krivokuca Nikola,Kitanovic Nevena, and Kokai Dunja

Examining regulation of restriction-modification systems by quantitativemodeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

Andjela Rodic and Marko Djordjevic

On the clustering of biomedical datasets - a data-driven perspective . . . . . . 95Richard Roettger

Identification of genes involved in morphogenesis in vitro in Centauriumerythraea Rafn. as a model organism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

Ana Simonovic, Milan Dragicevic, Giorgio Giurato, Biljana Filipovic,Sladjana Todorovic, Milica Bogdanovic, Katarina Cukovic, andAngelina Subotic

Mathematical Modeling of the Hypothalamic-Pituitary-Adrenal AxisDynamics in Rats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

Ana Stanojevic, Vladimir Markovic, Zeljko Cupic, Stevan Macesic,Vladana Vukojevic, and Ljiljana Kolar-Anic

Chaos and symmetry in mathematical neural flow models . . . . . . . . . . . . . . 99Rodica Cimpoiasu, Radu Constantinescu, and Alina Streche

BelBI2016, Belgrade, June 2016. xvii

Page 26: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Graph theoretical analysis reveals: Womens brains are better connectedthan mens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

Balazs Szalkai, Balint Varga, and Vince Grolmusz

Comparative Connectomics: Mapping the Inter-Individual Variability ofConnections within the Regions of the Human Brain . . . . . . . . . . . . . . . . . . . 101

Balint Varga

Viral: Real-world competing process simulations on multiplex networks . . 102Petar Velickovic, Andrej Ivaskovic, Stella Lau, and Milos Stanojevic

White-Box Predictive Algorithms for Predicting Disease States on GeneExpression Data From Component Based Design to Meta Learning . . . . . . 107

Milan Vukicevic, Sandro Radovanovic, Boris Delibasic, and MilijaSuknovic

C. Poster session

Machine learning-based approach to help diagnosing Alzheimer’sdisease through spontaneous speech analysis . . . . . . . . . . . . . . . . . . . . . . . . . 111

Jelena Graovac, Jovana Kovacevic, and Gordana Pavlovic Lazetic

Targeted resequencing in diagnostics of inherited genetic disorders . . . . . . 112Jelena Kusic-Tisma, Nikola Ptakova, A. Divac, M. Ljujic, Lj. Rakicevic,M. Tesic, N. Antonijevic, S. Kojic, Milan Macek Jr., and D. Radojkovic

A biologically-inspired model of visual word recognition . . . . . . . . . . . . . . . . 114Yair Lakretz, Naama Friedmann, and Alessandro Treves

Crystallographic study on CH/O interactions of aromatic CH donorswithin proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

J. Lj. Dragelj, Ivana M. Stankovic, D. M. Bozinovski, T. Meyer, DusanZ. Veljkovic, Vesna B. Medakovic, Ernst Walter Knapp,, and SnezanaD. Zaric

Dynamics of Escherichia coli type I-E CRISPR spacers over 42,000 years . . 120Ekaterina Savitskaya, Anna Lopatina, Sofia Medvedeva, MikhailKapustin, Sergey Shmakov, Alexey Tikhonov, Irena I. Artamonova,and Konstantin Severinov

De Novo Transcriptome Sequencing of Verbascum thapsus L. to IdentifyGenes Involved in Metal Tolerance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

Filis Morina, Marija Vidovic, Ana Sedlarevic, Ana Simonovic, andSonja Veljovic-Jovanovic

De Novo Transcriptome Sequencing of Pelargonium zonale L. to IdentifyGenes Involved in UV-B and High Light Response . . . . . . . . . . . . . . . . . . . . . . 123

Marija Vidovic, Filis Morina, Ana Sedlarevic, Ana Simonovic, andSonja Veljovic-Jovanovic

xviii BelBI2016, Belgrade, June 2016.

Page 27: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Protein Interaction Network Construction and Analysis Using theQuantitative Proteomics Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

Ozal Mutlu and Nagihan Gulsoy

An optimal promoter description for bacterial transcription start sitedetection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

Milos Nikolic, Tamara Stankovic, and Marko Djordjevic

Chronic Treatment with Fluoxetine Led to Alterations in the RatHippocampal Proteome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

Ivana Peric, Dragana Filipovic, Victor Costina, and Peter Findeisen

A web-based tool for prediction of effects of single amino acidsubstitutions outside conserved functional protein domains . . . . . . . . . . . . . 130

Vladimir Perovic, Ljubica Mihaljevic, Branislava Gemovic, and NevenaVeljkovic

Protein-protein interaction prediction method based on principlecomponent analysis of amino acid physicochemical properties . . . . . . . . . . . 131

Neven Sumonja, Nevena Veljkovic, Sanja Glisic, and Vladimir Perovic

Basic Sequence Alignment Based Screening for Alternative MannanaseProducing Bacteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

Bojan D. Petrovic and Zorica D. Knezevic-Jugovic

Theoretical study on the role of aromatic amino acids in stability ofamyloids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

Dragan B. Ninkovic, Dusan P. Malenov, Predrag V. Petrovic, EdwardN. Brothers, Shuqiang Niu, Michael B. Hall, Milivoj Belic and SnezanaD. Zaric

Construction of Amyloid PDB Files Database . . . . . . . . . . . . . . . . . . . . . . . . . . 139Ivana Stankovic and Snezana Zaric

Search for small RNAs associated with CRISPR/Cas . . . . . . . . . . . . . . . . . . . . 144Tamara Stankovic, Jelena Guzina, Magdalena Djordjevic, and MarkoDjordjevic

A novel approach for dealing with spatial/temporal edges withinmolecular interaction networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

Ruth A Stoney, Ryan Ames, Goran Nenadic, David L Robertson∗, andJean-Marc Schwartz ∗Shared last/corresponding authors

Gene expression in schizophrenia patients and non-schizophrenicindividuals infected with Toxoplasma gondii . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

Aleksandra Uzelac, Tijana Stajner, Milos Busarcevic, Ana Munjiza,Milutin Kostic, Cedo Miljevic, Dusica Lecic-Tosevski, Nenad Mitic, SasaMalkov, and Olgica Djurkovic-Djakovic

BelBI2016, Belgrade, June 2016. xix

Page 28: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Propensities of amino acid toward certain secondary protein structuretypes: comparison of different statistical methods . . . . . . . . . . . . . . . . . . . . . . 149

Dusan Z. Veljkovic, Sasa Malkov, Vesna B. Medakovic, and Snezana D.Zaric

Botryosphaeriaceae on Aesculus hippocastanum in Serbia . . . . . . . . . . . . . . . . 150Milica Zlatkovic, Nenad Keca, Michael Wingfield, Fahimeh Jami, andBernard Slippers

Botryosphaeriaceae on Sequoia sempervirens in Serbia . . . . . . . . . . . . . . . . . . 151Milica Zlatkovic, Nenad Keca, Michael Wingfield, Fahimeh Jami, andBernard Slippers

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

List of participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

Sponsors

Page 29: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

A. INVITED SPEAKERS

Page 30: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in
Page 31: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Complex landscapes and ultrametricity in abiological context

Vladik A. Avetisov

The Semenov Institute of Chemical Physics of the Russian Academy of Sciences,Kosygina 4, Moscow, Russia

[email protected]

Abstract

In general, the control functions relevant to the cooperative behavior of systemsof many interacting units can be perceived as landscapes. In biology, landscapescan specify the energy (i.e., they are to be minimized), or they can afford a mea-sure of fitness (i.e., they are to be maximized). For some problems, the landscapehas few dimensions (e.g., when it represents the potential of a specific unit). Forother problems, the landscape has many dimensions, such as when the behaviorof a representative point describes the positions of all of the individual units ofa many-body system. As in geography, landscapes may be flat and rugged. Flatlandscapes are the simplest to consider, but rugged landscapes are of the great-est interest due to their complexity and rich behavior. Here, I discuss ruggedlandscapes focusing on proteins and evolutionary systems. These examples aredemonstrative for cases in which one is attempting to describe the time behaviorof complex systems with inherently conflicting interactions over a wide range oftime scales. In fact, the protein energy landscape is overly complicated for re-constructing in detail the entire process of transmission of local excitations atthe protein active site into the directed movements of large protein fragments.The same problems appear when we consider a community of species whoseevolution in high-dimensional genomic space is specified by a rugged fitnesslandscape.

In any case, one needs to make some simplifications in order to describe multi-scale dynamic behavior on extremely complex landscapes. In this respect, itseems that ultrametric random processes, which are inherently multi-scale, openup new perspectives [1],[2].

Keywords: complex landscapes, ultrametric diffusion, protein dynamics, evolu-tion

References

1. Avetisov, V. A., Bikulov, A. Kh., Zubarev.A. P. : Ultrametric random walk and dynamics ofprotein molecules, Proceedings of the Steklov Institute of Mathematics, 285, 3-25. (2014)

2. Avetisov V. A., Zhuravlev Yu. N.: An evolutionary interpretation of the p-adic ultrametricdiffusion equation, Doklady Mathematics, 75 (3), 453-455. (2007)

BelBI2016, Belgrade, June 2016. 1

Page 32: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Computational Breath Analysis - Non-invasivedetection of biomarkers in exhaled air and bacterial

vapor

Jan Baumbach

Dept. of Mathematics and Computer Science, University of Southern Denmark,Campusvej 55, Odense, Southern Denmark 5230, Denmark

[email protected]

Abstract

Volatile organic compounds are emitted by all living cells and tissues. We seek tonon-invasively ’sniff’ biomarker molecules that are predictive for the biomedicalfate of individual patients or cell cultures. This promises great hope to move thetherapeutic windows to earlier stages of disease progression. While portable de-vices for exhaled volatile metabolite measurement exist, we face the traditionalbiomarker research barrier: A lack of robustness hinders translation to the worldoutside laboratories. To move from biomarker discovery to validation, from sep-arability to predictability, we have developed several bioinformatics methods forcomputational breath analysis, which have the potential to redefine non-invasivebiomedical decision making by rapid and cheap matching of decisive medicalpatterns in exhaled air. We aim to provide a supplementary diagnostic tool com-plementing classic urine, blood and tissue samples. In the presentation, we willreview the state of the art, study some clinical application examples, highlightexisting challenges, and introduce new data mining methods for identifying ex-haled biomarkers.

2 BelBI2016, Belgrade, June 2016.

Page 33: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Next Generation Biotechnologies, the bad and thegood: a look into the future

Erik Bongcam-Rudloff

SLU Global Bioinformatics Centre, SLU, Uppsala, [email protected]

Abstract

Life sciences have undergone an immense transformation during the recentyears, where advances in genomics, epigenetics, proteomics and other high-throughput techniques produce floods of raw data that need to be stored, anal-ysed and interpreted in various ways.

NGS technology massively parallelises nucleotide sequencing procedures, mak-ing the sequencing of genomes and of transcriptomes much faster and cheaperthan ever before. The new technologies are, however, posing massive (bio-) in-formatics challenges that require new ways of thinking and novel solutions.

During my talk I will present new technologies and briefly discuss the pros andcons of this development. I will also discuss some of the aspects of our scientificpublishing culture that are a hinder for modern efficient analysis of data. At last Iwill shortly present some exiting new initiatives that are working in the creationof research solutions to face the challenges that the new technologies generate.

BelBI2016, Belgrade, June 2016. 3

Page 34: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

On flexibility, deformability and mobility of proteinstructures in the light of a structural alphabet

Tarun Narwani1, Pierrick Craveur1, Nicolas Shinada1,2, Hubert Santuz1, JosephRebehmed3, Catherine Etchebest1, and Alexandre G. de Brevern2

1 INSERM UMR S 1134, DSIMB, Univ Paris Diderot, Sorbonne Paris Cit, INTS, lab ofexcellence GR-Ex, 6 rue Alexandre Cabanel, 75013 Paris, France

{arun.narwani, pierrick.craveur, nicolas.shinada, hubert.santuz,

catherine.etchebest, alexandre.de-brevern}@inserm.fr2 Discngine,79 Avenue Ledru Rollin, 75011 Paris, France

[email protected] Department of Computer Science and Mathematics, Lebanese American University,

Byblos 1h401 2010, [email protected]

Abstract

The function of a protein is directly dependent on its 3-dimensional structure.Visualization tools have oversimplified our views of protein structures. Often,they are considered as macromolecules with repetitive structures as rigid whilethe connecting loops as flexible or even disordered. in silico approaches are in-teresting tools to tackle this critical question of protein flexibility. Moreover, itallows applying other criteria than B-factors, to define flexibility [1].

We have previously developed different structural alphabets (SAs) [2], [3]. Theyare libraries of small protein fragments that are able to approximate every partof protein structures, making them more precise than classical secondary struc-tures. More precise and complete description of protein backbone conformationcan be obtained using SAs for the structural analysis; from definition of lig-and binding sites to superimposition of protein structures [4]. SAs are also wellsuited to perform prediction of protein flexibility from the sequence [5]. Wehave also used them to analyse the dynamics of protein structures in a case of atransmembrane protein [6] and for integrins implicated in pathologies [7], [8].

Here, we have selected a representative set of 169 protein structures describ-ing equally the main four SCOP classes [9]. 3 independent Molecular Dynamic(MD) simulations of 50 ns have been performed for each system using GRO-MACS software [10] with classical parameters of AMBER99sb forcefield. All thesimulations quickly reached a stabilize state, i.e. a plateau.

Each simulation was analysed through classical secondary structures assign-ment (with DSSP [11]) and Protein Blocks (PBs [2]) assignment (with PBxploretool [12]). Notably, from PBs’ assignment, entropy value: Neq can be computedwhich is a quantitative value reflecting the local conformational changes [2]. Ofcourse, classical MD analyses, e.g. root mean square fluctuation (RMSf), havebeen performed. From the original protein structures, normalized B-factors and

4 BelBI2016, Belgrade, June 2016.

Page 35: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

On flexibility, deformability and mobility ...

relative solvent accessibility have been computed with in-house tools and DSSP[11].

Using this large and diverse set of protein dynamics enables us to efficientlyanalyse the evolution of local protein structures. For instance, at a global level,correlation between normalized B-factors and RMSF is 0.43. Contrastingly, bycomputing the average value per PB (total 16), this correlation enhances to0.98, underlining that PBs are of great interest to analyse protein flexibility.

Interestingly, correlation of Neq values with normalized B-factors and RMSf ismerely 0.41 and 0.46, respectively. This behaviour roots from the fact that Neqencompasses local conformational variations (at residue level) while RMSf iscomputed for the overall structure every time. Therefore, we encounter caseswhere RMSf values are very high (flexibility) while Neq remains low (rigidity).Such cases may look awkward, as flexibility corresponds to huge movementsand so must associate to high Neq values. However, it may happen due to con-fusion of resolving between flexibility, deformability and mobility. Biologically,such cases could correspond to rigid regions (mobility) enclosed between twoflexible regions (deformability).

Protein Blocks are thus of great interest to analyse MD simulation and their cou-pling to various experimental data (e.g. B-factor) allows understanding generaland specific behaviours of local protein conformations found in various proteins.

Many complementary analyses were performed using PBs. Following are someof the primary findings. PB d and m (core of repetitive structures), depict highertendencies to resist changes while many others are less reserved, e.g. PB g (for”coil”) which changes 40% of the times to another PB during the dynamics.Clustering of different behaviours of the PBs has been performed and revealsthat some changes are not expected but often found. In a significant number ofcases, PB g changes to PB p, during majority of the simulation time and evensometimes to PB m. Noteworthy is that, PB p is associated to loops connectingα-helix to β-strand while PB m is denotes stable -helical regions.

We will present here a complete summary of all our results and perspectives ofthis work.

Keywords: bioinformatics, protein structures, secondary structures, flexibility,deformability, statistics, Protein Blocks

Acknowledgments

This work was supported by grants from the French Ministry of Research, Uni-versity of Paris Diderot Paris 7, French National Institute for Blood Transfusion(INTS), French Institute for Health and Medical Research (INSERM). AdB alsoacknowledge to Indo-French Centre for the Promotion of Advanced Research /CEFIPRA for collaborative grants (number 5302-2). This study was supportedby grants from Laboratory of Excellence GR-Ex, reference ANR-11-LABX-0051.The labex GR-Ex is funded by the program Investissements davenir of the French

BelBI2016, Belgrade, June 2016. 5

Page 36: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Tarun Narwani et al.

National Research Agency, reference ANR-11-IDEX-0005-02.

The authors were granted access to high performance computing (HPC) re-sources at the French National Computing Center CINES under grant no. c2013-037147 funded by the GENCI (Grand Equipement National de Calcul Intensif).

References

1. Craveur, P., Joseph, A.P., Esque, J., Narwani, T.J., Nol, F., Shinada, N., Goguet, M.,Leonard, S., Poulain, P., Bertrand, O., Faure, G., Rebehmed, J., Ghozlane, A., Swapna,L.S., Bhaskara, R.M., Barnoud, J., Tltcha, S., Jallu, V., Cerny, J., Schneider, B., Etchebest,C., Srinivasan, N., Gelly, J.-C., de Brevern A.G.: Protein flexibility in the light of structuralalphabets., Frontiers in Molecular Biosciences - Structural Biology, 2:20. (2015)

2. de Brevern, A.G. , Etchebest, C., Hazout, S.: Bayesian probabilistic approach for predictingbackbone structures in terms of protein blocks, Proteins, 41:271-287. (2000)

3. Bornot, A., Etchebest, C., de Brevern, A.G.: A new prediction strategy for long local proteinstructures using an original description, Proteins, 76:570-87. (2009)

4. Joseph, A.P., Agarwal, G., Mahajan, S., Gelly, J.-C., Swapna, L.S., Offmann, B., Cadet, F.,Bornot, A., Tyagi M., Valadi, H., Schneider, B., Etchebest, C., Srinivasan, N., de, Brevern,A.G.: A short survey on Protein Blocks, Biophysical Reviews, 2:137-145. (2010)

5. de Brevern, A.G., Bornot, A., Craveur, P., Etchebest, C., Gelly, J.-C.: PredyFlexy: Flexibilityand Local Structure prediction from sequence, Nucleic Acid Res, 40:W317-22. (2012)

6. de Brevern, A.G., Wong, H., Tournamille, C., Cartron, J.-P., Colin, Y., Le Van Kim, C.,Etchebest, C.: A structural model of seven transmembrane helices receptor, Duffy Antigen/ Receptor for Chemokines (DARC), Biochem Biophys Acta, 1724:288-306. (2005)

7. Jallu, V., Poulain, P., Fuchs, P.F., Kaplan, C., de Brevern, A.G.: Modeling and MolecularDynamics of HPA-1a and -1b Polymorphisms: Effects on the Structure of the 3 Subunit ofthe IIb/3 Integrin, Plos One, 7:e47304. (2010)

8. Jallu, V., Poulain, P., Fuchs, P.F., Kaplan, C., de Brevern, A.G.: Modeling and moleculardynamics simulations of the V33 variant of the integrin subunit 3: structural comparisonwith the L33 (HPA-1a) and P33 (HPA-1b) variants, Biochimie, 105:84-90. (2014)

9. Murzin, A.G., Brenner, S.E., Hubbard, T., Chotia, C.: SCOP: a structural classification ofproteins database for the investigation of sequences and structures, J Mol Biol., 247:536-40. (1995)

10. Pronk, S., Pall S., Schulz, P., Larsson, P., Bjelkmar, P., Apostolov R., Shirts, M.R., Smith,J.C., Kasson, P.M., van der Spoel, D., Hess, B., Lindahl, E.: GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit, Bioinformatics, 29:845-854. (2013)

11. Kabsch, W., Sander, C.: Dictionary of protein secondary structure: pattern recognition ofhydrogen-bonded and geometrical features, Biopolymers 22(12): 2577-2637. (1983)

12. Poulain, P. and collaborators: A program to explore protein structures with Protein Blocks,https://github.com/pierrepo/PBxplore. (2015)

6 BelBI2016, Belgrade, June 2016.

Page 37: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Elemental metabolomics for improving humanhealth

Ping Zhang1, Constantinos Georgiou2, and Vladimir Brusic3

1 Menzies Health Institute Queensland, Griffith University, [email protected]

2 Department of Chemistry, Agricultural University of Athens, [email protected]

3 School of Medicine and Bioinformatics Center, Nazarbayev University, [email protected]

Abstract

Bulk organic elements (C, H, N, and O) make 96% of human body. Other ele-ments essential for life include macroelements (Ca, Cl, K, Mg, Na, P, and S) thatmake 3% of human body, and microelements (Co, Cu, Cr, Fe, I, Mn, Mo, Se, Zn)that are present in lower quantities. A number of elements are present in humanbody at significant quantities (Al, Ba, Rb, and Ti) but have no known biologicalfunction. Some elements (As, Br, Ni, Si, Sn, Sr, V, B, Cd, Li, Pb) are thought to benecessary for optimal functioning and good health of organism since they mod-ulate the function of essential elements. Other elements are present in humanbodies at extremely low (ultratrace) concentrations. Some elements (such as As,Be, Cr, Cd, Hg, Pb) are potent toxins if they exceed homeostatic levels or if theyare present in form of toxic compounds, such as hexavalent chromium, Cr(VI).Elements are bioavailable to humans through food, water, environmental andoccupational exposure, and medical treatment. They circulate, accumulate, ordisperse throughout the environment and the food chain.

The development of mass spectrometry and measurement standards in recentyears have enabled us to precisely measure the quantities of more than 70 traceand ultratrace elements and their isotopes in a variety of inorganic and organicsamples. Elemental profiling has application in food science (assessment of foodquality and safety, nutrition (healthy diet, deficiencies), medicine (screening, di-agnostics, and toxicology), hydrology (safety and health properties of drinkingwater), geology, ecology, environmental science, forensic, and even anthropol-ogy. Elemental profiling has mainly focused on identification of levels of indi-vidual elements in biological samples. Elemental profiles can be used for clas-sification of biological samples. The examples of elemental profile use includedistinguishing wild and domestic rabbit meat, distinguish cherry tomatoes thatoriginate from different geographic regions, organically produced from conven-tionally produced eggs, and identification of profiles characteristic of blood sam-ples from normal, obese, metabolic syndrome, and type 2 diabetics. Elementalprofiling can be used for identification of nutritional needs of individuals anddefining personalized diet. Also, changes in elemental profiles can be monitoredto follow the progression of disease even in pre-clinical stages.

BelBI2016, Belgrade, June 2016. 7

Page 38: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Ping Zhang et al.

Rather than observing the behavior of one element at time, elements must beobserved in a systemic way, conceptually similar to systems biology, so that theeffects of variation of multiple elements can be correlated to observed outcomes,and the intervention can be correlated to the desired outcomes. Advanced statis-tical techniques and machine learning methods are needed to advance this field.The combination of databases and advanced algorithms will enable predictivemodeling that will interpret complex effects based on combination of elementsand their interactions rather than observing the behavior of a single element. El-emental metabolomics is the emerging field that focuses on the study of elemen-tal profiles and their use for advancing applications in health and developmentof living organisms.

8 BelBI2016, Belgrade, June 2016.

Page 39: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Quantifying genome-wide DNA methylation fromMethylCap-Seq data and its applications in cancer

Ralf Bundschuh

The Ohio State University, Department of Physics, Chemistry & Biochemistry, Division ofHematology, USA

[email protected]

Abstract

DNA methylation is an epigenetic mark with direct impact on gene regulationwhich is known to be aberrant in many cancers. There are several techniquesthat allow interrogating DNA methylation on a genome-wide basis. One suchtechnique that presents an especially good tradeoff between diversity of coveredgenomic regions, resolution, and cost compatible with large patient cohorts isthe preferential capture of methylated DNA using the MBD2 domain followedby high throughput sequencing (MethylCap-Seq). However, the readout of thistechnique is relative coverage of different genomic regions and thus quite indi-rect, requiring computational analysis tools to interpret the data. In this talk Iwill present our computational approaches to extracting DNA methylation fromMethylCap-Seq data, which includes quality control, methylation calling on in-dividual CpGs and on predefined genomic regions, global feature methylationsummaries, and identification of significantly differentially methylated genomicregions. I will also present a use case of our approach in a large study of patientswith acute myeloid leukemia (AML).

BelBI2016, Belgrade, June 2016. 9

Page 40: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Infomax strategies for an optimal balance ofexploration and exploitation

Antonio Celani

The Abdus Salam International Centre for Theoretical Physics, Department ofQuantitative Life Sciences, Trieste, Italy

[email protected]

Abstract

Information theory branched out from the science of reliable communication todiverse domains that comprise decision theory, neural and cellular biology. In-fomax postulates that acquisition and transmission of information is a generalfunctional principle, which has been applied to the visual system, transductionpathways, evolution, biological adaptation and regulation, training of neuralnetworks, decision and search processes. While specific applications are success-ful, it remains generally unclear under what conditions information constitutesa valid functional proxy. Here, we consider the classical multi-armed bandit de-cision problem, which features arms (slot-machines) of unknown probabilitiesof success and a player trying to maximize cumulative reward by choosing thesequence of arms to play. The model captures the crux of the dilemma betweenexploitation and exploration, and optimal bounds and strategies are known.We introduce two novel Infomax strategies, Info-id and Info-p, which optimallygather information on the unknown identity of the best arm and on the highestmean reward among the arms, respectively. We investigate analytically and nu-merically the two strategies and compare their performance to optimal bounds.Strikingly, we find that Info-p performs optimally, whilst Info-id is vastly subop-timal, even though it gathers more information on the identity of the best arm.The cost and value of information is quantified via rate-distortion arguments.Results demonstrate the crucial role of the nature of information acquired byInfomax, which suggests new general approaches.

10 BelBI2016, Belgrade, June 2016.

Page 41: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Leveraging Electronic Medical Records forPersonalized and Population Healthcare

Nitesh V. Chawla

University of Notre Dame, IN, [email protected]

Abstract

Personalized healthcare and precision medicine are introducing novel opportu-nities to leverage data about an individual to deliver personalized health pro-files and wellness profiles to an individual. This data includes electronic medicalrecords, genomics, lifestyle, and environmental data. However, there are funda-mental challenges from collecting such data at an individual level to integratingit to developing algorithms and tools to deliver the promise and outcome ofpersonalized healthcare. Our research program is focused on these challenges. Iwill present our research on developing personalized disease risk profiles fromelectronic medical records (EMR), leveraging the phenotypes from EMR to guidegenetic association discovery among diseases, and finally bringing together thespectrum of EMR to lifestyle data to guide a patient-centered population healthmanagement framework.

BelBI2016, Belgrade, June 2016. 11

Page 42: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Adapt or Die. Investigating the molecular basis ofcell variability

Andrea Ciliberto

Quantitative Biology of Cell Division Unit, IFOM, Via Adamello 16, Milan, [email protected]

Abstract

Cancer cells are highly proliferative. Drugs that impair cell proliferation antim-itotic drugs are indeed quite effective in treating cancer. Their mechanism ofaction is quite well understood. Several antimitotic drugs impair microtubulepolymerization and thus do not allow cells to segregate their chromosomes.Cells arrested in their division cycle, however, will not remain arrested forever.Some will die and some will adapt resuming proliferation. The choice betweenthese two fates is stochastic, with virtually identical cells going different paths.With a combination of mathematical models and live-cell imaging, we have ana-lyzed the phenomenon of adaptation, and we have investigated possible sourcesof variability that contributes to determine cell fate under constant drug treat-ment.

12 BelBI2016, Belgrade, June 2016.

Page 43: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Ultrametric Approach to Bioinformation Systems

Branko Dragovich

1 Institute of Physics, University of Belgrade, Belgrade, [email protected]

2 Mathematical Institute SANU, Kneza Mihaila 36, Belgrade, Serbia

Abstract

Ultrametricity is related to metric spaces with utrametric distance, which is char-acterized by strong triangle inequality

d(x, z) ≤ max{d(x, z), d(z, y)}

Ultrametrics is appropriate for description of similarity (nearness) between el-ements of some information sets and in particular, between information in bio-logical systems.

In this talk I will present basic properties of ultrametric distance, and in par-ticular – p-adic one. Then I will show that the set of 64 codons in the geneticcode has an ultrametric structure which is suitably described by p-adic distance,where p=5 and p=2 [1, 2], see also a similar approach [3]. Some other proper-ties of the genetic code can be also expressed in terms of p-adic distance. I willalso discuss some other examples of ultrametric biological information systemsand point out their importance towards foundation of an ultrametric informa-tion theory.

Keywords: ultrametrics, p-adic distance, bioinformation, genetic code

References

1. Dragovich, B., Dragovich, A.: A p-Adic Model of DNA Sequence and Genetic Code. p-AdicNumbers, Ultrametric Analysis and Applications, 1 (1), (2009). arXiv:q-bio.GN/0607018.

2. Dragovich, B., Dragovich, A.: p-Adic Modelling of the Genome and the Genetic Code. TheComuter Journal, 53(4), 432–442 (2010). arXiv:0707.3043[q-bio.OT].

3. Khrennikov, A., Kozyrev, S.: Genetic Code on a Diadic Plane. Physica A: Stat. Mech. Appl.,381, 265–272 (2007).

BelBI2016, Belgrade, June 2016. 13

Page 44: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Molecular mechanism of Aβ amyloid formation

Oxana V. Galzitskaya, Olga M. Selivanova, Alexey K. Surin, Victor V.Marchenkov, Ulyana F. Dzhus, Elizaveta I. Grigorashvili, Mariya Yu. Suvorina,

Anna V. Glyakina, and Nikita V. Dovidchenko

Institute of Protein Research, Russian Academy of Sciences, 142290 Pushchino, MoscowRegion, Russia

[email protected]

Abstract

It has been demonstrated using Aβ40 and Aβ42 recombinant and synthetic pep-tides that their fibrils are formed of complete oligomer ring structures. Such ringstructures have a diameter of about 8-9 nm, the oligomer height of about 2-4 nmand the internal diameter of the ring of about 3-4 nm. Oligomers associate in afibril in such a way that they interact with each other, overlapping slightly. Thereare differences in the packing of oligomers in fibrils of recombinant and syntheticAβ peptides. The principal difference is in the degree of orderliness of ring-likeoligomers that leads to generation of morphologically different fibrils. Most or-dered association of ring-like structured oligomers is observed for a recombinantAβ40 peptide. Less ordered fibrils are observed with the synthetic Aβ42 peptide.Fragments of fibrils the most protected from the action of proteases have beendetermined by tandem mass spectrometry. It was shown that unlike Aβ40, fib-rils of Aβ42 are more protected, showing less ordered organization compared tothat of Aβ40 fibrils. Thus, the mass spectrometry data agree with the electronmicroscopy data and structural models presented in our work.

14 BelBI2016, Belgrade, June 2016.

Page 45: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Epigenetic state and spatial structure of chromatin

Mikhail Gelfand1,2

1 A.A. Kharkevich Institute for Information Transmission Problems, [email protected]

2 Faculty of Bioengineering and Bioinformatics, M.V. Lomonosov Moscow StateUniversity, Moscow, Russia

Abstract

Abstract: Recent advances in large-scale experimental techniques, such as RNA-Seq, ChIP-Seq, HiC and others, provide data for integrated analysis of chromatin3D state, epigenetic markers, and gene expression. Not surprisingly, these turnedout to be highly interlinked. Contacting chromatin regions tend to carry similarhistone modifications and gene experssion in such regions tends to be corre-lated. On a finer scale, topologically associating domains (TADs) also seem todepend on histone modifications and transcription. Indeed, TADs are enrichedin repressive chromatin markers, wheres inter-TAD regions are enriched in activemarkers and highly transcribed genes. Moreover, differences in TAD structure be-tween cell lines are accompanied by corresponding differences in transcription.These observations seem to indicate that gene active expression is the drivingforce behind formation of the TAD structure. Finally, there are preliminary in-dications that regions forming many distant contacts are also enriched in activemarkers and actively transcribed genes.

BelBI2016, Belgrade, June 2016. 15

Page 46: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Dark states in quantum photosynthesis

Sergei Kozyrev

Steklov Mathematical Institute, Department of mathematical physics, Moscow, [email protected]

Abstract

A model of quantum photosynthesis will be discussed. Photosynthesys systemin the model is described by a three-level quantum system (which describesexitons) interacting with three different quantum fields, or reservoirs (light,phonons and the sink field corresponding to absorption of exitons), moreover,one of the levels of the system is degenerate. The degeneracy leads to excita-tion of the so called dark states (dark-state polaritons). Since interactions of thedegenerate state of the system with two different reservoirs are different, thespaces of dark states for these fields are also different. This allows to manipulatethese dark states, in particular, using spectroscopy. We conjecture that this modelgives the description of the known from spectroscopic experiments phenomenonof observation of quantum coherences in photosynthesis systems.

16 BelBI2016, Belgrade, June 2016.

Page 47: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Patient Specific Network Data Integration EnablesPrecision Medicine in Cancer

Noel Malod-Dognin

Department of Computing, Imperial College London, London United [email protected]

Abstract

Motivation. We are faced with a flood of molecular and clinical networked data.The recent advances in experimental technologies have resulted in the accumu-lation of large amounts of patient-specific “omics” and clinical datasets, whichprovide complementary information on the same disease. The challenge is howto mine these complex data systems to gain new insight into diseases and to im-prove therapeutics, in particular in the context of highly heterogeneous diseasessuch as cancer.

Method. We introduce a versatile data fusion (integration) framework that caneffectively integrate somatic mutation data, molecular interaction networks anddrug chemical data to address three key challenges in cancer research: (1) strati-fication of patients into groups having different clinical outcomes, (2) predictionof driver genes whose mutations trigger the onset and development of cancers,and (3) repurposing of drugs treating particular cancer patient groups. Our newframework is based on graph-regularised non-negative matrix tri-factorization, amachine learning technique for co-clustering heterogeneous datasets. We applyour framework on ovarian cancer data to simultaneously cluster patients, genesand drugs by utilising all datasets.

Results. We demonstrate superior performance of our method over the state-of-the-art method, Network-based Stratification, in identifying three patient sub-groups that have significant differences in survival outcomes and that are ingood agreement with other clinical data. Also, we identify potential new drivergenes that we obtain by analysing the gene clusters enriched in known drivers ofovarian cancer progression. We validated the top scoring genes identified as newdrivers through database search and biomedical literature curation. Finally, weidentify potential candidate drugs for repurposing that could be used in treat-ment of the identified patient subgroups by targeting their mutated gene prod-ucts. We validated a large percentage of our drug-target predictions by usingother databases and through literature curation.

BelBI2016, Belgrade, June 2016. 17

Page 48: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Biophysical models of protein evolutionarydynamics

Alexandre Morozov

Rutgers University, [email protected]

Abstract

Abstract: High-throughput sequencing and other modern molecular biology toolshave made it possible to track organismal evolution in unprecedented detail. Asa result, we are now closer to understanding the fundamental principles involvedin genome-scale evolution of proteins and protein interaction networks. Protein-protein interactions mediate numerous cellular processes, including metabolism,immune response, signaling, replication, and gene regulation. These interactionscan rapidly evolve in response to perturbations in the protein physico-chemicalenvironment, such as changes in the concentration or chemical composition ofthe protein’s binding targets. Several recent studies have underscored the piv-otal role of folding stability in protein evolutionary dynamics. Here I will focuson how structural coupling between folding and binding gives rise to evolu-tionary coupling between the traits of folding stability and binding strength.Using evolutionary models inspired by protein biophysics, I will show how theseprotein traits can emerge as evolutionary spandrels, that is, features that areby-products of the selection on some other trait, rather than direct targets ofadaptation. In particular, proteins can evolve strong binding interactions thathave no functional role but merely serve to stabilize the protein if its misfoldingis deleterious. Furthermore, such proteins may have divergent fates, evolving tobind or not bind their targets depending on random mutational events. These ob-servations may explain the abundance of apparently non-functional interactionsamong proteins assayed using high-throughput protein-protein binding screens.For the common class of proteins with both functional binding and deleteriousmisfolding, evolution appears to be predictable at the level of biophysical traits:adaptive paths are constrained to first gain extra folding stability and then par-tially lose it as the novel binding function emerges, as frequently observed inprotein engineering experiments. Overall, our findings lead to improved un-derstanding of evolution of proteins and protein interaction networks in bothcellular and in vitro contexts.

18 BelBI2016, Belgrade, June 2016.

Page 49: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

What is bioinformatics made from: understandingdatabase and software usage through literature

mining

Goran Nenadic

1 School of Computer Science, University of Manchester, Institute of Biotechnology &Health eResearch Centre, Manchester, UK

[email protected] Mathematical Institute of SASA, Belgrade, Serbia

Abstract

Computional resources such as databases and software are central to bioinfor-matics research. They are often described within the biomedical literature, eitherwhen introduced to the community or when used as part of the methods. Usingtext mining to process the entire available literature could help reveal the pat-terns of database and software usage. Our group has developed a methodologyto identify such resource mentions in full-text articles and construct networksof resources that can indicate their links and relative usage, both over time andwithin the sub-disciplines of bioinformatics, biology and medicine. For exam-ple, the bioinformatics literature has a high variability of new resources as novelresource development takes place, while database and software usage within bi-ology and medicine is more stable and conservative. Half of all mentions referto only 133 resouces (top 5%), which seem to represent the core of the currentbioinformatics. In some sub-disciplines, top 100 resources account for 96% ofall mentions in the literature. While such resources could be interpreted as aproxy definition of a particualar area, it is apparent that many long-establishedresources are seeing a steady decline in their usage (e.g., BLAST, SWISS-PROT)while some are instead seeing rapid growth (e.g., the GO, R). We will illustratethe changes in the bioinfromatics resourceome by looking into specific journalsand examining the ’long-tail’ of resources that are infrequently mentioned.

BelBI2016, Belgrade, June 2016. 19

Page 50: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

A Quantum Approach to the DNA Functioning

Argyris Nicolaidis

Aristotle Univertsity of Thessaloniki, Theoretical Physics Department, [email protected]

Abstract

We prime the notion that DNA is an information processing system, receivingregistering transferring information. In the pursuit of an inherent logic in DNAfunctioning, we explore the possibility that quantum logic might serve this pur-pose. We use the quantum formalism to describe the DNA dynamics and as abyproduct we obtain the DNA vacuum. The DNA vacuum, in clear analogy tothe quantum vacuum, is a collection of virtual DNA bases. An essential aspectof the DNA functioning is the complementarity relation R, which binds the pairsA-T, G-C, and generates the replication process. Further in an effort to codifyDNA, we introduce a numbering, assigning a specific natural number to eachindividual DNA strand. This numbering allows a quantitative measure of thedifference among the various DNA strands. Considering also that the four DNAbases constitute an ”alphabet”, we may assume the task to examine if DNA is a”language”.

20 BelBI2016, Belgrade, June 2016.

Page 51: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Effectiveness of Multiple Blood CleansingInterventions in Sepsis

Zoran Obradovic

Laura H. Carnell Professor of Data Analytics Data Analytics and Biomedical InformaticsCenter, Computer and Information Sciences Department, Statistics Department Temple

University, PA, [email protected]

Abstract

epsis is a serious, life-threatening condition that presents a growing problem inmedicine, but there is still no satisfying solution for treating it. Several bloodcleansing approaches recently gained attention as promising interventions thattarget the main site of problem development the blood. The focus of this studyis an evaluation of the theoretical effectiveness of hemoadsorption therapy andpathogen reduction therapy. This is evaluated using the mathematical model ofMurine sepsis, and the results of over 2,200 configurations of single and multipleintervention therapies simulated on 5,000 virtual subjects suggest the advantageof pathogen reduction over hemoadsorption therapy. However, a combinationof two approaches is found to take advantage of their complementary effectsand outperform either therapy alone. The conducted computational experimentsprovide unprecedented evidence that the combination of two therapies synergis-tically enhances the positive effects beyond the simple superposition of the ben-efits of two approaches. Such a characteristic could have a profound influenceon the way sepsis treatment is conducted.

Results reported in this talk are published at April 21 issue of Scientific Re-ports by Nature Publishing Group and are obtained in collaboration with IvanStojkovic, Mohamed Ghalwash and Xi Hang Cao. The study is funded by DARPADialysis-like Therapy program.

BelBI2016, Belgrade, June 2016. 21

Page 52: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Comparative analysis of plant genome structureand antisense transcripts

Salwa E.S. Mohamed1, Oxana B. Dobrovolskaya2,3, Vladimir N. Babenko2,KhaledSalem1, Ming Chen4, and Yuriy L. Orlov2,3

1 Genetic Engineering and Biotechnology Research Institute, Sadat City, Egypt2 Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia

3 Novosibirsk State University, Novosibirsk, [email protected]

4 Zhejiang University, Hangzhou, China

Abstract

Analysis of next generation sequencing data on plants genomes in specializeddatabases is challenging problem of computer genomics. Pairs of RNA moleculestranscribed from partially or entirely complementary loci are called cis-naturalantisense transcripts (cis-NATs), and they play key roles in the regulation of geneexpression in many organisms including plants. A promising experimental toolfor profiling sense and antisense transcription is strand-specific RNA sequenc-ing. Earlier, identification of chromatin signature of cis-NATs in Arabidopsis in-dicated a connection between cis-NAT transcription and chromatin modificationin plants. An analysis of small-RNA sequencing data showed that 4% of cis-NATpairs produce putative cis-NAT-induced siRNAs. To meet issues of statistical anal-ysis of plant genome sequencing data we developed set of computer programsto define antisense transcripts and miRNA genes based on available sequencingdata. Text complexity as a measure of context dependencies was applied for nu-cleotide sequences containing antisense transcripts in plants, as previously wedid it for monomer analysis. We had search for homological regulatory regionsin model plant genome organisms. We have analyzed data from PlantNATsDB(Plant Natural Antisense Transcripts DataBase) which is a platform for annotat-ing and discovering NATs by integrating various data sources (Chen et al., 2012).It contains about 70 plant species. The database provides an integrative, inter-active and information-rich web graphical interface to display multidimensionaldata, and facilitate research and the discovery of functional NATs. Available in-formation for the transcription factors for each species was retrieved from thePlant Transcription Factor Database. The phenomenon of antisense transcriptionand miRNA interference need further annotation in new sequenced genomes.We have compared gene structure in natural cis-antisense transcript for wheatand related plant genomes taking to account genes responsible for stress toler-ance.

Keywords: plant genomes, transcription, sequencing, databases, wheat, genomics

AcknowledgementsThe work is supported in part by RFBR (15-04-05371; 16-54-53064).

22 BelBI2016, Belgrade, June 2016.

Page 53: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Network Data Integration Enables PrecisionMedicine

Natasa Przulj

Computer Science Department, University College London, United [email protected]

Abstract

We are faced with a flood of molecular and clinical data. Various biomoleculesinteract in a cell to perform biological function, forming large, complex systems.Large amounts of patient-specific datasets are available, providing complemen-tary information on the same disease type. The challenge is how to mine thesecomplex data systems to answer fundamental questions, gain new insight intodiseases and improve therapeutics. Just as computational approaches for ana-lyzing genetic sequence data have revolutionized biological understanding, theexpectation is that analyses of networked ”omics” and clinical data will havesimilar ground-breaking impacts. However, dealing with these data is nontrivial,since many questions we ask about them fall into the category of computation-ally intractable problems, necessitating the development of heuristic methodsfor finding approximate solutions.

We develop methods for extracting new biomedical knowledge from the wiringpatterns of large networked biomedical data, linking network wiring patternswith function and translating the information hidden in the wiring patterns intoeveryday language. We introduce a versatile data fusion (integration) frame-work that can effectively integrate somatic mutation data, molecular interac-tions and drug chemical data to address three key challenges in cancer research:stratification of patients into groups having different clinical outcomes, predic-tion of driver genes whose mutations trigger the onset and development of can-cers, and re-purposing of drugs for treating particular cancer patient groups.Our new methods stem from network science approaches coupled with graph-regularised non-negative matrix tri-factorization, a machine learning techniquefor co-clustering heterogeneous datasets.

BelBI2016, Belgrade, June 2016. 23

Page 54: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Intrinsically disordered protein families

Marco Punta

Centre for Evolution and Cancer The Institute of Cancer Research, London, UnitedKingdom

[email protected]

Abstract

Intrinsically disordered proteins have been reported to be on average less con-served in sequence than the structured regions of the proteome. As a conse-quence, a lot of effort has focused on the identification of short (often < 10aa) conserved linear motifs known to be carrying diverse functional traits suchas, for example, post-translational modification sites and fold-upon-binding in-teraction regions [1]. Such short linear motifs can arise independently in non-homologous sequences. At the same time, a number of longer, evolutionary re-lated, conserved disordered regions are known and have been referred to as’disordered domains’ [2]. Some of these have already been integrated into pro-tein family databases such as Pfam. In this work, we annotate a set of yet un-classified long homologous intrinsically disordered regions (disordered families)within the UniProtKB database. We generate multiple sequence alignments foreach family and look for evidence of their functions in the literature.

References

1. Dinkel et al. Nucleic Acids Res. 44:D294-300 (2016)2. Tompa et al. Bioessays. 31:328-35 (2009)

24 BelBI2016, Belgrade, June 2016.

Page 55: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

The Influence of Copy-Number MaintenanceMechanisms of Targeted Extrachromosomal Genetic

Elements on the Outcome of CRISPR-Cas Defense

Konstantin Severinov1,2, Iaroslav Ispolatov3, and Ekaterina Semenova1

1 Waksman Institute of Microbiology, Rutgers, the State University of New Jersey,Piscataway, NJ 08854, USA

[email protected] Skolkovo Institute of Science and Technology, Skolkovo 143025, Russia3 Department of Physics, University of Santiago de Chile, Santiago, Chile

Abstract

Prokaryotic type I CRISPR-Cas systems respond to the presence of mobile geneticelements such as plasmids and phages in two different ways. CRISPR interfer-ence efficiently destroys foreign DNA harbouring protospacers fully matchingCRISPR RNA spacers. In contrast, even a single mismatch between a spacer and aprotospacer can render CRISPR interference ineffective but causes primed adap-tation - efficient and specific acquisition of additional spacers from foreign DNAinto the CRISPR array of the host. It has been proposed that the interferenceand primed adaptation pathways are mediated by structurally different com-plexes formed by the effector Cascade complex on matching and mismatchedprotospacers. We will review experimental evidence and present a simple math-ematical model that shows that when plasmid copy number maintenance/phagegenome replication is taken into account, the two apparently different outcomesof the CRISPR-Cas response can be accounted for by just one kind of effectorcomplex on both targets. The results underscore the importance of considera-tion of targeted genome biology when considering consequences of CRISPR-Cassystems action.

BelBI2016, Belgrade, June 2016. 25

Page 56: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Symmetry and minimum principle: a basis for thegenetic code ?

Paul Sorba

Laboratory of Theoretical Physics and CNRS, Annecy, [email protected], [email protected]

Abstract

The importance of the notion of symmetry in physics is well established: couldit also be the case for the genetic code? In this spirit, a model for the GeneticCode based on continuous symmetries and entitled the ”Crystal Basis Model”has been proposed a few years ago and applied to different problems, such asthe elaboration and verification of sum rules for codon usage probabilities, rela-tions between physico-chemical properties of amino-acids and some predictions[1]. Defining in this context a ”bio-spin” structure for the nucleotids and codons,the interaction between a couple codon - anticodon can simply be representedby a (bio) spin- spin potential. Then, imposing the minimum energy principle,an analysis of the evolution of the genetic code can be performed with goodagreement with the generally accepted scheme. A more precise study of this in-teraction model provides informations on codon bias, consistent with data [2].

This work is made in collaboration with A.Sciarrino, Universit di Napoli, Italy.

References

1. see for ex. L.Frappat, A.Sciarrino and P.Sorba , J.Biol.Phys. 27, 1-34 (2001); ibid. 28,17-26(2002); Phys.lett.A311, 264-269 (2003)

2. A.Sciarrino and P.Sorba, BioSystems 107, 113-117 (2012); ibid. 111, 175-180 (2013); ibid..141, 20-30 (2016).

26 BelBI2016, Belgrade, June 2016.

Page 57: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Networks of Co-expression Modules

Aleksandar Stojmirovic

Janssen R & D, LLC, Systems Pharmacology & Biomarkers, Immunology TA,Pennsylvania, United States of America

[email protected]

Abstract

We present an approach to reduce the complexity of human tissue transcrip-tomic datasets by constructing networks of co-expression modules. We illustrateour proposed method using public data sets: three large datasets generated fromliver, omental adipose and subcutaneous adipose samples collected from mor-bidly obese subjects, and a dataset of terminal ileum biopsies taken from pe-diatric subjects. Providing scaffolds for projection of data from other sources,module networks facilitate integrative analyses and provide insight into biolog-ical functions and cell compositions of the profiled tissues.

BelBI2016, Belgrade, June 2016. 27

Page 58: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Algebraic Topology Analysis of Brain GraphsEmanating from Social Communications

Bosiljka Tadic and Miroslav Andjelkovic

Jozef Stefan Institute, Department of Theoretical Physics, Ljubljana, [email protected]

Abstract

In recent years, mapping the brain imaging data onto brain networks and theobjective analysis using graph theory methods provided a new framework forbetter understanding the functional brain connections, e.g., related to infor-mation processing, cognitive control, the perception of space, time, numbers,and languages, or the presence of disease [1–3]. On the other hand, the archi-tecture of brain connections underlying human social behavior remains largelyunexplored. Here, we use algebraic topology of graphs to analyze higher orderstructures occurring in the functional brain networks in spoken communications.In particular, we consider the correlations among sets of EEG signals recordedduring the speakerlistener communications [4, 5]. The analysis reveals the or-ganization of the active areas in the speakers and listeners brain as well as thecomposition of the cross-brain correlations. The higher-order structures are rec-ognized by the presence of simplexes (cliques of potentially high dimension,topological level) and their complexes. The structural complexity of these brainnetworks is quantified by the number of simplexes and shared faces at each topo-logical level and the entropy related with the nodes population at each level. Weshow how the shifts in these topology measures vary with the quality of thespeakerlistener communication, which depends on the communicated content.

Keywords: functional brain networks, EEG data, algebraic topology of graphs

References

1. Sporns, O.: Structure and function of complex brain networks. Dialogues Clin. Neurosci.,Vol. 15, No. 3, 247-262, (2013)

2. De Vico Falani, F., Richiardi, J., Chavez, M., Achard, S.: Graph analysis of functional brainnetworks: practical issues in translational neuroscience, arXiv:1406.7931

3. Zeng L.-L. et al., Identifying major depression using whole-brain functional connectivity: amultivariate pattern analysis, Brain, Vol.? (2012)

4. Kuhlen, A.K. et al., Content-specific coordination of listeners’ to speakers’ EEG during com-munication, Frontiers Human Neurosci., Vol. 6, 266 (2012)

5. M. Andjelkovic et al. Towards understanding the social impact to functional brain connec-tions and the formation of super-structures. Preprint (2016).

28 BelBI2016, Belgrade, June 2016.

Page 59: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

The role of structural disorder in proteindegradation

Peter Tompa

VIB Structural Biology Research Center, Vrije Universiteit Brussel, [email protected]

Abstract

Structurally disordered proteins (IDPs) are prevalent in the proteome and oftenfunction by partner recognition and induced folding. Frequently, their recogni-tion elements are comprised of a short sequence of residues, termed EukaryoticLinear Motifs (ELMs). ELMs represent an underappreciated functional elementof the proteome, because their study lags far behind that of domains. Here Iwill give an overview of the motif field, including an assessment of the totalnumber of motifs in the human proteome [1], followed by the analysis the roleof motifs in protein degradation. Protein turnover is regulated by specific sig-nals (degrons), which we suggest to have a ”tripartite” nature [2, 3]. Tripar-tite degrons comprise: (1) a primary degron that specifies substrate recognitionby cognate E3 ubiquitin ligases, (2) secondary site(s) comprising a single, ormultiple neighboring, poly-ubiquitinated lysine(s), and (3) a segment that initi-ates substrate unfolding at the 26S proteasome. By collecting and analyzing allrelevant cases (124 instances of 18 degron types), we show that primary andsecondary degrons are short motifs that tend to fall into locally disordered re-gions, whereas the tertiary degron is a disordered segment in the vicinity of thesecondary one that is responsible for effective proteasomal engagement. The im-portance of degron motifs in disordered regions is shown by the high incidenceof their disease-causing mutations and their involvement in protein degradationmediated by mono-ubiquitination [4].

References

1. Tompa, P., Davey, N. E., Gibson, T. J., and Babu, M. M. (2014) A million peptide motifs forthe molecular biologist. Mol. Cell 55: 161-169.

2. Guharoy, M., Bhowmick, P., Sallam, M. and Tompa, P. (2016) Tripartite degrons confer di-versity and specificity on regulated protein degradation in the ubiquitin-proteasome system.Nature Comm. 7: 10239.

3. Guharoy M, Bhowmick P, Tompa P. (2016) Design principles involving protein disorder fa-cilitate specific substrate selection and degradation by the ubiquitin-proteasome system. JBiol Chem. [Epub]

4. Braten O. et al. (2016) Numerous proteins with unique characteristics are degraded by the26S proteasome following monoubiquitination. Cell [submitted]

BelBI2016, Belgrade, June 2016. 29

Page 60: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Non-globular proteins: Towards an understandingof the ”dark matter” in the protein universe

Silvio C.E. Tosatto

Dept. of Biomedical Sciences, University of Padova, [email protected]

Abstract

Non-globular proteins (NGPs) encompass different molecular phenomena thatdefy the traditional sequence-structure-function paradigm. NGPs include intrin-sically disordered regions, tandem repeats, aggregating domains, low-complexitysequences and transmembrane domains. Although growing evidence suggeststhat NGPs are central to many human diseases, functional annotation is verylimited. It was recently estimated that close to 40% of all residues in the humanproteome lack functional annotation and many of these are NGPs.

Several computational developments in the field of NGPs will be discussed. TheMobiDB (Potenza et al., NAR database issue 2015; http://mobidb.bio.unipd.it/)and RepeatsDB (Di Domenico et al., NAR database issue 2014; http://mobidb.bio.unipd.it/) databases have been recently established to annotate intrinsicallydisordered and structurally repeated proteins respectively. Both can be easily ac-cessed through web services. A large-scale analysis of intrinsic disorder data hasshown interesting differences among both predictors and experimental sources.Preliminary data on tandem repeat structures also helps explain their lack ofannotation in protein domain databases. Last but not least, a newly establishedEuropean research network focusing on NGPs aims to bring light into this (dark)corner of the protein universe.

30 BelBI2016, Belgrade, June 2016.

Page 61: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Intrinsically disordered proteins in salted waterand in the thick soup

Vladimir N. Uversky1,2,3

1 Department of Molecular Medicine, University of South Florida, Tampa, [email protected]

2 Department of Biology, Faculty of Science, King Abdulaziz University, Jeddah, SaudiArabia

3 Laboratory of Structural Dynamics, Stability and Folding of Proteins, Institute ofCytology, Russian Academy of Sciences, St. Petersburg, Russian Federation

Abstract

Intrinsically disordered proteins (IDPs) lack stable tertiary and/or secondarystructure under physiological conditions in vitro. Computational studies revealedthat they are highly abundant in nature, as 25-30% of eukaryotic proteinsare mostly disordered, and > 50% of eukaryotic proteins and > 70% of sig-naling proteins have long disordered regions. Often, these IDPs are involvedin regulation, signaling and control pathways, where binding to multiple part-ners and high-specificity/low-affinity interactions play a crucial role. It is sug-gested that functions of IDPs may arise from the specific disorder form, frominter-conversion of disordered forms, or from transitions between disorderedand ordered conformations. The choice between these conformations is deter-mined by the peculiarities of the protein environment, and many IDPs possessan exceptional ability to fold in a template-dependent manner. IDPs are highlyabundant among hub proteins. They are associated with alternative splicing.This association helps proteins to avoid folding difficulties and provides a novelmechanism for developing tissue-specific protein interaction networks. Numer-ous IDPs are commonly associated with such devastating maladies as cancer,cardiovascular disease, amyloidoses, neurodegenerative diseases, and diabetes.Novel strategies for drug discovery are based on these proteins. The vast ma-jority of in vitro experiments with IDPs are traditionally performed under therelatively ideal thermodynamic conditions of low protein and moderate saltconcentrations. However, the concentration of macromolecules, including pro-teins, nucleic acids, and carbohydrates, within a cell can be as high as 400 g/L,creating a crowded medium, with considerably restricted amounts of free wa-ter. The volume occupied by the macromolecular co-solutes is unavailable toother molecules, giving rise to the so-called excluded volume effects. Althoughit is believed that excluded volume can affect the behavior of biological macro-molecules, and protein-protein interactions, the accumulated data support thenotion that many IDPs preserve their mostly disordered state in crowded envi-ronment.

Disclaimer: This work was supported in part by a grant from the Russian ScienceFoundation RSCF No. 14-24-00131.

BelBI2016, Belgrade, June 2016. 31

Page 62: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Transcription factors interaction inference based onsequence feature representations

Nevena Veljkovic

Center for Multidisciplinary Research, Institute of Nuclear Sciences Vinca, University ofBelgrade, Mihaila Petrovica Alasa 14, 11001, Belgrade, Serbia

[email protected]

Abstract

Being central to most biological processes protein protein interactions (PPI) rep-resent an important class of targets for human therapeutics. Transcriptional reg-ulation which occurs mostly via PPI and which is often deregulated in cancerand complex diseases is at the forefront of this type drug discovery. Understand-ing how biomolecules recognize each other complements information on proteinbinding in a way that brings us necessary insights into how both high affinity andhigh specificity are achieved. Long-range intermolecular interactions play an im-portant role in the recognition between protein macromolecules and drugs andtherapeutic targets. In my speech, I will describe computational methods for PPIprediction that consider long-range interaction properties of the sequence. Pre-dictors based on spectral representation of a sequence and pseudo amino acidcomposition that efficiently decipher PPI involved in transcriptional regulationwill be presented.

32 BelBI2016, Belgrade, June 2016.

Page 63: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

DNA polymorphism as a tool for geneticinformation implementation

Sergey N. Volkov

Bogolyubov Institute for Theoretical Physics, NAS of Ukraine, Kiev, 03680 [email protected]

Abstract

Accuracy of genetic information implementation in living cells is largely due tothe peculiarities of the structure and variability of DNA double helix. The reg-ulation of genetic activity, stability and security of genetic texts, reading andtranslation of genetic information, all of these important biological processestake place because of the unique properties of the DNA double helix, which dis-tinguish them from other cellular molecules. One of these key properties of DNAmolecule is the polymorphism of double helix, through which this molecule hasthe ability to change its secondary structure under the influence of some externalfactors or depending on the nucleotide sequence. Arising in this case localizeddeformations provide a broad palette of tools in the processes of genetic informa-tion realization. Besides, the restructuring of the double helix under thresholddeformation of DNA allow preserving the genetic texts and protecting them frompossible emergencies in the cell.

The role of localized and threshold deformations caused by the polymorphicproperties of the double helix is under thorough research lately [1, 2]. Thesedeformations have sufficiently large amplitude of structural element deviationsfrom their equilibrium positions in the double helix and therefore cannot be un-derstood in terms of the model of elastic rod that is fair for the study of DNAmechanics in harmonic approximation. On the other hand the all-atomic mod-eling cannot frequently explain the mechanism of complex processes of DNAdeformations.

In the report the approach for consideration of conformational depended defor-mations of DNA macromolecule is presented. The transformation of DNA struc-ture is considered in the frame of two-component model. One model component(external) describes the macromolecule deformation as in the model of elasticrod, another component (internal) - the conformation changes of the macro-molecule monomer units. Both components are considered as interconnectedon the paths of certain conformational transformation [3, 4].

The developed approach allows studying the physical mechanisms of the lo-calized restructuring of the double helix due to the action of small molecules,regulatory proteins, and external forces on DNA structure. The obtained resultsgive a consistent interpretation of the observed effects of the deformability ofTATA-box, A-tract, allosteric proximal and distinct effects, and also the threshold

BelBI2016, Belgrade, June 2016. 33

Page 64: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Sergey N. Volkov

character of DNA unzipping and overstretching. The approach provides the pos-sibility to predict the sizes and energies of local deformation of the double helixat the location of some definite nucleotide sequences by them conformationalstates.

Theoretical study of threshold deformations in DNA unzipping and followingmolecular dynamics simulations [5] confirm the opinion that double-strandedDNA in solution is a highly organized system with definite degrees of protec-tion of its structure against extreme situation. The results obtained could alsobe useful for the development of modern technologies in the field of molecularmedicine, and DNA-based engineering as well.

References

1. Ch. Prevost, M. Takahashi, R. Lavery, ChemPhysChem, 10 (2009) 1399.2. P.D. Dans et al., Nucleic Acids Research, 40 (2012) 10668.3. S.N. Volkov, Bioph. Bulletin, 7 (2000) 7; Ibid, 12 (2003) 5; J. Biol. Phys., 31(2005) 323.4. P.P. Kanevska, S.N. Volkov, Ukr. J. Phys., 51 (2006) 1001.5. S.N. Volkov, A.V. Solovyov, Eur. Phys. J. D., 54 (2009) 657; S.N. Volkov et al., J. Phys.:

Condens. Mat. 24 (2012) 0351043.

34 BelBI2016, Belgrade, June 2016.

Page 65: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

OrthoDB: an evolutionary perspective tointerpreting genomics data

Robert M. Waterhouse, Evgenia V. Kriventseva, and Evgeny M. Zdobnov

University of Geneva Medical School & Swiss Institute of Bioinformatics, RueMichel-Servet 1, 1211 Geneva, Switzerland

{Robert.Waterhouse,Evgenia.Kriventseva,Evgeny.Zdobnov}@unige.ch

Abstract

The OrthoDB [1] hierarchical catalog of orthologs represents a comprehensiveresource of comparative genomics data that delineates the evolutionary his-tories of millions of genes from thousands of bacteria (3669) and hundredsof plants (33), fungi (227), and animals (331). Users may browse the cata-log at www.orthodb.org to view extensive mapped gene functional annotationsand quantified evolutionary traits. To facilitate large-scale evolutionary and/orfunctional genomics research projects, dynamic data queries may be performedthrough the dedicated application programming interface, or the OrthoDB soft-ware may be employed to compute tailored orthology datasets. Additionally,OrthoDBs sets of Benchmarking Universal Single-Copy Orthologs, BUSCOs [2],provide a rich source of data to assess the quality and completeness of genomeassemblies and their gene annotations.

OrthoDB resources and tools enable extensive orthology-based genome annota-tion and interpretation in a comparative genomics framework that incorporatesthe growing numbers of sequenced genomes. Orthology is a cornerstone of com-parative genomics, and such approaches are well-established as immensely valu-able for gene discovery and characterization, offering evolutionarily-qualifiedhypotheses on gene function by identifying ”equivalent” genes in different species.Orthology-based approaches therefore provide an important evolutionary per-spective to interpreting the increasing quantities of genomics data, and OrthoDBoffers both the ability to run custom analyses and to query extremely compre-hensive sets of orthology classifications.

Keywords: OrthoDB, BUSCO, orthology, gene function, gene evolution, genome

References

1. Kriventseva, EV., Tegenfeldt, F., Petty, TJ., Waterhouse, RM., Simo, FA., Pozdnyakov, IA.,Ioannidis, P., Zdobnov, EM.: OrthoDB v8: update of the hierarchical catalog of orthologsand the underlying free software. Nucleic Acids Res. 43(Database issue):D250-6. (2015)

2. Simo, FA., Waterhouse, RM., Ioannidis, P., Kriventseva, EV., Zdobnov, EM.: BUSCO: as-sessing genome assembly and annotation completeness with single-copy orthologs. Bioin-formatics. 31(19):3210-2. (2015)

BelBI2016, Belgrade, June 2016. 35

Page 66: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

From biocuration to model predictions and back

Ioannis Xenarios

SIB Swiss Institute of Bioinformatics, Center for Integrative Genomics, University ofLausanne, Switzerland

[email protected]

Abstract

We are in a time where sequencing is the paramount of biological and medicalresearch (at least thats what the press is talking about), but piecing the genomicinformation from the regulation to its functional impact is a major challengefor biology. My presentation will describe the work of unsung heroes that arepainstakingly biocurating the scientific literature and creating resources suchas UniProtKB/Swiss-Prot and other that are making the life easier for hundredof thousands of scientists. I will then present some applications that use theseresources in the system modeling arena, demonstrating that these models couldbe predictive and useful for targeting certain type of experimental design anddiscover novel treatments.

The presentation will also stress the importance of proper infrastructure andexpertises that are needed to enable such type of research as well as the clearnecessity of continued international collaboration to achieve these goals.

36 BelBI2016, Belgrade, June 2016.

Page 67: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

SPEAKERS IN SESSIONS

Page 68: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in
Page 69: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Bioinformatics Basis for the ”Molecular Tweezers”Construction

Anastasia Anashkina1 and Alexei Nekrasov2

1 Engelhardt Institute of Molecular Biology, Russian Academy of Sciences,Vavilovstr.,32, 119991 Moscow, Russia

[email protected] M.M. Shemyakin and Yu.A. Ovchinnikov Institute of Bioorganic Chemistry, Russian

Academy of Sciences, Miklukho-Maklaya str., 16/10, 117997 Moscow, Russiaalexei [email protected]

Abstract. The aim of this work is the creation of a potential field, whichdescribes the interaction between amino acid residues. We used the previ-ously proposed in ANIS method basic unit of the protein sequence which isa block of five adjacent amino acid residues. We introduced a new classifi-cation of amino acid residues (information type) depending on the residueposition respect to the local extrema of the occupancy profile of the pro-tein sequences. We have calculated 20x20 contacts matrices for each pairsof informational type of residues and for distances between Cα atoms ofcontacting residues from 3 A up to 15 A. The resulting set of matrix has a37-fold excess of ”information importance” than previously known matrixof contacts. The proposed approach makes it possible to design ”moleculartweezers”, which can be attached to a variety of molecular identificationsystems, such as green fluorescent protein.

Keywords: ANIS method, informational structure, contact matrices

1. Introduction

Potential fields that effectively describe the interactions between protein mole-cules are very important as for basic science and for applications. The aim of thiswork is the creation of a potential field, which describes the interaction betweenamino acid residues and that has more ”informative value” than potential fieldby the Voronoi-Delaunay method. To solve this problem, we used the previouslyproposed in ANIS method basic unit of the protein sequence which is a blockof five adjacent amino acid residues [1]. This block has been named the infor-mation unit of protein, and it has been shown [1, 2], that information units candetermine the structural organization of the local polypeptide chain. The densityof the information units in a protein sequence is determining the effectivenessof interactions within the polypeptide chain [3].

2. Classification of Amino Acid Residues

In this paper, we introduced a new, additional, classification of amino acid residues(information type) depending on the residue position respect to the local ex-trema of the occupancy profile of the protein sequences [1]. Figure 1 shows

BelBI2016, Belgrade, June 2016. 39

Page 70: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Anashkina Anastasia et al.

the amino acid sequence of the protein and the corresponding ”occupancy pro-file” of information units (see [4] for description of the ANIS method). We haveconsidered residue position at the points of local maxima and minima of theoccupancy profile, two residues adjacent right and left of these positions, andresidues outside local extrema of the occupancy profile (eleven informationaltypes of residue).

Fig. 1. Classification of amino acids according to the local extrema of the occu-pancy profile. M is amino acid residue at the local minimum position; ML1, MR1are positions of residues shifted one position to the left or right from the mini-mum; ML2, MR2 are residue positions shifted two positions to the left or rightfrom the minimum; P is amino acid residue position at the local maximum of oc-cupancy profile; PL1, PR1 are residue positions shifted one position to the left orright from the maximum; PL2, PR2 are residue positions shifted two positionsto the left or right from the maximum; U is the other positions of occupancyprofile.

3. Contact Matrices

We have calculated the information structure of each protein chain sequence (oc-cupancy profiles) from the set of 11,000 protein-protein complexes. All residueswere classified by information type according to the residue position in the occu-pancy profile. In this paper we defined that the amino acids are in contact, if thedistance between atoms of these amino acids (except hydrogen) is in the intervalfrom 2.0 to 3.4A. Such a ”hard” condition was used to avoid accidental contacts.Matrix of contacts calculated for all possible pairs of residues information types,in total 66 matrices of 20x20. Contact maps (Figure 2) show that the changeof information type changes the surface describing the frequency of contactsbetween residues. Thus residues information type is an important factor deter-mining the specificity of the interaction between amino acid residues in protein-protein complexes. ”Information value” of the contacts matrices were comparedwith a matrix obtained by Voronoi Delaunay tessellation and it was shown thatthe average information entropy for the contacts matrices of is 0.2, and for ma-trix obtained by Voronoi-Delaunay tessellation [4, 5] is 7.5. This means that the

40 BelBI2016, Belgrade, June 2016.

Page 71: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Bioinformatics Basis for ...

Fig. 2. Contact matrices for pairs of residues with different informational type:residues located at position in the local minimum ML1 with residues in positionsML2, ML1, M, MR1, MR2, - top row, PL2, PL1, P, PR1, PR2 bottom row). Contactmatrices are shown as contour lines in a logarithmic scale.

addition of information types for residues increases the specificity of contact inthe matrices describing the interaction in protein-protein complexes.

BelBI2016, Belgrade, June 2016. 41

Page 72: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Anashkina Anastasia et al.

4. Conclusions

The best tool for specific protein binding is monoclonal antibodies. But the bind-ing site in the target protein cannot be determined by the investigator and areselected by immune system. We offer a tool for designing specific binding site forrecognition of researcher-defined area on the surface of the target protein. Theproposed approach makes it possible to design ”molecular tweezers”, which canbe attached to a variety of molecular identification systems, such as green fluo-rescent protein. Thus, it is possible to identify individual molecules in biologicalsystems. In addition, choosing a binding site in the target protein, it is possibleto block certain of its function. It may also be an important tool in biomedicalresearch and medical practice. The data obtained in the studies opens up newopportunities to create fundamentally new artificial proteins for binding andregulation in protein engineering and various biotechnological applications.

Acknowledgments

This work was supported by the Russian Foundation for Basic Research, thegrant 15-04-99605a and by grant of RAS Presidium program of fundamentalresearch in strategic directions of science development ”Fundamental problemsof mathematical modelling” (Program code: II.4), project ”Mathematical modelof natural polypeptide chains spatial organization, based on information contentof protein sequence”.

References

1. Nekrasov AN, Anashkina AA, Zinchenko AA. A new paradigm of protein structural organi-zation. Institute of Physics, Belgrade. (2014)

2. Nekrasov AN. Entropy of Protein Sequences: An Integral Approach. J. Biomol. Struct. Dyn.Vol. 20, 8792. (2002)

3. Nekrasov AN, Zinchenko AA. Structural Features of the Interfaces in Enzyme-InhibitorComplexes. J. Biomol. Struct. Dyn. Vol. 28, 8596. (2010)

4. Anashkina A, Kuznetsov E, Esipova N, Tumanyan V. Comprehensive statistical analysisof residues interaction specificity at proteinprotein interfaces. Proteins Struct. Funct. Bioin-forma. Vol. 67, 106077. (2007)

5. Medvedev N. The algorithm for three-dimensional voronoi polyhedra. J. Comput. Phys. Vol.67, 2239. (1986)

42 BelBI2016, Belgrade, June 2016.

Page 73: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Clustering of CpG-rich elements in gene denseregions

Vladimir Babenko, Irina Chadaeva, and Yuriy. Orlov

Institute of Cytology and Genetics, Lavrentieva str 10, Novosibirsk, 630090, [email protected]

Abstract

Due to the fourfold depletion of CG dinucleotides in human genome causedby targeted methylation they represent a highly specific marker for open chro-matin. We sought to elucidate its functional relevance by considering the loca-tion specifics of CG-rich clusters that are CpG islands (CGIs) and Alu retrotrans-posons.

Chromosome wise resolution that displays the genes and CGI association wasreported previously. We report the strong domain wide association of genes andCGIs across 30000 100kb non-overlapping bins. Nearly a half of genome is voidof both genes and CGIs (43%), while 33% bins contain both elements, so the dis-cordant bins comprise only 24% of genome implying high significance of theirnon-random association.

We ascertained that the major cause of this correlation is the joint affinity tochromatin accessibility, assessed as Dnase Hypersensitive Sites (DHS) densityand chromatin state. Both genes and CGIs demonstrate high correlation withDHS and open state chromatin distribution genome wide.

Alu clusters also demonstrated distinct affinity to open chromatin. Using chro-matin signatures inferred from topologically associated domains comprising 3dmap of human genome [1], we elucidated that genes, CGIs and cg-rich AluY arepreferentially clustered in gene dense chromatin of A1 type.

Besides non-methylated, mostly promoter linked CpG islands which are inher-ently associated with DHS, we found that methylated CGIs also maintain strongaffinity to the accessible chromatin and DHS hotstpots, implying that the vastmajority of them maintains the functionality.

The striking massive instances of highly clustered CG-rich elements are under-scored by chromosome 19, which features 2.5 fold densities of genes, CGIs andAlus compared with the closest chromosome density. Notably, skipping A2, onlyA1 gene dense open chromatin type is present on chromosome 19, while thesingle alternative A2 is 1.5 times more abundant genome wide [1]).

While the phenomenon of gc-rich gene dense regions has long been appre-hended, we approached it using specific distribution patterns and large scalechromatin analysis.

Genes, CGIs, and Alus elevated density in open chromatin implies complex in-teractions of them in the process of gene functioning. As one of elaborations of

BelBI2016, Belgrade, June 2016. 43

Page 74: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Vladimir Babenko et al.

large scale viewpoint, we speculate on the observation that both 5 and 3 gene re-gions are encompassed with CG- rich stretches in gene dense regions specifically.We discuss how it may accommodate their expression in a range of ways.

References

1. Rao SS, Huntley MH, Durand NC, Stamenova EK, Bochkov ID, Robinson JT, Sanborn AL,Machol I, Omer AD, Lander ES, Aiden EL. A 3D map of the human genome at kilobaseresolution reveals principles of chromatin looping. Cell. 2014;159(7):1665-1680

44 BelBI2016, Belgrade, June 2016.

Page 75: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

SNP-Based Noninvasive Prenatal Screening usingCell-Free DNA for Detection of Fetal Chromosome

Abnormalities

Milena Banjevic, Allison Ryan, and Styrmir Sigurjonsson

Natera Inc., 201 Industrial Rd, San Carlos, CA 94070 [email protected]

Abstract

A singlenucleotide polymorphism (SNP)-based noninvasive prenatal test (Pano-ramaTM, Natera, San Carlos, CA) detects aneuploidy in cell-free DNA from ma-ternal blood as early as nine weeks gestation. Fetal fraction limit for aneuploidydetection is 2.8% (measured post-PCR in the sequencing data). Over 1000 testsper day are performed internationally, with overall accuracy of aneuploidy de-tection > 99.5% and fetal sex determination accuracy of 100%. In this test, SNPsare amplified using a targeted massively multiplexed PCR (mmPCR) approach,and then sequenced using next-generation sequencing (NGS). Nateras propri-etary model of allelic data takes into account sample, amplification, sequencingand SNP target set characteristics (process characteristics). Using this model,fetal chromosomal copy numbers are inferred from the allelic sequencing datausing iterative variational Bayesian and maximum likelihood methods. Here wewill demonstrate adherence of the model to real world data based on a dataset with known copy numbers and associated mother and child genotypes. Wepresent the performance of the algorithm as a function of key sample character-istics for a large real sample data set with known chromosomal copy numbers.We will also show the performance of the algorithm in a simulated environmentin which we can perform stress tests on a wide range of process characteristics.

Keywords: non-invasive prenatal testing, variational Bayesian methods, NIPT

References

1. Zimmermann, B., Hill, M. et al. (2012) Noninvasive prenatal aneuploidy testing of chromo-somes 13, 18, 21, X, and Y, using targeted sequencing of polymorphic loci. Prenat. Diagn.,32:12331241

2. RyanA.,HunkapillerN.,Banjevic M et al. (2015) Validation of an Enhanced Version of aSingle-Nucleotide Polymorphism-Based Noninvasive Prenatal Test for Detection of FetalAneuploidies. Fetal Diagnosis and Therapy.

3. http://www.natera.com/science-informatics

BelBI2016, Belgrade, June 2016. 45

Page 76: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Achieving a rapid expression of toxic (but useful)molecules within cell

Bojana Blagojevic1 and Magdalena Djordjevic1 and Marko Djordjevic2

1 Institute of Physics Belgrade, University of Belgrade, Pregrevica 118, 11080 Belgrade,Serbia

{bojanab,magda}@ipb.ac.rs2 Institute of Physiology and Biochemistry, Faculty of Biology, University of Belgrade,

Studentski trg 16, 11000 Belgrade, [email protected]

Abstract

Restriction-modification system (RM) is a rudimental bacterial immune system,whose main ingredients are the restriction enzyme (R), which cuts specific DNAsequences, and the methyltransverase (M), which methylates and consequentlyprotects the same DNA sequences from cleavage. While R is useful as it cancut the virus DNA, it is also potentially toxic as it can cut the unprotected hostgenome, so that R and M expression is tightly controlled by a control (C) protein.

We developed a biophysical model of gene expression regulation in RM sys-tems, and applied it to EcoRV, which has divergent RC and M promoters [1];this is, to our knowledge, the first quantitative model of divergent system archi-tecture. The main feature of EcoRV is that RC and M promoters overlap, which,in addition to C protein binding, controls the system transcription. We show thatEcoRV features meet three design principles that we propose: the time-delayedexpression of R with respect to M, the fast transition of R from OFF to ON state,and the increased steady-state stability of R. We show that perturbing EcoRVfeatures leads to diminishing the design principles, and moreover consistentlyincreases M to R ratio, preventing balancing the toxic molecule and its anti-dote. Based on the analysis of R-M system control, we propose a novel syntheticgene circuit [2], which combines a transcription control of R-M systems, withthe transcript processing exhibited by CRISPR/Cas systems [2]. Our goal is topropose an optimal strategy for rapidly generating toxic molecules within a cell[2].

Keywords: R-M systems, divergent promoters, transcript processing

References

1. Rodic A., Blagojevic B., Zdobnov E., Djordjevic M., Djordjevic M.:Design principles ofrestriction-modification systems: ensuring safe and efficient host establishment., submit-ted (2016).

2. Rodic A., Blagojevic B., Zdobnov E., Djordjevic M., Djordjevic M:, to be submitted (2016).3. Djordjevic M, Djordjevic M, Severinov K., Biology Direct, 7:24 (2012).

46 BelBI2016, Belgrade, June 2016.

Page 77: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Non-negative Matrix Factorization for IntegrativeClustering of Bioinformatics Data

Sanja Brdar

BioSense - Institute for research and development of information technology inbiosystem, University of Novi Sad, Serbia

[email protected]

Abstract

n bioinformatics, integrative approaches are motivated by the desired improve-ment of robustness, stability and accuracy. Clustering, the prevailing techniquefor preliminary and explorative analysis of experimental data in genomics, maybenefit from integration across multiple partitions. Different partitions can be in-ferred from different initialization, algorithms, parameters, features subsamples,items subsamples, similarity/distance functions or heterogeneous data sources.To overcome users’ dilemma of selecting data partition among many possible, wedeveloped a technique that infers separate clusters from diverse inputs and thenfuses them by means of non-negative matrix factorization (NMF). The proposedfusion technique is evaluated within the scope of functional genomics where itcontributes to an increase of the quality of clusters with respect to enrichmentof their associated gene function. The landscape of integrative clustering algo-rithms is further explored by comprehensive comparison of the partitions gen-erated by NMF and 5 alternative ensemble algorithms on 30 cancer genomicsmicroarrays . Here, on high-dimensional microarray data, integrative clusteringenhances the stability of final clusters that correspond to different types or sub-types of cancer. Finally, the current research on regularized NMF for integrativeclustering will be presented, as well as possible applications on the analysis ofmetagenomic data where the microbial diversity assessment may also benefitfrom ensemble clustering .

BelBI2016, Belgrade, June 2016. 47

Page 78: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Radiation Induced Dysfunctions in the WorkingMemory Performance Studied by Neural Network

Modeling

Aleksandr Bugay

Joint Institute for Nuclear Research, Laboratory of Radiation Biology, Moscow, Russiabugay [email protected]

Abstract

The synchronization of neuronal activity within a specific network is required forcognitive performance. Normal performance of neural network may be disturbedby various external factors. Among them galactic cosmic radiation remains oneof the poorly studied while providing a potential risk for central nervous systemin long-term space travel. In ground-based experiments, exposure to heavy ionradiation induces pronounced deficits in cognitive functions [1].

Biological neural network simulation have been applied recently for the quan-tification of related phenomena in hippocampus [2]. We study neural activityin the prefrontal cortex that is responsible for short-term retention of informa-tion about the object (object working memory). The model neural network con-tains two principal types of cells – pyramidal neurons (excitatory population)and interneurons (inhibitory population), connected to each other by synapseswith GABA, AMPA and NMDA receptors. Further we apply phenomenologicalapproach by using interpolated values of dose-dependent changes in basic struc-tural elements of neurons (synaptic receptors, ion channels, etc) according toknown experimental data. The simulation of network spatiotemporal dynamicswas performed for simple cognitive tasks. It is demonstrated, that radiation-induced alterations in the properties of synaptic receptors cause loss of stabilityfor specific patterns of activity. This instability arises at the excess of thresholdradiation dose.

Proposed theoretical approach provides a tool for the estimation of cognitiveimpairments caused by ionizing radiation.

Keywords: biological neural networks, radiation biology

References

1. Greene-Schloesser, D., et al.: Radiation-induced brain injury: a review, Frontiers in oncol-ogy, Vol. 2, 118 (2012).

2. Sokolova, I.V., et al.: Proton radiation alters intrinsic and synaptic properties of CA1 pyra-midal neurons of the mouse hippocampus, Radiation Research, Vol. 183, 208218 (2015).

48 BelBI2016, Belgrade, June 2016.

Page 79: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Transcriptome data mining results supportobserved changes in host lipid metabolism during

experimental toxoplasmosis

Milos Busarcevic1,2 and Aleksandar Trbovich1, Ivan Milovanovic1, AleksandraUzelac1, Olgica Djurkovic-Djakovic1

1 Center of Excellence for Food- and Vector-borne Zoonoses, Institute for MedicalResearch, University of Belgrade, Dr. Subotia 4, 11129 Belgrade, Serbia

2 United World College of the Adriatic, via Trieste 29, 34011 Duino, [email protected]

Abstract

Toxoplasma gondii is considered one of the most successful parasites on Earthdue to its omnipresence and widest array of hosts, including all mammals. Thegenus comprises a single species infective for all hosts, with limited genetic di-versity in Europe and North America where all isolates belong to three clonalgenotypes (type I, II and III). However, a wider genetic diversity characterizedby non-clonal, atypical strains is found in South America and Africa, and isthought to be related to the presence of diverse Felidae as the only definitivehost in which sexual reproduction, and consequentially, genetic recombinations,occur. In intermediate hosts, T. gondii occurs in two forms, the metabolicallyactive rapidly proliferating tachyzoite which characterizes acute infection andthe (so-called) metabolically inert encysted bradyzoite, characteristic of chronicinfection; the parasite readily converts between the two in response to the hos-pitality or hostility of the host environment (mostly depending on the immuneresponse) but is never eliminated from the infected host.

Human infection is widespread but not clinically significant (mild and self-limi-ting) except in populations with an incompetent immune system such as theunborn baby and immunosuppressed individuals, such as those infected by HIVor organ and tissue transplant recipients, in which it may cause life-threateningdisease. Treatment options have not much advanced for decades and there isstill no drug able to eliminate encysted parasites, thus there is an urgent needfor new drugs.

Interestingly, T. gondii is not capable of synthesizing cholesterol (Chl), and thusdepends on uptake of host Chl for its own development (1). We thus aimedto investigate Chl metabolism during T. gondii infection in the hope of find-ing prospective new drug targets. The aim of this study was to examine theeffects of T. gondii on Chl metabolism in murine models of acute and chronictoxoplasmosis at the biological and molecular level. For this purpose, we havemined seven published microarray datasets of murine brain homogenates andlymphocytes (from peripheral blood or peritoneum) during acute infection withT. gondii type I, II and III strains (2, 3), for the expression levels of genes relevant

BelBI2016, Belgrade, June 2016. 49

Page 80: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Milos Busarcevic et al.

for Chl metabolism, including its biosynthetic pathway and export (KEGG path-ways). Experimental validation of these findings was performed by assessing theserum lipid status during acute and chronic murine toxoplasmosis, as well as byanalyzing the transcript levels of relevant genes in brain and liver homogenates.

In the brain of mice infected with type II parasites, the data (day 8 post in-fection, p.i.) revealed down-regulation of most of the 13 genes from the Chlbiosynthetic pathway starting from farnesyl-PP, and of two transcriptional acti-vators of this pathway, Srebf1 and Srebf2. On the other hand, Soat1 and Soat2,as well as Cyp7b1 and Abcg5 were upregulated. While up-regulation of Soat1and Soat2 reflects an increase in Chl esterification, up-regulation of Cyp7b1,which catalyzes the first reaction in the Chl catabolic pathway of extrahepatictissues towards bile acids, and of bile acid export protein Abcg5, reflect removalof Chl from a metabolically active pool. The expression profile of these genesin murine brain during infection with type I parasites (d5 p.i.) was similar butless pronounced when compared to type II infection. Peripheral lymphocytesseemed to show a similar expression profile, since down-regulation of severalgenes from the Chl biosynthesis pathway and up-regulation of genes involved inits esterification were observed during infection with both type I and type II par-asites. Furthermore, in infection with type II parasites, up-regulation of Abcg4,a Chl export protein, and of liver X receptor alpha (Nr1h3), were revealed. Fi-nally, in peritoneal cells of mice infected with T. gondii type I, II and III parasites,mining for differentially expressed genes involved in Chl biosynthesis and trans-port revealed an over two-fold increase in several genes, including Hmgcr, Fdft1,Sqle and Ldlr, while the expression of ApoE was reduced by more than six-fold.These results may be interpreted as decreased expression of genes involved inChl biosynthesis and increased expression of genes involved in Chl esterificationand transport outside of the brain. All the described changes may have as an endresult a drop in cell Chl content.

Experimental validation of the results of data mining showed that in a murinemodel of toxoplasmosis induced by type II parasites, acute infection (d14 p.i.)was associated with a decrease in the transcription of genes involved in Chlbiosynthesis in both the brain and the liver. In contrast, in chronic infection(d42 p.i.), an increase in Chl metabolism was observed in both tissues. This wasassociated with changes at the biological level as well, as we observed a decreasein total serum Chl and HDL levels in acute infection, while both were unchangedin chronic infection.

In summary, the decrease in Chl content in both the brain and periphery (liver,peritoneal lymphocytes), and the decrease in Chl reverse transport we observedin acute T. gondii infection, correspond to the gene expression data obtained viadata mining. We propose that the observed changes in Chl metabolism are partof the host defense response. In acute infection, the host responds by an attemptto deprive the parasite of Chl, necessary for tachyzoite proliferation and devel-opment. In contrast, in chronic infection, when the parasite has converted intoits metabolically less active cyst form, these metabolic changes are not promi-nent because of the established balance between the parasite and its host. It thus

50 BelBI2016, Belgrade, June 2016.

Page 81: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Transcriptome data mining results support ...

seems that Chl metabolism during T. gondii infection may be a novel target fortherapeutic agents and should be further investigated.

Keywords: cholesterol, data mining, murine infection, Toxoplasma gondii, tran-scriptome

References

1. Coppens, I., Sinai, AP., Joiner, KA: Toxoplasma gondii Exploits Host Low-Density Lipopro-tein Receptor-Mediated Endocytosis for Cholesterol Acquisition. J. Cell. Biol. 149,1:167-180 (2000)

2. Hill, RD., Gouffon, JS., Saxton, AM., Su, C: Differential Gene Expression in Mice Infectedwith Distinct Toxoplasma Strains. Infect. Immun. 80,3:968-974 (2012)

3. Jia, B., Lu, H., Liu, Q., Yin, J., Jiang, N., Chen, Q: Genome-wide comparative analysisrevealed significant transcriptome changes in mice after Toxoplasma gondii infection. Par-asit.Vectors. 4,6:161 (2013)

BelBI2016, Belgrade, June 2016. 51

Page 82: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Genome-scale Modelling, Metabolomics andCheminformatics analysis guiding the Discovery of

Antifungal Metabolites for Crop Protection

Miroslava Cuperlovic-Culf

National Research Council of Canada, Department for Information CommunicationTechnologies, Ottawa, Canada

[email protected]

Abstract

Fusarium head blight (FHB), also known as scab or tombstone, is a devastatingdisease of wheat, barley, oats and other small-grain cereals as well as corn causedprimarily by Fusarium graminearum. Several cultivars of wheat have developedsome level of resistance to FHB. Resistance to this fungal pathogen includes spe-cific metabolic responses to inoculation. A number of published metabolomicsstudies have determined major metabolic changes induced by pathogen in re-sistant and susceptible plants. Functionality of the majority of these metabolitesin resistance remains, however, unknown. In this work we have made a com-pilation of all metabolites determined to selectively accumulate following FHBinoculation in resistant plants. Characteristics as well as possible functions andtargets of these plant metabolites are investigated using cheminformatics ap-proaches. A particular focus has been on the likelihood of these metabolitestargeting specific proteins and acting as drug-like molecules. Interesting tar-gets in Fusarium graminearum have been determined using COBRA analysis ofgenome-scale model of growth and toxin production in Fusarium graminearum.Results of computational analyses of binding properties of several representativemetabolites to homology models of these target proteins are presented. Activ-ity of several of these compounds has been experimentally confirmed in fungalgrowth inhibition assays.

52 BelBI2016, Belgrade, June 2016.

Page 83: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Analysis of network structural characteristicsthrough vertex characteristics in directed networks

Tamara Dimitrova

Macedonian Academy of Sciences and Arts, Research Center for Computer Science andInformation Technologies, Skopje, Macedonia

[email protected]

Abstract

We suggested a unified framework for introducing novel and describing somewell-known similarity characteristics of networks. These characteristics are com-puted for effective brain networks and their correlations are found from whichconclusions about the brain networks are derived.

BelBI2016, Belgrade, June 2016. 53

Page 84: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

A new random-walk-based approach for findingco-expression modules in biological networks

Natasa Djurdjevac Conrad

Zuse Institute Berlin, [email protected]

Abstract

Co-expression modules are sets of biological entities (such as genes or proteins)which interact with each other and are having highly correlated expression pat-terns. Changes of activity in these modules can be used as robust biomarkers forearly diagnosis or sub-type classification of major diseases.

We consider the problem of finding co-expression modules in undirected net-works, i.e. how to identify connected groups of nodes with similar and relativelyhigh (wrt. to the rest of the network) node weights. Compared to classical mod-ule identification, this task is more complex in the sense that it combines networktopology with the imposed node information.

In this talk, we will present our novel method for analyzing such networks basedon a new type of time-continuous random walk (RW) processes [1], with tran-sition rules that take into account both node weights and a node’s neighbor-hood. We will show that for such a process, co-expression modules correspondto metastable sets which - in contrast to standard spectral clustering approaches -leads to much more prominent gaps in the spectrum of the adapted process. Thisenables better identification of metastable sets. We will discuss dynamical prop-erties of the new RW process and show how they contribute to co-expressionmodule identification, improving upon previous methods [2]. Finally, we willpresent our recent biological results that can be achieved with our method inthe context of cancer analysis based on NGS data on the STRING PPI network.

Keywords: co-expression modules, network analysis, time-continuous randomwalk, metastable sets

References

1. Sarich, M. and Conrad Djurdjevac, N. et al.: Modularity revisited: A novel dynamics-basedconcept for decomposing complex networks. Journal of Computational Dynamics, 1(1):191-212, (2014).

2. Komurov, K. et al.: Use of data-biased random walks on graphs for the retrieval of context-specific networks from genomic data. PLoS computational biology, 6(8):e1000889, (2010).

54 BelBI2016, Belgrade, June 2016.

Page 85: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Improving 1NN strategy for classification of someprokaryotic organisms

Milana Grbic1, Aleksandar Kartelj2, Dragan Matic1 and Vladimir Filipovic2

1 Faculty of Science and Mathematics, University of Banja Luka, Mladena Stojanovica 2,78000 Banja Luka, Republic of Srpska, Bosnia and Herzegovina

[email protected], [email protected] Faculty of Mathematics, University of Belgrade, Studentski trg 16, 11000 Belgrade,

Serbia [email protected], [email protected]

Abstract

Classification algorithms are intensively used in discovering new informationin large sets of biological data. During the classification process, the classifieruses a set of training instances with known classes in order to learn how to pre-dict the class of an instance with an unknown class. For classifying biologicaldata, a number of commonly used classification tools exists. However, in clas-sification tasks which invole nominal attributes, these tools often do not obtainresults of satisfying quality, since mathematical operations and relations can notbe directly applied to symbolic values. As a consequence, the classifiers ignorenominal attributes and form the classification model based solely on numericalattributes, which leads to inaccurate and unreliable results.

This problem often appears in the K-nearest neighborhood (KNN) classification.KNN is based on a distance function that measures the difference or similaritybetween two instances. In KKN, there is an assumption that the class of a test in-stance is equal to the most frequent class of the nearby instances with respect todistance function, e.q. Euclidean distance function. When the problem includesmany nominal attributes, the standard Euclidean distance can become burdenedby the large number of irrelevant attributes consequently producing inaccurateclassification results. In these cases, if a KNN classifier is to be applied, a newdistance function between attributes needs to be defined.

A dataset of prokaryotic organisms, analysed in this paper, contains total of 30attributes, from which 11 attributes are nominal. Earlier experiments indicatedthat common used classification tools, which use NN strategy, mostly ignorenominal attributes and forms the classification based on only numerical ones.The classification task therefore becomes innacurate.

In this paper we examine several metrics which can be applied to nominal at-tributes of the analysed dataset, and for each metric we apply the appropriate1NN strategy. Additionally, we perform the attribute selection by formulating itas an optimization problem and solving it with Electromagnetism (EM) meta-heuristic algorithm. The proposed EM uses 1NN as an underlying classifier andimplements the precisely adjusted operators for the optimization process.

BelBI2016, Belgrade, June 2016. 55

Page 86: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Milana Grbic et al.

In order to justify the proposed approach, comprehensive experiments are per-formed on the dataset of prokaryiotic organisms. The obtained results are com-pared with the results of other classification methods from literature.

Keywords: bioinformatics; classification; nearest neighbor; data mining

56 BelBI2016, Belgrade, June 2016.

Page 87: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Identifying relevant positions in proteins by CriticalVariable Selection

Silvia Grigolon

Lincoln’s Inn Fields Laboratory, The Francis Crick Institute, London, United [email protected]

Abstract

Evolution in its course found a variety of solutions to the same optimisationproblem. The advent of high-throughput genomic sequencing has made avail-able extensive data from which, in principle, one can infer the underlying struc-ture on which biological functions rely.

In this talk, I will present a new method aimed at extracting sites encodingstructural and functional properties from a set of protein primary sequences,namely a Multiple Sequence Alignment [1].The method, called Critical Vari-able Selection, is based on the idea that subsets of relevant sites correspond tosubsequences that occur with a particularly broad frequency distribution in thedataset. By applying this algorithm to in silico sequences, to the Response Reg-ulator Receiver and to the Voltage Sensor Domain of Ion Channels, I will showthat this procedure recovers not only information encoded in single site statis-tics and pairwise correlations but it also captures dependencies going beyondpairwise correlations. The method proposed here is complementary to Statisti-cal Coupling Analysis [2], in that the most relevant sites predicted by the twomethods markedly differ. We find robust and consistent results for datasets assmall as few hundred sequences, that reveal a hidden hierarchy of sites that isconsistent with present knowledge on biologically relevant sites and evolution-ary dynamics. This suggests that Critical Variable Selection is able to identify in aMultiple Sequence Alignment a core of sites encoding functional and structuralinformation.

References

1. S. Grigolon, S. Franz, M. Marsili, Mol. BioSyst., 2016, DOI: 10.1039/C6MB00047A.2. N. Halabi, O. Rivoire, S. Leibler, R. Ranganathan, Cell, 138(4):774-86, 2009.

BelBI2016, Belgrade, June 2016. 57

Page 88: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Transcription initiation by alternative sigma factors

Jelena Guzina and Marko Djordjevic

Faculty of Biology, University of Belgrade, Studentski trg 16, 11000 Belgrade, Serbia{jelenag,dmarko}@bio.bg.ac.rs

Abstract

ECF (ExtraCytoplasmic Function) subfamily is the largest and most diverse groupof alternative σ factors within σ70 protein family [1]. Although physiologicallyhighly important, the mechanisms of transcription initiation for ECF σ factorsare poorly examined. Namely, the current paradigm of ECF σ functioning, whichassumes promoter rigidness/absence of mix-and-matching, is based on a verylimited data, centered around a subset of (canonical) σ factors with experimen-tally established promoter recognition specificity.

To gain a more comprehensive insight into the ECF σ functioning, besides canon-ical, we also investigate much less studied ECF σ subgroups, and the groupoutliers obtained by recently sequenced bacteriophages [2]. More precisely, byemploying extensive computational comparison of diverse ECF σs and their cor-responding promoters, we aim inferring DNA and protein recognition motifsinvolved in transcription initiation.

The analysis identifies the -10 element extension in phage ECF σ promoters,where a comparison with bacterial σ factors points to a putative 6-aa motif justC-terminal of domain σ2, responsible for this interaction. Interestingly, similarprotein motif is found C-terminal of domain σ2 in canonical ECF σ factors, at aposition suitable for interaction with a conserved DNA motif further upstreamof -10 element. Moreover, phiEco32 ECF σ lacks recognizable -35 element andσ4 domain, identified in a homologous phage 7-11, indicating that -35 elementinteractions can be compensated by the extended -10 element [3].

Overall, our results reveal a larger flexibility in ECF σ promoter recognitionthan previously recognized. The putative non-canonical σ-promoter interactions,along with promoter element complementation, implies a possibility that mix-and-matching mechanism - hallmark of the σ70 housekeeping factors - may alsoapply to ECF group.

Keywords: ECF sigma, bacterial promoters, transcription initiation, σ70 family

References

1. Staro A, Sofia HJ, Dietrich S, Ulrich LE, Liesegang H, Mascher T. 2009. Molecular microbi-ology 74:557-581.

2. Guzina J, Djordjevic M., BMC Evolutionary Biology 15: S(1) (2015).3. Guzina J, Djordjevic M., submitted for publication (2016).

58 BelBI2016, Belgrade, June 2016.

Page 89: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

The influence of amino acids physicochemicalproperties and frequencies on identifying MHC

binding ligands

Davorka R. Jandrlic1, Nenad S. Mitic2, and Mirjana D. Pavlovic3

1 Faculty of Mechanical Engineering, Department for Mathematics, University ofBelgrade, Serbia

[email protected] Faculty of Mathematics, University of Belgrade, Studentski trg 16 11000 Belgrade,

[email protected]

3 Institute of General and Physical Chemistry, Studentski trg 12/V, Belgrade, Serbia11000 Belgrade, Serbia

[email protected]

Abstract

Binding of proteolyzed fragments of proteins to MHC molecules is essential andthe most selective step that determines T cell epitopes. Therefore, predictionof MHC-peptide binding is principal for anticipating potential T cell epitopesand is of the immense relevance in vaccine design. Large quantity of proteinfragments, experimentally tested as potential epitopes, and MHC allele poly-morphism, have prompted the development of many computational methodsfor epitope identification, thus reducing laboratory work and costs. Althoughsome available methods, have reasonable accuracy, there is no guarantee thatall models produce good quality predictions [1]. Here, new models for quantita-tively and qualitatively predicting MHC-binding ligands that use different aminoscids properties, are presented. The models were made through two steps. Inthe first step, a new approach for identifying the most relevant physicochemicalproperties, for classification of peptides into MHC-binding ligands or non bind-ing ligands, is presented. For that purpose, classification models that take intoaccount the physicochemical properties of amino acids and their frequencies, aredeveloped. The developed classification models are rule based and use k-meansclustering technique for extracting the most important physicochemical proper-ties. The obtained results indicate that the physicochemical properties of aminoacids contribute significantly to the peptide-binding affinity and that the differ-ent alleles are characterized by a different set of the physicochemical properties.In the second step, results from these models are used as input features to twomachine learning models, based on support vector machine technique for classi-fication and regression problem. The resulting models have shown comparableperformance, or in some cases better than two of the currently best available pre-dictors: NetMHCpan and SMMPMBEC [2]. The new models could be used ascomplement to the best existing methods.

BelBI2016, Belgrade, June 2016. 59

Page 90: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Davorka Jandrlic et al.

Keywords: MHC binding prediction, encoding schemas, k-mean clustering, SVMclassification and regression

References

1. Brusic V., Bajic V.B., Petrovsky N.,Computational methods for prediction of T-cell epi-topesa framework for modelling, testing, and applications, Methods, 34(4), 436-443,(2004)

2. Nielsen M., Zhang H., Lundegaard C., Pan-specific MHC class I predictors: a bench-mark of HLA class I pan-specific prediction methods, Bioinformatics, (2009)

60 BelBI2016, Belgrade, June 2016.

Page 91: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Networks of interaction in moving animal groupsand collective changes of direction

Asja Jelic

The Abdus Salam International Centre for Theoretical Physics (ICTP), Department forQuantitative Life Sciences, Trieste, Italy

[email protected]

Abstract

Animal groups on the move are a paradigmatic example of collective behaviourin social species. The most striking features of this collective motion are suddencoherent changes in the travel direction of the whole group.

Such a coordinated collective behaviour requires fast and robust transfer of in-formation among individuals in order to prevent cohesion loss. However, little isknown about the mechanism by which natural groups achieve this robustness.Furthermore, collective directional switching often emerges not as a responseto an external alarm cue, but spontaneously from the intrinsic fluctuations inindividual behaviour. In particular, it is not yet clear the role of the underlyingstructure of the communication network in these events.

In this talk, I will present an experimental and theoretical study of spontaneouscollective turns in natural flocks of starlings [1, 2]. We automatically track the3D positions and velocities of all individuals in flocks of up to 600 birds for thewhole duration of a turning event [3]. This enables us to analyse the changesin the individual behaviour of every group member and reveal the emergent dy-namics of turning. We show that spontaneous turns start from the individualslocated at the elongated tips of the flocks, and then propagate across the wholegroup. We find that birds on the tips deviate from the mean direction much morepersistently than other individuals, indicating that persistent localized fluctua-tions are a trigger for collective directional switching. Moreover, our analysisreveals two crucial ingredients which enhance the effect of such noise leadingto collective changes of state: the non-symmetric nature of interaction betweenindividuals and the presence of heterogeneities in the topology of the network.

References

1. Attanasi, A. et al.: Information transfer and behavioural inertia in starling flocks. NaturePhysics 10, 691–696 (2014)

2. Attanasi A. et al.: Emergence of collective changes in travel direction of starling flocks fromindividual birds’ fluctuations. J. Royal Soc. Interface 12 (108), 20150319

3. Attanasi, A. et al.: GReTA – a novel Global and Recursive Tracking Algorithm in threedimensions. IEEE Trans. Pattern Anal. Mach. Intell., vol.37 (2015)

BelBI2016, Belgrade, June 2016. 61

Page 92: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Filtering of repeat sequences in genomes

Ana Jelovic1, Milos Beljanski2, and Nenad Mitic3

1 Faculty of Transport and Traffic Engineering, Univeristy of Belgrade,305 Vojvode Stepe, 11000 Belgrade, Serbia

[email protected] Institute of General and Physical Chemicstry, Studentski trg 12,

11000 Belgrade, [email protected]

3 Faculty of Mathematics, University of Belgrade, Studentski trg 16,11000 Belgrade, [email protected]

Abstract

Finding repeat sequences in nucleic acids and proteins is of great importance inbiology. A number of tools are able to efficiently extract these sequences. If wesearch for repeated sequences in a completely random computer-generated se-quence of any meaningful length we will still find a large number of matches. Ex-tracting all repeated sequences from a genome will find a mixture of sequencesthat are important for its function and organization and randomly occurring se-quences that are effectively noise.

We developed a method for efficiently estimating the probability of a group offound repeated sequences being randomly occurring, and an accompanying pro-gram that finds and then filters the found repeated sequences based on the givenprobability threshold. What makes our method different from existing ones isthat we don’t group the results by repeat length only but also by number ofoccurrences. Even short repeated sequences that happen many times may bestatistically significant, or longer repeated sequences occurring just a few timesmay not be. For the large number of repeated sequences that can be found in agenome if the minimal sequence length is relatively low, our method provides asignificant gain in performance and quality of results compared to outputting allthe found sequences.

The method can be applied to both nucleic acids and protein sequences. Wehave found that, as previously expected, longer repeated sequences mostly havehigher probability that they are statistically significant, but also counterintu-itively that for some viruses, for example, shorter repeated sequences are moreimportant than the longer ones.

62 BelBI2016, Belgrade, June 2016.

Page 93: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Could integrative bioinformatic approach predictthe circulating miRs that have significant role in

pancreatic tissue in type 2 diabetes?

Ivan Jovanovic1, Maja Zivkovic1, Jasmina Jovanovic2, Tamara Djuric1, andAleksandra Stankovic1

1 VINCA Institute of Nuclear Sciences, University of Belgrade, Laboratory forRadiobiology and Molecular Genetics, Mike Petrovica Alasa 12-14, 11001 Belgrade,

Serbia{ivanj,majaz,tamariska,alexas}@vin.bg.ac.rs

2 Faculty of Mathematics, University of Belgrade, Studentski trg 16, 11000 Belgrade,Serbia [email protected]

Abstract. The action of microRNAs (miRs) as post-transcriptional regu-lators of gene expression is being recognized as one of the critical pro-cesses that affect type 2 diabetes (T2D) progression. The cellcell signal-ing via paracrine or even endocrine routes is mediated by miRs releasedfrom human tissue. Therefore, the aim of our study was to bioinformat-ically predict the miRs from microarray gene expression analysis of thewhole blood that play role in the pancreas β cell functioning in humanT2D. We have demonstrated that gene expression signatures identified inthe whole blood correspond to the miR expression changes specific for thepancreas tissue during the insulin resistance. Further experimental studiesshould follow in order to characterize described effects as early prognosticbiomarkers of insulin resistance and T2D. Keywords: type 2 diabetes, mi-

croRNA, microarray gene expression, bioinformatic integrative approach

1. Introduction

Type 2 diabetes (T2D) is a complex disease generally characterized by insulin re-sistance and increased hepatic glucose production. The rapidly increasing preva-lence of T2D is motivating the intensive search for biomarkers of the diseaseas well as novel therapeutic targets. The action of microRNAs (miRs) as post-transcriptional regulators of gene expression is being recognized as one of thecritical processes that affect T2D progression. Therefore, these small, non-codingRNAs, that regulate gene expression by predominantly promoting the degrada-tion of mRNA, exhibit great biomarker and therapeutic potential. Also, it wasdescribed that all human cells can release miRs, which mediate cellcell signal-ing via paracrine or even endocrine routes. Recently, microarray whole genomeexpression data and miR target predictions from multiple prediction algorithmswas linked using a multivariate statistical technique called Co-Inertia analysis(CIA) in order to predict miR activity and to associate specific miRs with differ-ent diseases. The studies have shown that CIA method does provide good quality

BelBI2016, Belgrade, June 2016. 63

Page 94: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Ivan Jovanovic et al.

predictions of miR activity. It was suggested that CIA has complementarity withother previously described prediction approaches thus could offer the predictionof miRs unidentified by others. So far, this integrative approach was not usedfor the analysis of circulating miRs that may originate from pancreatic tissue inT2D. Therefore, the aim of our study was to bioinformatically predict the miRsfrom microarray gene expression analysis of the whole blood that play role inthe pancreas β cell functioning in human T2D.

2. Materials and Methods

2.1. Gene expression data

The gene expression data set used in our study was downloaded from www.ncbi.nlm.nih.gov/geo/ (Gene Expression Omnibus database), accession number:GSE26168. The total mRNA expression of whole blood from T2D patients andhealthy controls was profiled on Illumina HumanRef-8 v3.0 expression bead-chip. The data was initially background subtracted and quantile normalizationwas performed prior the analysis.

2.2. Co-Inertia analysis

CIA was used to link microarray gene expression data (8 T2D patients and 8controls) and miR target predictions from multiple prediction algorithms to as-sociate specific miR activity with T2D. This multivariate statistical technique si-multaneously analyzes two connected data tables. The tables are treated as twosets of measurements on the same objects, genes. One of the tables is the mRNAgene expression table of g genes from n samples and the other displays predictedtarget counts of all miRs for the same g genes. Non-symmetric correspondenceanalysis was used as ordination method of CIA, which summarizes each data ta-ble in a low dimensional space by projecting the samples onto axes which maxi-mize the variances of the coordinates of the projected points. CIA performs twosimultaneous NSCs on the two linked tables, and identifies pairs of axes, fromthe two datasets which are maximally covariant. This unsupervised method wasused for visual inspection of the data. By further use of Between Group Analysis(BGA), which forces an ordination to be carried out on groups of samples ratherthan individual samples, CIA was directed to find the maximum co-variance be-tween the gene expression difference between groups of samples and the miR-gene target frequency tables. For the specified split in the data that contrastsT2D and control samples, we received a ranked list of miR motifs. CIA generatesas many miR rank lists as target prediction algorithms used. The most extremevalues of the ranking lists (top 20 and last 20) were used for the prediction ofupregulated and downregulated miRs in T2D. Lists were combined using con-sistency among the methods, according to previous study. The complete analysiswas performed by the MADE4 R package.

64 BelBI2016, Belgrade, June 2016.

Page 95: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Could integrative bioinformatic approach...

2.3. miR target prediction

Five sequence based miR target prediction algorithms were used for CIA: Tar-getScan and TargetScanS, PicTar4way and Pictar5way, and miRanda accordingto Madden et al. Each of these sequence based prediction algorithms utilizesthe complementarity with the miR seed and the cross species conservation intheir predictions. The miR target prediction data for CIA input, extracted fromthese databases, was organized in gene/miR frequency tables of counts of pre-dicted targets per gene for each of the algorithm. The gene/miR frequencytables for sequence based predictions originated from the TargetScan websitehttp://www.targetscan.org/ (version 4.1), the UCSC genome browser tract forpictar4way and pictar5way http://genome.ucsc.edu/, and from miRBase for mi-Randa (http://microrna.sanger.ac.uk/sequences/).

3. Results

CIA was firstly used in unsupervised manner for the purpose of data exploration.Figure 1. shows an example of unsupervised analysis of CIA using Pictar5Waytarget prediction program. The plot is in 2 parts and depicts a correspondenceanalysis of T2D patients and control samples and miRs associated with the geneexpression pattern characteristic for the two groups of samples. The observedsplit in the data shows clear difference between the gene expression profiles ofthe analyzed groups (Figure 1). The CIA performed in conjunction with corre-spondence analysis and between group analysis produced five ranked lists ofmiRs associated with specific gene expression profile in T2D. Using consistencyamong methods, we characterized potentially upregulated and downregulatedcirculating miRs responsible for the whole blood gene expression template inT2D (data not shown).

Clear clustering of T2D samples and controls shown in Figure 1. depict homo-geneous genome expression from whole blood (from microarray experiment) inT2D patients and different from control samples. This makes data suitable forfurther, supervised CIA. Our preliminary results indicate successful predictionof miRs from blood and applicability of our approach to select T2D associatedmiRs, as potentially new molecular biomarkers for this disease.

By inspecting the results, along with literature mining, we discovered thattwo of the highly ranked miRs (Table 1) present important factors in pancreasβ cell proliferation in response to hyperglycemia and insulin resistance which isthe hallmark of T2D.

Table 1. The ranking of the selected miRs according to CIA performed on 5prediction algorithms representing top and last 20 miRs. P4W Pictar4Way; P5WPictar5Way; TS TargetScan; TSS TargetScanS

miR P4W P5W TS TSS Miranda Predicted regulation in T2DmiR-375 6 - 11 2 - UPmiR-184 8 1 17 7 - DOWN

BelBI2016, Belgrade, June 2016. 65

Page 96: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Ivan Jovanovic et al.

Fig. 1. Axes of the unsupervised CIA performed on the whole genome gene ex-pression data of T2D patients and Controls. The gene/miR frequency table gen-erated with Pictar5Way was used to make this figure.

4. Discussion and conclusion

Blood miRs expression patterns have been reported for various human diseaseswith disease specific signatures. In one of the first studies, it was shown by se-quencing that patients with T2D have a significantly altered expression profileof serum miRs. This approach was also favored in the detection of miRs in bloodand other body fluids of T2D patients.

Using bioinformatical approach that combines microarray gene expression andmiR target prediction from multiple prediction algorithms, we have associatedspecific circulating miRs with T2D. In this study, we have focused on the twoof the most noteworthy miRs, functionally associated in a network within themiR pathway that coordinately regulates the compensatory proliferation of thepancreatic β cells in T2D.

The miR-184 is unique in pancreatic islets as the most downregulated miR dur-ing insulin resistance. It was described that miR-184 acts as an inhibitor of Ago2.Increased expression of Ago2 facilitates the function of already upregulated miR-375 in suppressing genes, including growth suppressor Cadm1 in vivo, thus in-ducing the proliferation of β cells and accommodation of the elevated demandfor insulin. Therefore, the miR-184 mir-375 network presents the essential com-ponent of the compensatory response that regulates proliferation of β cells re-garding insulin sensitivity and metabolic stress.

The most important finding of our study is that the whole blood gene expressionsignatures reflects the miR expression changes specific for the pancreas tissueduring the insulin resistance. This is the first bioinformatical study showing thattissue-released miRs affect the whole blood gene expression in T2D. Althoughthere is still a debate about the hormone-like effect of extracellular miR in theblood, the results of our study suggest that certain circulating miRs could besystemic biomarkers of pancreatic tissue changes in T2D. The results of our pre-dictions are also in agreement with microarray expression results of circulatingmiRs in T2D.

The obtained results represent the data of great importance for understandingof complexity of miR nature. Also, here we demonstrate the crucial need of

66 BelBI2016, Belgrade, June 2016.

Page 97: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Could integrative bioinformatic approach...

bioinformatical integrative concepts in further research of molecular processesof T2D. Finally, further experimental studies should follow in order to charac-terize described effects as early prognostic biomarkers of insulin resistance andT2D.

References

1. Guo, H., Ingolia, NT., Weissman, JS., Bartel, DP.: Mammalian microRNAs predominantlyact to decrease target mRNA levels. Nature, Vol. 466, No. 7308, 835-40. (2010)

2. Chen, K., Rajewsky, N.: The evolution of gene regulation by transcription factors and mi-croRNAs. Nature Reviews Genetics, Vol. 8, No. 2, 93-103. (2007)

3. Turchinovich, A., Samatov, TR., Tonevitsky, AG., Burwinkel, B.: Circulating miRNAs: cell-cellcommunication function? Frontiers in Genetics, Jun 28;4:119. (2013)

4. Madden, SF., Carpenter, SB., Jeffery, IB., Bjrkbacka, H., Fitzgerald, KA., O’Neill, LA., Hig-gins, DG.: Detecting microRNA activity from gene expression data. BMC Bioinformatics,11: 257. (2010)

5. Mulrane, L., Madden, SF., Brennan, DJ., Greme, G., McGee, SF., McNally, S., et al, DP.:miR-187 is an independent prognostic factor in breast cancer and confers increased inva-sive potential in vitro. Clinical Cancer Research, Vol. 18, No. 24, 6702-13. (2012)

6. Jovanovi, I., Zivkovi, M., Jovanovi, J., Djuri, T., Stankovi, A.: The co-inertia approach inidentification of specific microRNA in early and advanced atherosclerosis plaque. MedicalHypotheses, Vol. 83, No. 1, 11-5. (2014)

7. Arora, A., Simpson, DA., Individual mRNA expression profiles reveal the effects of specificmicroRNAs. Genome Biology, Vol. 9, No. 5, R82. (2008)

8. Karolina, DS., Armugam, A., Tavintharan, S., Wong, MT., Lim, SC., Sum, CF., Jeyaseelan,K.: MicroRNA 144 impairs insulin signaling by inhibiting the expression of insulin receptorsubstrate 1 in type 2 diabetes mellitus. PLoS One, Vol. 6, No. 8, e22839. (2011)

9. Culhane, AC., Perrire, G., Considine, EC., Cotter, TG., Higgins, DG.: Between-group anal-ysis of microarray data. Bioinformatics, Vol. 18, No. 12, 1600-8. (2002)

10. Culhane, AC., Thioulouse, J., Perrire, G., Higgins, DG.: MADE4: an R package for multi-variate analysis of gene expression data. Bioinformatics, Vol. 21, No. 11, 2789-90. (2005)

11. Keller, A., Leidinger, P., Vogel, B., Backes, C., ElSharawy, A., Galata, V., et al.: miRNAscan be generally associated with human pathologies as exemplified for miR-144. BMCMedicine, 12:224. (2014)

12. Chen, X., Ba, Y., Ma, L., Cai, X., Yin, Y., Wang, K., et al.: Characterization of microRNAsin serum: a novel class of biomarkers for diagnosis of cancer and other diseases. CellResearch. Vol. 18, No. 10, 997-1006. (2008)

13. Collares, CV., Evangelista, AF., Xavier, DJ., Rassi, DM., Arns, T., Foss-Freitas, MC, et al.Identifying common and specific microRNAs expressed in peripheral blood mononuclearcell of type 1, type 2, and gestational diabetes mellitus patients. BMC research notes,6:491. (2013)

14. Zampetaki, A., Kiechl, S., Drozdov, I., Willeit, P., Mayr, U., Prokopi, M., et al.: Plasma mi-croRNA profiling reveals loss of endothelial miR-126 and other microRNAs in type 2 dia-betes. Circulation Research, Vol. 107, No. 6, 810-7. (2010)

15. Kong, L., Zhu, J., Han, W., Jiang, X., Xu, M., Zhao, Y., et al.: Significance of serum microR-NAs in pre-diabetes and newly diagnosed type 2 diabetes: a clinical study. Acta Diabeto-logica, Vol. 48, No, 1, 61-9. (2011)

16. Tattikota, SG., Rathjen, T., McAnulty, SJ., Wessels, HH., Akerman, I., van de Bunt, M., et al.:Argonaute2 Mediates Compensatory Expansion of the Pancreatic Cell. Cell Metabolism,Vol. 19, No. 1, 122-134. (2014)

BelBI2016, Belgrade, June 2016. 67

Page 98: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Mechanism of unusual flexibility of DNA TATA-box

Polina Kanevska and Sergey Volkov

Bogolyubov Institute for Theoretical Physics, 14-b, Metrolohichna str. Kyiv, 03680,Ukraine

[email protected]

Abstract

DNA is macromolecule with a variety of secondary structure which vary depend-ing on nucleotide content and sequence, conditions of solution, and interactionwith proteins or ligands [1]. The variations in helix geometry play the role ofconformational information for the regulatory proteins and ferments. That iswhy studying of conformational transformations of the DNA double helix is anapproach to understanding the mechanisms of many genetic processes.

In our pervious works we revealed that polymorphic macromolecule can havespecific deformational mechanism, due to appearance of localized conforma-tional excitations [2]. In this case macromolecular deformation occurs becauseof interrelation between internal (conformational) and external (elastic) com-ponents and its energy cost is smaller than the same deformation in elastic ap-proach as worm like chain (WLC).

The internal-induced mechanism of macromolecular deformation is used for de-scribing anomalous large bending of DNA fragment with specific base pair se-quence (TATA-box) which is functionally important part of gene. It is argued thatsome decreasing of bending stiffness leads to drastic increasing of magnitude ofbend with diminishing of energy to the value lower than elastic energy of equalbend. The bending stiffness parameter can be stronger regulator of the bendvalue, then WLC predicts. Our approach agrees with recent exploring of DNApolymorphism which demonstrates bimodality of regulator fragment [3].

Keywords: DNA TATA-box, conformational information, flexibility

References

1. Saenger, W.: Principles of Nucleic Acid Structure, 200–241. Springer, New York (1984)2. Kanevska, P. P. and Volkov, S. N.: Intrinsically inuced deformation of a DNA macromolecule,

Ukr.J.Phys., 51, 1003–1009. (2006)3. Dans, P. D. and Perez, A. and Faustino, I. and Lavery, R. and Orozco, M.: Exploring polymor-

phisms in B-DNA helical conformations, Nucleic Acids Res, 40(21), 10668–10678 (2012)

68 BelBI2016, Belgrade, June 2016.

Page 99: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

One structured output learning method for proteinfunction prediction

Jovana Kovacevic1, Predrag Radivojac2, Gordana Pavlovic-Lazetic1

1 Faculty of Mathematics, University of Belgrade, Studentski trg 16, 11000 Belgrade,Serbia

{jovana,gordana}@matf.bg.ac.rs2 Department of Computer Science and Informatics, Indiana University, 150 South

Woodlawn Avenue, Bloomington, Indiana, [email protected]

Abstract

The task of structured output learning is to learn a function that enables predic-tion of complex objects such as sequences, trees or graphs for a given input. Oneof the problems where such methods can be applied is protein function predic-tion, where the aim is to find one or more functions that the protein performs ina cell according to its characteristics such as its primary sequence, phylogeneticinformation, protein-protein interactions, etc. The space of all known proteinfunctions is defined by a directed acyclic graph known as Gene Ontology (GO),where each node represents one function and each edge encodes a relationshipsuch as is-a, part-of, etc. Each output, on the other hand, represents the subgraphof GO, consistent in a sense that it contains a proteins functions propagated tothe root of the ontology.

In this research, we developed structured output predictor that determines pro-tein function according to the histogram of 4-grams that appear in the proteinssequence. The predictor is based on the machine learning method of structuralsupport vector machines (SSVM), which represents generalization of the well-known SVM optimizers on structured outputs. Adjusting SSVM to this specificproblem required the development of an optimization algorithm that maximizesan objective function over the vast set of all possible consistent subgraphs ofprotein functional terms as well as careful choice of loss functions. Using theproposed method, we tested it on sets of proteins of five different organisms andinvestigated the influence of proteins origin to quality of function prediction.

Keywords: Protein function prediction, structured output learning

BelBI2016, Belgrade, June 2016. 69

Page 100: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Combined genomic and transcriptomiccharacterization of single disseminated prostate

cancer cells

Stefan Kirsch1, Urs Lahrmann1, Miodrag Guzvic2, Zbigniew T. Czyz1, GiancarloFeliciello1, Bernhard Polzer1 and Christoph A. Klein1,2

1 Fraunhofer ITEM - Project Group Personalized Tumor Therapy, Am Biopark 9, 93053Regensburg, Germany

[email protected] University of Regensburg - Experimental Medicine and Therapy Research,

Franz-Josef-Strauß-Allee 11, 93053 Regensburg, Germany

Abstract

he most frequent cause of death among prostate cancer patients is the manifesta-tion of bone metastases. It is therefore of high therapeutic relevance to identifyhow and when metastatic growth is induced. Using disseminated cancer cells(DCCs) may provide an opportunity to dissect the early stages of systemic can-cer and enable detection of critical therapeutic targets. To achieve a comprehen-sive characterization of these cells, we developed a method for combined wholegenome and whole transcriptome analysis [1].

We analyzed in total 36 samples (24 EPCAM-positive DCCs; 1 single cell and5 cell pools of the VCaP prostate cancer cell line; 6 cells from healthy donorsas controls). After isolation and quality control by PCR-based QC-assays, DCCswere subjected to combined whole genome and whole transcriptome amplifi-cation using the Ampli1 WGA and Ampli1 WTA approach [1]. WGA productswere then hybridized on high-resolution SurePrint aCGH arrays and analyzedwith the Agilent Genomic Workbench Software. WTA products were sequencedon the Roche 454 GS FLX+ system and analyzed using an in-house developedbioinformatics pipeline that included quality control, read mapping, differentialexpression analysis, pathway analysis, fusion gene prediction and variant call-ing.

Comprehensive transcript expression and cluster analysis revealed different gro-ups within our cell collective. Expression of classical cancer marker genes likeKLK3 (PSA) or AR could only be identified in one of those groups while theothers showed a strongly different expression signature. A further comparisonof expression data, mutational profile and aCGH data uncovered group-specificfeatures and allowed to link copy number alterations to corresponding changesin gene expression dosage.

Keywords: bioinformatics, single cell, whole genome, whole transcriptome

70 BelBI2016, Belgrade, June 2016.

Page 101: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Combined genomic and transcriptomic characterization ...

References

1. Klein, C.A., Seidl, S., Petat-Dutter, K., Offner, S., Geigl, J.B., Schmidt-Kittler, O., Wendler,N., Passlick, B., Huber, R.M., Schlimok, G., Baeuerle, P.A., Riethmller, G: Combined tran-scriptome and genome analysis of single micrometastatic cells. Nat Biotechnol. 2002Apr;20(4):387-92.

BelBI2016, Belgrade, June 2016. 71

Page 102: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

The perceptual structure of the phoneme manifold

Yair Lakretz1,2,3, Evan-Gary Cohen1, Naama Friedmann1, Gal Chechik2, andAlessandro Treves3

1 Tel-Aviv University, Tel-Aviv, [email protected]

2 Bar Ilan University (Israel)3 SISSA, 265 Via Bonomea, Trieste, Italy

Abstract

Theories of phoneme representation have been based on the notion of ”sub-phonemic features”, i.e. variables such as place of articulation, voicing and nasal-ization, some binary and some multi-valued, that can be taken to characterizethe production, and with some modifications also the perception, of differentphonemes. However, perceptual confusion rates between phonemes cannot besimply explained by the number of different values taken by their subphonemicfeatures. Moreover, assuming a discrete nature for these variables is incongruentwith the continuous, analog neural processes that underlie the production andperception of phonemes, and with the remarkable cross-linguistic differences ob-served, that make the notion of a universal phonemic space rather implausible.As a first step towards a plausible neuronal theory of how phoneme representa-tions may self-organize in each individual upon language learning, we describemethods to derive, from behavioral or neural data, distinct ”weights” for differ-ent features. Such weights provide a data-driven metric for the perceptual ormotor phoneme manifold. We find that they differ by more than an order ofmagnitude, and differ across languages, pointing at the need to go beyond theclassical digital description of phonemes.

72 BelBI2016, Belgrade, June 2016.

Page 103: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Model selection in biomolecular pathways

Hanen Masmoudi

Higher institute of Biotechnology of Sfax, [email protected]

Abstract

We present here a generalized Expectation-Maximization (GEM) algorithm [1]for learning parameters of nite Gaussian mixture distributions within a graph-ical modeling framework. The GEM algorithm iterates between three steps: anexpectation (E) and a maximization (M) steps, like the EM algorithm, where acreated function for log-likelihood is evaluated using the current estimate for theparameter after that the parameters estimated are maximized. Here, we addeda third (G) step where the M-estimates are updated based on Lauritzen for-mula [2] given a graph that indicates relationship between nodes. We apply theGEM al- gorithm on biomolecular interaction networks. In such networks, arcsrepresent probabilistic relationships (regulation, interaction) between nodes orvariables (proteins, genes, molecules, ...). A simulation study of signal transduc-tion net- work of a simple biomolecular pathway of the epidermal growth factor(EGFR) protein [3] was conducted, and we demonstrate that the GEM algorithmallows the classication of the data to each Gaussian distribution cluster and per-mits the selection of the best network that t the data.

Keywords: EM algorithm, EGFR, Bayesian Network, Selection

References

1. Dempster, A.P., Laird, N.M. and Rubin, D.B.,: Maximum likeli- hood from incom- plete datavia the EM algorithm (with discussion). J. Roy. Statist. Soc. B. 39, 1-38.

2. Lauritzen, S.L.,: Graphical models, Oxford University Press. (1996).3. Ben Hassen, H. Masmoudi, A. and Rebai, A.,: Causal inference in Biomolecular Pathways

using a Bayesian network approach and an Implicit method. J. Theor. Biol. 4, 717-724.(2008).

BelBI2016, Belgrade, June 2016. 73

Page 104: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Hybrid methodology for information extractionfrom tables in the biomedical literature

Nikola Milosevic1, Cassie Gregson2, Robert Hernandez2, and Goran Nenadic1

1 School of Computer Science, University of Manchester, Oxford Road, Manchester, UK{nikola.milosevic,gnenandic}@manchester.ac.uk

2 AstraZeneca plc, Cambridge, UK{cassie.gergson,rob.hernandez}@astrazeneca.com

Abstract. Scientific literature, especially in the biomedical domain, is grow-ing exponentially. Text mining can provide methods and tools that can helpprofessionals to handle large amount of literature. However, most of thecurrent approaches focus on the textual body of the article, usually ig-noring tables and figures. In this paper, we present a hybrid methodologythat utilizes machine learning and set of heuristics rules for informationextraction from tables in literature. In a case study, the method achievedF1-score of 83.94% for extracting the number of patients with the namesof participant groups from clinical trial publications.

Keywords: health informatics, text mining, table mining, clinical trials

1. Introduction

The literature in biomedical domain is growing exponentially. Currently, thereare over 25 million articles indexed in MEDLINE. The fields of natural languageprocessing and text mining have developed methods and tools that are able tohelp with processing a large amount of literature and retrieving informationof interest. However, most of the current approaches are limited to the textualbody of articles, usually ignoring figures and tables. However, authors of thescientific literature utilize tables in order to present detailed information aboutthe settings and the results of their experiments. Tables are used also for otherpurposes, where authors need to present relatively large amount of multidimen-sional information in a compact manner [13].

Tables may have various structures and the information can be presented inthe vast variety of formats. The structure of the table defines the relationshipsbetween cells. In order to ”read” the table correctly, these relationships and theroles of the cells need to be recognized. Current representational models of ta-bles mainly focus on visualization, which makes automated table processing acomplex task, with a need to disentangle its visual structure before analyzingthe presented data.

Previously, work have been done in detecting tables in documents (PDF, textand HTML) using optical character recognition [6] and machine learning algo-rithms, such as decision trees [10], Support Vector Machines [12], and heuris-tics [16]. Recognizing functional areas of the table (headers and data areas)has been done mainly using machine learning methods like decision trees [2] or

74 BelBI2016, Belgrade, June 2016.

Page 105: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Information extraction from tables in literature

Fig. 1. Example of a baseline characteristic table reporting number of partici-pants (PMC2147028)

conditional random fields [15]. Also some work has been done in the area of in-formation retrieval [5], information extraction [9, 13] and question answering[15]. However, most of these approaches examined general domain and werelimited to visually simple tables.

In this paper, we introduce a methodology for information extraction fromtables in the biomedical domain and present a case study on extracting numberof patients and participants groups from clinical trial literature.

2. Method

Our approach consists of six steps: (1) table detection, (2) functional processing,(3) structural processing, (4) semantic tagging, (5) pragmatic processing, (6)syntactic processing and information extraction. As a data set to test our methodwe used clinical documents stored as open access in PubMedCentral3. As thesedocuments are in XML format, it is trivial to detect tables by searching for aparticular XML tag. We therefore focus on other steps.

2.1. Functional processing

The aim of functional processing is to detect the basic roles of cells. The cellcan be column header, row header, super-row (row or part of row header thatcategorize additionally row header) or data cell. In order to detect functionalroles of the cells, we used a set of heuristics about cells positions, its neighbors,content type, surrounding XML tags and XML attributes (such as span).

2.2. Structural processing

During the structural processing, the relationships between cells are recognized,which include relationships to the navigational cells such as headers, stubs andsuper-rows. We used a set of heuristics about cell’s function, structure, content,position and table’s structure to disentangle table’s structure and inter-cell rela-tionships [8]. Information about cell’s content, position, function and relation-ships are stored in a database.

3 http://www.ncbi.nlm.nih.gov/pmc/

BelBI2016, Belgrade, June 2016. 75

Page 106: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Nikola Milosevic et al.

2.3. Semantic tagging

Once data are stored in the database, we enrich the data by annotating thecell’s content with concepts and semantic types from the UMLS [1]. We havedeveloped a dictionary-based concept tagging method to annotate text usingUMLS, WordNet, DBPedia and vocabularies represented in Simple KnowledgeOrganization System model [7].

2.4. Pragmatic processing

The main purpose of the pragmatic processing is to identify the table wherethe information of interest is located and reduce the amount of false positives.We propose a machine learning classification method with an aim to determinethe purpose of a given table and what kinds of information are stored in it.We classified tables into tables reporting baseline characteristics, adverse event,inclusion/exclusion criteria and others. We tested a number of machine learn-ing algorithms, including Naive Bayes, SVM, decision trees, random tree andrandom forest in Weka toolkit [4]. As features, we used words and semantic an-notations from the caption, column and row headings, the number of rows andnumber or columns.

2.5. Syntactic processing and information extraction

In this step, we have designed rules to find lexical cues in cells and extract infor-mation from related cells. The cells are syntactically processed and a templatebased information extraction approach was implemented. Value patterns wereexamined using regular expressions and the appropriate part of the value is ex-tracted.

Since CONSORT recommended reporting of trial baseline characteristics anddemographic information in tables [11], we experimented with the extractionof the number of patient per participant group. They may be as the total num-ber of patients in the caption of the table or alternatively, per participant groupin headings or data cells. We created a data set of 200 articles containing, atleast, one baseline characteristic table and split it into a training and testing setscontaining 100 article each. The rules are crafted by examining and testing onthe training set in an iterative manner. The output contained a reference to thearticle from which the number was extracted, label (in this case ”Number of pa-tients”), participant group name (extracted from the header) and the extractednumber.

3. Results

On evaluation set, our method performs functional analysis with a precision of0.9425, recall of 0.9428 and F1-score of 0.9426. Relationships between cellswere recognized with a precision of 0.9238, recall of 09744 and F1-score of0.9484. The results of experiments performed for the pragmatic classificationare given in Table 1.

76 BelBI2016, Belgrade, June 2016.

Page 107: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Information extraction from tables in literature

Table 1. Results of the pragmatic four class classification experiments

Algorithm Precision Recall F-ScoreNaive Bayes 0.943 0.943 0.943Bayesian Networks 0.938 0.939 0.938C4.5 decision trees 0.944 0.945 0.944Random tree 0.905 0.903 0.904Random Forests 0.948 0.948 0.948SVM 0.967 0.966 0.966

The results of manual evaluation of information extraction of number of thepatients can be seen in Table 2.

Table 2. Results of information extraction for number of patients

Precision Recall F-ScoreTraining 0.900 0.839 0.868Testing 0.894 0.791 0.839

Accumulation of errors over the steps affects the final performance, as wellas presentation types that are hard to generalize. However, the performance ofinformation extraction from tables is promising and reliable over the range ofcomplex tables.

4. Discussion

We presented a hybrid approach for information extraction from tables that iscomposed of six steps. Our approach uses machine learning and heuristic rules.Even though some of the previous approaches reported slightly better accuracy[3, 14], they were limited to standardized tables with pre-defined table’s struc-ture. Our approach does not make assumptions about table’s structure and canbe applied to any kind of tables. The first three steps of our approach are domainindependent. Semantic tagging, pragmatic processing and information extrac-tion rules are domain and task dependent. The results on a case study indicatethat information can be reliably extracted from complex tables, in particular, ifsuch information is combined with data mined from the main text.

References

1. Bodenreider, O.: The unified medical language system (umls): integrating biomedicalterminology. Nucleic acids research 32(suppl 1), D267–D270 (2004)

BelBI2016, Belgrade, June 2016. 77

Page 108: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Nikola Milosevic et al.

2. Chavan, M.M., Shirgave, S.: A methodology for extracting head contents from mean-ingful tables in web pages. In: Communication Systems and Network Technologies(CSNT), 2011 International Conference on. pp. 272–277. IEEE (2011)

3. Embley, D.W., Tao, C., Liddle, S.W.: Automating the extraction of data from htmltables with unknown structure. Data & Knowledge Engineering 54(1), 3–28 (2005)

4. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The wekadata mining software: an update. ACM SIGKDD explorations newsletter 11(1), 10–18 (2009)

5. Hearst, M.A., Divoli, A., Guturu, H., Ksikes, A., Nakov, P., Wooldridge, M.A., Ye, J.:Biotext search engine: beyond abstract search. Bioinformatics 23(16), 2196–2197(2007)

6. Kieninger, T.G., Strieder, B.: T-recs table recognition and validation approach. In:AAAI Fall Symposium on Using Layout for the Generation, Understanding and Re-trieval of Documents (1999)

7. Milosevic, N.: Marvin: Semantic annotation using multiple knowledge sources. arXivpreprint arXiv:1602.00515 (2016)

8. Milosevic, N., Gregson, C., Hernandez, R., Nenadic, G.: Extracting patient data fromtables in clinical literature: Case study on extraction of bmi, weight and number ofpatients. In: Proceedings of the 9th International Joint Conference on BiomedicalEngineering Systems and Technologies (BIOSTEC 2016). vol. 5, pp. 223–228 (2016)

9. Mulwad, V., Finin, T., Syed, Z., Joshi, A.: Using linked data to interpret tables. COLD665 (2010)

10. Ng, H.T., Lim, C.Y., Koo, J.L.T.: Learning to recognize tables in free text. In: Proceed-ings of the 37th annual meeting of the Association for Computational Linguistics onComputational Linguistics. pp. 443–450. Association for Computational Linguistics(1999)

11. Schulz, K.F., Altman, D.G., Moher, D.: Consort 2010 statement: updated guidelinesfor reporting parallel group randomised trials. BMC medicine 8(1), 1 (2010)

12. Son, J.W., Lee, J.A., Park, S.B., Song, H.J., Lee, S.J., Park, S.Y.: Discriminating mean-ingful web tables from decorative tables using a composite kernel. In: Web Intelli-gence and Intelligent Agent Technology, 2008. WI-IAT’08. IEEE/WIC/ACM Interna-tional Conference on. vol. 1, pp. 368–371. IEEE (2008)

13. Tengli, A., Yang, Y., Ma, N.L.: Learning table extraction from examples. In: Proceed-ings of the 20th international conference on Computational Linguistics. p. 987. As-sociation for Computational Linguistics (2004)

14. Wang, X.F.: Research on information extraction based on web table structure andontology. Applied Mechanics and Materials 321, 2254–2259 (2013)

15. Wei, X., Croft, B., McCallum, A.: Table extraction for answer retrieval. Informationretrieval 9(5), 589–611 (2006)

16. Yildiz, B., Kaiser, K., Miksch, S.: pdf2table: A method to extract table informationfrom pdf files. In: IICAI. pp. 1773–1785 (2005)

78 BelBI2016, Belgrade, June 2016.

Page 109: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Standard Genetic Code vs Vertebrate MitochondrialCode: Nucleon Balances and p-Adic Distances

Natasa Z. Misic

Lola Institute, Kneza Viseslava 70a, Belgrade, Republic of [email protected]

Abstract

The standard genetic code (SGC) is crucial to our understanding not only ofthe origin of life, but also of the link between the physical and biological realmwith information as an ultimate unifying concept [1]. The nature, origin, andevolution of genetic code is an enigma for itself, whose disclosure approachedthrough three not necessarily mutually exclusive main theories which suggestthat we still have not found an adequate set of the physicochemical or/and bio-logical factors in ensuring the emergence of SGC. Our approach includes somenonstandard concepts: Shcherbaks arithmetical regularities of nucleon numbersof SGC constituents [2] and Dragovichs p-adic modeling of the vertebrate mito-chondrial code (VMC) [3]. In addition to well-known nucleon balances [1, 2],we introduced a new type based on an aggregate nucleon number of amino acidand its corresponding codon. By giving Euclidean representation of the 5-adicmodel of SGC and VMS as well as the nucleotide 2-adic distances, we visualizedtheir inherent symmetries. A comparison of both type nucleon balances for thesetwo genetic codes shows that the regularities are more presented in SCG than inVMC, despite the fact that the last is more symmetrical. The fact that VMC is thesimplest genetic code system among all extant organisms so far examined makesthe previous result to be more significant. Also we show that the mean value ofthe aggregate nucleon numbers of SGC has a more accurate agreement with thepreviously defined self-similarity constant [1] than of VMC, what altogether in-dicates that by partially different mechanisms had been driven an optimizationof SCG in the primordial conditions and of VMC in the highly developed organ-isms.

Keywords: genetic code, origin, codon degeneracy, amino acid nucleon bal-ances, p-adic distances.

References

1. Misic, N. Z.: Nested Numeric/Geometric/Arithmetic Properties of shCherbaks Prime Quan-tum 037 as a Base of (Biological) Coding/Computing. Neuroquantology, Vol. 9, No. 4, 702-715, (2011);

2. shCherbak, V.I.: The Arithmetical Origin of the Genetic Code. In: Barbieri, M. (ed.): TheCodes of Life: The Rules of Macroevolution, Springer, 153-188, (2008).

3. Dragovich, B., Dragovich, A.: p-Adic Modelling of the Genome and the Genetic Code. Com-puter Journal, Vol. 53, No. 4, 432-442, (2010).

BelBI2016, Belgrade, June 2016. 79

Page 110: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Structural Characterization of the Trypanosomabrucei CK2A1-HDAC1/HDAC2 Interactions by

Molecular Modeling and Protein-Protein Docking

Ozal Mutlu

Marmara University, Faculty of Arts and Sciences, Department of Biology, 34722,Goztepe, Istanbul, Turkey

[email protected]

Abstract

Post-translational modifications of the histone tails by various mechanisms in-cluding phosphorylation, methylation, acetylation, ubiquitination etc. have valu-able impacts on gene regulation, development and disease [1]. One of the mod-ifications is the acetylation which is modulated by histone deacetylases (HDAC)and targeting of the HDACs is a popular issue in cancer treatment, some neu-rological and parasitic diseases [2]. In this work, we have characterized Try-panosoma brucei gambiense histone deacetylase class I enzymes (HDAC1 andHDAC2) interaction with the casein kinase 2 alpha 1 catalytic domain by protein-protein docking. Because of there is no crystal structural of the enzymes, firstly3D structures were determined by homology modeling using MODELLER v9.16.Then protein-protein docking and optimizations were conducted by using threedifferent servers (HADDOCK, ZDOCK and PyDock). At the end, we decided touse only HADDOCK server because of higher prediction value. Then, dockedproteins again and predicted the interaction interface using HADDOCK serverand select best complexes based on the total server score, orientation of thedesired region, and also solvation free energy (∆Gcomplex)from thePDBePISAs-erver (http://www.ebi.ac.uk/msd-srv/prot int/pistart.html). As a conclusion, un-derstanding binding mode and interaction interface of HDAC-CK2A1 could bea potent option in inactivation of histone deacetylation by dissecting protein-protein interaction for the treatment of parasitic diseases and selective drug de-sign development.

Keywords: histone deacetylase, protein-protein docking, interaction interface,Trypanosoma brucei gambiense

References

1. Mersfelder, E.L., Parthun, M.R.: The Tale Beyond the Tail: Histone Core Domain Modifica-tions and the Regulation of Chromatin Structure.Nucleic Acids Research. 34(9):2653-2662,2006.

2. Thomas, E.A.: Involvement of HDAC1 and HDAC3 in the Pathology of Polyglutamine Dis-orders: Therapeutic Implications for Selective HDAC1/HDAC3 Inhibitors.Pharmaceuticals.7(6):634-661, (2014).

80 BelBI2016, Belgrade, June 2016.

Page 111: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Mining PMMoV genotype-pathotype associationrules from public databases

Vesna Pajic1, Bojana Banovic2, Milos Beljanski3 and Dragana Dudic1

1 Center for Data Mining and Bioinformatics, Faculty of Agriculture, University ofBelgrade Nemanjina 6, 11080 Zemun, Serbia

{svesna, ddragana}@agrif.bg.ac.rs2 Institute of Molecular Genetics and Genetic Engineering, University of Belgrade,

Vojvode Stepe 444a, 11000 Belgrade, [email protected]

3 Institute for General and Physical Chemistry, University of Belgrade, Studentski Trg16, 11080 Belgrade, Serbia

[email protected]

Abstract. In order to utilize knowledge hidden in public databases, we ap-plied several data mining techniques on PMMoV sequences from NCBI nu-cleotide database with an aim to characterize this virus at molecular level.The dataset consists of 231 nucleotide sequences collected. We identifiedthree distinct genotype variants (namely TG, GA and GG) based on thenucleotide combinations on significant positions within subgroups of se-quences. Those positions were further confirmed using the EM algorithm.The information about pathotype was known for only 40% of studied se-quences and distribution of pathotypes was very imbalanced. Nevertheless,using the Apriori-type algorithm two strong rules was mined (confidence0.96 and 0.93). The analysis showed that hidden knowledge could be dis-closed and put to use through data mining approaches like class associa-tion analysis and cluster analysis.

Keywords: clustering, class association rules, PMMoV

1. Introduction

With new sequencing technologies, field of genomics is growing fast and so isthe amount of the data behind it. Most of that data is publicly available throughdifferent data sources in a recent molecular biology databases review [1] thereare even 1685 relevant resources in molecular biology reported, where eachdata source contains a large amount of data. NCBI nucleotide database con-tains sequences from multiple sources including GenBank with 190,250,235sequences4, RefSeq with 92,936,289 sequences5, and PDB with 117,240 se-quences6. Although a vast amount of sequence data is available, there is a hugeand mostly unrealized potential in analyzing it.

In this research we choose Pepper Mild Mottle Virus (PMMoV) as in silico plant4 http://www.ncbi.nlm.nih.gov/genbank/statistics/5 http://www.ncbi.nlm.nih.gov/refseq/statistics/6 http://www.rcsb.org/pdb/statistics/holdings.do

BelBI2016, Belgrade, June 2016. 81

Page 112: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Vesna Pajic et al.

virus model to test what kind of information one can extract from publicallyavailable nucleotide sequences by using bioinformatics tools and data miningapproach. PMMoV is a tobamovirus responsible for diminishing pepper yields.Until 2005, soil treatment against PMMoV consisted of the application of methylbromide, ozone depleting chemical. By utilizing publicly available PMMoV’s se-quence data one could explore life cycle, pathogenicity, virulence potential andplant resistance mechanisms of the virus in order to develop more eco-friendlyalternatives for the suppression of the virus in the field. We analyzed nucleotidecontent and single nucleotide variations of available sequences with severaldata mining techniques, and compared the results with information on viruspathogenicity found in the literature [2–4]. The overall aim was to detect someof existing relations between nucleotide content and pathotype which could po-tentially be used for future monitoring of virus and its pathogenicity.

2. Data

At the time of the analysis, 231 PMMoV nucleotide sequences were available inNCBI database at total; 13 of them were complete genomes, 150 correspondedto coat protein, 62 corresponded to 126K replicase small subunit, 6 correspondedto 183K replicase large subunit and 7 corresponded to 30K cell-to-cell movementprotein. They constituted dataset D1 and were aligned using Clustal X 2.17, withlater manual correction in MEGA 68. We used package seqinr in R in order to de-termine profile sequence, in respect of which all other analyses were conducted.

There were 94 sequences (40%) in the dataset D1 for which information onpathotype (one of five pathotypes: P0, P1, P12, P123 and P1234 described in litera-ture [5, 6]) was available either in papers or in NCBI database. For the purposeof mining genotype-pathotype association rules, these sequences, along with theinformation about genotype (determined in this research) and pathotype wereextracted in another dataset, the dataset D2.

Dataset D1 was additionally split into groups and subgroups based on the partof the genome the sequences were covering (Table 1). The whole genome se-quences were then divided into subsequences corresponding to the same nu-cleotide positions (np) the subgroups were covering, so each subgroup had got13 more sequences, obtained from the whole genome sequences.

3. Tools, Methods and Algorithms

For fulfilling data mining tasks we used WEKA 3.6.109 algorithm implemen-tations. Bioinformatics analyzes were performed using Bioconductor package

7 http://www.clustal.org/clustal2/8 http://www.megasoftware.net/9 http://www.cs.waikato.ac.nz/ml/weka/

82 BelBI2016, Belgrade, June 2016.

Page 113: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Mining PMMoV genotype-pathotype association ...

Table 1. Number of sequences and np for each subgroup

Group Subgroup Number ofsequences

Number ofnp covered

First np inthe genome

Last np inthe genome

1 1.1 67 200 612 8101.2 50 768 481 1248

2 2 22 190 1622 18113 3 18 790 4015 4805

4.1 15 779 4909 56824 4.2 110 476 5685 6157

4.3160 209 5841 6047

in R. After aligning sequences of each subgroup, single nucleotide variations(SNVs) were determined with in-house script. Comprehensive analyzes of SNVsrevealed several informative np in each subgroup. Based on the combination ofnucleotides contained in these np, sequences could be divided into disjoint sets.Information about the sets and determined significant positions is shown in Ta-ble 2.

Analysis of relationships among these sets, for the sequences spanning in more

Table 2. Nucleotide positions that divide sequences into disjoint sets in eachgroup

Group np in the genome Set label Nucleotide combi- Number ofsequence nation (short mark) sequences

1.1 639; 669 G1.1-1 GT (G) 52G1.1-2 AC (A) 15

1.2 565; 566; 708;1125 G1.2-1 TGGA (T) 34G1.2-2 GTTG (G) 16

2 1638; 1647 G2-1 TA (T) 12G2-2 CG (C) 9

3 4107; 4131; 4392;4395;

G3-1 GACGTCCA (G) 10

4516; 4560; 4650;4698

G3-2 AGTACTTG (A) 8

4.1 4929; 4963; 5085;5151;

G4.1-1 CGACACGG (C) 10

5244; 5487; 5557;5611

G4.1-2 TAGGGTAA (T) 5

5763; 5819; 5837;5996;

G4.2-1 CTTACTGATGC (C) 86

4.2 6002; 6011; 6038;6062;

G4.2-2 TCCTTCTTATT (T) 24

6100; 6101; 61274.3 5996; 6002; 6038 G4.3-1 ACG (A) 127

G4.3-2 TTT (T) 35

BelBI2016, Belgrade, June 2016. 83

Page 114: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Vesna Pajic et al.

than one group, was used for the detection of three distinct genotype variants,which were additionally confirmed with cluster analysis.

In order to confirm determined genotype variants and to disclose other exist-ing similarities between sequences, we performed cluster analysis on D1 datasetusing Expectation Maximization (EM) algorithm [7]. The optimum number ofclusters was estimated via 10-fold cross validation.

Rules for associating genotypes with pathotypes of virus were learned with theApriori algorithm [8] from the dataset D2 with specified minimum support of 0.1and minimum confidence of 0.9. We used modification of the original algorithmwhich combines association rule technique with classification rule technique [9]to allow the algorithm to focus on association rules useful to determine prede-fined classes.

4. Results

4.1. Determination of Genotypes

We analyzed the relationship among sets G1.1-1, G1.1-2, G1.2-1 and G1.2-2 for50 sequences in Group 1 covering subgroups 1.1 and 1.2. Three distinct geno-type variants were determined based on the nucleotide content on sites 565,566, 639, 669, 708 and 1125: Genotype variant GA, Genotype variant TG andGenotype variant GG.

Evaluation using the EM algorithm resulted with the three clusters in subgroup1.1, based on already emphasized positions 639 and 708, corresponding to de-termined genotype variants. In subgroup 1.2, four clusters were mined, one ofthem contained only one sequence with Genotype variant GG. Distinction ofthree remaining clusters was based upon 565 np and 552 np. The separation atthe earlier stressed position 565 extracts Genotype variant TG. For the clustersformed upon the new obtained position 552 we can state that all sequences fromone cluster were having Genotype variant GG and all sequences having Geno-type variant GA was in the other cluster, which also contained sequences havingGenotype variant GG.

Assuming that revealed information about Group 1 can be transferred to wholegenomes (and therefore to isolates) we classified the whole genome sequencesbased on the determined genotype variants (Table 3).

Cluster analysis of the sequences from Groups 2, 3 and subgroup 4.1 did not re-veal any new similarities among sequences, but in the subgroup 4.2 it revealedthree clusters which corresponded to defined genotype variants. The three clus-ters obtained using the EM algorithm segregated 6002 np, based on which Geno-type variant GA is matched, and newly observed 5975 np which can be used todistinguish Genotype variant GG from Genotype variant TG.

4.2. Genotype pathotype associations

Applying class association analysis we found two strong rules:

84 BelBI2016, Belgrade, June 2016.

Page 115: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Mining PMMoV genotype-pathotype association ...

Table 3. Distribution of whole genome sequences into disjoint sets determinedby Groups 1.1 4.3 analysis and the determined genotype variants (for simplifiedrepresentation, short marks are used instead of set’s labels)

Sequence 1.1 1.2 2 3 4.1 4.2 4.3 Geno-type

Spain-1989-P12—NC 003630.1 G T T G C C A TGSpain-1989-P12—M81413.1 G T T G C C A TGJapan-2005-P0—AB113117.1 G T T G C C A TGJapan-2005-P0—AB113116.1 G T T G C C A TGJapan-2002-P0—AB069853.1 G T T G C C A TGJapan-1997-P1234—AB000709.2 G T T G C C A TGChina-2006—AY859497 G T T G C C A TGBrasil-2010—AB550911.1 G T T G C C A TGIndia-2014-P12—KJ631123.1 G T T G C C A TGJapan-2003-P1234—AB276030.1 G G G A T C A GGSouthKorea-2005—AB126003.1 G G G A T C A GGJapan-2007-P12—AB254821.1 G G G A T C A GGSpain-2002-P123—AJ308228 A G G A T T T GA

Genotype variant=TG 45 =⇒ Pathotype=2 43 conf:(0.96)Genotype variant=GA 29 =⇒ Pathotype=3 27 conf:(0.93)

The rules clearly indicate that almost all (43 out of 45) sequences that haveGenotype variant TG also have pathotype P12, and that 27 out of 29 sequenceshaving Genotype variant GA also have pathotype P123.

For the sequences having Genotype variant GG, 5921 np (found with a classifi-cation method) can be discrimative for pathotype prediction: if sequence has Tor C it is of pathotype P12, while if sequence has A it is of pathotype P123.

5. Conclusion

The clustering and class association analysis of 231 PMMoV sequences availableat NCBI showed some regularities which potentially can be used for molecularmonitoring of virus genotype-pathotype association.

References

1. Rigden D. J., Fernndez-Surez X. M., Galperin M. Y.: The 2016 database issue of NucleicAcids Research and an updated molecular biology database collection. Nucleic Acids Re-search, 44, D1D6. (2016)

2. Gilardi P, Wicke B, Castillo S, de la Cruz A, Serra MT, Garca Luque I. Resistance in Cap-sicum spp. against the tobamoviruses. In: Pandalai SG, ed. Recent research developmentsin virology, Vol. 1. India: Transworld Research Network, 547-558. (1999)

3. Genda Y, Kanda A, Hamada H, Sato K, Ohnishi J, Tsuda S. Two amino acid substitutionsin the coat protein of Pepper mild mottle virus are responsible for overcoming the L4 genemediated resistance in Capsicum spp. Phytopathology 97, 787793. (2007)

BelBI2016, Belgrade, June 2016. 85

Page 116: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Vesna Pajic et al.

4. Antignus, O. Lachman, M. Pearlsman, L. Maslenin, A. Rosner. A new pathotype of Peppermild mottle virus (PMMoV) overcomes the L4 resistance genotype of pepper cultivars. PlantDis. 92, 10331037. (2008)

5. Boukema I. W. Resistance to TMV in Capsicum chacoense Hunz. is governed by allele ofthe L-locus. Capsicum Newsl. 3, 4748 (1984)

6. Sawada H., Takeuchi S., Hamada H., Kiba A., Matsumoto M., Hikichi Y.: A newtobamovirus-resistance gene L1a, of sweet pepper (Capsicum annuum L.). J. Jpn. Soc.Hortic. Sci. 73, 552-557 (2004)

7. Dempster A. P., Laird N. M., and Rubin D. B., Maximum Likelihood from Incomplete Datavia the EM Algorithm, Journal of the Royal Statistical Society B 39: 138. (1977)11.

8. Atluri, G., Gupta, R., Fang, G., Pandey, G., Steinbach, M., Kumar, V., Association analysistechniques for bioinformatics problems, Bioinformatics and Computational Biology, 1-13.(2009)

9. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules, Proceedings of the20th International Conference on Very Large Databases, 487499. (1994)

86 BelBI2016, Belgrade, June 2016.

Page 117: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Complexity measures based on intermittent eventsin brain EEG data

Paolo Paradisi1,2, Marco Righi1, Massimo Magrini1,Maria Chiara Carboncini3, Alessandra Virgillito3, and Ovidio Salvetti1

1 Institute of Information Science and Technologies (ISTI-CNR), Via G. Moruzzi 1,I-56124 Pisa, ITaly

[email protected] Basque Center for Applied Mathematics(BCAM), Alameda de Mazarredo 14, E-48009

Bilbao, Basque Country, Spain3 Department of Neuroscience, University of Pisa, via Paradisa 2, I-56126 Pisa, Italy

Abstract. In this work we discuss the application of the complexity ap-proach to the study of physiological signals. In particular, a theoreticalframework based on the ubiquitous emergence of fractal intermittency incomplex signals is introduced. This approach is based on the ability of com-plex systems’ cooperative micro-dynamics of triggering meta-stable self-organized states. The meta-stability is strictly connected with the emer-gence of a intermittent point process displaying anomalous non-Poissonstatistics and driving the fast transition events between successive meta-stable states. As a consequence, the estimation of features related to in-termittent events can be used to characterize the ability of the complexsystem to trigger self-organized structures.We introduce an algorithm for the processing of complex signals that isbased on the fractal intermittency paradigm, thus focusing on the detectionand scaling analysis of intermittent events in human ElectroEncephalo-Grams (EEGs). We finally discuss the application of this approach to realEEG recordings and introduce the preliminary findings.

Keywords: signal processing, complexity, fractal intermittency, brain, elec-troencephalogram (EEG), disorders of consciousness

1. IntroductionHuman physiology is a prototypical example of complexity and the brain issurely the most important one. The brain is composed of elementary units,the neurons, that are strongly connected with many other neurons with highlynonlinear interactions, given by chemically activated electrical signals travelingalong the inter-neuron links (axons and dendrites). The nonlinear dynamics atthe level of single neurons (i.e., the threshold mechanism for the electrical dis-charges generating spikes and bursts) are highly enhanced by the complex linktopology, but at the same time some kind of ordering, or self-organizing, princi-ple triggers the formation of global cooperativity. The overall picture is that of acomplex network with a huge number of nodes (the neurons) and links with avery complicated topology. It is then not surprising that brain dynamics displaya very rich landscape of different behaviors and a very efficient plastic behavior,characterized by a rapid and efficient capability of response to rapid changes in

BelBI2016, Belgrade, June 2016. 87

Page 118: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Paradisi et al.

the external environment. Due to this variety, the attempt of characterizing thebrain functioning with a relatively low number of parameters is a very fascinat-ing problem and a very hot topic in brain research. This topic involves differentdisciplines, spanning from biology and medicine to non-equilibrium statisticalphysics of complex systems, network analysis and information science.

2. The complexity paradigmThe complexity paradigm involves a modeling approach that is complementaryto the microscopic approach based on extending the micro-dynamics of singleunits to the network level via properly modeled node-node interactions. Follow-ing the paradigm of emerging properties, the complexity approach simply focuseson the modeling of self-organized large scale structures emerging from the coop-erative dynamics of the complex network. The main idea is that self-organizedstructures are the essential actors in the global dynamics of complex systems andplay a crucial role in the response of the system to external stimuli. As a con-sequence, also the statistical indicators extracted from the data analysis usuallyrefer to some global property associated with the large scale, global, dynamicalevolution of coherent or self-organized structures.

2.1. Complexity and fractal intermittencyEven if a universally accepted definition of a complexity does not yet exist, com-plex systems often display the following features:

(1) a complex system is multi-component with a large number of degrees offreedom, i.e., many functional units or nodes. As said above, these units interactwith each other and their dynamics are strongly nonlinear;(2) non-linearity and multi-component is not enough to define complexity: thedynamics must be cooperative and trigger the emergence of self-organized struc-tures;(3) self-organized states display long-range space-time correlations (slow power-law decay);(4) self-organized states are meta-stable, with relatively long life-times and fasttransition events between two successive states, denoted in the following as cru-cial events.

Crucial events determine a fast memory drop, while the self-organized struc-tures remain strongly correlated until their decay. The sequence of crucial events,marking the transition among self-organized states, is an emergent dynamics de-scribed as a a birth-death point process of self-organization. Then, the feature(4) in the above list is the basic property allowing for a description of com-plexity in terms of intermittent signals. Due to the fast memory drop occurringduring the fast transitions, each self-organized state is often independent fromeach other, as such as the crucial transition events. This is denoted as renewalcondition. In this case, the sequence of crucial events is described by a renewalpoint process. A very general observation is that a complex (cooperative) systemis characterized by long life-times that are statistically distributed according to ainverse power-law. The life-times correspond to the time between two successivecrucial events and are also denoted as inter-event times or Waiting Times (WTs).

88 BelBI2016, Belgrade, June 2016.

Page 119: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

EEG complexity measures...

In this work we discuss an approach to complexity based on the modeling oftime intermittency emerging by the underlying cooperative dynamics. In par-ticular, the emergence of a renewal point process whose WT distribution is ainverse power-law: ψ(τ) ∼ 1/τµ is denoted as fractal intermittency [9, 14, 15, 8,6] The distribution ψ(τ) and the exponent µ are emerging properties and, thus,a signature of complex behavior. In Fig. 1 we report a synthetic scheme qual-itatively explaining the connection between self-organization, cooperation andnon-Poisson renewal processes. Poisson renewal processes always emerge in thecase of independent systems, whatever the micro-dynamics of the single nodes.As a consequence, a departure from the Poisson statistics reveals some kind ofcooperation among the nodes of the network. For power-law distributed WTs,

Fig. 1. Comparison of Poisson (non-complex) and non-Poisson (complex) pro-cesses.

µ is then used as an indicator of complexity, essentially being a measure of theability of the system’s dynamics to trigger global self-organized structures. Inparticular, complexity is identified with a condition of very slow decay in ψ(τ),corresponding to the range µ < 3.Conversely, the feature (3) is the starting point for a description of complexityin terms of spatial and topological indicators (e.g., the degree distribution of acomplex network, avalanche size distribution).

3. Crucial events and fractal intermittency in the brainMeta-stability is a basic feature of the information processing in the brain neu-ral network. Fingelkurts and Fingelkurts recognized that rapid changes in theElectroEncephaloGram (EEG), called Rapid Transition Processes (RTPs), markpassages between two quasi-stationary periods, each one corresponding to dif-ferent neural assemblies, [1, 2] and are the signature of brain self-organization.RTPs and neural assemblies are then a prototype of crucial events and meta-stable self-organized states, respectively. The algorithm for the automatic detec-tion of RTP events in EEG data was developed in Ref. [2] and exploited by theauthors of Refs. [3–5, 7, 9, 10, 16] to characterize the complexity of the intermit-tent events. By exploiting a scaling detection method, the EDDiS method ([14]and references therein), these authors found that brain dynamics display fractalintermittency. In particular, it was shown that the fractal intermittency approachis able to reveal the integrated (Rapid Eye Movement, REM) and segregated(Non-REM) stages during sleep, thus in agreement with the consciousness stateof the subjects [9, 10, 16].

BelBI2016, Belgrade, June 2016. 89

Page 120: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Paradisi et al.

In the intermittency-based analysis here proposed, a key aspect is the definitionof events, which needs to be further studied in order to extend the above analysisto different experimental and clinical conditions.

4. Signal processing for intermittent complex systemsThe results obtained by applying the algorithms cited above and, in particular:(i) the RTP event detection algorithm [2];(ii) the EDDiS method for the evaluation of the diffusion scaling H, whose re-lationship with the index µ is known when renewal condition is positively vali-dated [3, 6, 8, 14];are very promising in the perspective of potential applications in the clinical ac-tivity of neurological disorders. However, RTP events are defined only for someexperimental conditions.In this work we investigate the key aspect of the event definition. We proposean algorithm involving a more general definition of event and being able todetect and discriminate events with different neuro-physiological origins. Theproposed method essentially extends the technique introduced and applied inRefs. [11–13]. This method allows to extract different kind of crucial eventsmarking the sudden increases of activity in given frequency bands. This allowsto derive different definitions of events and to build a very flexible algorithm tobe exploited in different experimental conditions.We assume that the signals were already pre-processed for the artifact cleaning.Then, the software tool is divided into different modules:

(1) splitting of the single EEG channel into different frequency bands;(2) detection of crucial events and high-activity epochs in the different fre-

quency bands by using a thresholding method;(3) building of a spatio-temporal map of events;(4) extraction of some specific kind of events from the event map;(5) estimation of the complexity of these events of interest, both for single EEG

channels and for global events.

Despite its apparent simplicity, this algorithm is very flexible and powerful. Beingbased on the classical Fourier approach and on splitting the EEG signal intostandard frequency bands, this approach allows for a more clear link betweenthe event detection algorithm and its neuro-physiological interpretation. In thissense, a particular kind of brain events should be recognized to be a neuralcorrelate of some increased neuro-physiological activity.

Finally, we will discuss some applications on real EEG data in different con-ditions (wake, sleep). Some preliminary results on subjects with disorder of con-sciousness will be presented.

References

1. A. A. Fingelkurts, A. A. Fingelkurts, Brain-Mind Operational Architectonics Imaging:Technical and Methodological Aspects. Open Neuroimag. J. 2 (2008) 73-93.

90 BelBI2016, Belgrade, June 2016.

Page 121: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

EEG complexity measures...

2. A.Y. Kaplan, A.A. Fingelkurts, A.A. Fingelkurts, B.S. Borisov, B.S. Darkhovsky, Nonsta-tionary nature of the brain activity as revealed by EEG/EMG: methodological, practicaland conceptual challenges. Signal Process. 85 (2005) 2190-2212.

3. P. Allegrini, D. Menicucci, R. Bedini, L. Fronzoni, A. Gemignani, P. Grigolini, B.J. West,P. Paradisi, Spontaneous brain activity as a source of ideal 1/f noise, Phys. Rev. E 80(2009), 061914.

4. P. Allegrini, P. Paradisi, D. Menicucci, A. Gemignani, Fractal complexity in spontaneousEEG metastable state transitions: new vistas on integrated neural activity. Frontiers inPhysiology 1, 128 (2010).

5. P. Allegrini, D. Menicucci, R. Bedini, A. Gemignani, P. Paradisi, Complex intermittencyblurred by noise: Theory and application to neural dynamics. Phys. Rev. E 82 (2010)015103.

6. P. Paradisi, R. Cesari, A. Donateo, D. Contini, P. Allegrini, Diffusion scaling in event-driven random walks: an application to turbulence. Rep. Math. Phys. 70 (2012) 205-220.

7. P. Allegrini, P. Paradisi, D. Menicucci, R. Bedini, A. Gemignani, L. Fronzoni, Noisy co-operative intermittent processes: From blinking quantum dots to human consciousness.J. Phys.: Conf. Series 306 (2011) 012027.

8. P. Paradisi, R. Cesari, A. Donateo, D. Contini, P. Allegrini, Scaling laws of diffusionand time intermittency generated by coherent structures in atmospheric turbulence.Nonlinear Processes in Geophysics 19 (2012) 113-126; P. Paradisi et al., Corrigendum,Nonlinear Processes in Geophysics 19 (2012) 685.

9. P. Paradisi, P. Allegrini, A. Gemignani, M. Laurino, D. Menicucci, A. Piarulli, Scalingand intermittency of brain events as a manifestation of consciousness, AIP Conf. Proc.1510 (2013), 151-161.

10. P. Allegrini, P. Paradisi, D. Menicucci, M. Laurino, R. Bedini, A. Piarulli, A.Gemignani, Sleep unconsciousness and breakdown of serial critical intermittency: Newvistas on the global workspace. Chaos, Solitons and Fractals 55 (2013) 32-43.

11. C. Navona, U. Barcaro, E. Bonanni, F. Di Martino, M. Maestri, L. Murri, An automaticmethod for the recognition and classification of the A-phases of the cyclic alternatingpattern, Clin. Neurophysio. 113 (2002), 1826-1833.

12. U. Barcaro, E. Bonanni, M. Maestri, L. Murri, L. Parrino, M.G. Terzano, A generalautomatic method for the analysis of NREM sleep microstructure, Sleep Med. 5 (2004),567-576.

13. M. Magrini, A. Virgillito, U. Barcaro, L. Bonfiglio, G. Pieri, O. Salvetti, M.C. Car-boncini, An automatic method for the study of REM sleep microstructure, Int. Work-shop on Computational Intelligence for Multimedia Understanding (IWCIM 2015),Prague, 29-30 October 2015, DOI: 10.1109/IWCIM.2015.7347066 [IEEE Xplore DigitalLibrary]

14. P. Paradisi, P. Allegrini, Scaling law of diffusivity generated by a noisy telegraphsignal with fractal intermittency, Chaos, Solitons and Fractals 81 (2015), 451–462.

15. P. Paradisi, G. Kaniadakis, A.M. Scarfone, The emergence of self-organization in com-plex systems–Preface, Chaos, Solitons and Fractals 81 (2015) 407–411.

16. P. Allegrini, P. Paradisi, D. Menicucci, M. Laurino, A. Piarulli, A. Gemignani, Self-organized dynamical complexity in human wakefulness and sleep: Different criticalbrain-activity feedback for conscious and unconscious states, Phys. Rev. E 92, (2015)032808.

BelBI2016, Belgrade, June 2016. 91

Page 122: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

DORMANCYbase developing a bioinformaticsdatabase on molecular regulation of animal

dormancy

Popovic Zeljko D.1,2, Kadlecsik Tamas2, Fazekas David2, Ari Eszter2, KorcsmarosTamas3, Uzelac Iva1, Avramov Milos1, Krivokuca Nikola1, Kitanovic Nevena1,

and Kokai Dunja1

1 University of Novi Sad, Faculty of Sciences, Department of Biology and Ecology, TrgDositeja Obradovia 3, 21000 Novi Sad, Serbia

[email protected] Eotvos Lorand University, Department of Genetics, Pazmany Peter stny. 1/C, H-1117

Budapest, [email protected]

3 TGAC, The Genome Analysis Centre, Gut Health and Food Safety Programme,Institute of Food Research Norwich Research Park, Norwich, Norfolk, NR4 7UH, UK

[email protected]

Abstract

Dormancy is a period in an organisms life cycle when growth, development andphysical activity are temporarily arrested. It involves changes on behavioral,morphological, physiological, biochemical and molecular levels that, taken to-gether, increase the stress tolerance of organisms and help them survive harshenvironmental conditions. In the last two decades, application of -omic andother modern technologies has led to the exponential expansion of scientific dataon molecular background of dormancy. However, usage of these data is difficultand limited due to lack of organized system for the storage of produced data. Inthat light, developing a unique database, named DORMANCYbase, of gene andprotein expression during animal dormancy will provide an inimitable sourceof functional expression data derived from scientific literature. DORMANCYbasewill be available for free on the website: www.dormancybase.org, and linkedto other relevant databases such as NCBI, UniProt, DDJB etc. Not only will thedatabase allow scientists to browse information, but they will also be able to sub-mit their own research data. Excluding data from mass parallel -omic platforms,the database currently contains nearly 1000 RNA and protein sequences from63 different animal species, and includes a wide range of information regard-ing type of dormancy, life stage, organ/tissue/cell type and methodology usedfor analysis, expression level as well as DOI numbers and URLs of entered pub-lications. Analyzing all this data, we expect to define common groups of geneswhich participate in the regulation of certain dormant states and also in responseto diverse kinds of stress. The results will enable scientists to compare gene andprotein sets expressed in various dormancies and organisms, both useful andharmful from mans point of view. Furthermore, molecular data from DORMAN-CYbase will allow researchers to identify both conserved and specific molecular

92 BelBI2016, Belgrade, June 2016.

Page 123: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

DORMANCYbase developing a bioinformatics database ...

processes in different resting phases, as well as explore functional networks ofgenes and their products for a given type of dormancy.

Keywords: dormancy, database, gene, protein, expression, bioinformatics

BelBI2016, Belgrade, June 2016. 93

Page 124: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Examining regulation of restriction-modificationsystems by quantitative modeling

Andjela Rodic and Marko Djordjevic

Faculty of Biology, University of Belgrade, Studentski trg 16, 11000 Belgrade, Serbia{andjela.rodic,dmarko}@bio.bg.ac.rs

Abstract

Bacterial restriction-modification (RM) systems encode a restriction enzyme (R),which cuts specific DNA sequences, and a methyltransferase (M), which methy-lates the same sequences thus protecting them from cutting. Expression of theseenzymes has to be tightly coordinated to provide defense against foreign DNAwithout damaging host DNA, which is often accomplished by control (C) proteindriven regulation.

The main technical difficulty in directly observing R and M expression is synchro-nizing the plasmid entry in the bacterial cells. To resolve this difficulty, our col-laborators performed the first single-cell measurements of the in vivo dynamicsof R and M expression, done for Esp1396I RM system. We developed a quanti-tative model of the system dynamics, where we used statistical thermodynamicsto model transcription regulation of the system promoters, which was then usedas an input for the dynamical modeling, predicting the change of the enzymeamounts in a cell. The model successfully reproduces the main experimentallyobserved features of the expression dynamics the significant delay of R withrespect to M expression, including a high pic in M expression for the early times[1].

We use a similar modeling approach to perturb characteristic features of AhdIRM system, where we show that its design may be explained by the followingprinciples: a delayed R expression, a fast transition from ”OFF” to ”ON” state,and the stable steady state. We use these design principles to propose an expla-nation for the extremely high binding cooperativity and dimerization constantobserved in AhdI [1], and propose that these principles should be of generalapplicability to RM systems.

Keywords: restriction-modification system, regulation, control protein

References

1. Morozova, N., Sabantsev, A., Bogdanova, E., Fedorova, Y., Majkova, A., Vediajkin, A.,Rodic, A., Djordjevic, M., Khodorkovskiy, M., Severinov, K.: Nucleic Acids Research, 44,790-800. (2015)

2. Rodic, A., Blagojevic, B., Zdobnov, E., Djordjevic M., Djordjevic M., submitted. (2016)

94 BelBI2016, Belgrade, June 2016.

Page 125: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

On the clustering of biomedical datasets - adata-driven perspective

Richard Roettger

University of Southern Denmark, Department of Mathematics and Computer Science,Odense, Denmark

[email protected]

Abstract

Nowadays, scientists of virtually all disciplines are confronted with an increas-ing supply of information; this is especially true for biomedical research whererecent advances in wet-lab technologies have led to a sheer explosion of thewealth, quality, and amount of available data. A typically first step in analyzingthese large datasets is the so-called cluster analysis which unravels the inherentstructure of the data by grouping similar objects together.

Despite being a long standing problem, conducting a cluster analysis is every-thing but straight-forward; to the contrary, a high quality clustering analysis isvery often overwhelming the practitioner. A multitude of decisions have to bemade, all requiring deep understanding of the underlying methods; decisionslike feature extraction, similarity calculation, clustering tool selection and pa-rameter optimization etc often overwhelm the practitioner. Here, well-structuredand objective guidelines are widely missing, especially on larger scale.

To attack these challenges, we have developed ClustEval, a fully integrated andautomatized cluster evaluation framework. The power of this framework al-lowed us to conduct a massive, objective and fully reproducible clustering com-parison analysis consisting of several million evaluations. This massive data-driven background of structured clustering results allowed us provide an highlydemanded overview of the field and to carefully derive guidelines for the clus-tering of biomedical datasets which we recently published in Nature Methods.Based on this effort, we want to present ClustEval, most recent findings, andfurthermore aim to evaluate the future perspectives for improving the overallquality and usability of cluster analyses.

All results and the framework are freely available: http://clusteval.sdu.dk/

BelBI2016, Belgrade, June 2016. 95

Page 126: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Identification of genes involved in morphogenesisin vitro in Centaurium erythraea Rafn. as a model

organism

Ana Simonovic1, Milan Dragicevic1, Giorgio Giurato2, Biljana Filipovic1,Sladjana Todorovic1, Milica Bogdanovic1, Katarina Cukovic1, and Angelina

Subotic1

1 Institute for Biological Research ”Sinisa Stankovic”, University of Belgrade, Bul.Despota Stefana 142, 11000 Belgrade, Serbia

{ana.simonovic, mdragicevic, biljana.nikolic, slatod, milica.bogdanovic,

heroina}@ibiss.bg.ac.rs2 Genomix4Life Srl, Spin-Off of the Laboratory of Molecular Medicine and Genomics,

University of Salerno, Baronissi (SA), [email protected]

Abstract

Centaurium erythraea is an endangered medicinal plant with great regenera-tion potential and developmental plasticity in vitro [1]. Identification of genesinvolved in organogenesis and somatic embryogenesis (SE) is the first step to-wards elucidation of molecular mechanisms underlying centaurys morphogenicplasticity. RNA from leaves (L), roots (R), embryogenic calii (EC), globular so-matic embryos (GSE), cotyledonary somatic embryos (CSE) and adventitiousbuds (AB) was sequenced, resulting in 29-37 million reads/sample. Sequencing,de novo transcriptome assembly using Trinity and annotation were operated byGenomix4Life. The reference transcriptome (142 Mbp) contained 160,839 Trin-ity transcripts comprising 105,726 ”genes”. Of 160,839 transcripts, 44,288 hadBlast hits, 26,435 had GO Slim annotation, whereas 9,552 were with GO map-ping. The top-hit species was Coffea canephora. Relative expression was com-puted by aligning high quality reads to the Trinity transcripts and presented asTMM-FPKM. In each sample ≥30,000 transcripts were expressed. Transcriptsinvolved in different morphogenetic paths were filtered using R. Potential SEmarkers (FPKM ≥1 in EC or GSE and ≥8x higher FPKM in EC or GSE than inL, R and AB) included 1989 sequences, such as LRR receptor-like PK, germin-like proteins, TFs WRKY, AINTEGUMENTA and others. There were 1203 tran-scripts important for later SE development, including seed storage proteins andexpansins. Finally, 727 transcripts with at least 8x higher FPKM in AB than inother samples were considered as important for organogenesis.

This work was supported by the Ministry of Education, Science and Technologi-cal Development of the Republic of Serbia, Project TR-31019.

Keywords: organogenesis, somatic embryogenesis, RNA sequencing, transcrip-tome

96 BelBI2016, Belgrade, June 2016.

Page 127: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Identification of genes involved in morphogenesis ...

References

1. Filipovic, B.K, Simonovic, A.D., Trifunovic, M.M., Dmitrovic, S.S., Savic, J.M., Jevremovic,S.B., Subotic, A.R.: Plant regeneration in leaf culture of Centaurium erythraea Rafn. Part1: The role of antioxidant enzymes. Plant Cell Tiss Organ Cult, 121(3), 703-719. (2015)

BelBI2016, Belgrade, June 2016. 97

Page 128: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Mathematical Modeling of theHypothalamic-Pituitary-Adrenal Axis Dynamics in

Rats

Ana Stanojevic1, Vladimir Markovic1, Zeljko Cupic2, Stevan Macesic1, VladanaVukojevic3, and Ljiljana Kolar-Anic1,2

1 University of Belgrade, Faculty of Physical Chemistry, Studentski trg 12-16, 11158Belgrade, Serbia

{ana.stanojevic, vladimir.markovic, stevan.macesic, lkolar}@ffh.bg.ac.rs2 University of Belgrade, Institute of Chemistry, Technology and Metallurgy, Department

of Catalysis and Chemical Engineering, Njegoeva 12, 11000 Belgrade, [email protected]

3 Karolinska Institute, Department of Clinical Neuroscience, Center for MolecularMedicine CMM L8:01, 17176 Stockholm, Sweden

[email protected]

Abstract

The hypothalamic-pituitary-adrenal (HPA) axis is a dynamic regulatory networkof biochemical reactions that integrates and synchronizes the nervous and theendocrine systems functions at the organism level. In order to describe how thisvast network of biochemical interactions operates, we have developed a nonlin-ear eleven-dimensional stoichiometric model that concisely describes key bio-chemical transformations that comprise the HPA axis in rats. In a stoichiometricmodel of a biochemical system, the outcomes of complex biochemical pathwaysare succinctly described by stoichiometric relations. In this representation, sub-stances that initiate, i.e. enter a pathway are regarded to behave as reactants;substances that are generated in a pathway are regarded to behave as products;and the rates at which products of a pathway appear are jointly proportionalto the concentrations of the reactants. In order to derive rate constants for spe-cific biochemical reaction pathways, we have resorted to our recently developednonlinear reaction model that concisely describes biochemical transformationsin the HPA axis in humans. In this way, a mathematical framework is developedto describe in the form of a system of ordinary differential equations (ODEs) theintegration of biochemical pathways that constitute the HPA axis on chemical ki-netics basis. This, in turn, allows us to use numerical simulations to investigatehow the underlying biochemical pathways are intertwined to give an integralHPA axis response at the organism level to a variety of external or internal per-turbators of the HPA dynamics. Given that the HPA axis is a nonlinear dynami-cal network, its response is complex and often cannot be intuitively predicted,stoichiometric modeling can be harnessed for gaining additional insights intodynamical functioning of this complex neuroendocrine system.

Keywords: Hypothalamic-pituitary-adrenal (HPA) axis, rats, nonlinear dynami-cal network, system of ordinary differential equations

98 BelBI2016, Belgrade, June 2016.

Page 129: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Chaos and symmetry in mathematical neural flowmodels

Rodica Cimpoiasu, Radu Constantinescu, and Alina Streche

University of Craiova, 13 A.I.Cuza, 200585 Craiova, Romania{rodicimp, rconsta, maria.alina2009}@yahoo.com

Abstract

There are many mathematical models trying to explain the propagation of theneural flow in terms of nonlinear dynamical systems. Our paper presents someresults related to two such models. The first model consists in a system of 3 non-linear ODEs of Lorenz type and the second one is described through a general-ized nonlinear Boussinesq equation. In the first case, the neural flow is modeledby an electronic circuit generating chaotic signals, while, in the second case,the propagation of nerve pulses through neurons is assimilated with the soundpropagation in cylindrical bio-membranes. The approach through the symmetrygroup method allows obtaining important information concerning each of thetwo systems. We will focus on the chaotic behavior and control techniques forthe first example and on specific solitary wave solutions for the evolution de-scribed by the second system.

Keywords: nerve propagation models, Lie symmetry, chaos, solitary wave

References

1. Olver, P. J.: Applications of Lie Groups to Differential Equations, GTM 107, Second edn.,Springer-Verlag. (1993)

2. Bluman G.W., Kumei S.: Symmetries and Differential Equations. New York, Springer (1989)3. Cimpoiasu, R., Constantinescu, R.: Nonlinear Analysis:Theory, Methods and Applications,

vol.73, Issue1, 147-153 (2010)4. Sprott, J.C.: Elegant Chaos, World Scientific Publishing Co., (2010)5. Ionescu, C., Florian, G., Panaintescu, E., Petrisor, I.: Nonlinear control of chaotic circuits,

in Rom.J.Phys., Vol.61, Nos.1-2, 183-193 (2016)6. Ji, L.: J. Math.Anal.Appl.440, 286299 (2016)7. Cimpoiasu, R.: Nerve pulse propagation in biological membranes: solitons and other in-

variant solutions, in International Journal of Biomathematics, Vol.9, No.5 (2016), DOI:10.1142/S1793524516500753

BelBI2016, Belgrade, June 2016. 99

Page 130: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Graph theoretical analysis reveals: Womens brainsare better connected than mens

Balazs Szalkai, Balint Varga, and Vince Grolmusz

Eotvos Lorand University, Protein Information Technology Group, Pazmany Peter setany1/C, Budapest XI, Hungary

{szalkai,balorkany,grolmusz}@pitgroup.org

Abstract

Deep graph-theoretic ideas in the context with the graph of the World Wide Webled to the definition of Googles PageRank and the subsequent rise of the mostpopular search engine to date. Brain graphs, or connectomes, are being widelyexplored today. We believe that non-trivial graph theoretic concepts, similarly asit happened in the case of the World Wide Web, will lead to discoveries enlight-ening the structural and also the functional details of the animal and humanbrains. In the present work we have examined brain graphs, computed from thedata of the Human Connectome Project, recorded from male and female subjectsbetween ages 22 and 35.

Significant differences were found between the male and female structural braingraphs: we show that the average female connectome has more edges, is a bet-ter expander graph, has larger minimal bisection width, and has more spanningtrees than the average male connectome. Since the average female brain weighsless than the brain of males, these properties show that the female brain hasbetter graph theoretical properties, in a sense, than the brain of males [1].

Keywords: sex, brain, graph, graph theory, bioinformatics, MRI

References

1. Szalkai, B., Varga, B., Grolmusz, V.: Graph Theoretical Analysis Reveals: Womens BrainsAre Better Connected than Mens. PLOS ONE, DOI: 10.1371/journal.pone.0130045 (2015)

100 BelBI2016, Belgrade, June 2016.

Page 131: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Comparative Connectomics: Mapping theInter-Individual Variability of Connections within

the Regions of the Human Brain

Balint Varga

Eotvos Lorand University, Budapest, [email protected]

Abstract

The human braingraph, or connectome is a description of the connections ofthe brain: the nodes of the graph correspond to small areas of the gray mat-ter, and two nodes are connected by an edge if a diffusion MRI-based workflowfinds fibers between those brain areas. We have constructed 1015-vertex graphsfrom the diffusion MRI brain images of 395 human subjects and compared theindividual graphs with respect to several different areas of the brain. The inter-individual variability of the graphs within different brain regions was discoveredand described. We have found that the frontal and the limbic lobes are more con-servative, while the edges in the temporal and occipital lobes are more diverse.Interestingly, a ”hybrid” conservative and diverse distribution was found in theparacentral lobule and the fusiform gyrus. Smaller cortical areas were also eval-uated: precentral gyri were found to be more conservative, and the postcentraland the superior temporal gyri to be very diverse.

References

1. The Human Connectome Project and beyond: initial applications of 300 mT/m gradi-ents. Neuroimage, 80:234245, Oct 2013. doi: 10.1016/j.neuroimage.2013.05.074. URLhttp://dx.doi.org/10.1016/j.neuroimage.2013.05.074.

2. Sex differences in the structural connectome of the human brain. Proc NatlAcad Sci U S A, 111(2):823828, Jan 2014. doi: 10.1073/pnas.1316909110. URLhttp://dx.doi.org/10.1073/pnas.1316909110.

3. Graph theoretical analysis reveals: Womens brains are better connected than mens. PLOSOne, July 2015a. http://dx.plos.org/10.1371/journal.pone.0130045.

4. [Szalkai et al.(2015b)] The Budapest Reference Connectome Server v2. 0. Neuroscienceletters, 595:6062, 2015b.

5. [Hirsch(1997)] Differential Topology. Springer-Verlag, 1997. ISBN 978-0-387-90148-0.6. [Feller(2008)] An introduction to probability theory and its applications. John Wiley & Sons,

2008.7. [Daducci et al.(2012)] The connectome mapper: an open-source processing pipeline

to map con- nectomes with MRI. PLoS One, 7(12):e48121, 2012. doi: 10.1371/jour-nal.pone.0048121. URL http://dx.doi.org/10.1371/journal.pone.0048121.

8. [Tournier et al.(2012)] Mrtrix: diffusion tractography in crossing fiber regions. InternationalJournal of Imaging Systems and Technology, 22(1):5366, 2012.

BelBI2016, Belgrade, June 2016. 101

Page 132: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Viral: Real-world competing processsimulations on multiplex networks

Petar Velickovic, Andrej Ivaskovic, Stella Lau, and Milos Stanojevic

Computer Laboratory, University of Cambridge,Cambridge CB3 0FD, UK

{pv273,ai294,sl715,ms2239}@cam.ac.uk

Abstract. Accurate modelling of spreading processes represents a crucialchallenge of modern bioinformatics, particularly in the context of predict-ing the consequences of epidemics (e.g. the proportion of population in-fected at the critical point). A wide variety of frameworks have been es-tablished; especially, recent developments in multiplex networks allow forintegrating several competing spreading processes and modelling their in-teractions more directly. However, the research developments so far haveprimarily been evaluated on randomly-generated networks and assump-tions on network dynamics that are unlikely to correspond to actual humanpsychology. As a decisive step towards controlled experiments of this kind,we present Viral, a multiplex-network-guided system for real-world simu-lations of the competing processes of epidemics and awareness in modernsociety, based around a lightweight distributed Android application and acentralised simulation server, both of which are simple to set up and config-ure. Extensive logging facilities are provided for analysing the simulationresults.

Keywords: multiplex networks, competing processes, epidemics, aware-ness, real-world simulations, Android

1. Introduction

Traditionally, epidemics modelling has been performed by way of single-layerednetworks, representing humans as nodes with a set of possible states they can bein (susceptible-infected-recovered (SIR) and its varieties being a popular choice)and allowing for disease to spread along the links of the network, representingpairs of people that come into physical contact. However, incorporating severalcompeting processes into the model via multiplex networks [1] has been a topicof plentiful related research in recent years [2–6], showing an emergence of pre-viously unseen important phenomena in epidemics-related networks. Informally,a multiplex network is a multi-layered graph in which each layer is built over thesame set of nodes, and there may exist edges between nodes in different layers.Here the nodes usually represent individuals in a population, while the layersusually correspond to the different processes under study.

This framework has thus far been almost exclusively applied to generatednetworks (common choices include Erdos-Renyi random graphs [7] and Barabasi-Albert scale-free networks [8]), and assumptions on the network dynamics (such

102 BelBI2016, Belgrade, June 2016.

Page 133: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Viral: Real-world competing process simulations on multiplex networks

as the Markov property) that may not always correspond to human psychol-ogy are often made. With the primary aim of providing a complementary toolthat allows researchers to further verify their predictions on real-world con-trolled experiments, we have developed Viral during the 24-hour Hack Cam-bridge hackathon (https://www.hackcambridge.com/), where it was commend-ed as one of the top seven projects (out of ∼100 participating teams from toptier universities).

2. Multiplex network model

We consider a multiplex network setup with two layers (for the epidemics andawareness processes, respectively) over the same set of nodes, correspondingto individuals in the population. An SIS (susceptible-infected-susceptible) pro-cess is assumed for the epidemics layer, while a UAU (unaware-aware-unaware)process is assumed for the awareness layer (akin to the model used in [2]).

Along the awareness layer, knowledge of an epidemic can spread betweenindividuals that exchange information. This is modelled implicitly—the individ-uals are allowed to communicate verbally and via social networks, requiring noadditional state to be maintained for supporting it.

The layers influence one another in two critical ways: 1) a susceptible in-dividual that is aware of an epidemic can get vaccinated, thus diminishing theirprobability of infection; 2) an individual that becomes infected will, with a fixedprobability, become aware. The full network dynamics are illustrated by Fig. 1.

In order to discourage the “pack behaviour” in which awareness immediatelyfully spreads and everyone gets immunised early on, a novel component of oursystem encourages a proportion of the population to behave carelessly, by as-signing them a negative role of an infector—their purpose being to get as muchof the population infected as possible until the round ends. All other (human)nodes are simply tasked with staying healthy until the end of the round.

S

I

V

S

I

S

U

A

A

U

U

A

V

I

A

U

self-awareness

immunisa

tion

epidemics layer

awareness layer

Fig. 1. Illustration of the underlying network dynamics assumed by Viral. Nodestake part in both the epidemics layer (SIS + vaccinated) and the awareness layer(UAU).

BelBI2016, Belgrade, June 2016. 103

Page 134: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

P. Velickovic et al.

3. Implementation

Viral consists of two core components: the server and the Android application.The Android application represents a node in the network and broadcasts itscurrent geolocation to the server, which is used to compute distances betweennodes and obtain transmission probabilities for modelling the epidemics layer.The server simulates the state transitions (such as a change in awareness orphysical state) and sends the node’s updated state to the Android application.

3.1. Server

The server communicates with the Android applications as well as simulates andmaintains the network state. It also periodically appends the network state intoa log file for the current session and provides a visualisation tool that displaysthe most recent state (created in publication-ready TikZ format—examples canbe seen in the synthetic experiments’ outputs in Section 4.2).

Simulating the epidemics layer is achieved by maintaining a matrix M ofinverse-exponential distances between all pairs of nodes with

Mij = ke−λdij (1)

where k > 0 and λ are server parameters, and dij is the great circle distancebetween the locations of node i and node j. The probability of activation foredge i↔ j is given by normalising:

Pij =Mij∑i,jMij

(2)

This means that the likelihood of infection increases as the proximity betweennodes increases, corresponding to an assumption of airborne transmission. Anedge activated between a susceptible and an infected node leads to the suscep-tible node becoming infected with a specified probability (also a server parame-ter).

3.2. Android application

The Android application consists of two main graphical components (Fig. 2):

– Initial screen: the first prompt which becomes visible to the user once theapplication is started; it allows the user to provide the hostname and port ofa Viral server;

– Main screen: the screen responsible for showing all the necessary informationreceived from the server, as well as allowing user input where necessary.

Once the the hostname and port are provided via the initial screen, all the nec-essary components of the application are initialised. Thereafter, messages fromthe server can trigger updates to the main screen. Concurrently, when the po-sition of the device is changed, its new geolocation is submitted to the server.In addition, the user can enter (and potentially be shown) a round-unique codein order to initiate vaccination—this code can be shared among users, implicitlysimulating the awareness layer.

104 BelBI2016, Belgrade, June 2016.

Page 135: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Viral: Real-world competing process simulations on multiplex networks

Fig. 2. A variety of screenshots of the Viral Android application. Left-to-right: theinitial screen, followed by three different states of the main screen.

4. Usage

4.1. Installation

The full source code of Viral is hosted on the corresponding author’s GitHubprofile, at https://github.com/PetarV-/viral, and is licensed under the MITlicense. The source may be downloaded as an archive from GitHub, or the repos-itory may be directly cloned by running the following command within a termi-nal:

$ git clone https://github.com/PetarV-/viral.git

Detailed instructions for compiling and configuring the server, as well as set-ting up the Android application and configuring the synthetic clients used forthe runs below, are provided in the README file of the repository.

4.2. Synthetic experiments

While the primary purpose of Viral is creating data from a controlled and real en-vironment, it also supports the addition of bots (virtual participants), whom theserver does not distinguish from users. In the current model, the bots performrandom walks and periodically send position updates to the server. No other be-haviour is given to the bots, other than them vaccinating themselves if they haveaccess to the valid vaccine code and have the human role.

We have run our application on purely synthetic data for preliminary mea-surements. Some interesting cases of network behaviour (with different networkparameters) can be seen in Fig. 3.

5. Conclusions

In this applications note we have presented Viral, a utility for performing real-world controlled experiments on epidemics spreading with configurable param-eters, taking advantage of the Android platform and multiplex networks. To the

BelBI2016, Belgrade, June 2016. 105

Page 136: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

P. Velickovic et al.

U

U

A

AU

A

U

U

U

A

A

UU

U

U

A

U

A

UA

U

A

A

U

U

U

AU

U

U

U

U

U

AA

U

U

A

U

A

U

AU

U

U

Fig. 3. Examples of round endings. The first diagram shows a situation with pa-rameters corresponding to a typical flu-like epidemic. The second diagram cor-responds to a pandemic-like scenario, in which everybody who is not vaccinatedbecomes infected. The third diagram corresponds to a severe epidemic with in-effective vaccines. The node colours correspond to the colour coding from Fig.1, and the link intensities correspond to proximities in the epidemics layer.

best of our knowledge, it is the first of its kind, and should serve as both avaluable tool for bioinformaticians and a potential reference implementation forfuture advancements in the area of real-world simultaneous spreading processsimulation. In particular, the choice and amount of processes being consideredshould be extendable to other cases, such as simultaneously considering multi-ple transmission paths of a single disease [5] or multiple diseases [6]. We be-lieve that the awareness component is also vital, and the framework providedby Viral for implicitly simulating it should prove highly valuable in all futureextensions. Furthermore, we hope that the human/infector model considered inSection 2 should be a valuable first step towards accurately simulating the factthat a large proportion of the population acts fairly carelessly in the presence ofan epidemic.

References

1. Kivela, M., Arenas, A., Barthelemy, M., Gleeson, J. P., Moreno, Y., & Porter, M. A.(2014). Multilayer networks. Journal of Complex Networks, 2(3), 203-271.

2. Granell, C., Gomez, S., & Arenas, A. (2014). Competing spreading processes on mul-tiplex networks: awareness and epidemics. Physical Review E, 90(1), 012808.

3. Buono, C., Alvarez-Zuzek, L. G., Macri, P. A., & Braunstein, L. A. (2014). Epidemicsin partially overlapped multiplex networks. PloS one, 9(3), e92200.

4. Zhao, D., Wang, L., Li, S., Wang, Z., Wang, L., & Gao, B. (2014). Immunization ofepidemics in multiplex networks. PloS one, 9(11), e112018.

5. Zhao, D., Li, L., Peng, H., Luo, Q., & Yang, Y. (2014). Multiple routes transmittedepidemics on multiplex networks. Physics Letters A, 378(10), 770-776.

6. Azimi-Tafreshi, N. (2015). Cooperative epidemics on multiplex networks. arXivpreprint arXiv:1511.03235.

7. Erdos, P., & Renyi, A. (1959). On random graphs. Publicationes Mathematicae De-brecen, 6, 290-297.

8. Barabasi, A. L., & Albert, R. (1999). Emergence of scaling in random networks. sci-ence, 286(5439), 509-512.

106 BelBI2016, Belgrade, June 2016.

Page 137: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

White-Box Predictive Algorithms for PredictingDisease States on Gene Expression Data From

Component Based Design to Meta Learning

Milan Vukicevic, Sandro Radovanovic, Boris Delibasic, and Milija Suknovic

University of Belgrade, Faculty of Organizational Sciences, Jove Ilica 154, Belgrade,Serbia

milan.vukicevic, sandro.radovanovic, boris.delibasic,

[email protected]

Abstract

White-Box or Reusable Component Based Approach for design and applicationof predictive algorithms is recently proposed and allows numerous advantagesover traditional (black-box) design: development of algorithms on common ba-sis, seamless design of large number of hybrid algorithms, fair performance com-parison, increased interpretability of the results and easier adoption in practiceetc.

In this paper we will showcase possibilities of white-box clustering algorithmdesign for predicting disease states based on gene expression microarray data intwo ways. First we will design large number of hybrid clustering algorithms inorder to achieve increased adaption of models for data at hand. Second, we willexploit models and evaluations to build meta-learning system that will allowefficient performance estimation of models on new gene expression microarraydata.

Keywords: white-box, clustering, gene expression, microarray, disease state pre-diction

BelBI2016, Belgrade, June 2016. 107

Page 138: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in
Page 139: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

POSTER SESSION

Page 140: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in
Page 141: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Machine learning-based approach to helpdiagnosing Alzheimer’s disease through

spontaneous speech analysis

Jelena Graovac, Jovana Kovacevic, and Gordana Pavlovic Lazetic

Faculty of Mathematics, University of Belgrade, Studentski trg 1611000 Belgrade, Serbia

{jgraovac,jovana,gordana}@matf.bg.ac.rs

Abstract

Alzheimer’s disease and other dementias have been recognized as a major publichealth problem among the elderly in developing countries. We address this issueby exploring automatic noninvasive techniques for diagnosing patients throughanalysis of spontaneous, conversational speech. The techniques we are propos-ing are variant of the n-gram based kNN and SVM machine learning techniques.Since we use byte-level n-grams, we do not use any language dependent infor-mation, including word boundaries, character case, white-space characters orpunctuation [1].

Twelve adults diagnosed with dementia of Alzheimer type (DAT) participate inthe study. All DAT participants were interviewed at adult day care center forpeople with Alzheimer’s disease or dementia in Novi Sad, the only institutionof its kind in Serbia. All interviews were audio-taped, transcribed verbatim bya trained researcher, and checked for accuracy by the authors. Means for theMini-Mental Status Exam distinguished the two groups: moderate and mild.

Our plan is to compile a control dataset based on the interviews of healthy el-derly that do not differ significantly in age, sex or education level from the DATparticipants. We plan to compare DAT and healthy elderly participants to testhow well our techniques will discriminate between these groups. We also plan tomake a distinction between the two groups of the DAT participants. We alreadyperformed some preliminary experiments in that way, and we got promising re-sults.

We hope that our techniques will show promising as diagnostic and prognosticadditional tools that may help earlier diagnosis of DAT and determining its de-gree of severity.

Keywords: dementia of Alzheimer type, automatic diagnostics, natural languageprocessing, machine learning

References

1. Thomas, Calvin et al.: Automatic detection and rating of dementia of Alzheimer type throughlexical analysis of spontaneous speech. Mechatronics and Automation, 2005 IEEE Interna-tional Conference. Vol. 3. IEEE (2005)

BelBI2016, Belgrade, June 2016. 111

Page 142: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Targeted resequencing in diagnostics of inheritedgenetic disorders

Jelena Kusic-Tisma1, Nikola Ptakova2, A. Divac1, M. Ljujic1, Lj. Rakicevic1, M.Tesic3,4, N. Antonijevic3,4, S. Kojic1, Milan Macek Jr.2, and D. Radojkovic1

1 Institute of Molecular Genetics and Genetic Engineering, University of Belgrade,Vojvode Stepe 444a, 11000 Belgrade, Serbia

[email protected] Department of Biology and Medical Genetics, University Hospital Motol and 2nd

School of Medicine, Charles University, Prague, Czech Republic3 Clinic for Cardiology, Clinical Center of Serbia, Belgrade, Serbia

4 Faculty of Medicine, University of Belgrade, Belgrade, Serbia

Abstract

Next-generation sequencing technologies have made genetic testing a powerfuland cost-effective new tool in diagnostic of inherited diseases with locus andallelic heterogeneity. In this study we performed targeted resequencing in twogroups of patients with different disorders: cystic fibrosis (CF, ORPHA586) andhypertrophic cardiomyopathy (HCM, ORPHA217569).

Patients with CF were analyzed using CFTR MASTR kit (Multiplicom). The tech-nology is based on multiplex PCR amplification of coding regions of the CFTRgene and selected intronic variants resulting in 48 amplicons with average size of460 bp. The generated amplicon library was pair-end sequenced on a MiSeq sys-tem (Illumina Inc., San Diego, CA) using MiSeq Reagent Kit v2 (2x250 cycles).Fastq files produced upon library sequencing were processed by Sequencing Pilotsoftware v 3.5.0 (JSI MedicalSystems). Data analysis included trimming of thePCR primer sequences from the reads. Evaluation of detected variants for diseaserelevance was based on the CFTR databases: http://www.genet.sickkids.on.ca/app and http://www.cftr2.org.

Patients with HCM were analyzed by TruSight Cardiomyopathy Sequencing Panel(Illumina). Library preparation was based on target enrichment by hybridiza-tion. Target region covered 46 genes of interest (246 Kb total). The resultinglibrary was pair-end sequenced on a MiSeq system (Illumina Inc., San Diego,CA) using MiSeq Reagent Kit v2 (2x250 cycles). Alignment of Fastq files andvariant calling were done on machine by Miseq Reporter software v2.5. Gener-ated vcf files were annotated by VariantStudio (v2.2). Detected variants wereassessed for pathogenicity using guidelines of American college of medical ge-netics and genomics [1].

The NGS technology in combination with a well-characterized clinically rele-vant gene variation database is a good alternative for a time consuming step-wise testing of genes with large allelic heterogeneity such as CFTR. Absence ofsuch databases for HCM render variants in insufficiently studied genes difficult

112 BelBI2016, Belgrade, June 2016.

Page 143: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Targeted resequencing in diagnostics of ...

to interpret, increasing likelihood to classify them as variants of unknown sig-nificance (VUS) and failing to determine genetic basis of disease.

References

1. Standards and guidelines for the interpretation of sequence variants: a joint consen-sus recommendation of the American College of Medical Genetics and Genomics andthe Association for Molecular Pathology. Genetics in Medicine (2015) 17, 405–423doi:10.1038/gim.2015.30

BelBI2016, Belgrade, June 2016. 113

Page 144: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

A biologically-inspired model of visual wordrecognition

Yair Lakretz1, Naama Friedmann1, and Alessandro Treves2

1 Tel-Aviv University, Tel-Aviv, [email protected]

2 SISSA, 265 Via Bonomea, Trieste, Italy

Abstract. We present a computational model of visual word recognition.The model is biologically inspired, incorporating plausible cortical dynam-ics, thus adding to previous studies, which have used connectionist or’box-and-arrow’ type models. We begin by exploring several methods torepresent the letter identities in an artificial neural network, and to iden-tify the method that best agrees with experimental findings and compu-tational constraints. In the self-organization process of a multilayer neuralnetwork, letter-identity and letter-position representations are further pro-cessed to create word representations. These correspond to word memo-ries in an orthographic lexicon, as described in neuropsychological mod-els, and function as attractors of the neural network. Simulations presentnormal reading by the network in the absence of noise or deficits. Whennoise or deficits are introduced, the network presents failures such as let-ter transposition or letter substitution, which are similar to those made bydyslexics with letter-position dyslexia and letter-identity dyslexia, respec-tively.

Keywords: Reading, attractor neural networks, dyslexia

1. Introduction

Reading is a complex skill. It requires the brain to perform multiple processessuch as graphical pattern recognition, extraction of meaning, word productionand more, all in parallel and in strikingly short time. The first stages of the pro-cess of reading include the encoding of letter identities, letter position process-ing, and the composition of letters into words. Neuropsychological studies haveshown that these functions can be selectively impaired and give rise to specificdyslexias [1]. Most importantly for the current study, a dyslexia has been identi-fied in which letter position encoding is impaired [2–4]. Several computationalmodels for visual word recognition (VWR) have been proposed in the literature[5, for a review]. Although insightful and comprehensive, these models shed lit-tle light on how the brain performs these tasks. We hereby present a model thatbrings cognitive models together with plausible brain dynamics. These are mod-eled in an attractor dynamics network consisting of graded-response neuronswith threshold-linear activation function [6]. The model addresses the questionof how these processes are executed at the neuronal level, including possiblefailures in processing, due to noise or deficits.

114 BelBI2016, Belgrade, June 2016.

Page 145: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

A biologically-inspired model ...

2. The model

2.1. Letter representations

This section investigates the very first stage of reading, from the printed word tothe level of letter representation. The activation created by a printed-letter inputin an early visual stage is modeled by ascribing a list of factors to a letter fromall possible graphical features in a letter (figure 1A). Factors create in turn acti-vation in a higher layer, which we name the letter layer. Letter representationsare then used to compose written words at the orthographic lexicon.

Fig. 1. (A) Example of visual factors in the representation of a Hebrew letter. (B)Letter similarity among all letter pairs in Hebrew as judged by Hebrew readers.Scores are between 1 and 10. (C) Multidimensional scaling of all 27 Hebrewletters.

In order to reduce interference in memory retrieval between words in thelexicon, letter representations should be as little correlated as possible. We ex-amine and compare several methods for the generation of letter representationin this early stage of reading. All methods are taken to be simple abstractions ofpossible neuronal processes in the brain:

Constituting factors This method assumes a two-layer architecture: a factor- anda letter-representation layer. Each feature in the graphical form of a letter isrepresented in the model as a unit in the factor layer. Letters that have samefeatures will hence share the corresponding active-units. Each unit in the factorlayer creates in turn activation in a predefined random subset of units in theletter layer.

Renormalization As in the first method, each printed-letter input creates activa-tion in a factor layer. In this case however, the contribution of each factor to thefinal representation is increased by a factor that is inversely proportional to itsappearance in other letters. Salient features will therefore have higher weight inthe final letter representation. In addition, a competition between neighboringfeatures occurs, leaving in that neighborhood only features that are most salient.

Intermediate sub-network layer A third layer between the factor and letter layersis added. This layer is composed of several sub-networks; each sub-network cor-responds to a receptive-field (RF), which is a surrounding of neighboring cells.The size of the RF is a parameter of the model. The optimal value of this pa-rameter will be investigated below. Each factor is connected to a random subset

BelBI2016, Belgrade, June 2016. 115

Page 146: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Yair Lakretz et al.

of units in the sub-network layer. The size of this subset (UPF units per fac-tor) is another parameter of the model later to be determined. The connectionsbetween the sub-network layer and letter representation layer are set in the fol-lowing way: each weight between a unit in the sub-network layer and the rep-resentation layer is inversely proportional to the accumulated activation in thatsub-network unit across all letters. That is, popular units in the sub-network areless dominant in the final pattern of activation. Therefore, similar to the secondmethod, salient features have higher weight in the letter representation thanfeatures with high occurrence.

Figure 2 presents correlation matrices for the 27 letters in the Hebrew alpha-bet for the three methods. For each correlation matrix (top), a correspondingfull-cue retrieval test is presented along with (bottom). A full-cue test is doneby presenting the network with a full-cue of the printed letter and countingthe number of times successful retrieval occurs. Results show that low values ofthe correlations matrix correspond to high full-cue retrieval performance, andthat the intermediate sub-network layer method achieves best performance. Wetherefore focus on this method in what follows.

Fig. 2. Correlation matrices and full-cue retrieval test results for the threemethods. (A) Constituting factors (B) Renormalization (C) Sub-network Layer(RF=1, UPF=100).

Note, however, that in addition to low correlations between letter representa-tions, we require that similarity between these representations will correspondto letter similarity as judged by readers. That is, taking in consideration boththese constraints, letter representations cannot be completely orthogonal. Fig-ure 1B presents average similarities between letters as judged by 30 subjects. Inthis test, subjects were asked to judge similarities among all letter pairs in theHebrew alphabet. We use this data to determine the optimal model parameters

116 BelBI2016, Belgrade, June 2016.

Page 147: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

A biologically-inspired model ...

by choosing the values that: (a) maximize the correlation between letter simi-larities according to the model and the experimental data; and (b) minimize themean correlation between letter representations of the model.

2.2. Word representations and the learning stage of the network

A full description of the composition of words from letters is beyond the scope ofthis report. The process of composition is done in two steps. First, for each serialletter, letter identity and position are composed together. This is done through acompetition process between letter-identity and letter-position activations. Next,all resulting letter-in-their-position representations are composed together. Thisis done by a similar competitive process, eventually creating the desired word-representation. Importantly, units that encode letter position are the same unitsthat encode letter identity, which is presumably the case in the brain.

The final word representations undergo a follow-up self-organization pro-cess, which reduces redundant correlations between the representations. In thisprocess, a multilayer neural network, endowed with Hebbian learning and synap-tic scaling, is repeatedly presented with word patterns in a random order. Theresulting word representations are finally stored in the final layer of the network,which we name the word layer, and function as its attractor states.

2.3. Architecture and dynamics

The complete architecture of the model is a multilayer network, starting at thefactor layer and ending at the word layer. Units in the network are graded-response neurons, that is, a positive continuous variable Vi that is proportionalto the activity of the neuron is assigned to every unit. This is in accordance withan interpretation of Vi as mean firing-rates.

The updating of the network assumes a threshold-linear activation function:V (t) = g(h(t)−θ)θ(h(t)−θ), where h(t) is the local field, which in the word layeramounts to summation over all excitatory inputs: hi = Vinput +

∑jWijVj; θ is

the threshold below which there is no output; g is a gain factor; and Wij are thesynaptic weights as defined below. Each update step is followed by a competitiveprocess which brings the sparseness of the network to a constant value. Thesparseness a is defined as: a =

(∑

i Vi/N)2∑i V

2i /N

, which in the limit of the binary caseis equivalent to the fraction of active units. We set this value to a = 0.25, whichis in the range of plausible cortical values. This competitive process representsinhibitory feedback regulation on the activation of the network, and it operatesby adjusting the threshold and gain parameters of the threshold-linear activationfunction.

Connections between neurons at the word layer are according to a covarianceHebbian rule: Wij = 1

a

∑µ ξ

µi (ξµj − ξ), where ξµ is the µ’th word pattern, and ξ

is the mean across all words.After the learning stage described above is over, new words can be presented

to the network. Activations created by the printed word flow from the factorlayer, in a feed forward manner, to the word layer, finally converging accordingto the above dynamics. The resulting pattern can then be compared to the storedmemory patterns in the lexicon.

BelBI2016, Belgrade, June 2016. 117

Page 148: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Yair Lakretz et al.

Since similarities among letter identities and among letter positions are in-corporated in the model, the network exhibits several phenomena: under noiseconditions, a printed-word input can cause the network to converge to an incor-rect attractor, which corresponds to the printed-word with transposed positionsof letters, or to a word in which one letter is replaced by a similar one. Thesephenomena are dependent on the amount of noise and deficits presented tothe network, and correspond to errors as described in dyslexia [2, 7, 8]. Furtherwork is required to compare error statistics of the network to those found inreading test results.

3. Summary

We have presented a biologically-inspired computational model of visual wordrecognition. We have explored several methods for the representations of letteridentity. The method that achieved best performance was that of adding an in-termediate layer with sub-networks, which correspond to visual receptive fields.The optimal parameter values of this method were determined by two con-straints: (a) low correlations between letter representations (to improve memorycapacity of the neural network); and (b) high correlation between similarity re-lations among letter representations in the model, and those found in behavioraltests. Simulations in the absence of noise or deficits show almost perfect retrievalof word memories. When noise or deficits are presented, the network exhibitsreading errors such as letter transposition or substitution, similarly to dyslexics.A full report of the results of the simulations will be presented elsewhere.

References

1. Friedmann, Naama and Coltheart, Max and Bar-On, A and Ravid, D: Types of develop-mental dyslexia, Handbook of communication disorders: Theoretical, empirical, and appliedlinguistics perspectives, eds A. Bar-On and D. Ravid (De Gruyter Mouton)

2. Friedmann, Naama and Gvion, Aviah: Letter position dyslexia, Cognitive Neuropsychology,Vol. 18, No. 8, pp. 673–696, Taylor & Francis, 2001

3. Friedmann, Naama and Rahamim, Einav: Developmental letter position dyslexia, Journalof Neuropsychology, Vol. 1, No. 2, pp. 201–236, Wiley Online Library, 2007

4. Friedmann, Naama and Rahamim, Einav: What can reduce letter migrations in letter posi-tion dyslexia?, Journal of Research in Reading, Vol 37. No. 3, pp. 297–315, Wiley OnlineLibrary, 2014

5. Norris, Dennis: Models of visual word recognition, Trends in cognitive sciences, Vol. 17, No.10, pp. 517–524, Elsevier, 2013

6. Treves, Alessandro: Graded-response neurons and information encodings in autoassocia-tive memories, Physical Review A, Vol. 42, No. 4, pp. 2418, APS, 1990

7. Brunsdon, Ruth and Coltheart, Max and Nickels, Lyndsey: Severe developmental letter-processing impairment: A treatment case study, Cognitive neuropsychology, Vol 23, No. 6,pp. 795–821, Taylor & Francis, 2006

8. Friedmann, Naama and Biran, Michal and Gvion, Aviah: Patterns of visual dyslexia, Journalof neuropsychology, Vol. 6, No. 1, pp. 1–30, Wiley Online Library, 2012

118 BelBI2016, Belgrade, June 2016.

Page 149: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Crystallographic study on CH/O interactions ofaromatic CH donors within proteins

J. Lj. Dragelj1, Ivana M. Stankovic2, D. M. Bozinovski3, T. Meyer1, Dusan Z.Veljkovic3, Vesna B. Medakovic3, Ernst Walter Knapp1,, and Snezana D. Zaric3

1 Fachbereich Biologie, Chemie, Pharmazie/Institute of Chemistry and Biochemistry,Freie Universitt Berlin, Fabeckstrasse 36A, Berlin, Germany

2 ICTM, University of Belgrade, Njegoseva 12, Belgrade, Serbia3 Department of Chemistry, University of Belgrade, Studentski trg 16, Belgrade, Serbia

[email protected]

Abstract

CH/O interactions represent weak hydrogen bonds that stabilize protein struc-tures where they contribute up to 25% among the total number of detectedhydrogen bonds. Previously, we showed that CH/O interactions do not showstrong preference for linear contacts and that the energy of CH/O interactionsof aromatic CH donors depends on the type of atom or group in ortho-position tothe interacting CH group [1, 2]. In this work, CH/O interactions of aromatic CHdonors within proteins have been studied by analyzing the data in the ProteinData Bank (PDB) and by quantum chemical calculations of electrostatic poten-tials. The CH/O interactions were studied between three aromatic amino acids;phenylalanine, tyrosine and tryptophan, with several acceptors.

The analysis of the distribution of the CHO angle in the crystal structures fromthe PDB indicates no preference for linear CH/O interactions between aromaticdonors and acceptors in protein structures. Although there is no tendency forlinear CH/O interactions, there is no significant number of bifurcated CH/Ointeractions. The analyses also indicate an influence of simultaneous classicalhydrogen bonds. The influence is particularly observed in case of tyrosine. Thehydroxyl group of aromatic ring of tyrosine plays an important role by forminga simultaneous classical hydrogen bond along with CH/O interaction in ortho-position to the OH substituent. These investigations could help in future CH/Ointeractions studies in proteins or other proteic systems.

Keywords: Aromatic amino acids; CH/O interactions; Hydrogen bond; PDB

References

1. Veljkovi, D. ., Janji, G. V., Zari, S. D.: Are CHO interactions linear? The case of aromatic CHdonors. CrystEngComm. 13, 5005-5010. (2011)

2. Dragelj, J. Lj., Janji, G. V., Veljkovi, D. ., Zari, S. D.: Crystallographic andab initiostudy ofpyridine CHO interactions: linearity of the interactions and influence of pyridine classicalhydrogen bonds. CrystEngComm. 15, 10481-10489. (2013)

BelBI2016, Belgrade, June 2016. 119

Page 150: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Dynamics of Escherichia coli type I-E CRISPRspacers over 42,000 years

Ekaterina Savitskaya1,2, Anna Lopatina2,3, Sofia Medvedeva1,3, MikhailKapustin1, Sergey Shmakov1, Alexey Tikhonov6, Irena I. Artamonova7,8,9, and

Konstantin Severinov1,2,3,4,5

1 Skolkovo Institute of Science and Technology, Skolkovo, [email protected]

2 Institute of Molecular Genetics, Russian Academy of Sciences, Moscow, Russia3 Institute of Gene Biology, Russian Academy of Sciences, Moscow, Russia

4 Waksman Institute of Microbiology, Rutgers, the State University of New Jersey, USA5 Peter the Great St. Petersburg Polytechnic University, St. Petersburg, Russia6 Zoological Institute, Russian Academy of Sciences, St. Petersburg, Russia

7 N.I. Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow,Russia

8 A.A. Kharkevich Institute of Information Transmission Problems, Russian Academy ofSciences, Moscow, Russia

9 M.V. Lomonosov Moscow State University, Faculty of Bioengineering andBioinformatics, Moscow, Russia

Abstract

CRISPRCas systems defend prokaryotes against mobile genetic elements such asplasmids and phages. During the adaptation stage of the CRISPR-Cas immunitymechanism new invader-derived sequences are integrated into genomic CRISPRarrays as spacers between CRISPR repeats. We compared spacers associated withtype I-E E. coli CRISPR repeats from a baby Asiatic elephant from Moscow zooand a baby mammoth Lyuba that died about 42,000 years ago [1]. A PCR basedmethod with partially overlapping primers complementary to type I-E E. coliCRISPR repeat was elaborated. High density Illumina sequencing of ampliconsrevealed tens of thousands unique spacers. To reduce the diversity of spacersk-means hierarchical clustering was applied. Surprisingly, most of these spacerclusters were common to ancient and modern samples, indicating the lack ofspacer turnover during the time separating the Lyuba mammoth and the present.Partial reconstruction of CRISPR arrays using known reference E. coli strains wasperformed. Most of reconstructed arrays are unchanged between mammoth andelephant samples. Thus, despite its adaptive potential, the immune repertoire ofE. coli CRISPR-Cas system spacers did not significantly change in the course of42,000 years.Keywords: bioinformatics, CRISPR-Cas systems

References

1. Mueller, T. and Latreille, F.: Ice baby. Nat. Geogr. 215, 30–51 (2009).

120 BelBI2016, Belgrade, June 2016.

Page 151: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

De Novo Transcriptome Sequencing of Verbascumthapsus L. to Identify Genes Involved in Metal

Tolerance

Filis Morina1, Marija Vidovic1, Ana Sedlarevic1, Ana Simonovic2, and SonjaVeljovic-Jovanovic1

1 Institute for Multidisciplinary Research (IMSI), University of Belgrade, Serbia{filis,marija,ana.sedlarevic,sonjavel}@imsi.rs

2 Institute for Biological Research ’Sinisa Stankovic’, University of Belgrade, [email protected]

Abstract

Verbascum thapsus is a pioneer species and a successful colonizer of metal-polluted soils. Recently, we observed differential degree of metal tolerance atthe physiological level in V. thapsus populations originating from metal-pollutedand un-polluted soils [1, 2]. The aim of our work was de novo transcriptomeassembly and annotation of V. thapsus leaf tissue by using data collected fromRNA-Sequencing experiment. These results would enable identification of genescrucial for metal tolerance and redox homeostasis in this species. Sequencing,transcriptome assembly and annotation were done by Genomix4Life. Using ul-tra high-throughput RNA paired-end sequencing on Illumina platform 45 milionreads were obtained. The high quality reads were used as input to perform tran-scriptome assembly on Trinity platform. The assembled transcriptome of 69 Mbphad 41.37% GC and 73520 transcripts were grouped in 52204 ”genes”. The av-erage and median contig length were 938 bp and 598 bp, respectively, and N50was 1160 bp. At least 41084 genes were expressed with FPKM ≥ 2. Of over70,000 transcripts, 2,722 were Blasted without hits, 13,033 had Blast hits, 95had Gene Onthology (GO) Slim annotation, whereas 481 sequences were withGO mapping, while 38,760 were matched in InterProScan but not blasted. Thetop-hit species was Sesamum indicum, from the same order Lamiales like V. thap-sus. The sequences of the assembled transcripts were translated into proteinswith Transdecoder based on a minimum length open reading frame (minimumlength 100aa). Transcripts were described in terms of their associated cellularcomponent, biological process and molecular function. The functional annota-tion using B2GO of 2524 expressed sequence tags was obtained. This exhaustiveannotation may offer a suitable platform for functional genomics, particularlyuseful for V. thapsus as a non-model species.

Keywords: Blast2GO, de novo transcript sequence annotation, Verbascum thap-sus

BelBI2016, Belgrade, June 2016. 121

Page 152: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Filis Morina et al.

References

1. Morina, F. and Jovanovic, L. and Prokic, L. and Veljovic-Jovanovic, S.: Environmental Sci-ence and Pollution Research DOI 10.1007/s11356-016-6177-4. (2016)

2. Morina, F. and Vidovic, M. and Kukavica, B. and Veljovic-Jovanovic, S.: Botanica Serbica,39(2). (2015)

122 BelBI2016, Belgrade, June 2016.

Page 153: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

De Novo Transcriptome Sequencing of Pelargoniumzonale L. to Identify Genes Involved in UV-B and

High Light Response

Marija Vidovic1, Filis Morina1, Ana Sedlarevic1, Ana Simonovic2, and SonjaVeljovic-Jovanovic1

1 Institute for Multidisciplinary Research (IMSI), University of Belgrade, Serbia{marija,filis,ana.sedlarevic,sonjavel}@imsi.rs

2 Institute for Biological Research ’Sinisa Stankovic’, University of Belgrade, [email protected]

Abstract

The variegated Pelargonium zonale cv. ”Frank Headley” is a periclinal chimerawith white leaf margins, caused by the lack of functional chloroplasts in meso-phyll. Our previous ultrastructural, biochemical and physiological characteriza-tions of the photosynthetic and non-photosynthetic leaf tissues revealed signifi-cant differences related to sugar, phenolic, antioxidative metabolism and stom-atal regulation [1, 2]. High light intensity and UV-B radiation induced differentantioxidative and phenolic responses in these two tissues. The aim of our studywas de novo transcriptome assembly and annotation of green leaf tissue of P.zonale. By using ultra high-throughput RNA paired-end sequencing on Illuminaplatform Hiseq2500 43 million reads were obtained. The high quality reads werejoined and then used as input to perform transcriptome assembly using Trinityplatform. The preliminary assembled transcriptome included about 73 Mbp in83012 transcripts grouped in 60087 ”genes”. The mean GC content was 43.18%,the average contig length was 879 bp and the N50 was 1167 bp. At least 41084genes were expressed by at least 2 FPKM. The sequences of the assembled tran-scripts were translated into proteins with Transdecoder based on a minimumlength (100 aa) open reading frame. The software Blast2GO was used to asso-ciate a function to the set of identified transcripts. Within 83000 transcripts,7903 had Blast hits, while 5223 had Gene Onthology (biological processes,molecular function and cellular components). Majority of genes encoding en-zymes involved in response to UV-B radiation and high light intensity were iden-tified. Further work should expand on characterization of non-photosynthetictissue, in order to further explore the tissue-specific regulation of antioxidativeand phenolic metabolism.

Keywords: Blast2GO, de novo transcript sequence annotation, Pelargonium zonale

References

1. Vidovic, M. and Morina, F. and Milic, S. and Albert, A. and Zechmann, B. and Tosti, T. andWinkler, J. B. and Veljovic Jovanovic, S.: Plant Physiology & Biochemistry, 93, 44–55. (2015)

BelBI2016, Belgrade, June 2016. 123

Page 154: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Marija Vidovic et al.

2. Vidovic, M. and Morina, F. and Milic, S. and Vuleta, A. and Zechmann, B. and Prokic, Lj. etal.: Plant Biology, doi: 10.1111/plb.12429. (2015)

124 BelBI2016, Belgrade, June 2016.

Page 155: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Protein Interaction Network Construction andAnalysis Using the Quantitative Proteomics Data

Ozal Mutlu and Nagihan Gulsoy

Marmara University, Faculty of Arts and Sciences, Department of Biology, 34722,Goztepe, Istanbul, Turkey

[email protected]

Abstract

Protein-protein interaction networks comprise complex molecular interactionsamong proteins related with many diseases and biological systems including sig-naling, cellular growth, differentiation, cell death, environmental and pathogenicstimulus. Understanding complex system of protein association could help toidentify new protein-protein networks and biomarker proteins under diseaseand external stimulation conditions. In this study, we have constructed and ana-lyzed protein interaction networks from the quantitative proteomics data of theZnO nanoparticle exposed dermal fibroblasts to understand cellular responsesand toxicity mechanisms. In the first step of computational studies, protein listwith UNIPROT ID numbers were searched in STRING ver10.0 [1] and then net-work visualized and analyzed by Cytoscape ver3.2.1 [2] for general topologi-cal features. Based on gene ontology five different biological process includingnonsense mediated mRNA decay, protein localization to organelle, mitotic cellcycle phase transition, response to oxidative stress and unfolded proteins werefound. Cytoscape analysis showed top three proteins were HSP90, RPL3 (60Sribosomal protein L3) and glutathione reductase according to the betweennesscentrality. Decreased expression protein network consisted less proteins withone central module related with cell growth, reproduction, cycle control anddeath. When all network outcomes were analyzed completely, fibroblast cellsresponse to ZnO nanoparticles by activating the endoplasmic stress and oxida-tive stress mechanisms. These results will help to understand toxicity mecha-nisms of nanoparticles at proteome level.Keywords: protein-protein interaction network, String database, Cytoscape soft-ware, toxicoproteomicsAcknowledgements: This work was supported by the Marmara University (BAPFEN-C-DRP-110412-0102).

References

1. Szklarczyk, D. and Franceschini, A. and Wyder, S. et al.: STRING v10: ProteinProtein Inter-action Networks, Integrated Over The Tree Of Life. Nucleic Acids Research, 43(Databaseissue), D447–D452, (2015)

BelBI2016, Belgrade, June 2016. 125

Page 156: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Ozal Mutlu et al.

2. Shannon, P. and Markiel, A. and Ozier et al.: Cytoscape: A Software Environment for In-tegrated Models of Biomolecular Interaction Networks. Genome Research, 13(11), 2498–2504, (2003).

126 BelBI2016, Belgrade, June 2016.

Page 157: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

An optimal promoter description for bacterialtranscription start site detection

Milos Nikolic, Tamara Stankovic, and Marko Djordjevic

Faculty of Biology, University of Belgrade, Studentski trg 16, 11000 Belgrade, Serbia{milos.nikolic,dmarko}@bio.bg.ac.rs

Abstract

Accurately detecting transcription start sites (TSS) in bacteria is a starting pointfor understanding transcription regulation. It presents an essential componentfor many bioinformatics applications, such as gene and operon predictions. Con-sequently, improving TSS prediction, which is a classical bioinformatics problem,is necessary since currently available methods show poor accuracy.Different TSS prediction approaches use very different description of the bac-terial promoter structure.Which promoter features should be included in TSSrecognition, and how their accuracy impacts the search detection, is therefore,unclear. We address these questions on the examples of σ70 and σE (an alterna-tive sigma factor) in E. coli.We obtain that -35 element, which is considered exchangeable, contributes equallyto the search accuracy as the ubiquitous -10 element (σ70) or more (σE). Fur-thermore, sequences upstream of the canonical -10 element notably contributeto the search accuracy, despite their relatively low conservation. The sequence ofthe spacer between -35 and -10 promoter elements, which is commonly includedin TSS detection, notably decreases the search accuracy for σ70 promoters, butimproves the search accuracy for σE promoters. Overall, there is as much as∼ 50% false positive reduction for optimally implemented promoter features inσ70, compared to standard promoter structure implemented in TSS searches [1].

Keywords: σ70 promoters, σE promoters, transcription start site detection, tran-scription initiation, promoter specificity

References

1. Nikolic, M. and Stankovic, T. and Djordjevic, M.: submitted (2016).

BelBI2016, Belgrade, June 2016. 127

Page 158: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Chronic Treatment with Fluoxetine Led toAlterations in the Rat Hippocampal Proteome

Ivana Peric1, Dragana Filipovic1, Victor Costina2, and Peter Findeisen2

1 Vinca Institute of Nuclear Sciences, University of Belgrade, Serbia{ivanap,dragana}@vinca.rs

2 Institute for Clinical Chemistry, Medical Faculty Mannheim of the University ofHeidelberg, University Hospital Mannheim, Germany

{victor.costina,peter.findeisen}@medma.uni-heidelberg.de

Abstract

Fluoxetine (Flx) is the first-line treatment for depression and anxiety [1]; how-ever, precise mechanism of its action remains elusive. Therefore, we aim toidentify protein expression changes regulated by Flx, using proteomics studieswithin the rat hippocampus. Fluoxetine-hydrohloride (15 mg/kg/day) was ad-ministered to adult male Wistar rats for 3 weeks and protein patterns from rathippocampal cytosolic, nuclear, and mitochondrial fractions were identified byone dimensional gel electrophoresis followed by nano LC-MS/MS. All the differ-ential proteins were functionally annotated according to biological process andmolecular function using Uniprot and Blast2GO. Using this approach, we com-pared Flx-treated controls versus vehicle-treated control rats.Comparative study revealed that 67, 61 and 4 proteins were down-regulated and168, 32 and 79 proteins were up-regulated in the cytosolic, nuclear and mitosolfractions, respectively. The prevalent biological processes of down-regulated pro-teins were, as expected, cellular and single organism process in all three fractionswhile in up-regulated proteins, beside cellular and single organism process werebiological regulation in cytosolic fraction and metabolic process in nuclear andmitosol fractions. The molecular functions of down- and up-regulated proteinswere binding and catalytic activity in all hippocampal fractions. The pathwayanalysis of these differential proteins using the Kyoto Encyclopedia of Genesand Genomes (KEGG) database showed that down-regulated proteins were ba-sically involved in amino acid biosynthesis and nucleotide metabolism while up-regulated proteins participated mainly in amino acids biosynthesis, fatty acidsmetabolism, glycolysis/gluconeogenesis and signaling pathways.Observed differences in protein expression patterns between various cellularcompartments indicate that Flx led to alterations in the hippocampal proteome.This approach has provided new insight into the effects of Flx treatment on pro-tein expression in a key brain region associated with stress response and mem-ory.Keywords: proteomics, fluoxetine, rat brain, bioinformatics

128 BelBI2016, Belgrade, June 2016.

Page 159: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Chronic Treatment with Fluoxetine ...

References

1. Tacke, U.: Fluoxetine: an alternative to the tricyclics in the treatment of major depression?Am J Med Sci 298:126–129. (1989)

BelBI2016, Belgrade, June 2016. 129

Page 160: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

A web-based tool for prediction of effects of singleamino acid substitutions outside conserved

functional protein domains

Vladimir Perovic, Ljubica Mihaljevic, Branislava Gemovic, and Nevena Veljkovic

Centre for Multidisciplinary Research, Institute of Nuclear Sciences Vinca, University ofBelgrade, Belgrade, Serbia

[email protected]

Abstract

Single nucleotide polymorphisms (SNPs) are recognized as the main cause ofhuman genetic variability. A non-synonymous SNP (nsSNP) is a single basechange in the coding region of a gene that results in a single amino acid sub-stitution (SAP) in the corresponding protein product. nsSNPs can significantlyalter protein function and thus, the cellular and organismal phenotype of an or-ganism. The main challenge ahead is to differentiate between ”neutral” versus”pathogenic” SNPs that assign susceptibility to Mendelian disorders, commoncomplex diseases, as well as cancers. Tools for predicting these functional ef-fects are mostly phylogeny-based, and as such have high accuracies in predict-ing disease-associated mutations in conserved positions in protein sequences.However, we have shown recently that accuracies are significantly lower in clas-sifying variations outside conserved functional domains (CFDs).We developed a new tool for prediction of effects of single amino acid substi-tutions outside conserved functional domains based on the model that relieson informational spectrum method for sequence analyses. It was implementedin Java programming language and it is available as user friendly web service.As input data, it uses position of the variation and substituted amino acid, andprovides binary classification with graphical representation of the variation asoutput.Our tool was trained and tested on the datasets of the gene variations in epi-genetic regulators ASXL1, DNMT3A, EZH2, and TET2, a set of key biomark-ers in myeloid malignancies. It significantly outperformed state of the art tools,PolyPhen-2 and SIFT.

130 BelBI2016, Belgrade, June 2016.

Page 161: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Protein-protein interaction prediction methodbased on principle component analysis of amino

acid physicochemical properties

Neven Sumonja, Nevena Veljkovic, Sanja Glisic, and Vladimir Perovic

Centre for Multidisciplinary Research, Institute of Nuclear Sciences Vinca, University ofBelgrade, Belgrade, Serbia

[email protected]

Abstract

Protein-protein interactions (PPI) are of utmost importance for processes in thecell and key to understanding of protein functions. Here, we propose a methodfor solving the PPI binary classification problem based on sequence informationonly. Numerous general sequence-depending methods, despite using only a fewamino acid (AA) physicochemical descriptors for sequence feature representa-tion, have proven to be efficient in identifying novel PPIs. For that purpose, wepropose a group of entirely novel AA descriptors which were defined based on531 amino acid properties in AAindex database by means of principle compo-nent analysis (PCA). As a first step in sequence analyses, each sequence wastransformed into vector of numbers using new AA feature representation. Then,autocovariance function (ACF) on the obtain vectors was applied. Finally, aminoacid composition (AAC) is combined with the ACF vector. Random forest modelswere trained and tested on independent sets containing yeast and human PPI ex-tracted from the protein interaction network analysis platform (PINA). In termsof computational efficiency and predictive performance our approach outper-formed similar state of the art sequence based methods. Being robust and fast,the model presented here can deepen our insights in interactions of differentmodel organisms.

BelBI2016, Belgrade, June 2016. 131

Page 162: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Basic Sequence Alignment Based Screening forAlternative Mannanase Producing Bacteria

Bojan D. Petrovic and Zorica D. Knezevic-Jugovic

Faculty of Technology and Metallurgy, University of Belgrade, Karnegijeva 4, 11000Belgrade, Serbia

[email protected]

Abstract. The idea behind this work was to explore the genetic potentialwithin publically available data and collect the output for future researchon novel mannanase producing organisms. Mannanolytic enzymes can beapplied in multiple industrial setups, but this current interest was nar-rowed down exclusively to applications relevant for improved detergentformulations. Based on the patent data up to date, protein sequences of acouple Bacillus sp. enzymes were probed for similarity with non-redundantprotein database available via National Center for Biotechnology Informa-tion (NCBI). Thereafter, sequences were realigned and analyzed to com-pare the sequences of particular interest and assess conserved regions andvariations between species. Our results suggest that bacterial strains of in-terest should be tested for mannanase activity and if applicable, optimizedfor commercial enzyme production, as this would not be in conflict withthe currently relevant patents.

Keywords: industrial enzymes, mannanase, detergents, sequence align-ment

1. Introduction

Mannanase (mannan endo-1,4-beta-mannosidase, EC 3.2.1.78) is an enzymethat catalyses a random hydrolysis of (1→4)-beta-D-mannosidic linkages in man-nans, galactomannans and glucomannans. Broad substrate specificities of β-mannanases enables a plethora of applications where they are employed: hy-drolytic agent in detergent industry, biobleaching of pulp and paper, use in im-provement of animal feeds and slime control, as well as some of the emergingpharmaceutical applications [1]. The use of commercial enzyme preparationsin detergents is a growing market, but when it comes to mannanase - there isonly one commercially available product: Mannaway R©, produced by Novozymes(patent rights held by Novo Nordisk A/S, Denmark). This work will try to usethe data from the patent datasheets and utilize a fast growing collection of se-quences annotated to NCBI databases in order to propose some alternative pro-ducing organisms for mannanase production, not yet covered by patent protec-tion.

132 BelBI2016, Belgrade, June 2016.

Page 163: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Basic Sequence Alignment ...

2. Materials and Methods

Protein sequence data was extracted from the relevant patent [2] and inde-pendent Basic Local Alignment Search Tool (BLAST) searches were performedusing those sequences as queries. Search results that were in accordance to thecriteria given below (section 2.1) were further elaborated and correspondingprotein coding domain sequences (CDS) were retrieved. Unless specified oth-erwise, all data manipulation was done as in our previous work that used thesimilar methodology for multiple sequence alignment and comparison [3]. Outof the above mentioned patent documentation, two of the protein sequenceswere chosen based on the producing organism. We chose to focus on Bacillusspecies (excluding well known and commercially exploited producers such as B.subtilis, B. circulans, B. agaradhaerens, B. halodurans, B. licheniformis, B. cereus)due to the fact that Bacteria from the Bacilli group are already characterized as asuitable producing organism for the mannanase enzyme, providing high activityyield. Relevant patents claim mannanase sequences and/or segments of man-nanase sequences derived from the given strain (Bacillus sp. I633) and otherwild type and recombinant sequences that have a homology of 60% or moreto the sequences claimed in those patents. Table 1 shows detailed data aboutthe query sequences. The first query has a glycosyl hydrolase family 5 (GH5)cellulase domain, while the second has a GH5 cellulase domain and a cellulosebinding module (CBM).

Table 1. Mannanolytic enzymes from Bacillus sp. extracted from the patentdatasheets.

Nr Organism Length [AA] GenBank Accsession Reference1 Bacillus sp. I633 490 AAQ31834.1 Seq. #2 in [2]2 Bacillus sp. I633 476 AAQ31835.1 Seq. #4 in [2]

2.1. BLAST

Standard protein-protein BLAST search [4] was performed using the queriesgiven above. Parameters were as following: database was set to ”non-redundantprotein sequences (nr)”, organism was set to exclude Bacillus/Staphylococcusgroup (taxid:1385), while ”blastp” algorithm was chosen, using the thresholdvalues as follows: max. target sequences: 100, expect threshold: 10, word size:6, max. matches in a query range: 0. Out of each search result obtained (avail-able in the supplementary material), subjects were picked according to the fol-lowing criteria: ”query cover” ≥ 70% and ”ident” within the [45%, 60%) range.If multiple loci from the same source organism that encode for the same proteinproduct complied with these criteria, sequence with highest total score was cho-sen. Final list of sequences designated for downstream analyses is given withinTable 2.

BelBI2016, Belgrade, June 2016. 133

Page 164: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Bojan D. Petrovic et al.

Table 2. Sequences filtered for further analyses, with BLAST data. The first threeentries were obtained using query nr 1, while the remaining three were obtainedusing query nr 2.

Description Maxscore

Totalscore

Querycover

E value Ident NCBI Accsession

mannan endo-1,4-beta-mannosidase(beta-mannanase) (1,4-beta-D-mannanmannanohydrolase) [Clostridium bu-tyricum]

444 444 79% 7e-148 56% KHD15024.1

endoglucanase [Pseudobacteroides cellu-losolvens]

400 400 70% 1e-131 55% WP 036945506.1

hypothetical protein [Deinococcus misas-ensis]

388 388 93% 7e-126 45% WP 051963127.1

1,4-beta-glucanase [Thermoanaerobactercellulolyticus]

342 481 94% 2e-101 45% WP 045165560.1

endo-1,4-beta-glucanase [Caldicellu-losiruptor kronotskyensis]

340 476 94% 7e-101 45% WP 013429869.1

endo-1,4-beta-glucanase [Caldicellu-losiruptor bescii]

338 473 94% 6e-100 45% WP 015908242.1

2.2. Sequence Alignment and Comparative Analyses

All the sequences were aligned using the Clustal W program implemented asan accessory application in BioEdit (version 7.2.5). Alignment files are availablein the suppl. material. Aligned sequences underwent comparative analyses: a)Amino Acid Composition and b) Position Entropy were done within BioEditplatform. c) Conserved Regions Analysis was done within the BioEdit platform,using the following parameters: minimum segment length (actual for each se-quence): 10, maximum average entropy: 0.5, gaps limited to 1 per segment,contiguous gaps limited to 1 in any segment. d) The Phylogenetic Tree amongspecies based on the average number of amino acid substitutions per site be-tween species was constructed using the unweighted pair group method withthe arithmetic mean (UPGMA) implemented in Mega 3.1 software. Differencecount matrix used to generate the tree is available in the suppl. material.

3. Results and Discussion

Amino acid composition is shown in Table 3. Even though two groups of threesequences (with and without CBM unit) differed in length by more than a dou-ble, interspecies variation fits into naturally occurring differences, some of whichmay be important for enzyme stability and efficacy. As it was moderately sug-gested by positional entropy distribution (result not shown; Figure S3 available inthe suppl. material), most of the putative conserved sites were clustered withinthe N-terminal side of the sequence alignment, up until 600th residue.Conserved regions were found only within GH5 domain. It is indicative that thesesix sites (Figure 1) are all clustered within the GH5 cellulase domain and maybe a key basis for the catalytic activity. Some data suggest Glu residue withinthe conserved region might be involved in the catalytic mechanism [5]. Ourdata, shown in Figure 1 also pinpoints a very strong conserved Asn, Val and Trpresidues in the first conserved site; EVHD motif and a strongly conserved Thr in

134 BelBI2016, Belgrade, June 2016.

Page 165: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Basic Sequence Alignment ...

the second; Asn, Ile, Glu, Gly and Trp in the third; Asp, Trp and Gly in the fourth;Asp, Pro, Asn, Phe and a HMY motif in the fifth; and finally and IGEF motif, aswell as strongly conserved Leu and His residues in the sixth conserved region.All of these spots are possibly heavily implicated in the catalytic mechanism,substrate specificity and native structure stability of the GH5 cellulase domain.Unlike GH5 domain, CBM units do not have any conserved spots, even thoughthese modules generally share a large homology among species.

Table 3. Amino acid frequency analysis. All the frequencies are given in molarpercent. EMW - estimated molecular weight of the protein (in kDa).

Species C.butyricum,gi723448912

P.celluloso-lvens,gi739074211

D.misasensis,gi917356415

T.cellulolyticus,gi771511740

C.kronotskye-nsis,gi503195208

C. bescii,gi506388523

AminoacidAla 7.02 8.31 11.07 7.00 7.48 7.42Cys 1.06 1.36 0.60 0.55 0.55 0.54Asp 7.66 6.36 3.42 5.51 5.46 5.49Glu 4.04 2.93 3.02 4.01 3.98 3.86Phe 2.34 2.93 3.82 2.68 2.42 2.47Gly 10.21 10.02 8.85 7.40 7.56 7.65His 1.28 1.47 1.81 1.34 1.33 1.31Ile 7.66 8.07 4.83 6.45 6.55 6.41Lys 8.09 8.07 4.23 5.59 5.69 5.64Leu 5.32 6.11 5.23 5.74 5.38 5.41Met 2.34 2.93 2.21 1.49 1.71 1.62Asn 8.94 7.82 9.26 7.32 7.25 7.11Pro 1.28 2.69 3.22 5.98 6.24 6.41Gln 2.98 2.44 3.02 2.91 2.65 2.70Arg 0.85 0.98 2.62 2.91 2.81 2.86Ser 7.45 7.82 10.26 8.42 8.81 8.89Thr 6.81 6.85 9.86 8.89 9.12 9.20Val 5.74 7.09 7.04 7.47 6.63 6.72Trp 2.77 3.18 3.02 3.15 3.12 3.09Tyr 6.17 1.96 3.02 5.19 5.30 5.18EMW 51.8 43.9 53.0 140.4 141.2 142.2

Fig. 1. Conserved regions within the GH5 domain of mannanolytic enzymes formselected taxa.

This observed diversity provides for a phylogenetic overview of the selectedtaxa based on the mannanolytic CDS (Figure 2). However, it does not makefor a good evolutionary marker, since this phylogeny does not fully correspond

BelBI2016, Belgrade, June 2016. 135

Page 166: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Bojan D. Petrovic et al.

with the NCBI Taxonomy data. Such a result may be easily explained due to thecritically strong impact of the environment factors to the evolution of cellulolyticenzymes en general [6].

Fig. 2. UPGMA phylogenetic tree. This phylogeny does not fully represent cur-rent generally accepted classification of the selected taxa.

4. Conclusion

This work, while utilizing simple and free bioinformatics tools, has provided witha total of six potential candidate microorganisms for possibility of mannanaseproduction, in a manner that would not compromise patent protection currentlyactive for Bacillus sp. mannanase sequences. Bearing in mind the biology of thespecies mentioned in this work, that all have putative or proven mannanolyticprotein sequences annotated publically, it was not found in the literature thatthese bacterial strains could be pathogenic. This makes them solid candidatesfor mannanase production, as an alternative to well known producers.

5. Supplementary Material

Raw sequence files, original full BLAST results scraped from the NCBI server, aswell as proprietary alignment files, supplementary figure and the relevant patentbooklet can be found online at http://db.tt/XNSDuXmy.

References

1. Chauhan, P. S. et al: Mannanases: microbial sources, production, properties and potentialbiotechnological applications. Appl Microbiol Biotechnol, 93: 1817–1830. (2012)

2. Kauppinen, M. S. et al: Novel mannanases. WO1999064619A2. (1999). [Online]. Available:http://www.google.co.ug/patents/WO1999064619A2?cl=en

3. Prekovic, S. et al: Bioinformatical and mathematical comparative analysis of ClpP exonsand protein sequence. In: Zakrzewska, J. and Zivic, M. and Andjus, P. (eds.): RegionalBiophysics Conference 2012, Book of Abstracts, Serbian Biophysical Society, Belgrade,110. (2012)

136 BelBI2016, Belgrade, June 2016.

Page 167: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Basic Sequence Alignment ...

4. Altschul, S. F. et al: Gapped BLAST and PSI-BLAST: a new generation of protein databasesearch programs. Nucleic Acids Res, 25: 3389–3402. (1997)

5. Py, B. et al: Cellulase EGZ of Erwinia chrysanthemi: structural organization and importanceof His98 and Glu133 residues for catalysis. Protein Eng, 4(3): 325–333. (1991)

6. Aspeborg, H. et al: Evolution, substrate specificity and subfamily classification of glycosidehydrolase family 5 (GH5). BMC Evolutionary Biology, 12:186. (2012)

BelBI2016, Belgrade, June 2016. 137

Page 168: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Theoretical study on the role of aromatic aminoacids in stability of amyloids

Dragan B. Ninkovic1,2, Dusan P. Malenov4, Predrag V. Petrovic1,2, Edward N.Brothers2, Shuqiang Niu3, Michael B. Hall3, Milivoj Belic2 and Snezana D.

Zaric2,4

1 Innovation Center, Department of Chemistry, University of Belgrade, Studentski trg12–16, Belgrade, Serbia

2 Science Program, Texas A&M University at Qatar, Texas A&M Engineering Building,Education City, Doha, Qatar

3 Department of Chemistry, Texas A&M University, College Station, TX 77843-3255,USA

4 Department of Chemistry, University of Belgrade, Studentski trg 12–16, Belgrade,Serbia

[email protected]

Abstract

Various neurodegenerative disorders such as Alzheimer’s and Parkinson’s dis-eases have been associated with the amyloid fibril plaques. Widely spread beliefis that aromatic amino acid residues are crucial in the formation of the plaquessince they frequently occur in natural amyloids. It was shown that amyloids canbe formed from aliphatic peptides as well. However, this issue is still studiedand under consideration. In the last few years, numerous studies used variousexperimental and computational methods to investigate the role of aromaticamino acid residues in amyloid plaque formation.

We studied influence of aromatic amino acids on amyloid formation using DFTmethods to calculate interaction energies of peptide model systems with andwithout aromatic residues. We have also analyzed contributions of aliphatic-aliphatic, aromatic-aliphatic, and aromatic-aromatic interactions to the total in-teraction energy and stability of the structure. Studied peptides were basedon the crystal structures of amyloids available from the Cambridge StructuralDatabase.

In model systems with aromatic amino acids calculations showed that aliphatic-aliphatic contribution is the weakest, followed by aromatic-aromatic, while aromatic-aliphatic interactions have the strongest contributions to the total interactionsenergies. In model systems without aromatic amino acids, having only peptidesmade of aliphatic amino acids, interactions are as strong as in systems with aro-matic amino acids.

Results of the calculations indicate similar stability of amyloids with and withoutaromatic amino acids, which support findings that aromatic amino acids are notessential for amyloid formation.

Keywords: amyloids, DFT, noncovalent interactions

138 BelBI2016, Belgrade, June 2016.

Page 169: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Construction of Amyloid PDB Files Database

Ivana Stankovic and Snezana Zaric

1 ICTM, University of Belgrade, Njegoseva 12, Belgrade, Serbiaivana [email protected]

2 Department of Chemistry, University of Belgrade, Studentski trg 12-16, Belgrade,Serbia

[email protected]

Abstract. Amyloids are insoluble proteins of a cross-β structure foundas deposits in many diseases. They are largely examined structurally, butthere is a lack of a unique structural database for amyloid proteins resolvedwith atomic resolution. Here, we present a constructed amyloid databasemade based on keyword criterion as well as structural features of amyloidsdescribed in literature. The searching filter was performed by python pro-gramming. The total number of structures is 109. This database can helpfurther structural general and statistical analysis of amyloids, as we knowthe molecular basis can lead to understanding of disease mechanisms re-lated to amyloid proteins.

Keywords: database, protein structure, amyloid

1. Introduction

Amyloids are insoluble proteins of a cross-β structure found as deposits in manydiseases like Alzheimer’s, Parkinson’s, CreutzfeldtJakob’s, type II diabetes etc.They are also found in normal tissues (nails, spider net, silk) because of theirstrong fibrillar nature. Among functional nanostructured materials of a signifi-cant impact in nanotechnology and biological environments, amyloid fibrils haveattracted great attention because of their unique architectures and exceptionalphysical properties.Short polypeptides, of minimum 4 amino acids [1], are self-assemblied into β-sheets via backbone hydrogen atoms, then several β-sheets interact with eachother in a parallel fashion via polypeptide side chains forming long linear un-branched protofilaments with an axis nearly perpendicular to a polypeptidestrand. Several protofilaments, the number being specific to the particular amy-loid protein, form fibrils. All amyloid proteins, independently of their sequence,form very similar structure, the cross-β structure, made of parallel arrays ofβ-strands. These structures are different only in the inter-sheet spacing whichdepends on the side chain size, and in a morphology of a fibril [2].Amyloids are largely examined structurally [3–5] individually, but there is nosystematic structural analysis of all resolved structures so far in the literature.There is a lack of a unique structural database for amyloid proteins resolved withatomic resolution. The Protein Data Bank (PDB) consists of nearly 120 000 3Dshapes of proteins, nucleic acids and complex assemblies [6]. The PDB contains

BelBI2016, Belgrade, June 2016. 139

Page 170: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Ivana Stankovic et al.

amyloid structures, but they are hard to find by a simple one criterion search.PDB files are often not uniform about the amyloid keyword. The molecules in.pdb files are often labeled by another name referring to an amyloid precursorname or a disease name, while the word amyloid could be mentioned withindescription such as publication title, publication keywords, title section etc.Another difficulty in constructing amyloid database is that amyloid proteins ex-ist in different conformations depending on conditions. They might exist in non-amyloidal conformation in solution when they form helical or random coil sec-ondary structure with no parallel fragments forming fibrils [7].Here, we present a constructed amyloid database made based on keyword crite-rion as well as structural criteria.

2. Methodology

Amyloid protein 3D structures were searched in Protein Data Bank (PDB) and inCambridge Structural Database (CSD). The searching criteria for the CSD wasany 4 residue long acyclic polypeptide with nearly β-sheet structure. 8 structureswere found, but with no proof of self-assembly in the published papers.Amyloid PDB subdatabase was made by searching the PDB for the keyword amy-loid and precursor names. Only the β secondary structures or extended oneswere taken. There are 109 structures found in PDB, resolved by X-ray crystallog-raphy, solid state or solution NMR.

2.1. Online Search

The online search on the website http://www.rcsb.org/pdb/home/home.do gaveus a list of PDB IDs of potential amyloid structures according to the name key-word.The search was done by picking every structure in which the desired keywordappears. The keyword was simply amyloid and 38 amyloid precursor names.The precursors names were published recently in the editorial of Amyloid, TheJournal of Protein Folding Disorders, Tables 1, 2 and 3 in [8]. These are allknown naturally occurring amyloids. By searching by files that contain the key-word amyloid, we include all the synthetic amyloids as well, described by thekeywords amyloid-like, amyloid-related, amyloidogenic etc.We got 1218 structures in total. It is difficult to separate all the amyloid struc-tures, but not pick the non-amyloid ones. Not every structure mentioning theamyloid keyword, is in fact an amyloid. Further filtration in the next sectionswill deal with structural features of amyloids.

2.2. Excluding Helical Structures

It appears that amyloids are not exclusively β structures, there are also coil andextended peptides which pack in a parallel manner forming long fibrils perpen-dicular to the peptide axis. This is why we excluded only the helical structures,leaving the β-sheets and coil in the first step of the filtration. The filtration was

140 BelBI2016, Belgrade, June 2016.

Page 171: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Construction of Amyloid PDB ...

made using TCL scripting language [9] command get structure incorporated inthe VMD software [10]. The result was total of 241 structures. This is still notthe ready database, as it contains non-parallel, globular protein arrangements.

2.3. Excluding Non-parallel Structures

We defined amyloid structure as a structure which does not contain more than 1non-parallel peptide fragment for every fragment in the whole structure. Thisis because there are highly ordered structures with alternating parallel andtilted fragments, as in PDB ID: 4UBZ [11], thus amyloids could contain non-parallel fragments. On the other hand, there are parallel fragments in non-amyloidogenic structures, as they may contain -sheets made of parallel β-strands.But they are mostly globular proteins. We distinguished them from amyloids asstructures which contain more than 1 non-parallel fragment for each fragment,Fig. 1.

Fig. 1. Criterion for distinguishing amyloid structures from non-helical struc-tures: an amyloid possesses maximum 1 nonparallel fragment for each fragmentin the whole structure.

Flat fragments were defined according to the Ramachandran backbone tor-sion angles found in structures of 8 amyloid-β fragments published by [12].Among these structures, there are β-sheets as well as curved coil fragments withthe total torsion angles scope of (-156◦, -103◦) for the angle, and (104◦, 154◦)for the ψ angle. We expanded this scope by the fully extended peptide conforma-tion, (ϕ, ψ) = (-180◦, 180◦), so the final scope was ϕ=(-180◦, -103◦), ψ=(104◦,180◦). Furthermore, a fragment must be of minimum 4 amino acids length.The criterion for the parallelity of fragments was also taken from the 8 structuresin [12]. In these structures the maximal difference in the distance between twoCα atoms belonging to two parallel fragments is 1.5A, Fig. 2.

For the purpose of this final structural filtration, the .pdb files were down-loaded from http://files.rcsb.org/pub/pdb/data/structures/divided/pdb, and the.pdb1 files containing information in biological assembly were downloaded fromhttp://files.rcsb.org/pub/pdb/data/biounit/coordinates/divided. This is impor-tant because both translating a crystallographic unit cell in all the three direc-tions, and completing the biological assembly structure must be done in order

BelBI2016, Belgrade, June 2016. 141

Page 172: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Ivana Stankovic et al.

Fig. 2. Criterion for parallel fragments: the distance between two Cα atomsbelonging to two parallel fragments must differ maximally 1.5A, as found inamyloid-β structures resolved by [12].

to complete the amyloid structure and find all the parallel fragments.Homemade scripts for the downloading and structural filtration were programmedin Python programming language [13] and for PDB file parsing MDAnalysispython library has been used [14].

3. Results and Discussion

The resulting database consists of 109 structures. The database was confirmedby visual inspection of the 241 non-helical structures found by TCL scriptingsearch.According to the geometric parameters we considered, flat fragments weather asβ-sheets or coils, and number of nonparallel fragments of each fragment, thereare 5 classes of amyloid PDB structures: U-shape with β-sheets connected byunstructured coils, β-sheets packed in a flat fashion, β-sheets packed in a tiltedfashion, coil structure packed in a flat fashion and coil structure packed in atilted fashion. These arrangements of amyloid structures are all found in thereview on amyloid states [15] according to the facial and directional alignmentof the interacting β-sheets.

4. Conclusion

An amyloid atomic resolution structural data bank was made by searching theProtein Data Bank. The criteria were based on both amyloid name keyword andstructural features of amyloid described in literature. The total number of struc-tures is 109 on the 25th of March of 2016. This number will grow as new amyloidstructures are resolved crystallographically and by NMR spectroscopy.This database can help further structural general and statistical analysis of amy-loids, as we know the molecular basis can lead to understanding of diseasemechanisms related to amyloid proteins.

142 BelBI2016, Belgrade, June 2016.

Page 173: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Construction of Amyloid PDB ...

References

1. Lakshmanan, A. and Cheong, D. W. and Accardo, A. and Di Fabrizio, E. and Riekel, C. andHauser, C. A.: Aliphatic peptides show similar self-assembly to amyloid core sequences,challenging the importance of aromatic interactions in amyloidosis. Proc. Natl. Acad. Sci.U.S.A., 110, 519–524. (2013)

2. Harrison, R. S. and Sharpe, P. C. and Singh, Y. and Fairlie, D. P.: Amyloid peptides andproteins in review. Physiol Biochem Pharmacol, 159:1–77. (2007)

3. Jakob T. Nielsen and Morten Bjerring and Martin D. Jeppesen and Ronnie O. Pedersenand Jan M. Pedersen and Kim L. Hein and Thomas Vosegaard and Troels Skrydstrup andDaniel E. Otzen and Niels C. Nielsen: Unique Identification of Supramolecular Structures inAmyloid Fibrils by Solid-State NMR Spectroscopy. Angew. Chem. Int. Ed., 48, 2118–2121.(2009)

4. Charles H. Davis and Max L. Berkowitz: Interaction Between Amyloid-β (1–42) Peptideand Phospholipid Bilayers: A Molecular Dynamics Study. Biophysical Journal 96, 785–797.(2009)

5. Das, P. and Kang, S-g. and Temple, S. and Belfort, G.: Interaction of Amyloid InhibitorProteins with Amyloid Beta Peptides: Insight from Molecular Dynamics Simulations. PLoSONE 9(11): e113041. (2014)

6. Berman, H. M. and Henrick, K. and Nakamura, H.: Announcing the worldwide Protein DataBank Nature Structural Biology 10 (12): 980. (2003)

7. Martino Calamai and Fabrizio Chiti and Christopher M. Dobson: Amyloid Fibril FormationCan Proceed from Different Conformations of a Partially Unfolded Protein. Biophysical Jour-nal 89, 4201–4210. (2005)

8. Nomenclature 2014: Amyloid fibril proteins and clinical classification of the amyloidosisAmyloid, 21(4): 221–224, Editorial. (2014)

9. http://www.tcl.tk/10. Humphrey, W. and Dalke, A. and Schulten, K.: VMD-Visual Molecular Dynamics. J Molec

Graphics 14, 33–38. (1996)11. Lu Yu and Seung-Joo Lee and Vivien C. Yee: Crystal Structures of Polymorphic Prion

Protein β1 Peptides Reveal Variable Steric Zipper Conformations. Biochemistry, 54, 3640–3648. (2015)

12. Jacques-Philippe Colletier and Arthur Laganowsky and Meytal Landau and Minglei Zhaoand Angela B. Soriaga and Lukasz Goldschmidt and David Flot and Duilio Cascio andMichael R. Sawaya and David Eisenberg: Molecular basis for amyloid-β polymorphism,PNAS, 108, 16938-16943. (2011)

13. http://www.python.org/14. Michaud-Agrawal, N. and Denning, E. J. and Woolf, T. B. and Beckstein, O.: MDAnalysis:

A Toolkit for the Analysis of Molecular Dynamics Simulations. J. Comput. Chem. 32, 2319–2327. (2011)

15. David Eisenberg and Mathias Jucker: The Amyloid State of Proteins in Human Diseases,Cell 148. (2012)

BelBI2016, Belgrade, June 2016. 143

Page 174: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Search for small RNAs associated with CRISPR/Cas

Tamara Stankovic1, Jelena Guzina1, Magdalena Djordjevic2, and MarkoDjordjevic1

1 Institute of Physiology and Biochemistry, Faculty of Biology, University of Belgrade,Studentski trg 16, 11000 Belgrade, Serbia

{tamaras,jelenag,dmarko}@bio.bg.ac.rs2 Institute of Physics Belgrade, University of Belgrade, Pregrevica 118, 11080 Belgrade,

[email protected]

Abstract

CRISPR/Cas is an advanced heritable defense system against viruses and plas-mids, which was recently found in bacteria and archaea [1]. These systems con-sist of clusters of regularly interspaced palindromic repeats (the CRISPR array)and of CRISPR associated (Cas) proteins. In this research we focus on Type IICRISPR/Cas systems and small RNAs associated with these systems. These smallRNAs have a crucial role in CRISPR/Cas functioning, such as processing CRISPRtranscripts. Detecting them is however hard, as they are poorly conserved ineven a closely related bacterial strains. Moreover, they are typically expressedunder non-standard and ill-characterized conditions, obscuring their identifica-tion from (still limited in bacteria) dRNA-seq experiments. We here use a state-of-the art transcription start site detection methods that we developed, togetherwith an optimized implementation of the transcription terminator detection, todetect CRISPR/Cas associated small RNAs in Type II systems [2].

References

1. Horvath, P. and Barrangou, R.: Science 327: 167 (2010)2. Guzina, J. and Stankovic, T. and Zdobnov, E. and Djordjevic, M.: Detection of CRISPR/Cas

associated small RNAs, in preparation (2016).

144 BelBI2016, Belgrade, June 2016.

Page 175: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

A novel approach for dealing with spatial/temporaledges within molecular interaction networks.

Ruth A Stoney1,2, Ryan Ames3, Goran Nenadic2, David L Robertson∗1, andJean-Marc Schwartz1

∗Shared last/corresponding authors

1 Computational and Evolutionary Biology, Faculty of Life Sciences, University ofManchester, Manchester, M13 9PT, UK

2 School of Computer Science, University of Manchester, Manchester, M13 9PT, UK3 Wellcome Trust Centre for Biomedical Modelling and Analysis, University of Exeter,

RILD Level 3, Exeter, EX2 5DW

Abstract

Functional networks are biological models, often used to explore the function ofmolecules such as proteins or transcription factors. Their aim is to use topologi-cal clustering to link molecules with shared cellular functions (e.g. metabolisingsugar, building proteins), with applications such as identification of potentialdrug targets and understanding disease pathologies. Within these networks bio-logical molecules are represented as nodes; with molecular interactions shownas edges. Clusters in the network represent functionally similar molecules andare referred to as functional modules [1, 2].A great deal of research has focused on computational methods used to formclusters, based on topological features [3–5]. However, such networks assumethat edges are constant (non-dynamic), therefore it is always appropriate to usethe sum of a nodes edges, rather than a subset. This assumption is often inac-curate in biological systems. Simply because two proteins can interact does notmean that they will interact in every context [1][1, 6]; some interactions mayonly take place under specific cellular conditions. Combining sets of edges whichmay not co-occur in the cell (due to spatial/temporal separation) may result incases of incorrect clustering. . Evidence for this comes from discrepancies incommunity detection between networks created from different data types [7].To deal with the issue of spatial/temporal edges we developed a new methodusing yeast pathways [8]. Pathways represent small, experimentally validatedsets of protein-protein interactions, that have been observed under particularcellular conditions. Information passed into a pathway is assumed to affect allnodes and to be shared simultaneously (the whole pathway will respond to ex-ternal stimuli as a single unit). The nodes within a pathway may therefore beconsidered as a single pathway object.We have created a novel model in which pathways are used as nodes in thenetwork, representing units of cellular activity. Our model shares the aim ofbringing together functionally related molecules and interactions, however re-producing the molecular models method of linking pathway nodes by physicalinteractions is impractical.

BelBI2016, Belgrade, June 2016. 145

Page 176: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Ruth A Stoney et al.

Pathways are employed in one or more function within the cell, which can beidentified using gene enrichment analysis. Functions may be assigned to path-ways with greater confidence than functional modules because the risk of spa-tial/temporal edges is removed. Cellular function is not divided discretely intopathways, rather pathway work together and share functions. We used sharedfunctionality between pathways to create the edges in our network. The re-sulting network contains clusters of functionally related pathways, where eachpathways represents a set of interacting proteins. This method achieves the goalof clustering interacting proteins, while avoiding the issues faced by previousmolecular methods. Since the publication of this paper work on human systemshas begun, with the goal of exploring the link between function and disease.Keywords: bioinformatics, functional network, cluster analysis

References

1. Chen, J. and Yuan, B.: Detecting functional modules in the yeast protein-protein interactionnetwork. Bioinformatics [Internet]. 2006 [cited 2015 Mar 22];22:2283–90. Available from:http://www.ncbi.nlm.nih.gov/pubmed/16837529

2. Vidal, M. and Cusick, M. E. and Barabsi A-L.: Interactome networks and human disease.Cell [Internet]. 2011 [cited 2013 Nov 7];144:986–98.Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3102045&tool=pmcentrez&rendertype=abstract

3. Blondel, V. and Guillaume, J.: Fast unfolding of communities in large networks. J. Stat.. . . [Internet]. 2008 [cited 2014 Jul 13];1–12. Available from: http://iopscience.iop.org/1742-5468/2008/10/P10008

4. Song, J. and Singh, M.: How and when should interactome-derived clusters be used topredict functional modules and protein function? Bioinformatics [Internet]. 2009 [cited 2014Jun 26];25:3143–50.Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3167697&tool=pmcentrez&rendertype=abstract

5. Wang, J. and Li, M. and Deng, Y. and Pan, Y.: Recent advances in clustering methods forprotein interaction networks. BMC Genomics [Internet]. 2010 [cited 2014 Oct 26];11 Suppl3:S10.Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2999340&tool=pmcentrez&rendertype=abstract

6. Hyduke, D. R. and Palsson, B. Ø.: Towards genome-scale signalling network reconstruc-tions. Nat. Rev. Genet. [Internet]. Nature Publishing Group; 2010;11:297–307. Availablefrom: http://dx.doi.org/10.1038/nrg2750

7. Ames, R. M. and Macpherson, J. I. and Pinney, J. W. and Lovell, S. C. and Robertson, D. L.:Modular biological function is most effectively captured by combining molecular interactiondata types. PLoS One [Internet]. 2013 [cited 2015 Jan 9];8:e62670.Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3643936&tool=pmcentrez&rendertype=abstract

8. Stoney, R. and Ames, R. and Nenadic, G. and Robertson, D. and Schwartz, J.: Disentan-gling the multigenic and pleiotropic nature of molecular function. BMC Syst. Biol. 2015;9.

146 BelBI2016, Belgrade, June 2016.

Page 177: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Gene expression in schizophrenia patients andnon-schizophrenic individuals infected with

Toxoplasma gondii

Aleksandra Uzelac1, Tijana Stajner1, Milos Busarcevic1, Ana Munjiza2, MilutinKostic2, Cedo Miljevic2, Dusica Lecic-Tosevski2, Nenad Mitic3, Sasa Malkov3,

and Olgica Djurkovic-Djakovic1

1 Center of Excellence for Food- and Vector-borne Zoonoses, Institute for MedicalResearch, University of Belgrade, Dr. Subotica 4, 11129 Belgrade, Serbia

[email protected] Institute of Mental Health, School of Medicine, University of Belgrade, Palmoticeva

37, 11000 Belgrade, Serbia3 Faculty of Mathematics, University of Belgrade, Studentski trg 16, 11000 Belgrade,

Serbia{nenad,smalkov}@matf.bg.ac.rs

Abstract

There is an increasing body of data suggesting the association of infection withthe protozoan parasite Toxoplasma gondii and schizophrenia. In this study, weemployed a combination of data mining and bioinformatics to investigate whetherany genes from loci which harbor schizophrenia associated SNPs, as determinedby a GWAS study by Ripke et al [1], are associated with the immune response toToxoplasma gondii infection. After extracting a list of genes from the loci, we ex-amined the expression of their murine homologs in response to acute infectionwith T. gondii in brain homogenates and lymphocytes of experimentally infectedanimals by mining microarray data published by Jia et al [2]. Of the 208 uniqueprotein coding genes in schizophrenia associated loci we were able to cross refer-ence with both sets of microarray data, 108 differed in expression by at least 30%with respect to controls. Functional annotation clustering using the algorithm in-cluded in the DAVID bioinformatics resources 6.7 database confirmed that thestatistically most significant annotation cluster was indeed enriched with geneswhich code for proteins with immune functions. Based on these results, we se-lected the following genes HLA-DQA1, TAP1, TAP2, PSMB8, EGFL8, LY6G6C,C4A and CFB, which are all located in the MHC region on chromosome 6, forvalidation by real time PCR. Their expression is being assayed in the peripheralblood of schizophrenia patients infected and not infected with T. gondii and thecorresponding non-schizophrenic controls. Preliminary results suggest that theexpression of HLA-DQA1 and TAP2 in response to T. gondii infection is indeedaltered in schizophrenia patients. We are also currently investigating whetherthe infection itself or an altered immune response to the infection can be cor-related with the patients Positive and Negative Symptom Scale (PANSS) scoresand thereby with the clinical presentation of schizophrenia.

Keywords: bioinformatics, data mining, gene expression, Toxoplasma gondii

BelBI2016, Belgrade, June 2016. 147

Page 178: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Aleksandra Uzelac et al.

References

1. Ripke, S. et al: Genome-wide association analysis identifies 13 new risk loci for schizophre-nia. Nat. Genet. 45(10):1150-9. (2013)

2. Jia, B. and Lu, H. and Liu, Q. and Yin, J. and Jiang, N. and Chen, Q.: Genome-wide compar-ative analysis revealed significant transcriptome changes in mice after Toxoplasma gondiiinfection. Parasit.Vectors. 4,6:161 (2013)

148 BelBI2016, Belgrade, June 2016.

Page 179: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Propensities of amino acid toward certainsecondary protein structure types: comparison of

different statistical methods

Dusan Z. Veljkovic1, Sasa Malkov2, Vesna B. Medakovic1, and Snezana D. Zaric1

1 Department of Chemistry, University of Belgrade, Studentski trg 16, Belgrade, [email protected]

2 Faculty of Mathematics, University of Belgrade, Studentski trg 16, Belgrade, Serbia

Abstract

The conformational preferences of amino acids are of great importance for un-derstanding conformational interactions in proteins. When used as propensi-ties, these preferences can be helpful in predicting secondary and tertiary struc-tures of proteins. Several statistical studies were performed in order to calcu-late amino acid propensities [1, 2]. In our previous work we carried out studyof amino acid propensities using statistical method [3, 4]. Based on the study,preferences of amino acids towards certain secondary structures classify aminoacids into four groups: -helix preferrers, strand preferrers, turn and bend prefer-rers, and His and Cys (these two amino acids do not show clear preference forany secondary structure). Amino acids in the same group have similar structuralcharacteristics at their Cβ and Cγ atoms that predict their preference for a par-ticular secondary structure.

In this work other statistical methods for calculation of amino acid propensitieswere compared to the statistical method which was used in our previous work.Comparison was made on the basis of correlation coefficients (ρ(s,p)). The re-sults show that although methods are similar, there are some significant differ-ences, resulting in a more explicit connection between our classification andamino acid chemical structure. Application of our statistical approach allows forstricter conclusions, without misjudgment on the amino acid’s preferences.

Keywords: amino acids, preferences, correlations, classification

References

1. Chou P. Y., Fasman G. D.: Conformational parameters for amino acids in helical, beta-sheet,and random coil regions calculated from proteins. Biochemistry, 211-222. (1974)

2. Levitt M.: Conformational preferences of amino acids in globular proteins. Biochemistry,4277-4285. (1978)

3. Malkov S. N., ivkovi M. V., Beljanski M. V., Hall M. B., Zari S. D.: A reexamination ofthe propensities of amino acids towards a particular secondary structure: classification ofamino acids based on their chemical structure. J. Mol. Model. 769-775. (2008)

4. Malkov S. N., ivkovi M. V., Beljanski M. V., Stojanovi S. ., Zari S. D.: A reexamination ofcorrelations of amino acids with particular secondary structures. The Protein Journal, 74-86. (2009)

BelBI2016, Belgrade, June 2016. 149

Page 180: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Botryosphaeriaceae on Aesculus hippocastanum inSerbia

Milica Zlatkovic1, Nenad Keca1, Michael Wingfield2, Fahimeh Jami2, andBernard Slippers2

1 University of Belgrade-Faculty of Forestry, Kneza Viseslava 1, Belgrade, [email protected]

2 Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, CnrLynwood and University roads, Pretoria, South Africa

Abstract

Horse chestnut (Aesculus hippocastanum L.) is a large, long lived, deciduous treeendemic to the Southern part of the Balkan Peninsula, in South Eastern Europe[1]. The seeds of this tree are widely used in medicinal and pharmaceutical in-dustries. Because of its large hand-shaped leaves and attractive white flowersA. hippocastanum is a highly valuable street and shade tree commonly plantedacross Europe. In recent years, A. hippocastanum trees in Serbia have exhibiteddie-back of shoots, shoot cankers and necrotic lesions in the lower parts of thestems. Samples were collected from the symptomatic tissues in Belgrade, Ser-bia from 2009-2015. The consistently isolated fungal colonies were grey andBotryosphaeriaceae-like [2] and the aim of this study was to identify them.Based on morphology of the asexual morph and phylogeny of DNA sequencedata for the internal transcribed spacer (ITS), translation elongation factor 1α(TEF 1-α), β-tubulin-2 (BT2) and large subunit (LSU) gene regions the isolateswere identified as Botryospaheria dothidea, Neofusicoccum parvum, Diplodia mu-tila and Dothiorella sarmentorum. A. hippocastanum is in danger of extinctiondue to the population decline caused by the invasive leaf miner moth Camerariaohridella and the species has been listed in the IUCN red list of threatened plants[3]. This study adds to the knowledge on the identity of Botryosphaeriaceae aspotential pathogens of this important and threatened tree.Keywords: Botryosphaeriaceae, multigene phylogeny, identification, Aesculus hip-pocastanum

References

1. Jovanovic, B.: Dendrologija. IV izmenjeno izdanje. Univerzitet u Beogradu. Beograd. (1985)2. Zlatkovic, M. and Keca, N. and Wingfield, M. J. and Jami, F. and Slippers, B.: Botryosphaeri-

aceae associated with the die-back of ornamental trees in Serbia, Antonie van Leeuwen-hoek, International Journal of General and Molecular Microbiology, 109: 543–564. (2016)

3. Khela, S.: Aesculus hippocastanum. The IUCN Red List of Threatened Species. (2013).Accessed on 28 April 2016.

150 BelBI2016, Belgrade, June 2016.

Page 181: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Botryosphaeriaceae on Sequoia sempervirens inSerbia

Milica Zlatkovic1, Nenad Keca1, Michael Wingfield2, Fahimeh Jami2, andBernard Slippers2

1 University of Belgrade-Faculty of Forestry, Kneza Viseslava 1, Belgrade, [email protected]

2 Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, CnrLynwood and University roads, Pretoria, South Africa

Abstract

Coastal redwood (Sequoia sempervirens) is an evergreen, large, long-lived treenative to Western North America. It is the only species in the genus Sequoiaand is an important timber tree valued for its beauty, light-weight timber thatis resistant to decay and fire damage. S. sempervirens is in danger of extinctiondue to its population decline and the species has been listed in the IUCN redlist of threatened plants [1]. In Serbia, the only known S. sempervirens tree isplanted in the botanical garden ”Jevremovac” in Belgrade. In autumn 2011, thetree exhibited branch flagging associated with branch and shoot cankers withthe leaves remaining attached. Tissue samples associated with these symptomswere plated on Malt Extract Agar (MEA). One week later, fast-growing, grey fun-gal colonies resembling those of the Botryosphaeriaceae spp. [2] were obtainedand the aim of this study was to identify them. Morphology of the asexual morphand phylogenetic inference based on DNA sequence data for the internal tran-scribed spacer (ITS), translation elongation factor 1α (TEF 1-α), β-tubulin-2(BT2) and large subunit (LSU) gene regions showed that isolates representedDiplodia mutila, Neofusicoccum parvum and Botryospaheria dothidea. In its natu-ral range, S. sempervirens grows in coastal areas with moist climate. The reasonsfor the newly emerging die-back of this tree in Serbia might be connected withthe recent drought periods that could have provided stressful conditions that aretypically associated with opportunistic infections by Botryosphaeriaceae.Keywords: Botryosphaeriaceae, multigene phylogeny, identification, Sequoia sem-pervirens

References

1. Farjon, A. and Schmid, R.: Sequoia sempervirens. The IUCN Red List of ThreatenedSpecies. (2013) Accessed on 26 April 2016.

2. Zlatkovic, M. and Keca, N. and Wingfield, M. J. and Jami, F. and Slippers, B.: Botryosphaeri-aceae associated with the die-back of ornamental trees in Serbia, Antonie van Leeuwen-hoek, International Journal of General and Molecular Microbiology, 109: 543–564. (2016)

Page 182: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in
Page 183: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Author Index

Ames, Ryan, 145Anashkina, Anastasia, 39Andjelkovic, Miroslav, 28Antonijevic, N., 112Ari, Eszter, 92Artamonova, Irena, 120Avetisov, Vladik, 1Avramov, Milos, 92

Babenko, Vladimir, 22, 43Banjevic, Milena, 44Banovic, Bojana, 81Baumbach, Jan, 2Belic, Milivoj, 138Beljanski, Milos, 62, 81Blagojevic, Bojana, 46Bozinovski, D., 119Bogdanovic, Milica, 96Bongcam-Rudloff, Erik, 3Brdar, Sanja, 47Brothers, Edward, 138Brusic, Vladimir, 7Bugay, Aleksandr, 48Bundschuh, Ralf, 9Busarcevic, Milos, 49, 147

Carboncini, Maria Chiara, 87Celani, Antonio, 10Chadaeva, Irina, 43Chawla, Nitesh, 11Chen, Ming, 22Ciliberto, Andrea, 12Cimpoiasu, Rodica, 99Cohen, Evan-Gary, 72Constantinescu, Radu, 99Costina, Victor, 128Craveur, Pierrick, 4Cuperlovic-Culf, Miroslava, 52Czyz, Zbigniew, 70

Cupic, Zeljko, 98

Cukovic, Katarina, 96

de Brevern, Alexandre G., 4Delibasic, Boris, 107Dimitrova, Tamara, 53

Divac, A., 112Djordjevic, Magdalena, 46, 144Djordjevic, Marko, 46, 58, 94, 127, 144Djurdjevac Conrad, Natasa, 54Djuric, Tamara, 63Djurkovic-Djakovic, Olgica, 49, 147Dobrovolskaya, Oxana, 22Dovidchenko, Nikita V., 14Dragelj, J. , 119Dragicevic, Milan, 96Dragovich, Branko, 13Dudic, Dragana, 81Dzhus, Ulyana F., 14

Etchebest, Catherine, 4

Fazekas, David, 92Feliciello, Giancarlo, 70Filipovic, Biljana, 96Filipovic, Dragana, 128Filipovic, Vladimir, 55Findeisen, Peter, 128Friedmann, Naama, 72, 114

Gal Chechik, 72Galzitskaya, Oxana V., 14Gelfand, Mikhail, 15Gemovic, Branislava, 130Georgiou, Constantinos, 7Giurato, Giorgio, 96Glisic, Sanja, 131Glyakina, Anna V., 14Graovac, Jelena, 111Grbic, Milana, 55Gregson, Cassie, 74Grigolon, Silvia, 57Grigorashvili, Elizaveta I., 14Grolmusz, Vince, 100Gulsoy, Nagihan, 125Guzina, Jelena, 58, 144Guzvic, Miodrag, 70

Hall, Michael, 138Hernandez, Robert, 74

Ispolatov, Iaroslav, 25Ivaskovic, Andrej, 102

Page 184: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Jami, Fahimeh, 150, 151Jandrlic, Davorka, 59Jelic, Asja, 61Jelovic, Ana, 62Jovanovic, Ivan, 63Jovanovic, Jasmina, 63

Kadlecsik, Tamas, 92Kanevska, Polina, 68Kapustin, Mikhail, 120Kartelj, Aleksandar, 55Keca, Nenad, 150, 151Kirsch, Stefan, 70Kitanovic, Nevena, 92Klein, Christoph, 70Knapp, Ernst Walter, 119Knezevic-Jugovic, Zorica, 132Kojic, S., 112Kokai, Dunja, 92Kolar-Anic, Ljiljana, 98Korcsmaros, Tamas, 92Kostic, Milutin, 147Kovacevic, Jovana, 69, 111Kozyrev, Sergei, 16Kriventseva, Evgenia, 35Krivokuca, Nikola, 92Kusic-Tisma, Jelena, 112

Lahrmann, Urs, 70Lakretz, Yair, 72, 114Lau, Stella, 102Lecic-Tosevski, Dusica, 147Ljujic, M., 112Lopatina, Anna, 120

Macesic, Stevan, 98Macek, Milan, 112Magrini, Massimo, 87Malenov, Dusan, 138Malkov, Sasa, 147, 149Malod-Dognin, Noel, 17Marchenkov, Victor V., 14Markovic, Vladimir, 98Masmoudi, Hanen, 73Matic, Dragan, 55Medakovic, Vesna, 119, 149Medvedeva, Sofia, 120Meyer, T., 119Misic, Natasa, 79Mihaljevic, Ljubica, 130Miljevic, Cedo, 147Milosevic, Nikola, 74

Milovanovic, Ivan, 49Mitic, Nenad, 59, 62, 147Mohamed, Salwa, 22Morina, Filis, 121, 123Morozov, Alexandre, 18Munjiza, Ana, 147Mutlu, Ozal, 80, 125

Narwani, Tarun, 4Nekrasov, Alexei, 39Nenadic, Goran, 19, 74, 145Nicolaidis, Argyris, 20Nikolic, Milos, 127Ninkovic, Dragan, 138Niu, Shuqiang, 138

Obradovic, Zoran, 21Orlov, Yuriy, 22, 43

Pajic, Vesna, 81Paradisi, Paolo, 87Pavlovic Lazetic, Gordana, 69, 111Pavlovic, Mirjana, 59Peric, Ivana, 128Perovic, Vladimir, 130, 131Petrovic, Bojan, 132Petrovic, Predrag, 138Polzer, Bernhard, 70Popovic, Zeljko, 92Przulj, Natasa, 23Ptakova, Nikola, 112Punta, Marco, 24

Radivojac, Predrag, 69Radojkovic, D., 112Radovanovic, Sandro, 107Rakicevic, Lj., 112Rebehmed, Joseph, 4Righi, Marco, 87Robertson, David, 145Rodic, Andjela, 94Roettger, Richard, 95Ryan, Allison, 44

Salem, Khaled, 22Salvetti, Ovidio, 87Santuz, Hubert, 4Savitskaya, Ekaterina, 120Schwartz, Jean-Marc, 145Sedlarevic, Ana, 121, 123Selivanova, Olga M., 14Semenova, Ekaterina, 25

Page 185: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Severinov, Konstantin, 25, 120Shinada, Nicolas, 4Shmakov, Sergey, 120Sigurjonsson, Styrmir, 44Simonovic, Ana, 96, 121, 123Slippers, Bernard, 150, 151Sorba, Paul, 26Stankovic, Aleksandra, 63Stankovic, Ivana, 119, 139Stankovic, Tamara, 127, 144Stanojevic, Ana, 98Stanojevic,Milos, 102Stojmirovic, Aleksandar, 27Stoney, Ruth, 145Streche, Alina, 99Subotic, Angelina, 96Suknovic, Milija, 107Sumonja, Neven, 131Surin, Alexey K., 14Suvorina, Mariya Yu., 14Szalkai, Balazs, 100

Stajner, Tijana, 147

Tadic, Bosiljka, 28Tesic, M., 112Tikhonov, Alexey, 120Todorovic, Sladjana, 96Tompa, Peter, 29Tosatto, Silvio, 30

Trbovich, Aleksandar, 49Treves, Alessandro, 72, 114

Uversky, Vladimir, 31Uzelac, Aleksandra, 147Uzelac, Iva, 92Uzelac,Aleksandra, 49

Varga, Balint, 100, 101Velickovic, Petar, 102Veljkovic, Dusan, 119, 149Veljkovic, Nevena, 32, 130, 131Veljovic-Jovanovic, Sonja, 121, 123Vidovic, Marija, 121, 123Virgillito, Alessandra, 87Volkov, Sergey, 33, 68Vukicevic, Milan, 107Vukojevic, Vladana, 98

Waterhouse, Robert, 35Wingfield, Michael, 150, 151

Xenarios, Ioannis, 36

Zaric, Snezana, 119, 138, 139, 149Zdobnov, Evgeny, 35Zhang, Ping, 7Zlatkovic, Milica, 150, 151

Zivkovic, Maja, 63

Page 186: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in
Page 187: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

List of partipipants

1. Anastasia Anashkina, Engelhardt Institute of Molecular Biology, Laboratoryof computational methods for system biology, Russian Academy of Sciences,Moscow, Russia

2. Vladik Avetisov, The Semenov Institute of Chemical Physica, RAS Moscow,Russia

3. Milos Avramov, University of Novi Sad, Faculty of Sciences, Department ofBiology and Ecology, Serbia

4. Vladimir Babenko, Institute of Cytology and Genetics, Novosibirsk, Russia5. Milena Banjevic, Natera, Department of Statistical Research, San Carlos, Cal-

ifornia, United States of America6. Bojana Banovic, Institute of Molecular Genetics and Genetic Engineering,

University of Belgrade, Serbia7. Jan Baumbach, Head of the Computational Biology Group Dept. of Math-

ematics and Computer Science (IMADA), University of Southern Denmark(SDU), Denmark

8. Milos Beljanski, Institute for General and Physical Chemistry, University ofBelgrade, Serbia

9. Bojana Blagojevic, Institute of Physics Belgrade, Serbia10. Erik Bongcam-Rudloff, Division of Molecular Genetics, Department of An-

imal Breeding and Genetics, Swedish University of Agricultural Sciences,Sweden

11. Sanja Brdar, BioSense Institute for research and development of informa-tion technology in biosystem, University of Novi Sad, Serbia

12. Alexandre de Brevern, University Paris Diderot, Sorbonne Paris Cite, Paris,France

13. Vladimir Brusic, School of Medicine and Bioinformatics Center, NazarbayevUniversity, Kazakhstan and Department of Computer Science, MetropolitanCollege, Boston University, USA

14. Aleksandr Bugay, Joint Institute for Nuclear Research, Laboratory of Radia-tion Biology, Moscow, Russia

15. Ralf Bundschuh, The Ohio State University, Department of Physics, Chem-istry & Biochemistry, Division of Hematology, USA

16. Milos Busarcevic, Center of Excellence for Food and Vector-borne Zoonoses,Institute for Medical Research, University of Belgrade, Serbia; United WorldCollege of the Adriatic, Duino, Italy

17. Oliviero Carugo, Faculty of Science, University of Pavia, Italy18. Antonio Celani, The Abdus Salam International Centre for Theoretical Physics,

Trieste, Italy19. Nitesh Chawla, Frank M. Freimann Professor of Computer Science and En-

gineering, University of Notre Dame, USA20. Andrea Ciliberto, IFOM-IEO, Italy21. Miroslava Cuperlovic-Culf, National Research Council of Canada, Depart-

ment for Information Communication Technologies, Ottawa, Canada22. Tamara Dimitrova, Macedonian Academy of Sciences and Arts, Research

Center for Computer Science and Information Technologies, Skopje, Mace-donia

Page 188: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

23. Marko Djordjevic, Faculty of Biology, University of Belgrade, Serbia24. Branko Dragovich, Institute of Physics, Mathematical Institute SANU, Bel-

grade, Serbia25. Dragana Dudic, Faculty of Agriculture, Belgrade, Serbia26. Natasa Djurdjevac Conrad, Zuse Institute Berlin, Germany27. Olgica Djurkovic-Djakovic, Institute for Medical Research, University of Bel-

grade, Serbia28. Oxana Galzitskaya, Group of bioinformatics, Institute of Protein Research of

the RAS, Russia29. Vladimir Gasic, Institute of Molecular Genetics and Genetic Engineering,

Belgrade, Serbia30. Mikhail Gelfand, A.A. Kharkevich Institute for Information Transmission

Problems, RAS, Faculty of Bioengineering and Bioinformatics, M.V. LomonosovMoscow State University, Moscow, Russia

31. Giorgio Giurato, Genomix4Life, Italy32. Sanja Glisic, Institute of Nuclear Sciences VINCA, Center for Multidisci-

plinary Research, Belgrade, Serbia33. Jelena Graovac, Faculty of Mathematics, Department of Computer Science,

University of Belgrade, Serbia34. Milana Grbic, Faculty of Science and Mathematics, Univeristy of Banja Luka,

Bosnia and Herzegovina35. Silvia Grigolon, Lincolns Inn Fields Laboratory, The Francis Crick Institute,

London, United Kingdom36. Jelana Guzina, Faculty of Biology, University of Belgrade, Serbia37. Maja Gvozdenov, Institute of Molecular Genetics and Genetic Engineering,

Laboratory for Molecular Biology, University of Belgrade, Serbia38. Andrej Ivaskovic, Faculty of Computer Science and Technology, Cambridge,

United Kingdom39. Davorka Jandrlic, Faculty of Mechanical Engineering, Department for Math-

ematics, University of Belgrade, Serbia40. Asja Jelic, The Abdus Salam International Centre for Theoretical Physics

(ICTP), Department for Quantitative Life Sciences, Trieste, Italy41. Ana Jelovic, Faculty of Transport and Traffic Engineering, Department of

General and Applied Mathematics, Univeristy of Belgrade, Serbia42. Tihomir Jovanic, School of Electrical Engineering, Belgrade, Serbia43. Ivan Jovanovic, VINcA Institute of Nuclear Sciences, University of Belgrade,

Department for Radiobiology and Molecular Genetics, Serbia44. Jasmina Jovanovic, Faculty of Mathematics, Belgrade University, Serbia45. Polina Kanevska, Bogolyubov Institute for Theoretical Physics, Kyiv, Ukraine46. Jelena Kostic, Institute of Molecular Genetics and Genetic Engineering, Lab-

oratory for Molecular Biology, Belgrade, Serbia47. Jovana Kovacevic, Faculty of Mathematics, Belgrade University, Department

for Computer Science, Serbia48. Sergei Kozyrev, Steklov Mathematical Institute, Moscow, Russia49. Jelena Kusic-Tisma, Institute of Molecular Genetics and Genetic Engineer-

ing, Laboratory for Molecular Biology, Belgrade, Serbia50. Urs Lahrmann, Fraunhofer Institute for Toxicology and Experimental Medicine,

Project Group Personalized Tumor Therapy, Regensburg, Germany

Page 189: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

51. Yair Lakretz, Tel-Aviv university, Israel52. Ilija Lalovic, Faculty of Natural Sciences and Mathematics, Banja Luka, Bosnia

and Herzegovina53. Mladen Lazarevic, Seven Bridges Genomics, Serbia54. Mirjana Maljkovic, Faculty of Mathematics, University of Belgrade, Serbia55. Sasa Malkov, Faculty of Mathematics, University of Belgrade, Serbia56. Noel Malod-Dognin, Imperial College London, Department of Computing,

UK57. Mina Mandic, Institute of Molecular Genetics and Genetic Engineering, Lab-

oratory for Microbial Molecular Genetics and Ecology, Belgrade, Serbia58. Hanen Masmoudi, Higher institute of Biotechnology of Sfax, Tunisia59. Dragan Matic, Faculty of Science and Mathematics, Univeristy of Banja Luka,

Bosnia and Herzegovina60. Vesna Medakovic, Faculty of Chemistry, University of Belgrade, Serbia61. Sofia Medvedeva, Skolkovo Institute of Science and Technology, Skolkovo,

Russia62. Sanja Mijalkovic, Seven Bridges Genomics, Serbia63. Nikola Milosevic, University of Manchester, School of Computer Science,

United Kingdom64. Natasa Misic, Lola Institute, Belgrade, Serbia65. Nenad Mitic, Faculty of Mathematics, University of Belgrade, Serbia66. Ivana Moric, Institute of Molecular Genetics and Genetic Engineering, Uni-

versity of Belgrade, Serbia67. Filis Morina, Institute for Multidisciplinary Research, Department of Life

Sciences, University of Belgrade, Serbia68. Alexandre Morozov, Rutgers University, USA69. Ozal Mutlu, Marmara University, Faculty of Arts and Sciences, Department

for Biology, Istanbul, Turkey70. Giovanni Nassa, Genomix4Life, Italy71. Goran Nenadic, School of Computer Science, University of Manchester, In-

stitute of Biotechnology & Health eResearch Centre, Manchester, UK; Math-ematical Institute of SASA, Belgrade, Serbia

72. Argyris Nicolaidis, Aristotle University of Thessaloniki, Greece73. Milos Nikolic, Faculty of Biology, University of Belgrade, Serbia74. Zoran Obradovic, Center for Data Analytics and Biomedical Informatics,

Temple University, USA75. Zoran Ognjanovic, Mathematical Institute SASA, Serbia76. Yuriy L. Orlov, Institute of Cytology and Genetics SB RAS, Novosibirsk State

University, Russia77. Vesna Pajic, University of Belgrade, Faculty of Agriculture, Department for

Mathematics and Physics, Center for Data Mining and Bioinformatics, Bel-grade, Serbia

78. Paolo Paradisi, Institute of Information Science and Technologies, NationalResearch Council (ISTI CNR), Department for Signals and Images Labora-tory (SI-Lab), Pisa, Italy

79. Mirjana Pavlovic, Institute for General and Physical Chemistry, University ofBelgrade, Serbia

Page 190: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

80. Gordana Pavlovic-Lazetic, Faculty of Mathematics, University of Belgrade,Serbia

81. Ivana Peric, Vinca Institute of Nuclear Sciences, Laboratory of MolecularBiology and Endocrinology, Belgrade, Serbia

82. Vladimir Perovic, Centre for Multidisciplinary Research, Institute of NuclearSciences Vinca, University of Belgrade, Belgrade, Serbia

83. Jelena Petkovic, Institute of Molecular Genetics and Genetic Engineering,Belgrade, Serbia

84. Marko Petkovic, Seven Bridges Genomics, Serbia85. Bojan Petrovic, Faculty of Technology and Metallurgy, Department for Bio-

chemical Engineering and Biotechnology, University of Belgrade, Serbia86. Zeljko Popovic, University of Novi Sad, Faculty of Sciences, Department of

Biology and Ecology, Serbia87. Natasa Przulj, Department of Computing , Imperial College London, UK88. Marco Punta, Centre for Evolution and Cancer, The Institute of Cancer Re-

search, London, UK89. Krsto Radanovic, University of Banja Luka, Faculty of Sciences, Department

for Biology, Bosnia and Herzegovina90. Miloje Rakocevic, Mathematical Institute SASA, Serbia91. Andjela Rodic, University of Belgrade, Faculty of Biology, Serbia92. Richard Roettger, Department of Mathematics and Computer Science, Uni-

versity of Southern Denmark, Odense, Denmark93. Jelena Samardzic, Institute of Molecular Genetics and Genetic Engineering,

University of Belgrade, Serbia94. Milica Selakovic, University in Belgrade, Faculty of Mathematics, Serbia95. Konstantin Severinov, Rutgers University, Department of Molecular Biology

and Biochemistry, Waksman Institute of Microbiology, USA96. Ana Simonovic, Institute for Biological Research Sinisa Stankovic, Belgrade,

Serbia97. Paul Sorba, Laboratory of Theoretical Physics and CNRS, Annecy, France98. Ivana Stankovic, Institute of Chemistry, Technology and Metallurgy, Univer-

sity of Belgrade, Serbia99. Tamara Stankovic, Institute of Physiology and Biochemistry, Faculty of Biol-

ogy, University of Belgrade, Serbia100. Ana Stanojevic, University of Belgrade, Faculty of Physical Chemistry, Ser-

bia101. Biljana Stojanovic, Faculty of Mathematics, University of Belgrade, Serbia102. Aleksandar Stojmirovic, Janssen R & D, LLC, Systems Pharmacology &

Biomarkers, Immunology TA, USA103. Ruth Stoney, University of Manchester, United Kingdom104. Alina-Maria Streche, University of Craiova, Department of Physics, Roma-

nia105. Neven Sumonja, Vinca Institute of Nuclear Sciences, Centre for Multidisci-

plinary Research and Engineering, Belgrade, Serbia106. Balazs Szalkai, Eotvos Lorand University, Budapest107. Bosiljka Tadic, Department of Theoretical Physics, Jozef Stefan Institute,

Ljubljana, Slovenia

Page 191: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

108. Peter Tompa, VIB Structural Biology Research Center, Flanders Institute forBiotechnology (VIB), Belgium

109. Vladanka Topalovic, Institute of Molecular Genetics and Genetic Engineer-ing, Laboratory for Human Molecular Genetics , Belgrade, Serbia

110. Silvio Tosatto, Department of Biomedical Sciences, University of Padova,Italy

111. Vladimir Uversky, Department of Molecular Medicine, Morsani College ofMedicine, University of South Florida, Tampa, USA

112. Iva Uzelac, University of Novi Sad, Faculty of Sciences, Department of Bi-ology and Ecology, Novi Sad, Serbia

113. Aleksandra Uzelac, Institute for Medical Research, Center of Excellence forFood- and Vector-borne Zoonoses, Department for Parasitology, Belgrade,Serbia

114. Balint Varga, Eotvos Lorand University, Budapest, Hungary115. Petar Velickovic, University of Cambridge, Faculty of Computer Science and

Technology, Cambridge, United Kingdom116. Aleksandar Veljkovic, Faculty of Mathematics, University of Belgrade, Ser-

bia117. Nevena Veljkovic, Institute for Nuclear Sciences VINCA, University of Bel-

grade, Serbia118. Sergey Volkov, Bogolyubov Institute for Theoretical Physics, Kiev, Ukraine119. Milan Vukicevic, University of Belgrade, Faculty of Organizational Sciences,

Serbia120. Robert Waterhouse, Department of Genetic Medicine and Development,

Medical School, University of Geneva, Swiss Institute of Bioinformatics, Switzer-land

121. Ioannis Xenarios, SIB Swiss Institute of Bioinformatics, Switzerland122. Ping Zhang, Griffith University, Southport, Australia123. Milica Zlatkovic, University of Belgrade-Faculty of Forestry, Belgrade, Ser-

bia

Page 192: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in
Page 193: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

S P O N Z O R S

Page 194: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in
Page 195: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in

Ministry of Education, Science andTechnological Development of

Republic of Serbia

Page 196: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in
Page 197: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in
Page 198: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in
Page 199: Book of Abstracts - University of Belgradealas.matf.bg.ac.rs/~websites/bioinfo/wp-content/...The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in