spas e-scibioenergy: program and presentation abstracts

Upload: brazilian-bioethanol-science-and-tech-laboratory

Post on 04-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 SPAS e-SciBioenergy: Program and Presentation Abstracts

    1/20

  • 7/31/2019 SPAS e-SciBioenergy: Program and Presentation Abstracts

    2/20

  • 7/31/2019 SPAS e-SciBioenergy: Program and Presentation Abstracts

    3/20

  • 7/31/2019 SPAS e-SciBioenergy: Program and Presentation Abstracts

    4/20

    4

  • 7/31/2019 SPAS e-SciBioenergy: Program and Presentation Abstracts

    5/20

    5

    Program

    Time

    October 22

    Monday

    October 23

    Tuesday

    October 24

    Wednesday

    October 25

    Thursday

    October 26

    Friday

    8:30 Registration

    9:00 Opening

    9:30Presentation

    FAPESPTalk C. Ambroise Talk T. Dunning Talk J. E. Ferreira Talk Y. Xu

    10:30 Break Break Break Break Break

    11:00

    Talk M. Mattoso

    Talk C. Ambroise Talk T. Dunning Talk S. Sansone Talk Y. Xu

    12:00 Talk M. Mattoso Talk T. Dunning Talk S. Sansone

    13:00 Lunch Lunch Lunch Lunch Lunch

    14:30 Talk C. B. Medeiros Talk M. MattosoTalk B.S.

    Manjunath

    Talk C. B.

    Medeiros

    15:30 Talk C. AmbroiseTalk B.S.

    ManjunathPosters Students

    Talk Graduate

    Progs

    16:30 Break Break Break Break

    17:00 Talk C. Ambroise Talk B.S.Manjunath

    Posters Students - -

  • 7/31/2019 SPAS e-SciBioenergy: Program and Presentation Abstracts

    6/20

    6

    B. S. Manjunath, Centre for Bio-image Informatics, University of

    California (UCSB), USA

    Introduction to Bio-Image Informatics. Introduction to the topic;

    fundamental issues in image and video segmentation and tracking, examples

    drawn from recent research. (Lecture time: 2 hours)

    Introduction to Bisque Cyber Infrastructure for Bio-image Informatics. A

    high level introduction to the open source Bisque image database platform for

    managing, processing, indexing and searching bio-images. (Lecture time: 1

    hour)

  • 7/31/2019 SPAS e-SciBioenergy: Program and Presentation Abstracts

    7/20

    7

    Christophe Ambroise, Laboratoire Statistique et Gnome, Centre

    National de la Researche Scientifique (CNRS), France

    Statistical Models for Biological Network Inference. Gaussian Graphical

    Models provide a convenient framework for representing dependencies

    between variables. In this framework, a set of variables is represented by an

    undirected graph, where vertices correspond to variables, and an edge

    connects two vertices if the corresponding pair of variables are dependent,

    conditional on the remaining ones. Recently, this tool has received a high

    interest for the discovery of biological networks by l1-penalization of the model

    likelihood. In this lecture, we introduce various ways of inferring sparse co-

    expression networks from either steady-state or time-course transcriptomic

    data. We will focus on inference from samples collected in different

    experimental conditions and therefore not identically distributed. (Lecture

    time: 2 x 2 hours)

  • 7/31/2019 SPAS e-SciBioenergy: Program and Presentation Abstracts

    8/20

    8

    Cludia Bauzer Medeiros, Institute of Computing, University of

    Campinas (UNICAMP), SP, Brazil

    The Era of eScience: building the ark during the data deluge. Scientists

    from all domains (from the mathematical to the social sciences) are collecting

    enormous amounts of data. These data are captured from a variety of devices

    (from those aboard satellites to microsensors in embedded systems), but also

    provided by experiments, or even social networks. This has originated the so-

    called "data deluge", sometimes referred to as "data tsunami", in recognition

    that a large amount of these data will never be seen or directly managed by

    humans. eScience has emerged as a branch of science characterized by joint

    research between computer scientists and scientists from other domains to

    leverage and accelerate research in those domains, helping scientists to

    analyze, filter, manipulate, visualize and interpret their data, while at the same

    time supporting cooperative work. This talk is geared towards discussing a few

    major trends in eScience research, from a data-centric perspective, with

    examples from several scientific domains. (Lecture time: 1 hour)

    Coping with Digital Preservation: preserving the present to help the

    future. We daily generate an enormous amount of data - for instance, during

    bank transactions, phone calls, credit card operations and others. Moreover,

    there are countless kinds of data linked to us -X-ray images, security videos in

    stores and banks, radar-triggered photos in streets, and so on. All this

    information is stored, frequently during several years, and maintained by third

    parties, given its economic and/or social value. What are we doing, however,

  • 7/31/2019 SPAS e-SciBioenergy: Program and Presentation Abstracts

    9/20

    9

    with other very valuable kinds of data sets - the data generated by our

    research? Our work involves complex models and computational simulations

    whose intermediate and final results need to be stored. We may archive the

    most relevant files, but there are many more data sets that are lost, sometimesfor lack of adequate procedures, or time, or even appropriate hardware to

    record the data. This phenomenon is repeated in any context that involves

    experimental activities, e.g., in biology, chemistry, physics, sociology,

    anthropology, and so on. Even when all data and models involved in an

    experiment are recorded, there are other challenges to meet. For instance, how

    to ensure that we will be able to retrieve the desired information, in the future?

    And how to share and disseminate the results of our work? This and otherissues are at the origin of digital preservation concerns. They are geared

    towards investigating new methods, models, algorithms and mechanisms to

    support data organization, archival and retrieval, for long term accessibility,

    while at the same time considering the issues of quality, reliability and

    durability. Preservation research can also be applied to corporate or business

    data, but the problems involved (and their solution) are not the same. This talk

    will discuss some of the challenges faced by the research in the preservation ofexperimental research data. (Lecture time: 1 hour)

  • 7/31/2019 SPAS e-SciBioenergy: Program and Presentation Abstracts

    10/20

    10

    Joo Eduardo Ferreira, Computer Science Department, Institute of

    Mathematics and Statistics (IME), University of So Paulo, BrazilTransaction Processing for e-Science Applications. The management of molecular

    and clinical data in e-Science applications has introduced new requirements for

    database storage and transaction processing systems. There are two famous phrases

    that resume the e-Science scenario. The first phrase is Science is becoming data-

    intensive and collaborative, and the second is Researchers from numerous disciplines

    need to work together to attack complex problems; openly sharing data will pave the

    way for researchers to communicate and collaborate more effectively. These phraseswere written by Ed Seidel, acting assistant director for NSF Mathematical and Physical

    Sciences directorate. This e-Science scenario shows that we are in data deluge age

    where transaction processing systems under collaborative research perspective is an

    important computer science challenge. More concretely, in typical e-Science laboratory

    routines, transaction processing is used in many tests that are performed concurrently

    and supervised by researchers. New tests are defined frequently, so researchers have to

    be guided to execute the right task at appropriate time. Incompatibilities among

    previous processes and new data requirements make the integration and analysis of

    available knowledge very difficult. This problem is compounded by the process of

    scientific knowledge discovery, which requires frequent process updates, collaborative

    interactions among researchers, and refinement of scientific hypotheses. This e-Science

    scenario requires an appropriate transaction processing in order to avoid data manual

    approaches that quickly become very expensive or commonly infeasible. In this talk,

    we provide a historical perspective, main recent challenges and solutions of

    transactional processing for e-Science applications. (Lecture time: 1 hour)

  • 7/31/2019 SPAS e-SciBioenergy: Program and Presentation Abstracts

    11/20

    11

    Marta L. Queirs Mattoso (jointly with Jonas Dias and Kary Ocana),

    Alberto Luiz Coimbra Institute for Graduate Studies and ResearchEngineering (COPPE), Federal University of Rio de Janeiro (UFRJ), Brazil

    Exploring Provenance Data in High Performance Scientific Computing. Large-scale

    scientific computations are often organized as a composition of many computational

    tasks linked through data flow. After the completion of a computational scientific

    experiment, a scientist has to analyze its outcome, for instance, by checking inputs and

    outputs along computational tasks that are part of the experiment. This analysis can be

    automated using provenance management systems that describe, for instance, the

    production and consumption relationships between data artifacts, such as files, and

    the computational tasks that compose the scientific application. Due to its exploratory

    nature, large-scale experiments often present iterations that evaluate a large space of

    parameter combinations. In this case, scientists need to analyze partial results during

    execution and dynamically interfere on the next steps of the simulation. Features, such

    as user steering on workflows to track, evaluate and adapt the execution need to be

    designed to support iterative methods. In this course we define basic concepts of

    scientific workflows and provenance data. We will show examples of scientific

    workflows in the bioinformatics domain. We briefly describe how provenance of many-

    task scientific computations are specified and coordinated by current workflow

    systems on large clusters and clouds. We discuss challenges in gathering, storing and

    querying provenance in high performance computing environments. We also show

    how provenance can enable runtime and useful queries to correlate computational

    resource usage, scientific parameters, and data set derivation. (Lecture time: 2 x 2

    hours)

  • 7/31/2019 SPAS e-SciBioenergy: Program and Presentation Abstracts

    12/20

    12

    Susanna-Assunta Sansone, PhD. Principal Investigator, Team Leader

    University of Oxford, Oxford e-Research Center, Oxford, UKThe Buzz Around Reproducible Bioscience Data: the policies, the communities

    and the standards. Increased availability of the bioscience data generated is

    fuelling increased consumption, and a cascade of derived datasets that

    accelerate the cycle of discovery. But the successful integration of

    heterogeneous data from multiple providers and scientific domains is already a

    major challenge within academia and industry. Even when datasets are

    publicly available, published results are often not reusable due to incomplete

    description of the experimental details. In the last decade, several data

    preservation, management, sharing policies, and plans have emerged in

    response to increased funding for high-throughput approaches in genomics

    and functional genomics bioscience [1]. A growing number of community-

    based initiatives have developed minimum reporting guidelines, terminologies

    and formats (referred to generally as community standards) [2] to structure and

    curate datasets, enabling data annotation to varying degrees; other efforts

    work to maximize the interoperability among these standards [e.g. 3, 4].

    Researchers and bioinformaticians in both academic and commercial

    bioscience, along with funding agencies and publishers, embrace the concept

    that standards are pivotal to enriching the annotation of the entities of interest

    (e.g., genes, metabolites) and the experimental steps (e.g., provenance of study

    materials, technology and measurement types), to ensure that shared

    investigations are comprehensible and (in principle) reproducible. But despite

  • 7/31/2019 SPAS e-SciBioenergy: Program and Presentation Abstracts

    13/20

    13

    all these efforts, in practice data sharing is challenging [5]. Vast swathes of

    bioscience data still remain locked in esoteric formats, are described using ad

    hocor proprietary terminology [e.g. 6], or lack sufficient contextual information;

    many tools do not implement standards even where these exists; a currentwealth of domain-specific reporting standards, or their incompleteness and

    absence in other areas are other major challenges. My presentation will provide

    a snapshot of the current situation. I will highlight a number of stories, the

    social engineering side and also key challenges, enriched by my experience

    over the last decade by working with a variety of stakeholders, including

    bioscience researchers, bioinformaticians, developers in public and private

    sectors, standards developing communities, as well as funders and publishers.(Lecture time: 1 hour)

    References

    1. Field D*, Sansone SA*, Collis A, Booth T, Dukes P, Gregurick SK, Kennedy K,

    Kolar P, Kolker E, Maxon M, Millard S, Mugabushaka AM, Perrin N, Remacle JE,

    Remington K, Rocca-Serra P, Taylor CF, Thorley M, Tiwari B, Wilbanks J:

    Megascience. 'Omics data sharing. Science 326(5950):234-236 (2009)

    2. List of standards at BioSharing: www.biosharing.org3. Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, Goldberg LJ,

    Eilbeck K, Ireland A, Mungall CJ; OBI Consortium, Leontis N, Rocca-Serra P,

    Ruttenberg A, Sansone SA, Scheuermann RH, Shah N, Whetzel PL, Lewis S: The

    OBO Foundry: coordinated evolution of ontologies to support biomedical data

    integration. Nat Biotechnol 25(11):1251-1255 (2007)

    4. Taylor CF,* Field D*, Sansone SA*, Aerts J, Apweiler R, Ashburner M, Ball CA,

    Binz PA, Bogue M, Booth T, Brazma A, Brinkman RR, Michael Clark A, Deutsch

    EW, Fiehn O, Fostel J, Ghazal P, Gibson F, Gray T, Grimes G, Hancock JM, Hardy

    NW, Hermjakob H, Julian RK Jr, Kane M, Kettner C, Kinsinger C, Kolker E, Kuiper

    M, Le Novre N, et al.: Promoting coherent minimum reporting guidelines for

    biological and biomedical investigations: the MIBBI project. Nat Biotechnol

    26(8):889-896 (2008)

    5. Sansone SA and Rocca-Serra P: On the evolving portfolio of community-

    standards and data sharing policies: turning challenges into new opportunities.

    GigaScience 1:10 (2012)

  • 7/31/2019 SPAS e-SciBioenergy: Program and Presentation Abstracts

    14/20

    14

    6. Harland L, Larminie C, Sansone SA, Popa S, Marshall MS, Braxenthaler M,

    Cantor M, Filsell W, Forster MJ, Huang E, Matern A, Musen M, Saric J, Slater T,

    Wilson J, Lynch N, Wise J, Dix I: Empowering industrial research with shared

    biomedical vocabularies. Drug Discov Today 16(21-22):940-947 (2011)

    The Reality From the Buzz: how to deliver reproducible bioscience data. In

    this unsettled status quo - presented in my first talk - how can we enable

    bioscience researchers to make use of existing community standards and

    maximize data sharing and the subsequent reuse of richly annotated

    experimental information?

    A successful example is provided by the Investigation/Study/Assay (ISA) [1]

    open source, metadata-tracking framework developed and supported by the

    growing ISA Commons community [2]. The ISA framework includes both a

    general-purpose file format and a software suite to tackle the harmonization of

    the structure of bioscience experimental metadata (e.g., provenance of study

    materials, technology and measurement types, sample-to-data relationships)

    by enabling compliance with the community standards. This exampleillustrates how the synergy between research and service groups in academia,

    (e.g. in Harvard [3] and at The European Bioinfomatics Institute [4]) and in

    industry (e.g. at The Novartis Institutes for BioMedical Research and at Janssen

    Pharmaceuticals, a company of Johnson & Johnson) across a variety of life

    science domains, is pivotal to build an network of data collection, curation, and

    sharing solutions that progressively enable the invisible use of standards. I will

    present the rationale behind the collaborative development and the evolution

    of this exemplar ecosystem of data curation and sharing solutions - built on the

    common ISA framework. I will also provide high-level examples on how this is

    used to collect, curate and manage heterogeneous experimental metadata in

    an increasingly diverse set of domains including environmental health,

    environmental genomics, metabolomics, (meta)genomics, proteomics, stem

    cell discovery, systems biology, transcriptomics, toxicogenomics, etc. I will also

    discuss the experiences learned by my team, our collaborators and the growing

    user community with usability of the community standards and provide an

  • 7/31/2019 SPAS e-SciBioenergy: Program and Presentation Abstracts

    15/20

    15

    update on the next steps to develop user-friendly visualization functionalities

    and use semantic web approaches to make existing knowledge available for

    linking, querying, and reasoning. (Lecture time: 1 hour)

    References

    1. Rocca-Serra P, Brandizi M, Maguire E, Sklyar N, Taylor C, Begley K, Field D,

    Harris S, Hide W, Hofmann O, Neumann S, Sterk P, Tong W, Sansone SA: ISA

    software suite: supporting standards-compliant experimental annotation and

    enabling curation at the community level. Bioinformatics. 15; 26(18):2354-6(2010); isa-tools.org

    2. Sansone SA*, Rocca-Serra P*, Field D, Maguire E, Taylor C, Hofmann O, Fang

    H, Neumann S, Tong W, Amaral-Zettler L, Begley K, Booth T, Bougueleret L,

    Burns G, Chapman B, Clark T, Coleman LA, Copeland J, Das S, de Daruvar A, de

    Matos P, Dix I, Edmunds S, Evelo CT, Forster MJ, Gaudet P, Gilbert J, Goble C,

    Griffin JL, Jacob D et al.: Toward interoperable bioscience data. Nat Genet 27;

    44(2):121-126 (2012); isacommons.org

    3. Ho Sui SJ, Begley K, Reilly D, Chapman B, McGovern R, Rocca-Sera P, MaguireE, Altschuler GM, Hansen TA, Sompallae R, Krivtsov A, Shivdasani RA, Armstrong

    SA, Culhane AC, Correll M, Sansone SA, Hofmann O, Hide W: The Stem Cell

    DiscoveryEngine: an integrated repository and analysis system for cancer stem cell

    comparisons. Nucleic Acids Res 40 (Database issue):D984-91 (2012). (2012);

    discovery.hsci.harvard.edu

    4. Haug K; Salek R; Conesa P, Hasting J, de Matos P, Rijnbeek M, Mahendraker T,

    Williams M, Neumann S, Rocca-Serra P, Maguire E, Gonzalez Beltran A, Sansone

    SA, Griffin J, Steinbeck C: MetaboLights An open-access general-purpose

    repository for Metabolomics studies and associated meta-data. Nucleic Acids Res

    (in review);www.ebi.ac.uk/metabolights

  • 7/31/2019 SPAS e-SciBioenergy: Program and Presentation Abstracts

    16/20

    16

    Thom H. Dunning, Jr., National Center for SupercomputingApplications, Institute for Advanced Computing Applications and

    Technologies, and Department of Chemistry, University of Illinois at

    Urbana-Champaign

    Scientific Computing in Science and Engineering. Computational modeling and

    simulation is among the most significant developments in the practice of scientific

    inquiry in the 20th Century. Modeling and simulation are now contributors to

    essentially all scientific and engineering research programs and are finding increasing

    use in a broad range of industrial applications. The use of computing technology is

    now spreading to the observational sciences, which are being revolutionized by the

    advent of powerful new sensors that can detect and measure a wide range of physical,

    chemical and biological phenomena. Massive digital detectors in a new generation of

    telescopes have turned astronomy into a digital science. Sensor arrays for

    characterizing ecologies and new sequencing instruments for genomics research are

    revolutionizing the biological sciences. This lecture will discuss the elements ofcomputational modeling and simulation as well as the emerging area of data-driven

    science and discuss the impact of these new approaches in a few fields, while also

    drawing on the lecturers experiences in chemistry. (Lecture time: 1 hour)

    Technology Trends and Future of High Performance Computing. Computing

    technologies are undergoing a dramatic transition. Because of physical limitations, the

    computational power of a single microprocessor core, the basis of all computing

    systems from laptops to supercomputers, has stopped increasing. Dual-core systemswere introduced in 2005, quad-core chips in 2007, and eight-core chips are now

  • 7/31/2019 SPAS e-SciBioenergy: Program and Presentation Abstracts

    17/20

    17

    available from many vendors. This trend will continue into the future, with the number

    of cores on a chip continuing to increase. In fact, the use of innovative computing

    technologies based on many-core chips, e.g., NVIDIA GPUs, is now being seriously

    explored in many areas of scientific computing. This technology shift presents a

    challenge for computational science and engineeringthe only significant

    performance increases in the future will be through the increased exploitation of

    parallelism. Although these technologies promise to bring petascale computers into

    researchers institutions, and even their laboratories, computers built on these

    technologies have significant implications for the design of the next generation of

    science and engineering applications. This lecture will provide an overview of the

    directions in computing technologies as well as describe the challenges associated

    with exploiting these new technologies in computational science and engineering.(Lecture time: 1 hour)

    Blue Waters: overview of a sustained petascale computing system. A new

    generation of supercomputerspetascale computersis providing scientists and

    engineers with the ability to simulate a broad range of natural and engineered systems

    with unprecedented fidelity. Just as important in this increasingly data-rich world,

    these new computers allow researchers to manage and analyze unprecedented

    quantities of data, seeking connections, patterns and knowledge. The impact of thisnew computing capability will be profound, affecting science, engineering andsociety.

    The National Center for Supercomputing Applications at the University of Illinois at

    Urbana-Champaign is deploying a computing system that can sustain one quadrillion

    calculations per second on a broad range of science and engineering applications as

    well as manage and analyze petabytes of data. This computer, Blue Waters, has been

    configured to enable it to solve the most compute-, memory- and data-intensive

    problems in science and engineering. It will have tens of thousands of chips (CPUs &

    GPUs), petabytes of memory, tens of petabytes of disk storage, and hundreds of

    petabytes of archival storage. The presentation will describe Blue Waters and illustrate

    the role that Blue Waters will play in a few illustrative areas of research. (Lecture time: 1

    hour)

  • 7/31/2019 SPAS e-SciBioenergy: Program and Presentation Abstracts

    18/20

    18

    Yan Xu, Microsoft Research, USA

    Open Data for Open Science. Part 1. Tools for data scientists. An introduction

    to some of the most cutting-edge Microsoft technologies that facilitate

    scientists to discover, access, consume, and share scientific data. Part 2. Demos

    of data tools from Microsoft. Demos of how to create solutions using the tools

    presented in Part-1, with real-world scenarios and data. Attendees may bring

    their Windows PC to follow the demos to create data visualization samples with

    their own environmental research data in WorldWide Telescope

    (http://www.worldwidetelescope.org) and share the results on Layerscape

    (http://www.layerscape.org).

  • 7/31/2019 SPAS e-SciBioenergy: Program and Presentation Abstracts

    19/20

  • 7/31/2019 SPAS e-SciBioenergy: Program and Presentation Abstracts

    20/20