bioinformacs resources - rostlab · 2016-04-22 · bioinfres sose 16 organizaon lecture: friday...

66
BioinfRes SoSe 16 Bioinforma)cs Resources Lecture & Exercises Prof. B. Rost, Dr. L. Richter, J. Reeb Ins)tut für Informa)k I12

Upload: others

Post on 05-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

Bioinforma)csResources

Lecture&ExercisesProf.B.Rost,Dr.L.Richter,J.Reeb

Ins)tutfürInforma)kI12

Page 2: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

Bioinforma)csResources

●  Organiza)on●  Schedule

●  Overview

Page 3: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

Organiza)on

●  Lecture: Friday9-12,i.e.9.30-11.45o’clock 10-15minbreakinbetween Room00.13.009A

●  Exercise: Monday14-16o’clockroom 00.08.038,star)ngMon,May2nd Friday13-15o’clockroom01.09.014 star)ngFri,Apr.29th

Page 4: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

TeamBehindtheCourse

Page 5: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

Puta)veSchedule

Apr. 22nd Intro, General Overview (1. sh.) Jun 10th No-SQL (7.sh.) Apr. 29th Sequence Databases (2. sh.) Jun 17th No-SQL (8.sh.)* May 6th No lecture Jun 24th JavaScript / UI (9.sh.) May 13th Sequence Databases (3. sh.) Jul 1st Web Services (10.sh.) May 20th Structure Databases (4. sh)* Jul 8th Bioinformatics Suites / Forums May 27th SQL (5. sh.) Jul 15th Wrap Up, Q&A Jun 3rd SQL (6. sh)

* These exercises can earn you a bonus

Page 6: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

ScheduleDetails

●  NolectureonMay6th●  NoexerciseonFri,May13thandMon,May,16th

●  ExercisesheetsarepublishedonFridaysanddiscussedFri/Montheweeka\er

●  Lastsheet/exercise:Jul4th,Fri/Mon8th/11th

●  Exam:(workingdate):August5th,tobediscussedwiththeaudience

Page 7: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

Overview

●  lectureisnewandconsideredbeta1●  seconditera)on

●  nopriorsyllabusavailableandsubjecttochange

●  dependingontheadvancementsinthelecturesingletopicscouldbeaddedordropped

●  thesequenceoftopicsmightbeshuffled

●  hybridnature:presenta)onofexis)ngresourcesareblendedwithback-andfront-endtechnology

Page 8: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

Exercises

●  Exerciseshelptoconvertknowledgeintoaskill●  prac)calapplica)onoftopicscoveredinthelecture

●  ac)veexplora)onofbioinforma)csresources

●  implemen)ngvariouspartsofbioinforma)csresource

●  usePython/Biopythonascommonplaborm

Page 9: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

Meaning

●  Whatdoes“resource”actuallymeans?●  aGooglequeryabout“Bioinforma)csResource”yieldsabout20Miohits

●  fallsroughlyintothreecategories:-  databases-  tools-  servicecenters

Page 10: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

WorkingonaDefini)on

●  acollec)onofinforma)onwhichisusefultodoresearchintheareaoflifesciences/computa)onalbiology

●  containstheinforma)onitself

●  providesappropriateinterfacestoaccesstheinforma)on

●  mayprovidetoolsforinterac)vedataanalysis

Page 11: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

Genbank/NCBI

●  NIHgene)csequencedatabase●  annotatedcollec)onofallpubliclyavailableDNAsequences

●  partoftheInterna)onalNucleo)deDatabaseCollabora)ontogetherwithDNADataBankofJapan(DDBJ)andtheEuropeanMolecularBiologyLaboratory(EMBL)

Page 12: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

GenbankII●  newreleaseevery2months●  retrievableviaFTPfromtheNCBIwebsite

●  currentreleaseis213.0,April15,2016

●  211,423,912,047basesfrom191,739,511reportedsequences

●  (187,893,826,750basesfrom181,336,445reportedsequencesFeb2015)

●  Genbankflatfileformat

Page 13: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

GenbankIII

●  threemaindivisions:CoreNucleo)de,dbEST,dbGSS

●  QueryingoverEntrezNucleo)de●  interac)veBLASTanalysiswithusersequences

●  programma)caccessviaNCBIe-u)li)es

Page 14: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

Swissprot

●  officialname:UniProtKB/Swiss-Prot●  history

●  currentrelease:2016_04

●  548208sequenceentries●  (550960sequenceentries,195282524aminoacidsabstractedfrom235893referenceslastyear)

●  manuallyannotated

Page 15: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

Swissprot/UniprotII

●  manualannota)onprocess●  standardopera)onprocedure

●  controlledvocabularies

●  guidelines●  offeredservices:BLAST,Align,IDmapping

●  associatedservices

Page 16: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

OtherUniprotServices

●  TrEMBL●  Proteomes

●  UniRef

●  UniParc●  programma)csaccess

Page 17: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

PDB

●  History●  118,087structures,incl.115,169proteins

●  (108124structures,incl.100450proteinslastyear)

●  PDBformats

●  dataupload/valida)on

●  datadic)onaries

Page 18: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

PDBII

●  retrieval●  programma)caccess

●  visualiza)onwiththedifferentviews

●  fileformattransi)ons:pdbandmmcif

Page 19: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

SCOP/e

●  StructuralClassifica)onofProteins●  history,currentversionisSCOPe2.05

●  changesinSCOPe

●  access●  needed/recommendedaddi)onalso\ware

Page 20: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

PFAM

●  PFAM-  currentversionis29.0,December2015-  whatisisabout-  categories-  interac)veuse-  programma)caccess

Page 21: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

Prosite

●  Prosite-  currentversion20.125Apr5th,2016-  UniRuleformatandProRule-  access-  typicaluseandinterfaces

Page 22: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

PubMedanddiscussionforums

●  Whatisitfor●  Searchopportuni)es

●  Linkingtootherinforma)onsources

●  Searchstrategies●  Atourthroughvarousdiscussionforums

Page 23: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

FileFormats*

●  HighThroughputdata:-  BAM,SAM-  VCF

●  Newicktreefileformat

●  Genbank/EMBL●  PDB:mmCIF

*mostlyintegrated

Page 24: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

FileFormats

●  Equivalenceandtransforma)onsbetweendifferentformats

●  XMLformats●  RDFformats

Page 25: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

SQL

●  SQLbasics●  datatypes

●  tablecrea)onandmanipula)on

●  join●  select

Page 26: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

SQLII

●  keys●  indexes

●  performanceinfluenceofindexes

●  similaritysearchvssubstrings●  permissions

Page 27: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

SQLIII

●  transac)ons●  setup,administra)on,backup

●  programma)caccess

●  mySQL,postgreSQL

Page 28: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

SQLIV

●  generalhintsfordatabasedesign●  do’sanddon’ts

●  normaliza)onultralight

Page 29: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

NoSQL

●  defini)onsofNoSQL●  advantages/disadvantages

●  underlyingtheory

●  typicalusecases●  typesofNo-SQLdatabase

●  query(languages)

Page 30: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

NoSQLSystems

●  MongoDB●  CouchDB

●  Neo4J

●  programma)caccess

Page 31: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

(StoringFacts)*

●  triplestores●  datamodel

●  rdfrefresher

●  querylanguage:sparql●  examples

*op)onal,mightbedropped

Page 32: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

ProgrammingLibraries

●  roadshowofprogramminglibriariesdedicatedtobioinforma)cs:

●  bioperl●  biopython

●  bioJS

●  visualiza)on

Page 33: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

GraphicalUserInterfaces

●  principles●  interac)onmodes

●  modelling

●  interac)onmodes

Page 34: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

GraphicalUserInterfaces*

●  interac)veuserinterfaceswithJavaScript●  languagebasics

●  programmingmodel

●  client/servercommunica)onwithjson*tobeconfirmed

Page 35: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

JavaScript

●  librariesfordatavizializa)on/bioinforma)cs●  bioJS

●  D3

Page 36: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

Client/ServerModels

●  cgi●  Webservices

●  RemoteProcedureCalls/CORBA

●  securityconsidera)ons

Page 37: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

Authen)ca)on/Encryp)on

●  authen)ca)onmodels●  communica)onencry)on

●  data/resultencryp)on

●  legalprivacyissues●  dataaccessmodels

Page 38: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

WebServicesI

●  typesofwebservices●  webservicecomponents

●  integra)onofwebservicesinso\ware

Page 39: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

WebServicesII

●  clientsideinterfacestowebservices●  serversideinterfacestowebservices

●  Apacheconfigura)onforwebservices

●  requiredmodules●  configura)on

●  performance

Page 40: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

Bioinforma)csSuites

●  wheretofind●  installa)on/configura)on

●  workflowsystems:e.g.Taverna,....

●  EMBOSS,STADEN●  bio-.....

●  .....

Page 41: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

SelectedBioinforma)csSuites

●  Aquaria●  PredictProtein

●  ....

Page 42: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

SummaryI

●  aimofthismodule:-  shapetheconceptofabioinforma)csresource-  becomefamiliarwithsomeofthemostprominentexamplesoutthere

-  getintouchwiththeunderlyingtechnology-  gatherideasandexperiencehowtorealizeanewbioinforma)csresource

Page 43: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

SummaryII

●  handson(interac)on)experiencewithexis)ngexperience

●  backendtechnology,i.e.variousdatabasemodels

●  frontendtechnologytorealizetheUI/designra)onales

●  communica)onmodels

Page 44: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

Grading:

●  gradedbyawriuenexam90/100min●  scheduleddayxxxdependson:-  availableroom-  numberofpar)cipants

●  examadmission:noadmissionlimit●  withsufficientperformanceinthetwomarkedexercisesyoucanearnabonus

●  thebonusappliesonlyifyoupasstheexam

Page 45: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

Exercises

●  Explora)onofavailableresources●  simpletointermediateprogrammingtasks

●  publica)on/presenta)onofthetaskinweekx

●  solu)onsx+1

Page 46: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

ExercisesII

●  10exercisesheets●  workingroupsof2forthebonus

●  discussionwiththeaudience

Page 47: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

ExercisesIII

●  groupsfixedforthebonsu●  newsheetsarepublishedonFriday

●  submissionisdueonFridaymorningforallgroups

●  twoslotsforexercises

Page 48: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

Ques)ons&Answers

Page 49: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

ProgrammingExercises

●  wewillusePythonforourprogrammingexercises

●  scrip)nglanguage●  basicunderstandingofPythonshouldbesufficienttounderstandthepresentedcodesnippets

●  vividcommunityforsupportanddevelopment

Page 50: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

ProgrammingExercisesII

●  objectoriented●  goodintegra)onwithdatabasesystemsandwebaccess

●  goodintegra)onwithsophis)cateddataanalysistoolslike:numPy,sciPy,mathplotlib

●  BioPython

Page 51: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

Structureyourresearchwork

●  computa)onalbiologyisdatadriven●  resultsmauer->moreresultsmauermore

●  otherthane.g.so\waredevelopmentthereisnofinalreleaseversionandallpriorbugs/versionareabandoned

●  appropriatedocumenta)onoftheexperimentstoreconstructtheintermediatestepsisimportant,otherwiseyoumaywithresult01-result1000files

Page 52: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

OurpreferredSo\wareSetup

●  Anaconda●  iPythonnotebooks

Page 53: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

Anaconda

●  Pythondistribu)on(hups://www.con)nuum.io)●  cleverpacketmanager:conda

●  allowsacompleteinstalla)onincludingvariousconfigura)onnexttoeachotherintheuserspace

●  noprivilegesneeded●  yourhostsystemisnotmodified

●  workswithWindows,OSX,Linux

Page 54: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

Somesnippetsfromthecondacheatsheet

●  hup://conda.pydata.org/docs/_downloads/conda-cheatsheet.pdf

●  use“condacreate–nxxxbiopython”tocreateanewenvironmentxxxandinstallbiopython

●  use“(source)ac)vatexxx”toac)vatethisenvironmentinyourshell

●  allowsdifferentversionsofpythontobeinstalledatthesame)me

Page 55: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

iPython/Jupyther

●  hup://jupyter.org●  supportsmanydifferentlanguages,weuseitforpython

●  usecondatoinstallthepackage:condainstalljupyter

●  easystartofnotebook:jupyternotebook

Page 56: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

AdvantagesofaNotebook

●  allowsyouaseamlessintegra)onof:-  (rich)text-  (live)code-  visualiza)ons

●  )etogetheryouranalysisscript,theresultsandaninterpreta)on/discussion

●  youcanarchiveandsharethenotebookseasily

Page 57: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

Biopython●  hup://biopython.org●  ifinstalled:“importBio”loadsitinyourscriptstakenfromhup://biopython.org/wiki/Geyng_Started:

from Bio.Seq import Seq!#create a sequence object!my_seq = Seq('CATGTAGACTAG')!!#print out some details about it!print 'seq %s is %i bases long' % (my_seq, len(my_seq))!print 'reverse complement is %s' % my_seq.reverse_complement()!print 'protein translation is %s' % my_seq.translate()!

Page 58: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

Biopythonseq CATGTAGACTAG is 12 bases long!reverse complement is CTAGTCTACATG!protein translation is HVD*!!takenfromhup://biopython.org/wiki/SeqIO:from Bio import SeqIO!handle = open("example.fasta", "rU")!for record in SeqIO.parse(handle, "fasta") :! print record.id!handle.close()!!from Bio import SeqIO!record = SeqIO.read(open("single.fasta"), "fasta")!

Page 59: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

Biopython●  advantageofeasier/moreclearsyntaxthanPerl●  orientedtoBioPerl

●  supportsalotofcommonbioinforma)csfileformats

●  supportsaccesstoonlineserviceslikeNCBI,Expasy...

●  moreinterfacesforbioinforma)csso\ware

●  hup://biopython.org/DIST/docs/tutorial/Tutorial.html

Page 60: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

DedicatedDataStructures

●  sequence(Seq):besidethesequenceofresiduesitallowsalsotoprovideanAlphabetobject->kindoftypesafetyforDNAandproteinsequences

●  typicalfunc)onslikecomplement(),reverse_complement()!

Page 61: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

DedicatedDataStructures

●  parsingfunc)onsfordifferentsequenceformats●  parsingfunc)onsalignmentformatsknowaboutthedifferentcomponents

●  aswellasrespec)veoutputfunc)ons

●  differenttransla)ontables

●  variouspredefinedalphabets

Page 62: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

PythonBasics

●  hups://docs.python.org/2/tutorial/index.html●  goodinterac)vehandling,i.e.youcanevolveandevaluateyourcodedirectlyinpythonshell

●  lateryoucanincludeitinyourscript

●  basicdatatypes:-  numericaltypescomparabletoPerl,C,Java-  strings-  boolean

Page 63: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

SequenceTypes

●  supportseasycheckforanelement●  mutabletypes:List,Bytearray

●  immutable:String,Tuple

●  slicing:actonsubsetsnotonlyonsingleelements

Page 64: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

OtherCollec)onTypes

●  Set:everyelementexistsonlyonce●  Dic)onary:-  canstorekey/valuepairs-  keyhastobeimmutable(hashable)

●  allcollec)ontypessupportiterators

Page 65: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

ImportantSyntax●  whitespace(tabs,spaces)and:areusedtostructurethecodeinblocks,similarto{}inotherlanguages

●  sameindenta)on==sameblock

●  usualcontrolstructuresavailablefor w in words:! print w, len(w)!!# if you want to iterate by numbers you !# have to use range()!for i in range(len(a)):! print i, a[i]!!

Page 66: Bioinformacs Resources - Rostlab · 2016-04-22 · BioinfRes SoSe 16 Organizaon Lecture: Friday 9-12, i.e. 9.30-11.45 o’clock 10 - 15 min break in between Room 00.13.009A Exercise:

BioinfRes SoSe 16

ImportantSyntax●  Defini)onoffunc)ons:def fib(n): # write Fibonacci series up to n! """Print a Fibonacci series up to n.""”! a, b = 0, 1! while a < n:! print a,! a, b = b, a+b!

●  Argumentscanbepassedby:-  name-  posi)on

●  Argumentscanhavedefaultvalues->op)onalinthecall

●  Packageareloadedwiththeimportdirec)ve!

!