spaghetti code, soupy logic jim kent - university of california santa cruz steaming fresh modules in...

Click here to load reader

Post on 21-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

  • Slide 1
  • Spaghetti Code, Soupy Logic Jim Kent - University of California Santa Cruz Steaming fresh modules in sourceforge.net Combinatorical assembly of transcription factors in cell.
  • Slide 2
  • A Challenge Every Speaker Faces: Who is the audience? Bioinformaticians: Biologists with bigger, better databases? Geeks trading bits for bases? Leading edge interdisciplinary super scientists?
  • Slide 3
  • Top 5 Reasons Biologists Go Into Bioinformatics 5 - Microscopes and biochemistry are so 20th century.
  • Slide 4
  • Top 5 Reasons Biologists Go Into Bioinformatics 5 - Microscopes and biochemistry are so 20th century. 4 - Got started purifying proteins, but it turns out the cold room is really COLD.
  • Slide 5
  • Top 5 Reasons Biologists Go Into Bioinformatics 5 - Microscopes and biochemistry are so 20th century. 4 - Got started purifying proteins, but it turns out the cold room is really COLD. 3 - After 23 years of school wanted to make MORE than $23,000/year as a postdoc.
  • Slide 6
  • Top 5 Reasons Biologists Go Into Bioinformatics 5 - Microscopes and biochemistry are so 20th century. 4 - Got started purifying proteins, but it turns out the cold room is really COLD. 3 - After 23 years of school wanted to make MORE than $23,000/year as a postdoc. 2 - Like to swear, @ttracted to $_ Perl #!!
  • Slide 7
  • Top 5 Reasons Biologists Go Into Bioinformatics 5 - Microscopes and biochemistry are so 20th century. 4 - Got started purifying proteins, but it turns out the cold room is really COLD. 3 - After 23 years of school wanted to make MORE than $23,000/year as a postdoc. 2 - Like to swear, @ttracted to $_ Perl #!! 1 - Getting carpel tunnel from pipetting
  • Slide 8
  • Top 5 Reasons Computer People go into Bioinformatics 5 - Bio courses actually have some females.
  • Slide 9
  • Top 5 Reasons Computer People go into Bioinformatics 5 - Bio courses actually have some females. 4 - Human genome more stable than Windows XP
  • Slide 10
  • Top 5 Reasons Computer People go into Bioinformatics 5 - Bio courses actually have some females. 4 - Human genome more stable than Windows XP 3 - Having mastered binary trees, quad trees, and parse trees ready for phylogenic trees.
  • Slide 11
  • Top 5 Reasons Computer People go into Bioinformatics 5 - Bio courses actually have some females. 4 - Human genome more stable than Windows XP 3 - Having mastered binary trees, quad trees, and parse trees ready for phylogenic trees. 2 - Missing heady froth of the internet bubble.
  • Slide 12
  • Top 5 Reasons Computer People go into Bioinformatics 5 - Bio courses actually have some females. 4 - Human genome more stable than Windows XP 3 - Having mastered binary trees, quad trees, and parse trees ready for phylogenic trees. 2 - Missing heady froth of the internet bubble. 1 - Must augment humanity to defeat evil artificial intelligent robots.
  • Slide 13
  • The Paradox of Genomics How does a long, static, one dimensional string of DNA turn into the remarkably complex, dynamic, and three dimensional human body? GTTTGCCATCTTTTG CTGCTCTAGGGAATC CAGCAGCTGTCACCA TGTAAACAAGCCCAG GCTAGACCAGTTACC CTCATCATCTTAGCT GATAGCCAGCCAGCC ACCACAGGCATGAGT
  • Slide 14
  • The Analogy of the Code of Life DNA is popularly considered the code of life. Computer programs are complex systems that ultimately are built up of 0s and 1s, perhaps they are a model for a genome built of A,C,G and T? BUT. Human genome lacks documentation, has accumulated 3 billion years of cruft, and does not believe in local variables. Therefore we must look to less than straightforward software programs as guides.
  • Slide 15 {'_orb'} = $orb; $self->{'_rootpoa'} = $root_poa; return $self; }">
  • Bioperl CORBA module sub new { my ( $class, @args) = @_; my $self = $class->SUPER::new(@args); my ( $idl, $ior, $orbname ) = $self->_rearrange( [ qw(IDL IOR ORBNAME)], @args); $self->{'_ior'} = $ior || 'biocorba.ior'; $self->{'_idl'} = $idl || $ENV{BIOCORBAIDL} || 'biocorba.idl'; $self->{'_orbname'} = $orbname || 'orbit-local-orb'; $CORBA::ORBit::IDL_PATH = $self->{'_idl'}; my $orb = CORBA::ORB_init($orbname); my $root_poa = $orb->resolve_initial_references("RootPOA"); $self->{'_orb'} = $orb; $self->{'_rootpoa'} = $root_poa; return $self; }
  • Slide 16 3)+1);setitimer(0,&t,0);f&&printf("\e[10;%u]",g+24);}f&&putchar(7);s+=(9-w[21] )*((g>>3)+1);o=p;m(x);m(w);(n=rand())&255||--*w||++*w;if(!(**P&&P++||n&7936)){ while(abs((X=rand()%76)-*x+2)-*w 100,000 hits by > 5000 scientists each day. Involves 570,000 lines of C code, bits of awk, perl, ">
  • A Big Bioinformatics Web Site genome.ucsc.edu gets > 100,000 hits by > 5000 scientists each day. Involves 570,000 lines of C code, bits of awk, perl, bash, tcsh, java, r and tcl. 1200 CPUs and 12 Terabytes of disk 12 full time staff, 18 part time, grad student and post-doc.
  • Slide 58
  • Site Architecture 8 web servers running Apache and MySQL CGIs written in C access genome data and user interface settings in MySQL. Genome database is bottleneck, and is replicated on each server. Cluster of 1000 CPUs, and smaller clusters of faster CPUs create annotation files which are loaded into database.
  • Slide 59
  • Site Sociology 1/3 of group telecommutes. Thursdays are devoted to reading and testing each others code and if necessary a one or two hour meeting. We develop very incrementally, and do a new release once a week. 1/4 of group is dedicated to quality assurance, Im wanting to increase this to 1/3. User support is shared by everyone.
  • Slide 60
  • Parasol and Kilo Cluster UCSC cluster has 1000 CPUs running Linux 1,000,000 BLASTZ jobs in 25 hours for mouse/human alignment We wrote Parasol job scheduler to keep up. Very fast and free. Jobs are organized into batches. Error checking at job and at batch level.
  • Slide 61
  • Conclusions Spaghetti code is not so helpful in understanding the genome. Human genome suggests that trial and error development is likely to yield a robust version of windows within 3 billion years. Understanding the flow of control in the genome is a problem that fascinates biologists and computer scientists alike.
  • Slide 62
  • Further Acknowledgements Individuals Institutions NHGRI, The Wellcome Trust, HHMI, NCI, Taxpayers in the US and worldwide. Baylor, Sanger, Wash U, Whitehead, Stanford, JGI/ DOE, Vancouver GSC, UW and the international sequencing centers. UCSC, NCBI, EBI, Ensembl, Genoscope, MGC, Intel, TIGR, Jackson Labs, Affymetrix, SwissProt. Chuck Sugnet, Angie Hinrichs, Fan Hsu, Terry Furey, Heather Trumbower, Kate Rosenbloom, Hiram Clawson, Brian Raney, Rachel Harte, Bob Kuhn, Mathieu Blanchette, Donna Karolchik, David Haussler John Sulston, Richard Gibbs, Eric Lander, Francis Collins, Roderic Guigo, Michael Brent, Olivier Jaillon, David Kulp, Victor Solovyev, Ewan Birney, Greg Schuler, Deanna Church, Scott Schwartz, Ross Hardison, and everyone else!
  • Slide 63
  • THE END