the (new) table browser. talk outline table browser history new table browser features new table...

12
The (new) Table Browser

Upload: marcus-elwood

Post on 31-Mar-2015

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: The (new) Table Browser. Talk Outline Table Browser History New Table Browser Features New Table Browser Implementation –all.joiner &.as files –Overall

The (new) Table Browser

Page 2: The (new) Table Browser. Talk Outline Table Browser History New Table Browser Features New Table Browser Implementation –all.joiner &.as files –Overall

Talk Outline

• Table Browser History• New Table Browser Features• New Table Browser Implementation

– all.joiner & .as files– Overall control and data flow– Joining and intersection modules

• Limits and future directions

Page 3: The (new) Table Browser. Talk Outline Table Browser History New Table Browser Features New Table Browser Implementation –all.joiner &.as files –Overall

Table Browser History• Goal - annotations over a particular region of

genome in text rather than graphic format• Krish - did first successful implementation -

separated tables into positional and non-positional, merged chrN_ tables, split off hgFind.

• Angie - added sequence output, filters, intersections, and many help pages.

• These versions of the table browser were called hgText

Page 4: The (new) Table Browser. Talk Outline Table Browser History New Table Browser Features New Table Browser Implementation –all.joiner &.as files –Overall

Why a New Table Browser• hgText is powerful, but much of the power

is not obvious in the first page.

• In hgText the association between tracks and tables was not clear.

• No way to join fields across related tables.

Page 5: The (new) Table Browser. Talk Outline Table Browser History New Table Browser Features New Table Browser Implementation –all.joiner &.as files –Overall

New Table Browser• Flip to demoing new table browser online.

– Show overall controls– Demo getting genome position, common name, and review

status for refSeq on ENCODE.– Demo getting alt-splice varients with knownCanonical and

knownIsoforms– Demo custom track created from filtered cpgIslands (>= 500

bases >= 0.9 Exp/Obs)– Intersect custom fat cpg track with most conserved, requiring

75% overlap, output as custom track– Intersect conserved fat cpg with exonophy, requiring <= 5%

overlap, output as hyperlink (custom track output crashes!)

Page 6: The (new) Table Browser. Talk Outline Table Browser History New Table Browser Features New Table Browser Implementation –all.joiner &.as files –Overall

New Table Browser Implementation

• Built using:– AutoSql .as files to describe table fields– all.joiner file to describe table relationships– .bed based intersection and sequence output

code from old table browser– About 8000 lines of new C code in 19 .c files in

src/hg/hgTables

Page 7: The (new) Table Browser. Talk Outline Table Browser History New Table Browser Features New Table Browser Implementation –all.joiner &.as files –Overall

Data Flow• Each region (piece of a chromosome) processed

separately• Filter is turned into a SQL where clause• Field oriented output, especially selected tables is

handled by one branch of code.– SQL rows -> joining routines -> output

• GFF, Custom Track, Sequence, Hyperlink, and Summary Stats outputs handled by a branch of code that turns things into BED format internally:– SQL rows -> BED -> intersecting -> output

• Need to merge fields & BEDs to get joining and intersecting to happen at the same time ultimately.

Page 8: The (new) Table Browser. Talk Outline Table Browser History New Table Browser Features New Table Browser Implementation –all.joiner &.as files –Overall

Joining Code• Use all.joiner to find out route from primary table to

other tables in join.• Construct SQL query for each table that applies table

filters and region and includes key fields even if not part of final output.

• Construct a row object (array of lists) for each row returned on primary table.

• Construct a hash keyed by joining field of primary table, with row objects as values.

• Execute SQL query for next table, and when keys match add info to row object.

• Repeat with third and subsequent tables if any.

Page 9: The (new) Table Browser. Talk Outline Table Browser History New Table Browser Features New Table Browser Implementation –all.joiner &.as files –Overall

Limits/Features of Joining Code• Unless a filter is applied, non-positional tables

will be scanned completely. This takes 3 minutes for gbCdnaInfo. (Hint, add filter type=mRNA)

• Joining code only applied to field oriented output.• Will handle joins across split tables.• Can chop of prefixes and suffixes on a key field

before joining if specified in all.joiner. (Needed for chopping off version number in some Ensembl tables for instance)

• Avoids combinatorical explosion of output rows by allowing fields to contain lists.

Page 10: The (new) Table Browser. Talk Outline Table Browser History New Table Browser Features New Table Browser Implementation –all.joiner &.as files –Overall

Intersecting Code• Primarily inherited from hgText.• Uses hTableInfo (call in hg/lib/hdb.c) which

reports which fields in database store chromosome, start, end, etc.

• Analyses hTableInfo to figure out how many fields in corresponding BED structure, and how to query database and massage output to get a BED.

• Converts second table in intersection into a bitmap.

• Counts up number of bases in bitmap that intersect each bed item in first table.

• (For pure bitwise operations converts first table to bitmap too.)

Page 11: The (new) Table Browser. Talk Outline Table Browser History New Table Browser Features New Table Browser Implementation –all.joiner &.as files –Overall

Limits and Features of Intersections

• Not applied to field or MAF output.

• Information is lost in converting to BED.

• Does allow intersection code for sequence, GFF, custom track, BED, statistics, and hyperlinks output to go through same path.

Page 12: The (new) Table Browser. Talk Outline Table Browser History New Table Browser Features New Table Browser Implementation –all.joiner &.as files –Overall

Future Directions

• Make a combined BED/Row structure to bring together intersections and joining.

• Polish sequence output in some places.• Get .as file info for all tables.• Encourage people to pay a little more attention to

database concerns as well as genome browser concerns when designing tables.

• See if can phase out split tables by tuning MySQL aggressively.