bits training - ucsc genome browser - part 2
DESCRIPTION
These is the second part of the lecture slides of the BITS bioinformatics training session on the UCSC Genome Browser. See http://www.bits.vib.be/index.php?option=com_content&view=article&id=17203990:orange-genome-browsers-ucsc-training&catid=81:training-pages&Itemid=190TRANSCRIPT
Paco Hulpiau
UCSCgenome browsing
http://www.bits.vib.be
TABLE BROWSER
GET DNA
CLICK LINE
CURRENT BROWSER GRAPHIC IN PDF
TO GET OTHER DATA
CLICK LINE
TO GET OTHER DATA2
Databases & accession numbers
GenBank exchanges data daily with its two partners in the International
Nucleotide Sequence Database Collaboration (INSDC):
European Bioinformatics Institute (EBI, part of EMBL)
DNA Data Bank of Japan (DDBJ) Characteristics of GenBank and RefSeq @ NCBI :
The Ensembl automatic gene annotation system (Curwen et al, 2004) :
The gene-building system enables fast automated annotation of
eukaryotic genomes. It annotates genes based on evidence derived from
known protein, cDNA, and EST sequences
incl. GenBank sequences shared by INSDC, UniProtKB and NCBI
RefSeq
Databases & accession numbers
Databases & accession numbers
CLICK LINE
TO GET OTHER DATA2
zoom in on exon 1 + upstream
Exercises (II)
1) Are there any diseases related to your gene of interest?
(OMIM)
Which interactions partners are known? (Entrez Gene)
Any important SNPs changing the amino acid sequence?
Get the multiple sequence alignment (MSA, multiz46way)
showing the nucleotide sequences of human, mouse, chicken, Xenopus
and zebrafish genes (CDS fasta alignment, exons not separate).
Save your results (e.g. exercises2_1.doc).
TO GET OTHER DATA
GET DNA 3
http://www.visibone.com/colorlab/
Exercises (II)
1) Get the DNA sequence for your gene of interest
including 2000 base pairs upstream and
use the following extended case/color options:
» RefSeq and Ensembl genes in bold
» SNPs (132) underlined
» Regulatory information e.g. from Oreganno and miRNA sites
in different colors
» Save your results (e.g. exercises2_2a.doc).
Exercises (II)
1) Try to get the DNA sequence for your gene of interest
in chicken or zebrafish and
use the following extended case/color options:
» UCSC, RefSeq and Ensembl genes in bold
» Other RefSeq genes underlined
» Human proteins in a specific color
» Save your results (e.g. exercises2_2b.doc).
TABLE BROWSER4
TO GET OTHER DATA
COPY (Ctrl+C)
= Accession Number (RefSeq) e.g. NM_001229
= Gene Name (Entrez) e.g. CASP1
Exercises (II)
1) Get a list of the RefSeq and Ensembl transcripts using the table
browser with the following selected fields:
» name, chromosome, exon count, name2
» Save the results (exercises2_3a.xls)
Also get the sequences and save as genename_transcripts.fasta
Search the mouse genome using the filter in the table browser
to get all family members of a protein family (research interest)
and save the results in a list (exercises2_3b.xls) containing name,
chromosome, cds start and end, exon count and name2
TO GET OTHER DATA
TO GET OTHER DATA
BLAT = Blast-Like Alignment Tool search for high similarity matches by indexing entire
genome DNA limit = 25000 bases, for multiple seqs 50000 bases protein limit = 10000 aa, for multiple seqs 25000 aa total sequences = 25
PASTE (Ctrl+V)
TTTAGCCAACGAACAGTCGCT TTCTCTTTGCATCTGTCCCAG
The Utilities page contains links to some tools
created by the UCSC Genome Bioinformatics
Group.
DNA Duster & Protein Duster remove non-sequence
related characters from an input sequence.
Exercises (II)
1) Use BLAT to find orthologs of your gene in chicken, zebrafish
and fruit fly. What is the genomic location?
Are the flanking genes the same?
Perform an in silico PCR to see what happens when more than 1
PCR product may arise and determine product size and Tm:
species: human
forward primer: TTC AAG GAG GCC TTC TCC CT
reverse primer: CTG GGG GAG AAG CTG A (+click flip reverse)