the magic fit - helix.bio.uci.eduhelix.bio.uci.edu/academic/grad/gk-12/documents/092708-shah... ·...

13
The Magic Fit Goals: 1. Introduction to NCBI’s resources and educational tools. 2. Use NCBI Entrez to search for a gene sequence or a protein sequence. 3. Use NCBI Blast to search for homologous sequences. 4. Use PDB to download protein coordinates. 5. Use Swiss-PDB viewer to analyze protein structures. 6. Use Swiss-PDB viewer to overlay related structures. California Grade Seven Science Content Standards Genetics 2d. Students know plant and animal cells contain many thousands of different genes. 2e. Students know DNA is the genetic material of living organisms. Evolution 3a. Students know genetic variation [is a] cause of evolution and diversity of organisms. Investigation and Experimentation 7a. Select and use appropriate tools and technology (computers). 7b. Use a variety of print and electronic resources (including the World Wide Web) to collect information and evidence. 7c. Communicate the logical connection among science concepts, data collected and conclusions drawn. 7d. Construct appropriately labeled diagrams to communicate scientific knowledge. 7e. Communicate the steps and results from an investigation in written reports and oral presentations. California Grades 9 to 12 Biology and Life Sciences Content Standards Kandarp Shah (UCI GK-12) Page 1 3/11/2022

Upload: haliem

Post on 07-Jun-2019

214 views

Category:

Documents


0 download

TRANSCRIPT

The Magic Fit

The Magic Fit

Goals:

1. Introduction to NCBIs resources and educational tools.

2. Use NCBI Entrez to search for a gene sequence or a protein sequence.

3. Use NCBI Blast to search for homologous sequences.

4. Use PDB to download protein coordinates.

5. Use Swiss-PDB viewer to analyze protein structures.

6. Use Swiss-PDB viewer to overlay related structures.

California Grade Seven Science Content Standards

Genetics

2d. Students know plant and animal cells contain many thousands of different genes.

2e. Students know DNA is the genetic material of living organisms.

Evolution

3a. Students know genetic variation [is a] cause of evolution and diversity of organisms.

Investigation and Experimentation

7a. Select and use appropriate tools and technology (computers).

7b. Use a variety of print and electronic resources (including the World Wide Web) to collect information and evidence.

7c. Communicate the logical connection among science concepts, data collected and conclusions drawn.

7d. Construct appropriately labeled diagrams to communicate scientific knowledge.

7e. Communicate the steps and results from an investigation in written reports and oral presentations.

California Grades 9 to 12 Biology and Life Sciences Content Standards

Cell Biology

1d. Students know the central dogma of molecular biology.

1h. Students know most macromolecules are synthesized from precursors.

4. Genes are a set of instructions encoded in the DNA sequence of each organism that specify the sequence of amino acids in proteins.

Evolution

8f.Students know how to use DNA or protein sequence comparisons to show probable evolutionary relationships.

Note: Make a folder on your desktop to save your search results and files during this activity.

Search for a MAP kinase gene sequence, FUS3 from Saccharomyces cerevisiae, using Entrez.

1. Go to http://www.ncbi.nlm.nih.gov.

2. Click on All Databases link at the top.

3. Enter Fus3 in the search box and click on Go.

You will see a number on the left side of each database indicating the number of hits for the search. We will use the Nucleotide database to search for the FUS3 gene sequence.

4. Click on Nucleotide.

You will see a selection of 20 hits/results listed on the page from a total of 78 items. More results are listed on adjacent pages.

You will also see a list of organisms related to the hits on the top right side of the page.

At the top of the page before the hits, there is a list of three genes related to our search from NCBI Gene. Our gene for FUS3 from Saccharomyces cerevisiae is the second one on the list. However, lets try to refine our results to reduce the number of hits.

5. Click on Limits tab. Select Title under Fields and Genomic DNA/RNA under molecule. Click on Go. The number of hits should go down to 7.

6. Click on History tab. You will see a list of two searches we have done so far under Most Recent Queries.

7. In the search box, clear any text and enter the following text (do not hit Go at this point):Saccharomyces cerevisiae[Organism]

Saccharomyces cerevisiae is the name of our organism (bakers yeast). The quotations tell NCBI to look for the words together as one unit. The word organism in brackets tells NCBI to limit the query to only that organism.

8. Left click the number for the latest search result (Search Fus3 Field:Title Limits:Genomic DNA/RNA) at the top of the list in Most Recent Queries.

9. In the menu that appears, click on AND. You will see AND (#2) added to the search box after your text.

Selecting AND tells NCBI to combine the search results for the new text query in the search box and the query you select from the history and only return hits that meet both criteria.

10. Click on Go. You should see only a single result.

We have now refined our query to give us exactly what we wanted. The result is the coding sequence for FUS3 without upstream/downstream regions or introns.

11. Click on M31132 (accession number) for the result to browse for further information.

Try the options in the Display and Show drop down menus to see the possibilities.

Further down on the page, you will see information such as gi number, some links, the locus for the gene, number of base pairs, organism, authors and publication reference including link to PubMed, related protein sequence and finally, the coding sequence (CDS) for the gene.

An important step in the analysis of genome information is deciphering the complete coding potential or protein coding sequence (CDS) region of each gene. CDS is a sequence of nucleotides that corresponds with the sequence of amino acids in a protein. A typical CDS starts with ATG and ends with a stop codon. CDS can be a subset of an open reading frame (ORF) [1].

12. Click back on your browser to return to the results page. Copy the sequence identification number in the first line (gi|171532). We will use this for the BLAST search.

Note: You can also get the GI number from the gene information page but the format is not correct to use in a BLAST search.

Performing a BLAST search

BLAST = Basic Local Alignment Search Tool

1. Go to NCBI homepage. Click on BLAST at the top.

2. Under Basic BLAST, select nucleotide blast.

3. In the box labeled Enter Query Sequence/ Enter accession number, gi, or FASTA sequence, paste the GI number you copied for the gene or enter it manually (See figure below).

4. Enter a name for the job. You can keep the default name or assign your own.

5. Choose Human Genomic + Transcript for the database.

We want to find related genes/transcripts in the Human Genome/Transcriptome.

6. Under Program Selection, select More dissimilar sequences (discontiguous megablast).

Food for thought: Selecting Highly similar sequences for program selection will give you zero results for the search. Message on screen reads No significant similarity found. Why? (Answer is at the end of this guided tour on the last page).

7. Click on BLAST.

8. You will see a page similar to below during the search.

9. And finally results with sequence alignments. MAPK1 is the gene we are interested in.

Search for Fus3 protein sequence using Entrez and BLAST human proteome for similar proteins.

1. Use the steps above to search for the Fus3 protein sequence.

2. BLAST it (GI|536007) against human protein database (protein blast) to search for similar proteins. Enter Homo Sapiens under Organism to restrict the search to human proteins. You should find MAPK1 (also known as Erk2) as the top hit.

Take a look at the sequence alignment for Fus3 and Erk2 on the results page.From the BLAST results and sequence alignment we know that 50% sequence identity and 68% sequence similarity. Lets download the protein coordinates for Erk2 and Fus3 from PDB and look at the structures.

Downloading PDB files for Fus3 and Erk2

You can do this by searching for Fus3 under Structure on Entrez or directly from PDB website. We will use the RCSB Protein Data Bank (PDB) database.

RCSB = Research Collaboratory for Structural Bioinformatics (RCSB); more information at http://home.rcsb.org/.

1. Got to http://www.pdb.org.

2. Enter Fus3 in the search box and click on Site Search.

You will see 7 structure hits for your search. We will look at Crystal structure of non-phosphorylated Fus3 at the bottom of the page (PDB ID: 2b9f).

Each structure in the PDB is represented by a 4 character identifier of the form [0-9][a-z,0-9][a-z,0-9][a-z,0-9]. For example, 4HHB, 9INS are identification codes for PDB entries for hemoglobin and insulin. Many of PDB WWW pages, including the PDB home page, allow you to enter a PDB ID and retrieve information for the corresponding structure. Historically, 30% of queries to the PDB sites are of this type [2].

3. Click on 2b9f.

Browse the page to look at the available information such as title, author information, date the structure information was deposited in the database, experimental method, molecule, source and related structures.

4. Click on Display Files in the left panel and then click on PDB File to open the file.

Take a look at the contents of a typical PDB file. The left column identifies the type of information in the right column. PDB files contain a Header, Title, Compound information (protein name, source, experimental data gathering technique), author, journal, remarsk, etc. The last part is the list of 3-D coordinates for each atom in the protein and related heteroatoms such as from water.

5. Save the file in your folder on your desktop. You can save the file from this page by using Save As or you can go back and use Download Files feature on the page for 2b9f. Note the location of the file on your desktop.

Download the PDB file for Erk2 using the steps above. You will find only one structure that is not in a complex: Structure of Signal-Regulated Kinase (1erk) from Rat. This structure is fine for our purpose.

Looking at the 3-D structures of Fus3 and Erk2

Download and extract Swiss-PDB Viewer DeepView into your folder on the desktop from http://spdbv.vital-it.ch/. The download link is in the left panel. Once extracted, the viewer is ready for use.

1. Double-click on spdbv application to open the viewer.

2. Select the File menu and Open PDB File to open the pdb file for Fus3 (2b9f.pdb) from your folder on the desktop.

Take a look at the 3-D structure of Fus3. Select Display menu and Render in 3D and Render in Solid 3D. Try different color options under the Color menu (suggestion: Color ( act on ribbon AND Color ( Secondary Structure). Try the features in the Control Panel (under Wind menu): Compare left clicks vs. right clicks under different columns. Left-click selects individual amino acid residues, Right-click selects all. Right click under the columns labeled show, side and label to see what happens. I find it easier to look at the backbone structure without the sidechains.

3. Select the File menu and Open PDB File to open the pdb file for Erk2 (1erk.pdb) from your folder on the desktop.

Now that you have two structures open, control panel needs to know which structure you ate working with at the moment. Check or uncheck the box for Visible to see or hide a structure at the top of the control panel.

Left-click the name 1erk to select the proper structure.

4. Remove sidechains/labels from the view. Render in 3D and Solid 3D.

5. Select Fit menu ( Magic Fit ( CA only ( Layers 2b9f and 1erk ( OK.

Watch the structures align in space. Center the structures on screen using button under the file menu. Compare the two structures for similarity.

Answer for BLAST search:

Highly similar sequence feature does not work for our blast search because yeast has relatively less number of introns compared to the human genome. The introns in the sequence makes the sequences dissimilar. Selecting this feature makes the search highly stringent. On the other hand selecting more dissimilar feature allows for discontinuity in the sequences (discontiguous sequences).

References:

1. Furuno M et al. CDS annotation in full-length cDNA sequence. Genome Res. June 2003. [PMID: 12819146]

2. http://www.rcsb.org/robohelp_f/#site_navigation/introduction_to_site_navigation.htm

OR

Help link from PDB website.

Kandarp Shah (UCI GK-12)Page 33/5/2009