csc411- machine learning and data mining tutorial 10– march 23 th, 2007 university of toronto...
TRANSCRIPT
CSC411- Machine Learning and Data Mining
Tutorial 10– March 23th, 2007
University of Toronto (Mississauga Campus)
Case 1: In order to improve the business, a national-chain supermarket starts a project to keep track of their customers. Regular customers can collect points or receive discounts by using their store card on each purchase. Temporary customers who are not members to the store will be assigned to a same temporary store card. Now supermarket is hiring the data mining analyst to help them on this project.
Question: If you are the data mining analyst, how will you design the project and what data you need for the project?
Data Mining and Machine Learning Applications
Case 2: Researchers found that individuals have different responses or reactions to the same drug treatment. For example, two smokers have the same smoking history. One is detected to have lung cancer and the other one does not. Single Nucleotide Polymorphisms (SNPs) are an important resource to explain these phenomenons. One possible project is study the association between the SNPs and the DNA sequences.
Question: If you are the researcher, how will you design this project?
Data Mining and Machine Learning Applications
Cancer – Different Fates
This slide is copied from National Cancer Institute, Understanding cancel series: Genetic Variation (SNPs): http://www.nci.nih.gov/cancertopics/understandingcancer/geneticvariation
SNPs A SNPs B
SNPs C SNPs D
SNPs May Be the Solution
This slide is copied from National Cancer Institute, Understanding cancel series: Genetic Variation (SNPs):
http://www.nci.nih.gov/cancertopics/understandingcancer/geneticvariation
What Is Variation in the Genome?Common Sequence
Variations
Polymorphism
Deletions
Translocations
Insertions
Chromosome
This slide is copied from National Cancer Institute, Understanding cancel series: Genetic Variation (SNPs): http://www.nci.nih.gov/cancertopics/understandingcancer/geneticvariation
SNPs Are the Most CommonType of Variation
At least 1 percent of the populationMost of the population
Common sequence
G to C
SNP site
Variant sequence
This slide is copied from National Cancer Institute, Understanding cancel series: Genetic Variation (SNPs): http://www.nci.nih.gov/cancertopics/understandingcancer/geneticvariation
The Genome Contains Genes
Gene 2 Coding region Protein 2
Protein 1
Noncoding region
Noncoding region
Gene 1 Coding region
This slide is copied from National Cancer Institute, Understanding cancel series: Genetic Variation (SNPs): http://www.nci.nih.gov/cancertopics/understandingcancer/geneticvariation
Variation in the Human Genome
Person 1 Person 2
= Variations in DNAThis slide is copied from National Cancer Institute, Understanding cancel series: Genetic Variation (SNPs): http://www.nci.nih.gov/cancertopics/understandingcancer/geneticvariation
Variations Causing No Changes
= Variations in DNA that cause no changes
This slide is copied from National Cancer Institute, Understanding cancel series: Genetic Variation (SNPs): http://www.nci.nih.gov/cancertopics/understandingcancer/geneticvariation
Variations Causing Harmless Changes
= Variations in DNA that cause harmless changesThis slide is copied from National Cancer Institute, Understanding cancel series: Genetic Variation (SNPs): http://www.nci.nih.gov/cancertopics/understandingcancer/geneticvariation
Variations Causing Harmful Changes
= Variation in DNA that causes harmful change
No Disease
No Disease Hemophilia
This slide is copied from National Cancer Institute, Understanding cancel series: Genetic Variation (SNPs): http://www.nci.nih.gov/cancertopics/understandingcancer/geneticvariation
Variations Causing Latent Changes
Many years laterMany years later
= Variations in DNA that cause latent effects
This slide is copied from National Cancer Institute, Understanding cancel series: Genetic Variation (SNPs): http://www.nci.nih.gov/cancertopics/understandingcancer/geneticvariation