assignment 11 identify expression-modulating variants

16
Assignment 11 Identify Expression-Modulating Variants Bio5488 4/7/17

Upload: others

Post on 16-Feb-2022

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Assignment 11 Identify Expression-Modulating Variants

Assignment 11 Identify Expression-Modulating

Variants

Bio5488

4/7/17

Page 2: Assignment 11 Identify Expression-Modulating Variants

Link genotype to phenotype?

QTL: DDNA -?-> Dphenotype

eQTL: DDNA -?-> DRNA

MacLellan W. R. Nat. Rev. Card. (2012)

Page 3: Assignment 11 Identify Expression-Modulating Variants

Expression quantitative trait loci (eQTLs)

• Definition: region w/ DNA sequence variantsinfluence expression level of 1/more genes

• Data collection for eQTL analysis:• Genotype: SNP microarray / Whole genome sequencing

• Expression: Expression microarray / RNA sequencing

• Statistical test• 1 Group individual by allele they carry

• 2 Compare difference in expression between groups

Page 4: Assignment 11 Identify Expression-Modulating Variants

Expression quantitative trait loci (eQTLs)

Albert F. W. Nat. Rev. Gen. (2015)

- The logarithm of the odds (LOD) score: strength of mRNA-genotype association.

- When mRNA[Allele1] >> mRNA[Allele2] (Or <<, by significance)The LOD score is high and the region is called an eQTL

Page 5: Assignment 11 Identify Expression-Modulating Variants

Classification & Interpretation

• eQTLs can have local / distant effect

• Possible causal relationships:

MacLellan W. R. Nat. Rev. Card. (2012)

Page 6: Assignment 11 Identify Expression-Modulating Variants

From loci to variant(s)

http://www-personal.umich.edu/~xwen/geuvadis/fm_rst_html

Page 7: Assignment 11 Identify Expression-Modulating Variants

MPRA: high throughput direct identification

Tewhey. R. Cell. (2016)

variant_to_barcode.txt

pDNA.fq

cDNA.fq

Page 8: Assignment 11 Identify Expression-Modulating Variants

Assignment 11: What to do?

• Filter out variants with insufficient barcode labeling

• Count # barcodes in both pDNA and cDNA (only count

barcodes matched perfectly)

• Assign barcode counts back to variants

• Calculate expression level

• Compare the difference between reference and

alternative alleles

• Pick out the potential causal variant(s)

Page 9: Assignment 11 Identify Expression-Modulating Variants

Requirements

• Due Friday (4/7/17) before class

• Your submission folder should contain:• Scripts: filter_variants.py, count_barcodes.py,

analyze_MPRA.py

• Output files: filtered_variant_to_barcode.txt, pDNA_count.txt, cDNA_count.txt, variant_fold_change.txt

4/14/17

Page 10: Assignment 11 Identify Expression-Modulating Variants

Unix Basics II

Page 11: Assignment 11 Identify Expression-Modulating Variants

Count newline, word, and byte(wc)

• wc counts line, byte, character, word… in a file$ cat B.txt

Apple Red

Banana Yellow

Kiwi Green

$ wc -l B.txt

3 Count by line

Page 12: Assignment 11 Identify Expression-Modulating Variants

Combine two files with one shared column(join)

• When you have two files each with one columnstoring same set of values, e.g.:

$ cat A.txt $ cat B.txtApple Red Apple Crispy

Banana Yellow Banana Sweet

Kiwi Green Kiwi Sour

Orange Orange

• join will combine all columns from 2 files together$ join -1 1 -2 1 -t $’\t’ A.txt B.txtApple Red Crispy

Banana Yellow Sweet

Kiwi Green Sour Tells join to use tabas delimiter

Page 13: Assignment 11 Identify Expression-Modulating Variants

Replace or remove specific characters(tr)

• By default, output from join is delimited by space:$ join -1 1 -2 1 A.txt B.txtApple Red Crispy

Banana Yellow Sweet

Kiwi Green Sour

• tr replaces the 1st set in input with the 2nd set$ join -1 1 -2 1 A.txt B.txt | tr ‘ ’ ‘\t’

Apple Red Crispy

Banana Yellow Sweet

Kiwi Green Sour

This looks ugly 😢

Replace spacewith tab

Pipe

Page 14: Assignment 11 Identify Expression-Modulating Variants

Pass output for another command as input(“|”)

• Pipe: Pass the output of one command to another for further processing• What will the code below provide?

$ join -1 1 -2 1 A.txt B.txt | head -n1 | tr ‘ ’ ’\n’ | wc -l

3

(number of columns in your file)

Page 15: Assignment 11 Identify Expression-Modulating Variants

Print the lines in sorted order(sort)

• Sorting all lines based on sort key extracted from each line• -k: by column of • -n: rank by number• -r: in reverse order

• Say if we want to sort the joined file by the last column

$ join -1 1 -2 1 -t $‘\t’ A.txt B.txt | sort -k3,3Apple Red CrispyKiwi Green SourBanana Yellow Sweet

Hint: join requires the two files to be combined sorted by the column with values to joinBUT HOW???

Page 16: Assignment 11 Identify Expression-Modulating Variants