joanna szyda magdalena frĄszczak -...
Post on 28-Feb-2019
213 Views
Preview:
TRANSCRIPT
INTRODUCTION
1. Why statistical methods ???
2. The Biostatistic Group – current projects
3. Course contents
4. Contact
Copyright ©2018 Joanna Szyda
WHY STATISTICAL METHODS ???
ASHG 2011 Writing Workshop; Albertine ©2011 / Copyright ©2018, Joanna Szyda
“…science is not data. Data are the raw material of science. It is what you do with the data that is science – the interpretation you
make, the story you tell.”
STATISTICAL METHODS – SNP
N = 19 778 743 834
[Header] BSGT Version 3.2.32 Processing Date 11/24/2008 10:14 AM Content BovineSNP50_A.bpm Num SNPs 54001 Total SNPs 54001 Num Samples 32 Total Samples 2636 [Data]
SNP Name Sample ID GC Score SNP Index Allele1 - AB
Allele2 - AB Chr Position GT Score
ARS-BFGL-BAC-10172 4408169492_K 0.883 1 B B 14 4736993 0.849 ARS-BFGL-BAC-1020 4408169492_K 0.899 2 B B 14 6339014 0.8626 ARS-BFGL-BAC-10245 4408169492_K 0.6582 3 B B 14 30073020 0.71 ARS-BFGL-BAC-10345 4408169492_K 0.9092 4 A B 14 4497877 0.8721 ARS-BFGL-BAC-10365 4408169492_K 0.8021 5 B B 14 25140301 0.833 ARS-BFGL-BAC-10375 4408169492_K 0.8858 6 A B 14 4983527 0.8513 ARS-BFGL-BAC-10591 4408169492_K 0.867 7 A B 14 15446975 0.8363 ARS-BFGL-BAC-10793 4408169492_K 0.8722 8 B B 14 27452258 0.8403 ARS-BFGL-BAC-10867 4408169492_K 0.9316 9 A B 14 32700054 0.8949 ARS-BFGL-BAC-10919 4408169492_K 0.7805 10 A B 14 29520816 0.778 ARS-BFGL-BAC-10952 4408169492_K 0.9314 11 B B 10 19315327 0.8947 ARS-BFGL-BAC-10960 4408169492_K 0.6543 12 B B 10 21056606 0.7079 ARS-BFGL-BAC-10975 4408169492_K 0.8622 13 A B 10 21682679 0.8358 ARS-BFGL-BAC-10986 4408169492_K 0.8687 14 A B 10 25897020 0.8376 ARS-BFGL-BAC-10993 4408169492_K 0.8146 15 A B 10 80403647 0.7993 ARS-BFGL-BAC-11000 4408169492_K 0.9135 16 A A 10 81191638 0.8762
Copyright ©2018 Joanna Szyda
STATISTICAL METHODS – SNP
…
7 198 552 polymorphic variants for 1 individual
Copyright ©2018, Joanna Szyda
##FORMAT=<ID=SP,Number=1,Type=Integer,Description="Phred-scaled strand bias P-value">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
##INFO=<ID=PR,Number=1,Type=Integer,Description="# permutations yielding a smaller PCHI2.">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT BSWCHEM120014887571
Chr1 182 C T 30.8 DP=2;VDB=7.200000e-02;AF1=1;AC1=2;DP4=0,0,2,0;MQ=34;FQ=-33 GT:PL:GQ 1/1:62,6,0:10
Chr1 300 A G 87 DP=6;VDB=8.330040e-02;RPB=0.000000e+00;AF1=0.5;AC1=1;DP4 GT:PL:GQ 0/1:117,0,52:5
Chr1 324 A G 34 DP=9;VDB=6.733101e-02;RPB=-1.711553e+00;AF1=0.5;AC1=1;DP4= GT:PL:GQ 0/1:64,0,160:6
Chr1 340 G A 90 DP=14;VDB=8.462522e-02;RPB=-1.333333e-01;AF1=0.5;AC1=1;DP4= GT:PL:GQ 0/1:120,0,209:9
Chr1 353 T A 136 DP=14;VDB=1.121465e-01;RPB=1.219070e+00;AF1=0.5;AC1=1;DP GT:PL:GQ 0/1:166,0,49:5
Chr1 355 T A 141 DP=14;VDB=9.310645e-02;RPB=1.219070e+00;AF1=0.5;AC1=1;DP4= GT:PL:GQ 0/1:171,0,50:53
Chr1 380 G T 103 DP=18;VDB=1.049857e-01;RPB=8.897565e-01;AF1=0.5;AC1=1;DP GT:PL:GQ 0/1:133,0,241:9
Chr1 420 T A 211 DP=19;VDB=1.566941e-01;RPB=-7.964914e-01;AF1=0.5;AC1=1;DP GT:PL:GQ 0/1:241,0,81:84
STATISTICAL METHODS IN BIOLOGY – CNV
…
• 13 149 – 22 496 deletions • 1 694 – 5 187 duplications
for 1 individual
Copyright ©2018 Joanna Szyda
duplication chr1:4001-16300 12300 2.10151 0 1.76985e-38 0 2.2605e-49 1
deletion chr1:16301-20400 4100 0.535091 3.73056e-06 15163.9 3.33459 5.68939e+06 1
duplication chr1:20401-24500 4100 1.81889 0.000438811 9.48454e+08 10.3995 1.62802e+09 1
duplication chr1:43501-62600 19100 2.19581 0 2.21431e+09 0 2.27536e+09 1
duplication chr1:64901-68800 3900 2.29307 0 1.84995e-32 0.000141891 2.69203e-13 1
deletion chr1:215901-217800 1900 0 8.38803e-11 9.73635e-46 1 1 1
deletion chr1:319601-320500 900 0.0267015 0.000351021 9.17077e-09 1 1 1
deletion chr1:518401-519200 800 0.171095 0.188201 1.02726e-05 1 1 1
deletion chr1:531101-537700 6600 0.553887 1.11078e-09 3577.68 5.86046e-05 1901.93 1
deletion chr1:541901-542900 1000 0.0569842 0.00162457 4.12208e-07 1 1 1
deletion chr1:626501-627200 700 0.0206353 0.00758167 2.79548e-07 1 1 1
deletion chr1:665101-671700 6600 0.707933 1.16982e-06 399149 0.000237804 4.29859e+06 1
deletion chr1:761501-762300 800 0.0375489 0.0396925 3.13366e-05 1 1 1
deletion chr1:1044501-1045500 1000 0.0142217 1.22398e-07 3.16665e-14 1 1 1
STATISTICAL METHODS – GENE EXPRESSION
…
• 55 488 „genes” • 28 individuals 14 comparisons
Copyright ©2018 Joanna Szyda
Row 016704_082506 016711_082506 016728_082506 016735_082506 016742_082506 016759_082506 016766_082506
1 -0.379969712105877 0.403899786582667 -0.0104257720702610 -0.252970787977042 0.327140777455535 -0.861182591351216 -0.418276255293902
2 -0.50199428553046 0.288115135624757 -0.290012002570961 -0.286254699271135 0.336603008034914 -0.953015213030984 -0.472757076735075
3 -1.54452075790905 -1.16841250501632 -2.26017997435925 -2.70243709475759 0.699574039274513 -1.67947803658176 -1.10291758468840
4 -4.09629943987777 -2.43170432963352 -1.98478562881723 -2.72302386217010 1.18316854518407 -3.85397448302713 -0.249393607318890
5 -0.111098651892599 0.347292518152577 0.331439096860182 -0.010271905540433 1.43641051135051 0.346796102673222 0.336072438921737
6 -0.134729455270304 0.372643580526304 0.341247555322519 0.0595729133568211 1.35465156919272 0.160746561142349 0.41983899862486
7 -0.0517075104474681 0.515841300595418 0.238693349173603 0.121545575614600 0.44754152247933 -0.173893212633558 0.607449797085044
8 0.00907483185174598 0.452372485853459 0.162678295501523 0.0116542025434294 0.454864977564281 -0.104183663554561 0.632652223876392
9 0.571883776225215 0.904874387070875 0.7151566631217 0.361208461495468 0.196875815319073 0.025325024774007 0.462987648085157
10 0.495052517283484 1.00961838793444 0.0185276096030743 0.410618296994563 0.156786130902701 0.436302356814827 0.599210687996748
11 0.0232575805162449 0.585138149087892 0.0793272014350288 -0.0880757258764968 -0.356087576025068 -0.237106015010118 -0.280425773562447
12 1.25820722351758 0.63282308775407 -0.367716448933869 0.597439461465197 0.00389484066424839 0.349837535932186 0.362530089156421
13 -0.385570802182663 -0.385808962840273 -3.63031315090913 -0.41187510253694 0.695487631312712 -0.737690939876218 -0.397344092359565
14 -0.254563184593792 -0.220929995962450 -2.94907420301003 -0.816929027969383 3.14460371526462 -2.29196428605653 0.58986720495445
15 1.43339698705363 -3.22298657886497 1.89900870851926 -0.781070579763326 -1.56342292995965 -2.22526479241737 -4.68211173194506
16 -1.05622199191238 -0.206711926130888 -2.19081669917052 -1.60889203046095 -0.940066213614575 -2.21680720505194 -1.33155495933538
17 -0.616360052234347 -0.113974969944924 -0.49754977042167 0.40335771824433 -1.66573906138418 -0.89379246137783 -0.0220819242616700
18 -1.11441032827542 -0.0121586088199914 -0.169381132834277 -0.259949749665342 -1.69547100878847 -0.532474149398847 -0.218494485981870
19 1.67603439717634 1.41549585761362 2.01532077573397 1.02444425365727 1.18053786915195 1.42482870601831 2.19714912440384
20 1.58221157017317 1.58597171111711 0.79534753411091 1.47264591748487 1.06832032683278 1.52720704132747 2.32917253211925
21 -1.38664147891939 -0.759377915993809 -1.10757370154727 -1.29208329760012 -1.23917090519556 -1.30128806704239 -0.447940860454261
22 -1.49072070093148 -0.631167611179447 -1.22894878016313 -1.25638656073317 -1.49320316146543 -1.25326569115893 -0.504870024348167
23 1.95998352826971 2.30735302307128 1.83415294148455 1.74252437471081 1.98265166288651 2.00894963430074 2.78722511063934
24 2.05602617745716 2.41171224030426 1.85531172025114 1.74726239681063 1.97018225108370 1.96295549117018 2.71147801348367
25 0.161682569370010 -0.479052221058937 -0.893649016169367 -0.99387843224199 -2.36883710446832 -1.55017155978841 -0.728467644523237
26 -1.74280128105311 -0.359451612557868 -0.748710921283307 -1.01479583218312 -2.58859363745148 0.0401337986802179 -0.716334120535449
Copyright ©2018, Joanna Szyda
THE INSTITUTE OF GENETICS
Institute of Genetics: http://gen.edu.pl
The Biostatistic Group: http://theta.edu.pl
Copyright ©2018, Joanna Szyda
THE BIOSTATISTIC GROUP
MAJOR FIELDS OF RESEARCH
1. Gene detection • Genome-Wide Association Studies GWAS • Epistatic effects • Gene regulatory networks
2. Modelling the phenotypic variability • Prediction of genomic breeding values • Impact of rare variants on trait variability
3. Bioinformatics • Aanalyses of whole genome sequence data
Copyright ©2018, Joanna Szyda
THE BIOSTATISTIC GROUP – A PROJECT
Copy number variations analysis among diverse cattle breeds
• Magda Mielczarek
• Joanna Szyda
• Magdalena Frąszczak
• Giulietta Minozzi
• Ezequiel L. Nicolazzi
• John Williams
• Katarzyna Wojdak-Maksymiec
Copyright ©2018, Joanna Szyda
THE BIOSTATISTIC GROUP – A PROJECT
• CNV in whole genome sequence of 155 bulls
• Various breeds:
− Brown Swiss 48 − Guernsey 20 − Fleckvieh 31 − Simmental 16 − Norwegian Red 26 − Parda de la Montaña 4 − Pezzata Rossa Italiana 3 − Bruna Italiana 1 − Avileña 2 − Albera 1 − Rubia Gallega 1 − Toro de Lidia 1 − Pirenaica 1
Copyright ©2018, Joanna Szyda
THE BIOSTATISTIC GROUP – A PROJECT
Genomic distribution of CNV duplications
Copyright ©2018, Joanna Szyda
THE BIOSTATISTIC GROUP – A PROJECT
Genomic distribution of CNV deletions
LECTURE CONTENTS
1. Ability to use biological data of various structures
2. Principles of statistical data analysis
3. Interpretation of results
4. Presence
5. Questions
Copyright ©2018, Joanna Szyda
LECTURE CONTENTS
1. Introductory lecture
2. Random variables and probability theory
3. Populations and samples
4. Hypotheses testing and parameter estimation
5. Most widely used statistical tests I
6. Most widely used statistical tests II
Principles of statistical data analysis
Copyright ©2018, Joanna Szyda
LECTURE CONTENTS
7. Linear regression
8. Nonlinear regression
9. Regression model fit
10. Correlation
11. Elements of statistical data modelling
12. Model comparison
13. Variance analysis
14. Covariance analysis
15. Summary of the material, analysis of examples, discussion
Elements of statistical modelling of data
Copyright ©2018, Joanna Szyda
LAB CONTENTS
1. Presence
2. Final grade – average of particular grades
3. Grading:
• Written exams - lectures + labs • Presentations
4. Computer labs
Copyright ©2018, Joanna Szyda
LAB CONTENTS
Principles of statistical data analysis
1. Introductory lab
2. Probability theory
3. Random variables
4. Populations and samples
5. Hypotheses testing and parameter estimation
6. Exam I
Copyright ©2018, Joanna Szyda
LAB CONTENTS
Statistical tests
7. t - tests
8. c2 - tests
9. F - tests
10. Exam II
Copyright ©2018, Joanna Szyda
LAB CONTENTS
Elements of statistical modelling of data
11. Correlation
12. Linear and nonlinear regression
13. Interpreting results from various models
14. Variance analysis
15. Exam III
Copyright ©2018, Joanna Szyda
CONTACT
consultation: time scheduled individually
address: Institute of Genetics
Kożuchowska 7
Copyright ©2018, Joanna Szyda
top related