joanna szyda magdalena frĄszczak -...

29
STATISTICAL METHODS JOANNA SZYDA MAGDALENA FRĄSZCZAK

Upload: trantram

Post on 28-Feb-2019

213 views

Category:

Documents


0 download

TRANSCRIPT

STATISTICAL METHODS

JOANNA SZYDA

MAGDALENA FRĄSZCZAK

INTRODUCTION

1. Why statistical methods ???

2. The Biostatistic Group – current projects

3. Course contents

4. Contact

Copyright ©2018 Joanna Szyda

WHY STATISTICAL METHODS ???

ASHG 2011 Writing Workshop; Albertine ©2011 / Copyright ©2018, Joanna Szyda

“…science is not data. Data are the raw material of science. It is what you do with the data that is science – the interpretation you

make, the story you tell.”

STATISTICAL METHODS – SNP

N = 19 778 743 834

[Header] BSGT Version 3.2.32 Processing Date 11/24/2008 10:14 AM Content BovineSNP50_A.bpm Num SNPs 54001 Total SNPs 54001 Num Samples 32 Total Samples 2636 [Data]

SNP Name Sample ID GC Score SNP Index Allele1 - AB

Allele2 - AB Chr Position GT Score

ARS-BFGL-BAC-10172 4408169492_K 0.883 1 B B 14 4736993 0.849 ARS-BFGL-BAC-1020 4408169492_K 0.899 2 B B 14 6339014 0.8626 ARS-BFGL-BAC-10245 4408169492_K 0.6582 3 B B 14 30073020 0.71 ARS-BFGL-BAC-10345 4408169492_K 0.9092 4 A B 14 4497877 0.8721 ARS-BFGL-BAC-10365 4408169492_K 0.8021 5 B B 14 25140301 0.833 ARS-BFGL-BAC-10375 4408169492_K 0.8858 6 A B 14 4983527 0.8513 ARS-BFGL-BAC-10591 4408169492_K 0.867 7 A B 14 15446975 0.8363 ARS-BFGL-BAC-10793 4408169492_K 0.8722 8 B B 14 27452258 0.8403 ARS-BFGL-BAC-10867 4408169492_K 0.9316 9 A B 14 32700054 0.8949 ARS-BFGL-BAC-10919 4408169492_K 0.7805 10 A B 14 29520816 0.778 ARS-BFGL-BAC-10952 4408169492_K 0.9314 11 B B 10 19315327 0.8947 ARS-BFGL-BAC-10960 4408169492_K 0.6543 12 B B 10 21056606 0.7079 ARS-BFGL-BAC-10975 4408169492_K 0.8622 13 A B 10 21682679 0.8358 ARS-BFGL-BAC-10986 4408169492_K 0.8687 14 A B 10 25897020 0.8376 ARS-BFGL-BAC-10993 4408169492_K 0.8146 15 A B 10 80403647 0.7993 ARS-BFGL-BAC-11000 4408169492_K 0.9135 16 A A 10 81191638 0.8762

Copyright ©2018 Joanna Szyda

STATISTICAL METHODS – SNP

7 198 552 polymorphic variants for 1 individual

Copyright ©2018, Joanna Szyda

##FORMAT=<ID=SP,Number=1,Type=Integer,Description="Phred-scaled strand bias P-value">

##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">

##INFO=<ID=PR,Number=1,Type=Integer,Description="# permutations yielding a smaller PCHI2.">

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT BSWCHEM120014887571

Chr1 182 C T 30.8 DP=2;VDB=7.200000e-02;AF1=1;AC1=2;DP4=0,0,2,0;MQ=34;FQ=-33 GT:PL:GQ 1/1:62,6,0:10

Chr1 300 A G 87 DP=6;VDB=8.330040e-02;RPB=0.000000e+00;AF1=0.5;AC1=1;DP4 GT:PL:GQ 0/1:117,0,52:5

Chr1 324 A G 34 DP=9;VDB=6.733101e-02;RPB=-1.711553e+00;AF1=0.5;AC1=1;DP4= GT:PL:GQ 0/1:64,0,160:6

Chr1 340 G A 90 DP=14;VDB=8.462522e-02;RPB=-1.333333e-01;AF1=0.5;AC1=1;DP4= GT:PL:GQ 0/1:120,0,209:9

Chr1 353 T A 136 DP=14;VDB=1.121465e-01;RPB=1.219070e+00;AF1=0.5;AC1=1;DP GT:PL:GQ 0/1:166,0,49:5

Chr1 355 T A 141 DP=14;VDB=9.310645e-02;RPB=1.219070e+00;AF1=0.5;AC1=1;DP4= GT:PL:GQ 0/1:171,0,50:53

Chr1 380 G T 103 DP=18;VDB=1.049857e-01;RPB=8.897565e-01;AF1=0.5;AC1=1;DP GT:PL:GQ 0/1:133,0,241:9

Chr1 420 T A 211 DP=19;VDB=1.566941e-01;RPB=-7.964914e-01;AF1=0.5;AC1=1;DP GT:PL:GQ 0/1:241,0,81:84

STATISTICAL METHODS IN BIOLOGY – CNV

• 13 149 – 22 496 deletions • 1 694 – 5 187 duplications

for 1 individual

Copyright ©2018 Joanna Szyda

duplication chr1:4001-16300 12300 2.10151 0 1.76985e-38 0 2.2605e-49 1

deletion chr1:16301-20400 4100 0.535091 3.73056e-06 15163.9 3.33459 5.68939e+06 1

duplication chr1:20401-24500 4100 1.81889 0.000438811 9.48454e+08 10.3995 1.62802e+09 1

duplication chr1:43501-62600 19100 2.19581 0 2.21431e+09 0 2.27536e+09 1

duplication chr1:64901-68800 3900 2.29307 0 1.84995e-32 0.000141891 2.69203e-13 1

deletion chr1:215901-217800 1900 0 8.38803e-11 9.73635e-46 1 1 1

deletion chr1:319601-320500 900 0.0267015 0.000351021 9.17077e-09 1 1 1

deletion chr1:518401-519200 800 0.171095 0.188201 1.02726e-05 1 1 1

deletion chr1:531101-537700 6600 0.553887 1.11078e-09 3577.68 5.86046e-05 1901.93 1

deletion chr1:541901-542900 1000 0.0569842 0.00162457 4.12208e-07 1 1 1

deletion chr1:626501-627200 700 0.0206353 0.00758167 2.79548e-07 1 1 1

deletion chr1:665101-671700 6600 0.707933 1.16982e-06 399149 0.000237804 4.29859e+06 1

deletion chr1:761501-762300 800 0.0375489 0.0396925 3.13366e-05 1 1 1

deletion chr1:1044501-1045500 1000 0.0142217 1.22398e-07 3.16665e-14 1 1 1

STATISTICAL METHODS – GENE EXPRESSION

• 55 488 „genes” • 28 individuals 14 comparisons

Copyright ©2018 Joanna Szyda

Row 016704_082506 016711_082506 016728_082506 016735_082506 016742_082506 016759_082506 016766_082506

1 -0.379969712105877 0.403899786582667 -0.0104257720702610 -0.252970787977042 0.327140777455535 -0.861182591351216 -0.418276255293902

2 -0.50199428553046 0.288115135624757 -0.290012002570961 -0.286254699271135 0.336603008034914 -0.953015213030984 -0.472757076735075

3 -1.54452075790905 -1.16841250501632 -2.26017997435925 -2.70243709475759 0.699574039274513 -1.67947803658176 -1.10291758468840

4 -4.09629943987777 -2.43170432963352 -1.98478562881723 -2.72302386217010 1.18316854518407 -3.85397448302713 -0.249393607318890

5 -0.111098651892599 0.347292518152577 0.331439096860182 -0.010271905540433 1.43641051135051 0.346796102673222 0.336072438921737

6 -0.134729455270304 0.372643580526304 0.341247555322519 0.0595729133568211 1.35465156919272 0.160746561142349 0.41983899862486

7 -0.0517075104474681 0.515841300595418 0.238693349173603 0.121545575614600 0.44754152247933 -0.173893212633558 0.607449797085044

8 0.00907483185174598 0.452372485853459 0.162678295501523 0.0116542025434294 0.454864977564281 -0.104183663554561 0.632652223876392

9 0.571883776225215 0.904874387070875 0.7151566631217 0.361208461495468 0.196875815319073 0.025325024774007 0.462987648085157

10 0.495052517283484 1.00961838793444 0.0185276096030743 0.410618296994563 0.156786130902701 0.436302356814827 0.599210687996748

11 0.0232575805162449 0.585138149087892 0.0793272014350288 -0.0880757258764968 -0.356087576025068 -0.237106015010118 -0.280425773562447

12 1.25820722351758 0.63282308775407 -0.367716448933869 0.597439461465197 0.00389484066424839 0.349837535932186 0.362530089156421

13 -0.385570802182663 -0.385808962840273 -3.63031315090913 -0.41187510253694 0.695487631312712 -0.737690939876218 -0.397344092359565

14 -0.254563184593792 -0.220929995962450 -2.94907420301003 -0.816929027969383 3.14460371526462 -2.29196428605653 0.58986720495445

15 1.43339698705363 -3.22298657886497 1.89900870851926 -0.781070579763326 -1.56342292995965 -2.22526479241737 -4.68211173194506

16 -1.05622199191238 -0.206711926130888 -2.19081669917052 -1.60889203046095 -0.940066213614575 -2.21680720505194 -1.33155495933538

17 -0.616360052234347 -0.113974969944924 -0.49754977042167 0.40335771824433 -1.66573906138418 -0.89379246137783 -0.0220819242616700

18 -1.11441032827542 -0.0121586088199914 -0.169381132834277 -0.259949749665342 -1.69547100878847 -0.532474149398847 -0.218494485981870

19 1.67603439717634 1.41549585761362 2.01532077573397 1.02444425365727 1.18053786915195 1.42482870601831 2.19714912440384

20 1.58221157017317 1.58597171111711 0.79534753411091 1.47264591748487 1.06832032683278 1.52720704132747 2.32917253211925

21 -1.38664147891939 -0.759377915993809 -1.10757370154727 -1.29208329760012 -1.23917090519556 -1.30128806704239 -0.447940860454261

22 -1.49072070093148 -0.631167611179447 -1.22894878016313 -1.25638656073317 -1.49320316146543 -1.25326569115893 -0.504870024348167

23 1.95998352826971 2.30735302307128 1.83415294148455 1.74252437471081 1.98265166288651 2.00894963430074 2.78722511063934

24 2.05602617745716 2.41171224030426 1.85531172025114 1.74726239681063 1.97018225108370 1.96295549117018 2.71147801348367

25 0.161682569370010 -0.479052221058937 -0.893649016169367 -0.99387843224199 -2.36883710446832 -1.55017155978841 -0.728467644523237

26 -1.74280128105311 -0.359451612557868 -0.748710921283307 -1.01479583218312 -2.58859363745148 0.0401337986802179 -0.716334120535449

STATISTICAL METHODS

Copyright ©2018 Joanna Szyda

?

Copyright ©2018, Joanna Szyda

THE INSTITUTE OF GENETICS

Institute of Genetics: http://gen.edu.pl

The Biostatistic Group: http://theta.edu.pl

Copyright ©2018, Joanna Szyda

THE BIOSTATISTIC GROUP

MAJOR FIELDS OF RESEARCH

1. Gene detection • Genome-Wide Association Studies GWAS • Epistatic effects • Gene regulatory networks

2. Modelling the phenotypic variability • Prediction of genomic breeding values • Impact of rare variants on trait variability

3. Bioinformatics • Aanalyses of whole genome sequence data

Copyright ©2018, Joanna Szyda

THE BIOSTATISTIC GROUP

Copyright ©2018, Joanna Szyda

THE BIOSTATISTIC GROUP – A PROJECT

Copy number variations analysis among diverse cattle breeds

• Magda Mielczarek

• Joanna Szyda

• Magdalena Frąszczak

• Giulietta Minozzi

• Ezequiel L. Nicolazzi

• John Williams

• Katarzyna Wojdak-Maksymiec

Copyright ©2018, Joanna Szyda

THE BIOSTATISTIC GROUP – A PROJECT

• CNV in whole genome sequence of 155 bulls

• Various breeds:

− Brown Swiss 48 − Guernsey 20 − Fleckvieh 31 − Simmental 16 − Norwegian Red 26 − Parda de la Montaña 4 − Pezzata Rossa Italiana 3 − Bruna Italiana 1 − Avileña 2 − Albera 1 − Rubia Gallega 1 − Toro de Lidia 1 − Pirenaica 1

Copyright ©2018, Joanna Szyda

THE BIOSTATISTIC GROUP – A PROJECT

Bioinformatics pipeline

Copyright ©2018, Joanna Szyda

THE BIOSTATISTIC GROUP – A PROJECT

# CNV

Copyright ©2018, Joanna Szyda

THE BIOSTATISTIC GROUP – A PROJECT

CNV length

Copyright ©2018, Joanna Szyda

THE BIOSTATISTIC GROUP – A PROJECT

Genomic distribution of CNV duplications

Copyright ©2018, Joanna Szyda

THE BIOSTATISTIC GROUP – A PROJECT

Genomic distribution of CNV deletions

THE BIOSTATISTIC GROUP – A PROJECT

# of breed-specific CNVs

Copyright ©2018, Joanna Szyda

LECTURE CONTENTS

1. Ability to use biological data of various structures

2. Principles of statistical data analysis

3. Interpretation of results

4. Presence

5. Questions

Copyright ©2018, Joanna Szyda

LECTURE CONTENTS

1. Introductory lecture

2. Random variables and probability theory

3. Populations and samples

4. Hypotheses testing and parameter estimation

5. Most widely used statistical tests I

6. Most widely used statistical tests II

Principles of statistical data analysis

Copyright ©2018, Joanna Szyda

LECTURE CONTENTS

7. Linear regression

8. Nonlinear regression

9. Regression model fit

10. Correlation

11. Elements of statistical data modelling

12. Model comparison

13. Variance analysis

14. Covariance analysis

15. Summary of the material, analysis of examples, discussion

Elements of statistical modelling of data

Copyright ©2018, Joanna Szyda

LAB CONTENTS

1. Presence

2. Final grade – average of particular grades

3. Grading:

• Written exams - lectures + labs • Presentations

4. Computer labs

Copyright ©2018, Joanna Szyda

LAB CONTENTS

Principles of statistical data analysis

1. Introductory lab

2. Probability theory

3. Random variables

4. Populations and samples

5. Hypotheses testing and parameter estimation

6. Exam I

Copyright ©2018, Joanna Szyda

LAB CONTENTS

Statistical tests

7. t - tests

8. c2 - tests

9. F - tests

10. Exam II

Copyright ©2018, Joanna Szyda

LAB CONTENTS

Elements of statistical modelling of data

11. Correlation

12. Linear and nonlinear regression

13. Interpreting results from various models

14. Variance analysis

15. Exam III

Copyright ©2018, Joanna Szyda

http://theta.edu.pl/teaching/ Statistical mmethods

CONTACT

Copyright ©2018, Joanna Szyda

CONTACT

consultation: time scheduled individually

address: Institute of Genetics

Kożuchowska 7

Copyright ©2018, Joanna Szyda

STATISTICAL METHODS

course