ngs data format and general quality control
DESCRIPTION
NGS data format and General Quality Control. Data format “Flowchart”. Fastq file. Used to record raw reads coming off the sequencers Each record contains four lines Parameters were usually set by the sequencer, such as read length. Fastq file . - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: NGS data format and General Quality Control](https://reader035.vdocuments.us/reader035/viewer/2022062811/56815f21550346895dcdee33/html5/thumbnails/1.jpg)
NGS data format and General Quality Control
![Page 2: NGS data format and General Quality Control](https://reader035.vdocuments.us/reader035/viewer/2022062811/56815f21550346895dcdee33/html5/thumbnails/2.jpg)
Data format “Flowchart”
Sequencer raw data Fastq SAM/BAM
![Page 3: NGS data format and General Quality Control](https://reader035.vdocuments.us/reader035/viewer/2022062811/56815f21550346895dcdee33/html5/thumbnails/3.jpg)
Fastq file
• Used to record raw reads coming off the sequencers
• Each record contains four lines• Parameters were usually set by the sequencer,
such as read length
![Page 4: NGS data format and General Quality Control](https://reader035.vdocuments.us/reader035/viewer/2022062811/56815f21550346895dcdee33/html5/thumbnails/4.jpg)
Fastq file
![Page 5: NGS data format and General Quality Control](https://reader035.vdocuments.us/reader035/viewer/2022062811/56815f21550346895dcdee33/html5/thumbnails/5.jpg)
• Line 1 begins with a '@' character and is followed by a sequence identifier and an optional description (like a FASTA title line).
• Line 2 is the raw sequence letters. The read length is the length of the string.
• Line 3 begins with a '+' character and is optionally followed by the same sequence identifier (and any description) again.
• Line 4 encodes the quality values for the sequence in Line 2, and must contain the same number of symbols as letters in the sequence.
http://en.wikipedia.org/wiki/FASTQ_format
![Page 6: NGS data format and General Quality Control](https://reader035.vdocuments.us/reader035/viewer/2022062811/56815f21550346895dcdee33/html5/thumbnails/6.jpg)
General quality control of raw reads
• Using FASTQC– A tool that implements some general rules– Basic Statistics– Per base sequence quality– Per sequence quality scores– Per base sequence content– Per base GC content– Per sequence GC content– Per base N content– Sequence Length Distribution– Sequence Duplication Levels– Overrepresented sequences– Kmer Content
![Page 7: NGS data format and General Quality Control](https://reader035.vdocuments.us/reader035/viewer/2022062811/56815f21550346895dcdee33/html5/thumbnails/7.jpg)
Quality scores
![Page 8: NGS data format and General Quality Control](https://reader035.vdocuments.us/reader035/viewer/2022062811/56815f21550346895dcdee33/html5/thumbnails/8.jpg)
Perbase “N” percentage
![Page 9: NGS data format and General Quality Control](https://reader035.vdocuments.us/reader035/viewer/2022062811/56815f21550346895dcdee33/html5/thumbnails/9.jpg)
Sample FASTQC reports
Good quality : http://www.bioinformatics.babraham.ac.uk/projects/fastqc/good_sequence_short_fastqc/fastqc_report.html
Bad quality: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/bad_sequence_fastqc/fastqc_report.html
![Page 10: NGS data format and General Quality Control](https://reader035.vdocuments.us/reader035/viewer/2022062811/56815f21550346895dcdee33/html5/thumbnails/10.jpg)
Data format “Flowchart”
Sequencer Fastq SAM/BAM
![Page 11: NGS data format and General Quality Control](https://reader035.vdocuments.us/reader035/viewer/2022062811/56815f21550346895dcdee33/html5/thumbnails/11.jpg)
SAM/BAM
• SAM stands for Sequence Alignment Map• BAM is the binary form of SAM• Used for mapped/aligned reads• Generated by NGS mapper/aligners
![Page 12: NGS data format and General Quality Control](https://reader035.vdocuments.us/reader035/viewer/2022062811/56815f21550346895dcdee33/html5/thumbnails/12.jpg)
SAM
![Page 13: NGS data format and General Quality Control](https://reader035.vdocuments.us/reader035/viewer/2022062811/56815f21550346895dcdee33/html5/thumbnails/13.jpg)
BAM