cdisc pharmacogenomics standards · • pg –pgx methods and supporting information run parameters...
TRANSCRIPT
© CDISC 2014
CDISC Pharmacogenomics
Standards
Joyce Hernandez (Joyce Hernandez Consulting)
November 20, 2014
1
© CDISC 2014
Agenda
• Project background
• Domains, Relationships & Molecular Concepts
• Variables
• Specimen Genealogy
• Specimen Hierarchy
• Pharmacogenomics (PGx) Examples:
Biospecimen events and findings
Genetic Variation
Gene Expression
• Next steps and team
© CDISC 2014
Background
• Initial Data Focus for Version 1.0
Specimen Collection and Handling
Specimen Hierarchy
Genetic Variation utilizing well-known standards (HGVS)
Genotyping data (common formats currently used)
Viral Genetics (includes some viral classification
variables)
• Special sections to enhance understanding
Glossary of genetic and genomic terms
Nomenclatures (HGVS, HLA)
CMAPS to document common processes
© CDISC 2014
New Domains to support PGx
4
© CDISC 2014
Domain
Relationships
within a STUDYID
5
BS - Specimen
BSREFID
BE - Specimen Event
BEREFIDUSUBJID
USUBJID
PG - Setup and QC
PGREFID
USUBJID
PF -Findings
PGREFIDUSUBJID
SB - Subject Biological State
USUBJIDSBREFID
PB - PGx Biologoical State
PBMRKRIDPBREFID
SBMRKRID
© CDISC 2014
Molecular concepts represented in the
domains
6
1
3
2
4a
4b
4c
5
4
© CDISC 2014
PGx Specific Variables - Specimen
7
Name Label Notes
--REFID Reference ID Specimen identifier.
--PARENT Specimen Parent When the specimen in question has been obtained from another specimen (e.g., via resectioning, aliquoting), --PARENT
holds the --REFID of the “parent specimen;” that is, the specimen from which the current specimen has been obtained.
--LEVEL Specimen Level Any specimen obtained directly from the subject has a specimen level of 1. Specimens obtained from a level 1 specimen
have a specimen level of 2; from a level 2 specimen have a specimen level of 3; etc. A level 4 specimen, therefore, would
be a specimen (4) obtained from a specimen (3) obtained from a specimen (2) obtained from a specimen (1) obtained
from the subject.
--DTC Date/Time Collected Date/time of specimen collection. For specimens with a specimen level greater than 1, --DTC refers specifically the
date/time of collection for the originating specimen, i.e., the specimen obtained directly from the subject.
A specimen is a sample of the subject which undergoes a test in place of the subject when the test cannot be performed
on the subject directly, with the understanding that any results obtained thereby may be treated as pertaining to the
subject. However, once the specimen has been separated from the subject, any changes in the subject’s state will not be
reflected by the specimen. Therefore, when a test is performed on a specimen, the results cannot be guaranteed to
pertain to the subject as they are at the time of the test, only to the subject as they were at the time of specimen
collection.
RELSPEC
© CDISC 2014
List of IG Use Cases• BE/BS – Biospecimen Domains
Specimen handling such as freeze/thaw cycles and transportation.
Steps in obtaining cell-free RNA from blood plasma.
Types of quality evaluation.
• RELSPEC – Related Specimens
Specimen genealogy and hierarchy.
• PF – Pharmacogenomics Findings
Protein variation in viral genetics.
Protein and nucleic variation in viral genetics.
Frame shifts, both viral and subject.
Nucleotide reads.
Zygosity.
Single-nucleotide polymorphisms (SNP) reads.
HLA allelic records
Observed somatic vs. gremlins variations.
Observed levels of somatic variations in a biopsy sample.
Gene expression measured via qRT-PCR.
Gene expression measured via microarray.
• PG – PGx Methods and Supporting Information
Run parameters for PCR.
Details of SNP probe assays.
• PB/SB – PGx Marker Domains
Simple and complex genetic markers for drug resistance.
• Relating PGx Domains
A somatic variation and its related medical diagnosis.
Germline variations and related inherited risk of cancer.
Genetic variations relating to drug metabolism.
8
© CDISC 2014
Specimen Genealogy
9
Row STUDYID USUBJID REFID SPEC PARENT LEVEL
1 ABC-123 001-01 SPC-001 TISSUE 1
2 ABC-123 001-01 SPC-001-A TISSUE SPC-001 2
3 ABC-123 001-01 SPC-001-B TISSUE SPC-001 2
4 ABC-123 001-01 SPC-001-B-1 DNA SPC-001-B 3
5 ABC-123 001-01 SPC-003 BRAIN 1
6 ABC-123 001-01 SPC-003-A RNA SPC-003 2
RELSPEC
© CDISC 2014
Biospecimen Events and Findings
10
Row STUDYID DOMAIN USUBJIDSPDEVID
BESEQ BEREFID BETERM BEDECOD BEPARTY BEPRTYID BECAT BESCAT
1 ABC134 BE 43871TS409871
1 1148.267 Excision EXCISIONCOLLECTIO
NSOFT TISSUE
2 ABC134 BE 43871 2 1148.267Flash
Frozen
FLASH
FROZENPREP
3 ABC134 BE 43871309827
3 1148.267Stored in
FreezerSTORED STORING
4 ABC134 BE 43871 4 1148.267 Thaw THAW PREP
5 ABC134 BE 43871LN43871
5 1148.267 Shipped SHIPPED ABC LAB 01TRANSPOR
T
Row BEBODSYS BELOC VISITNUM VISIT BEDTC BESTDTC BEENDTC
1 (cont)Nervous System
[A08]BRAIN 1 BASELINE
2005-03-20 2005-03-
20T15:07
2 (cont)Nervous System
[A08]BRAIN 1 BASELINE
2005-03-20 2005-03-
20T15:07
2005-03-
20T13:22
3 (cont)Nervous System
[A08]BRAIN 1 BASELINE
2005-03-20 2005-03-
20T13:22
2005-03-
21T10:29
4 (cont)Nervous System
[A08]BRAIN 1 BASELINE
2005-03-20 2005-03-
21T10:29
2005-03-
21T10:36
5 (cont)Nervous System
[A08]BRAIN 1 BASELINE
2005-03-20 2005-03-
21T11:00
2005-03-
21T15:00
Row BSSTRESU BSPEC BSANTREG BSBLFL VISITNUM BSDTC
1 (cont) cm3 BRAINCEREBRAL
AQUEDECT
2 (cont) C BRAINCEREBRAL
AQUEDECT
Y 1 2005-03-20
Row STUDYID DOMAIN USUBJID BSSEQ BSREFID BSTESTCD BSTEST BSCAT BSORRES BSORRESU BSSTRESC BSSTRESN
1 ABC134 BS 43871 1
1148.267
VOLUME Volume
SPECIMEN
MEASURE
MENT
2 cm3 2 2
2 ABC134 BS 43871 2
1148.267
FFRZTMP
Flash
Frozen
Temp
SPECIMEN
HANDLING-80 C -80
© CDISC 2014
Variables – (Pathogens)
11
Name Label Notes
-NSPCES *** Biological Classification In findings domains, --NSPCES holds the species of the pathogen to which the subject is a host when the
pathogen is the focus of the. In instances when both the subject and the pathogen are tested, records for the
pathogen are distinguished and differentiated from records for the subject by the use of the --SPCIES variable.
Not to be confused with DMSPCIES, which holds the species of the subject.
--NSTRN Type of Strain As --NSPCES. --NSTRN holds the strain of the pathogen to which the subject is a host when the pathogen is
the focus of the test.
*** SDTMIG omits --NSPCES because all subjects in most human clinical trials must be homo sapiens; the nature of the study
obviates the need for this information to be included in SDTM datasets. The exception is Virology when a viral species must be
identified.
© CDISC 2014
Variables – (Genetics/Genomics Test related)
12
Name Label Notes
--TEST
--TESTCD
Test Name
Test Code
For genetic variation, usually the level of granularity and/or molecular component of interest: Examples: Nucleotide,
Amino Acid, Allele
--REFSEQ Reference Sequence Depending on the type of test method, the reference sequencing is most likely to be either the rsID from dbSNP (for
targeted tests) or a GenBank accession number (for non-targeted tests).
--GENTYP Type of Genetic Region of
Interest
The type the portion of the genome serving as a locus for the experiment/test. Examples: GENE, SECTOR,
PROTEIN
--GENRI Genetic Region of Interest The portion of the genome serving as a locus for the experiment/test. Often the name of a gene. Examples: EGFR,
KRAS, CYP2D6
--GENLI Genetic Location of
Interest
The numeric position within the sequence for the targeted read. Compare vs. --GENLOC.
--GENLI and --GENTGT are variables that should be used only when the the test specifies a single genetic read to
the exclusion of all other possibilities, and the result is a matter of occurrence, either as a percentage or as a
boolean observation.
--GENTGT Genetic Target The genetic read targeted by the probe at the position specified by --GENLI.
© CDISC 2014
Variables – (Genetics/Genomics Result related)
13
Name Label Notes
--GENSR Genetic Sub-Region The sub-region within the genetic region of interest in which the observed varition at the position given in --GENLOC
is located, if relevant. Because exon numbers can be variable and are not regulated, caution should be exercised
when populating this variable.
--GENLOC Genetic Location One of the three variables used to define a genetic read. --GENLOC holds the numeric position within the sequence
for the observed result.
--ORRES Result or Finding in
Original UnitsOne of the three variables used to define a genetic read. --ORRES holds the observed result at the position specified
by --GENLOC.
When --GENLI is populated, --ORRES follows the standard rules.
--ORREF Reference Result One of the three variables used to define a genetic read. --ORREF holds the expected result at the position specified
by --GENLOC according to the reference sequence specified by --REFSEQ.
--STRESC Result or Finding in
Standard Format
When --GENLOC is populated, --STRESC holds the observed variation, given in HGVS nomenclature.
When --GENLI is populated and --ORRES=Y, --STRESC holds the observed variation as targeted, given in HGVS
nomenclature.
Otherwise, --STRESC is copied or derived from --ORRES.
--RSNUM Reference SNP Cluster ID
Number
Reference identifier for previously identified instances of the variation, such as the rs# in dbSNP.
--MUTYP Mutation Type The type of mutation, usually either GERMLINE (inherited) or SOMATIC (arising only in parts of the individual, as in
cancer).
--ALLELC Allele (Chromosome)
Identifier
Humans are diploid: they have two homologous copies of each chromosome. However, the two copies are not
necessarily identical, since one chromosome is inherited from each parent. Therefore, in tests that compare
chromosomes, or parts of chromosomes (alleles), the --ALLELE variable is used to denote results for one or the
other of the two alleles (chromosomes).
© CDISC 2014 14
Genetic Variation ExampleRow STUDYID DOMAIN USUBJID PGSEQ PGTESTCD PGTEST PGGENTYP PGGENRI PGCAT PGORRES PGSTRESC
1 ABC-01234 PG 17C0154 1 EXONExons
SequencedGENE EGFR
GENETIC
VARIATION13-21 13-21
2 ABC-01234 PG 17C0154 2 SEQSTARTSequence
StartGENE EGFR
GENETIC
VARITATION1499 1499
3 ABC-01234 PG 17C0154 3 SEQLONGSequence
LengthGENE EGFR
GENETIC
VARITATION1127 1127
Row STUDYID DOMAIN USUBJID PFSEQ PFREFID PFTESTCD PFTEST PFGENRI PFGENTYP PFREFSEQ PFORRES
1 ABC-01234 PF 17C0154 1 5493283 NUC Nucleotide EGFR GENE NM_005228.3 C
2 ABC-01234 PF 17C0212 1 8970343 NUC Nucleotide EGFR GENE NM_005228.3 T
3 ABC-01234 PF 17C0220 1 7629230 NUC Nucleotide EGFR GENE NM_005228.3 T
Row PFORREF PFGENLOCPFGENS
RPFSTRESC PFXNAM PFNAM PFMETHOD PFRUNID VISITNUM PFDTC
1 (cont) G 2156 Exon 18 c.2156G>C 5.23.445.1.4.165008.1.8:86175Biotech
ABC
Massively Parallel
Sequencing8970723 1
2012-10-
23T10:06
2 (cont) C 2369 Exon 20 c.2369C>T 5.23.445.1.4.165008.1.8:87952Biotech
ABC
Massively Parallel
Sequencing8925000 1
2012-10-
23T12:50
3 (cont) A 2073 Exon 16 c.2073A>T 5.23.445.1.4.165008.1.8:87970Biotech
ABC
Massively Parallel
Sequencing8925018 1
2012-10-
23T13:03
Row STUDYID DOMAIN PBSEQ PBMRKRID PBMRKR PBGENRI PBGENTYP PBDRUG PGDIAG PBSTMT
1 ABC-01234 PB 1 2073A>T 2073A>T EGFR GENE AstrocytomaDecreased risk of diffusely infiltrating
astrocytoma
2 ABC-01234 PB 2 G719A G719A EGFR GENE EGFR TKIs Increased sensitivity
3 ABC-01234 PB 3 T790M T790M EGFR GENE EGFR TKIs Decreased sensitivity
Row STUDYID DOMAIN USUBJID SBSEQ SBREFID SBMRKRID SBGENRI SBGENTYP SBGENRI SBNAM VISITNUM SBDTC
1 ABC-01234 SB 17C0154 1 5493283 G719A EGFR GENE EGFR Biotech ABC 1 2012-10-23T10:06
2 ABC-01234 SB 17C0212 1 8970343 T790M EGFR GENE EGFR Biotech ABC 1 2012-10-23T10:06
3 ABC-01234 SB 17C0220 1 7629230 2073A>T EGFR GENE EGFR Biotech ABC 1 2012-10-23T10:06
© CDISC 2014 15
Gene Expression Example – ArraysRow STUDYID DOMAIN USUBJID SPDEVID PFSEQ PFGRPID PFREFID PFTESTCD PFTEST PFCAT PFORRES
1 A12345 PF 43871AGS-
G4900DA2 1 2287.09443 NINT1VAL Normalized Intensity 1 Value Analytic 1.16279
2 A12345 PF 43871AGS-
G4900DA3 1 2287.09443 NINT2VAL Normalized Intensity 2 Value Analytic 0.96469
3 A12345 PF 43871 MANAN03 4 1 2287.09443 PVAL P Value Post-Analytic 0.05391
4 A12345 PF 43871 MANAN03 5 1 2287.09443 FOLDCHG Fold Change Post-Analytic 1.8
Row PFSTRESC PFSTRESN PFXFN PFNAM PFSPEC PFMETHOD PFRUNID PFANMETH PFBLFL VISITNUM PFDTC
1 (cont) 1.16279 1.162792.16.090.1.1357
64.3.4:7280912
Deluxe Central
LabsRNA Microarray 1000450001 LOWESS
2 2005-03-
21T11:28:17
2 (cont) 0.96469 0.964692.16.090.1.1357
64.3.4:7280912
Deluxe Central
LabsRNA
Microarray1000450001
LOWESS
2 2005-03-
21T11:28:17
3 (cont) 0.05391 0.053912.16.090.1.1357
64.3.4:7280912
Deluxe Central
LabsRNA
Microarray1000450001
2 2005-03-
21T11:28:17
4 (cont) 1.8 1.82.16.090.1.1357
64.3.4:7280912
Deluxe Central
LabsRNA Microarray 1000450001
2 2005-03-
21T11:28:17
Row STUDYID DOMAIN SPDEVID DISEQ DIPARMCD DIPARM DIVAL
1 A12345 DI AGM-G4851B 1 TYPE Device Type Microarray Kit
2 A12345 DI AGM-G4851B 2 MANUF Manufacturer Agilent
3 A12345 DI AGM-G4851B 3 MODEL Model G4851B
4 A12345 DIAGS- G4900DA
1 TYPE Device Type Microarray Scanner
5 A12345 DI AGS- G4900DA 2 MANUF Manufacturer Agilent
6 A12345 DI AGS- G4900DA 3 MODEL Model G4900DA
7 A12345 DI MANAN03 1 TYPE Device Type Workstation
© CDISC 2014
Next Steps
• Public Review Posting – ends December 2nd
Link: http://cdisc.org/sdtm
• Final Posting – 1st Quarter, 2015
• Next Project – 1st Quarter, 2015 - Cytogenetics
16
© CDISC 2014
Contact Information and Team
Name Company
Joyce Hernandez, Team Leader Joyce Hernandez Consulting
Mohtaram Bahmanian Eli Lilly
Sally Cassals Next Step Clinical Systems.
Rhonda Facile CDISC
Doris Li Eli Lilly
Cliona Molony Merck
Mona Oakes Eli Lilly
Phil Pochon Covance
Janet Reich Amgen
Ellen Schatz Eli Lilly
James Sullivan Vertex
Richard Tyhach Eli Lilly
Patricia Wesolowski Vertex
Diane Wold GSK
Darcy Wold Independent Consultant
Fred Wood Accenture
17
Anyone that wishes to join the team please contact Joyce: