-
8/11/2019 Phylogenetic Structure of Q-M378 Subclade Based On Full Y-Chromosome Sequencing
1/19
The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG
84
__________________________________________________________
Received: December 14 2013; accepted: December 16 2013;published: January 8 2014Correspondence: [email protected] [email protected]@yfull.com
Phylogenetic Structureof Q-M378 Subclade
Based On FullY-Chromosome Sequencing
Vladimir Gurianov1Leon Kull2Roman Sychev3Vladimir Tagankin3Vadim Urasin3
1 The Q-L275 Research Project, Russia,2 Full Genomes Corporation, USA,3 YFull research group, Russia.
Abstract
Q-M378 subclade, which is downstream of Q-L275 haplogroup, is marked by a wide area of its distributionand a minor share of presence in modern populations of Eurasia. Phylogenetic structure of the subclade, knownso far, did not allow for matching SNP Y-chromosomes to specific populations and to reconstruct possible direc-tions of their migrations in retrospect.
The conducted research enabled us to form a consistent phylogenetic structure of Q-M378 subclade, validatedby analysis of SNP and STR-markers, based on the data of full Y-chromosome sequencing using next generationsequencers. As part of the research, new phylogenetic levels of Q-Y2250 (downstream of Q-M378 and includingQ-L301), Q-Y2220 (downstream of Q-L245), Q-Y2200 (downstream of Q-Y2220) were defined.
SNPs, which, in the future, may possibly mark certain European and Asian subclusters of Q-Y2220 (includingthe Armenian subcluster), as well as separate branches of the Jewish cluster Q-Y2200, were defined as well.
The research also confirmed connection of Q-M378 subclade distribution with migration of Indo-Europeanlanguage carriers from Central Asia via Afghanistan and Iran to the West.
Introduction
The Q-M378 subclade1, downstream of Q-L275 haplogroup, is present in a number of pop-ulations in Europe, Southwest (Western)2 andSouthern Asia3, and also in the Central Asia allthe way to North-West China4.
1yDNA Haplogroup Q and its Subclades 2013 -http://www.isogg.org/tree/ISOGG_HapgrpQ.html. Hereinafter subclades are referenced in
line with ISOGG notation (International Society of Genetic Genealogy) specifyingsingle nucleotide polymorphism (SNP) typical for a respective subclade.2Cinnioglu et al, Excavating Y-chromosome haplotype strata in Anatolia, 2003.Haplotypes 337-339 according to predictor by Urasin (http://predictor.ydna.ru/) are
positive to SNP M378. All samples belong to Central-Anatolian and East-Anatolianregions of Turkey.3Sanghamitra Sengupta et al., Polarity and Temporality of High-Resolution Y-
Chromosome Distributions in India Identify Both Indigenous and Exogenous Ex-pansions and Reveal Minor Genetic Influence of Central Asian Pastoralists, Am JHum Genet. 2006 February; 78(2): 202221.http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1380230/(among the tested inhabitants of
Pakistan 2 out of 176 or 1.14% were positive to SNP M378; SNP M378 was not
identified among sample groups in India and Eastern Asia).4Zhong et al., Extended Y-chromosome investigation suggests post-Glacial mi-
grations of modern humans into East Asia via the northern route // Molecular Bi-ology and Evolution, First published online: September 13, 2010, doi:10.1093/molbev/msq247 (among four populations of Uigurs from Xinjiang onesuch person was found in each of the two populations: 1 out of 71, 1 out of 18).
One of the peculiar features of Q-M378 sub-clade is a relatively wide area of its distribution(connected with migrations of ancestral popula-tions of the Indo-European language family) andan extremely low percentage in almost all popu-lations (modern ethnic groups), where it hasbeen reported by now. The exception is the Jew-
ish Diaspora (primarily Ashkenazi Jews), whereQ-M378 subclade share reaches 5.2 to 7 percent(Behar 20045, Hammer 20096). Therefore, Q-M378 locality is often associated with the MiddleEast. In the meantime, a more comprehensiveanalysis of research data and publicly availabledata of commercial tests enables us to draw aconclusion on more complex and rather unob-
5 Behar DM, Garrigan D, Kaplan ME, Mobasher Z, Rosengarten D, Karafet TM, Quintana-Murci L, Ost-
rer H, Skorecki K, Hammer MF. (2004)."Contrasting patterns of Y chromosome variation in AshkenaziJewish and host non-Jewish European populations". Hum Genet114(4): 354365.doi:10.1007/s00439-003-1073-7. PMID 14740294
6Hammer MF, Behar DM, Karafet TM,et al.(November 2009). "Extended Y
chromosome haplotypes resolve multiple and unique lineages of the Jewishpriesthood".Human Genetics126(5): 707717. doi:10.1007/s00439-009-0727-
5.PMC2771134. PMID19669163.
-
8/11/2019 Phylogenetic Structure of Q-M378 Subclade Based On Full Y-Chromosome Sequencing
2/19
The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG
85
vious correlations between carriers of this Y-chromosome mutation for the last millennium.
The article's aim is to, based on the available
data from open sources and conducted researchdata, specify phylogenetic structure of Q-M378subclade and provide classification of its majorclusters (haplotypes, combined according to thefollowing criteria: pertaining to a sequence of asingle SNP - single nucleotide polymorphisms,phylogenetic similarity, geographical distribu-tion).
Source data and methodology
Data sets for comparison
Data from the Personal Genome Project7
and the 1000 Genomes Project8 were usedwithin the framework of the conducted research.Samples, taken from the specified projects (Ta-ble 1), have PGP and HG prefixes respectively.
7 http://www.personalgenomes.org/ See also: Ball, M.P., et al., A publicresource facilitating clinical use of genomes. Proceedings of the NationalAcademy of Sciences, 2012. 109(30): p. 11920-11927.8http://www.1000genomes.org/ See also: 1000 Genomes Project Consortium.An integrated map of genetic variation from 1,092 human genomes. Nature,2012. 491(7422): p. 56-65.
Table 1. Information based on the data from The Personal Genome Project and 1000 Genomes Project.
Sample code Population Verified origin
HG03914 Bengali (BEB) Bangladesh
HG03652 Punjabi (PJL) Pakistan (Lahore)
HG03864 Telugu (ITU) India
PGP130 N/A Northern Africa (Morocco)
Samples HG03914, HG03652, HG03864 thatdo not belong to Q-M378 subclade were used forcomparison.
Additionally, data from targeted Y-chromosome sequencing of five individuals,tested at Full Genomes Corporation (FGC)9,were analyzed.
9
https://www.fullgenomes.com/
Table 2. Information based on test participants' data at Full Genomes Corporation.
Sample code Population Verified origin
AJ1 Ashkenazi Jews Eastern Europe
AJ2 Ashkenazi Jews Eastern Europe
Ar1 Armenians Eastern Turkey
Ir1 Iranians Iran, Khuzestan province
Kz1 Kazakhs Kazakhstan, kozha lineage
-
8/11/2019 Phylogenetic Structure of Q-M378 Subclade Based On Full Y-Chromosome Sequencing
3/19
The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG
86
Genotyping
Data sets in BAM format (BAM/SAM Specifi-cation10) and, in case of PGP130, TSV11 format
were used for the research.
Next-generation sequencing12, performedby Full Genomes Corporationat Beijing Ge-nomics Instituteusing Illumina HiSeq 2000sequencer, is characterized by the following pa-rameters: 50x coverage at read length of 100base pairs, with paired end reads. Mapped cov-erage at about 23 million base pairs out of ap-proximately 59 million base pairs, present in ahuman Y-chromosome, was obtained.
Data processing and analysis
Clusterization of Q-M378 subclade haplo-types (including haplotypes that belong to Q-L275 upstream level and downstream levels)was carried out based on 222 haplotypesprocessing (67 STR-markers13), obtained frompublic sources14. MURKA software15was used toconstruct the phylogenetic tree.
Processing and analysis of full Y-chromosomesequencing data was made using FGC software,
along with the software developed by YFull re-search group16.
Samples pertaining to Q-L275 subclade andhaving no M378 mutation were used as refer-ence, along with the samples of an upstreamand parallel subclades on a case-by-case basis.Each sample was genotyped for both SNPs dis-covered during the research and SNPs includedin the ISOGG list under Q-L275 subclade and itsdownstream subclades.
Presence of mutation in more than two sam-ples served as the criterion of a new SNP dis-covery, as well as data consistency between thenew SNPs inter seand the previously known in-10An up-to-date specification version can be found at.https://github.com/samtools/hts-specs11 TSV (Tab Separated Values) text format for storing and viewing tabular da-ta.12Behjati & Tarpey, What is next generation sequencing?, Arch Dis Child EducPract Ed 2013;98:236-238 doi:10.1136/archdischild-2013-304340http://ep.bmj.com/content/98/6/236.full13STR-markers (short tandem repeats).14 Public projects data from the Family Tree DNA website:http://www.familytreedna.com/projects.aspx. Hereinafter haplotypes from thespecified source are marked as follows - FTDNA kit and haplotype number.15 MURKA by Valery Zaporozhchenko (Research Center of Medical Genetics of the
Russian Academy of Medical Sciences, Moscow, Russia).http://sourceforge.net/projects/phylomurka/16 http://www.yfull.com/
formation on phylogenetic structure of a respec-tive subclade.
Results
Clusterization of Q-M378 subcladebased on SNP and STR-markers analysis
Given that SNPs characterize distribution ofhaplotypes into clusters in a more specific way,primary clusterization was made taking into ac-count the known data on SNPs, defining sub-levels of Q-M378 subclade.
There are three downstream subclades cur-rently known17 Q-L245, Q-L301, Q-L327. SNPswith an L prefix, defining the above subclades,were identified at the Family Tree DNA lab ledby Dr. Thomas Krahn.
Geography of Q-L245 distribution essentiallyrepeats geography of M378 distribution (exceptfor Central and Southern Asia).
Q-L301 subclade is localized exclusively inIran18. Simultaneous presence of two subcladesQ-L301 and Q-L245 in Iran and Iraq among au-tochthonous population is indicative of the long
duration of residence of M378 mutation carriesamong the people living in this region1920.
L327 is a private SNP, represented by a sin-gle haplotype of a Portuguese from Azores21.
Another private SNP22 is P306, localized inone Indian. That being said, it was not foundamong the tested representatives of Q-M378subclades (including Q-L301)23.
Until recently only two SNPs were acknowl-
edged as downstream of L24524: L272.1, de-tected in Europe (Sicily) and L315 (discovered in
17 Y-DNA Haplogroup Q and its Subclades 2013 -http://www.isogg.org/tree/ISOGG_HapgrpQ.html18FTDNA kit 178026, M7540, M7949.19 Nadia Al-Zahery et al, In search of the genetic footprints of Sumerians: a sur-vey of Y-chromosome and mtDNA variation in the Marsh Arabs of Iraq (2011).http://www.biomedcentral.com/1471-2148/11/288 This work has some data onQ haplotypes present in the Marsh Arabs (n=143) and Iraqis (n=154). Q-M378has a frequency of 2.1% in the first case and 1.9% in the second one.20Grugni et al., Ancient Migratory Events in the Middle East: New Clues from theY-Chromosome Variation of Modern Iranians (2012). DOI:10.1371/journal.pone.0041252. Among those positive to SNP M378 the followingethnic groups come under notice Khorasan Persians - 3 out of 59 (5.1%), Es-fahan Persians - 1 out of 11 (9.1%), Lurs - 2 out of 50 (3.9%), Assyrians - 1 outof 39 (2.6%), Azerbaijani - 1 out of 63 (1.6%).21FTDNA kit 13254.22 FTDNA kit N78873.23 FTDNA kit 178026, M7540, 193005, 95307 respectively.24 Both are private SNPs, i.e. found so far in a single carrier of such mutation.L315 FTDNA kit 51 and L272.1 (FTDNA kit 95307). L315 may not be stable asit was positive in HG02291 sample.
-
8/11/2019 Phylogenetic Structure of Q-M378 Subclade Based On Full Y-Chromosome Sequencing
4/19
The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG
87
East European Ashkenazi). Below L245 SNPL619.2 is located as well, discovered in two rep-resentatives of Armenian Diaspora25. Further-more, the fact that this SNP emerged relatively
recently is confirmed by existence of ArmenianDiaspora representatives, who showed no signof this polymorphism26.
Consequently, until very recently Q-L245subclade could not be clusterized using SNPs.Thereby phylogenetic definitions and analysis ofSTR-markers were used for clusterization. Asegment of DYF395S1 chromosome of low va-riability27 was used for clusterization (the ap-proach was initially proposed by Q yDNAProject28 administrator Rebekah A. Canada),which allowed formation of stable clusters withrespective geographical and ethnic reference.
For example, the following clusters were hig-hlighted using this approach.
DYF395S1=14-17
It includes four haplotypes: two Dagestanis(identifiers according to the cited publication29-Avar Dag 511 and Kaitag Dag06 894), a Turk30and an Arab of Iraq31. The latter belongs to thelegendary tribe of Quraysh (Adnan-Modar tribal
self-definition).
This cluster is located closer to the tree rootL245 than any other one and, apparently, is thenearest to the ancestral haplotype.
DYF395S1=15-17
It includes a whole group of haplotypes ofpeople of various origin. One can pinpoint thefollowing subclusters in the cluster:
-Central European (localization of most
ancestral lineages Switzerland32, part of themis linked to a Mennonite community);
25FTDNA kit E5340, 191379.26 FTDNA kit 173902, 178717.27Vladislav Ryzhkov, Calculating time to the most recent common ancestor byseparate panels of Y-STR markers, sorted by increasing mutation rate constants,The Russian Journal of Genetic Genealogy (Russian version): Vol. 3, No. 2, 2011,ISSN: 1920-2997 http://ru.rjgg.org28Q yDNA Project http://www.familytreedna.com/public/yDNA_Q/29Balanovsky et al, Parallel Evolution of Genes and Languages in the CaucasusRegion. Molecular Biology and Evolution, 13 May 2011.30FTDNA kit 303617.31FTDNA kit 197506.32 The SCHACKE surname appeared in Germany at least as early as the 1600sand perhaps earlier. The JAGGI surname in Switzerland goes back much further.With this DNA Project we hope to learn more about our early ancestors and
where our ancestors originated. Johann Christoffel SCHACKE, the paternalancestor of most who carry the SHOCKEY surname, was born inKirchheimbolanden, Pfalz, Germany in 1720 to Swiss parents. He arrived inPhiladelphia PA in 1737. The Anglicized version of his name became JohnChristopher Shockey. He and his wife Barbara had nine children between 1739
- North-European (localization of mostancestral lineages Netherlands33);
- Italian (including haplotypes with partial
SNP L272.1);
- Armenian;
- Southwest Asian.
It should be noted that according toDYF395S1=15-17 attribute, a number of haplo-types with no L245 mutation, are part of thecluster, in particular haplotypes of a level, whichwill be further described as Q-Y2250, as well ashaplotypes of level Q-L327, and Q-P306. How-ever, in view of a thesis adopted by us on priori-ty of SNP application during clusterization, wewill not do that. This also implies a conclusionthat clusters DYF395S1=14-17 and/or 15-17were formed already as a part of Q-M378 level.This hypothesis however can be made morespecific only with the growth of a number oftested representatives of the cluster.
DYF395S1=15-18
DYF395S1=15-19
These two clusters are represented exclu-sively by people of Jewish origin.
Individual haplotypes, having RecLOH (theso-called Recombinational Loss of Heterozygosi-ty) in this part of Y-chromosome, were not con-sidered under this clusterization.
It is expected to identify SNPs, correspondingto each of the above-mentioned STR-based clus-ters, as part of further research.
and 1756, six sons and three daughters. After Barbara died John Christophermarried Anna Marie COMPTON. John Christopher and Anna Marie had one sonborn in 1774 or 1775. This project hopes to help identify the descendants of theseven sons of John Christopher SHOCKEY as well as learn more about his Swissancestors and their related families from Germany and/or Switzerland.http://www.familytreedna.com/public/shockey-schacke/default.aspx33 Huff/Hough Surname Project -http://www.familytreedna.com/public/HOUGH/default.aspx A Dutch namedDerrick Pauluszen Hoff (1649-1730), who arrived in New Amsterdam (New York)no later than 1660, is considered to be the common ancestor of the family.
-
8/11/2019 Phylogenetic Structure of Q-M378 Subclade Based On Full Y-Chromosome Sequencing
5/19
The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG
88
New phylogenetic structureof Q-M378 subclade,upstream and parallel subclades
As a result of processing and analysis of fullY-chromosome sequencing data some new sin-gle nucleotide polymorphisms were discovered,their placements defined on Y-chromosome (ac-cording to the reference sequence of human ge-
nome hg1934), as well as phylogenetic place-ments on the SNP tree.
The data on the new SNPs was summarized
in Tables 3-5 along with Diagram 1, specifyingSNP tagging according to Y notation35 and FullGenomes Corporation notation36.
34hg19 reference sequence or GRCh37. See also: Human Genome Overview.http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/human/35 Y SNP prefix according to YFull.36 FGC SNP prefix according to Full Genomes Corporation.
-
8/11/2019 Phylogenetic Structure of Q-M378 Subclade Based On Full Y-Chromosome Sequencing
6/19
-
8/11/2019 Phylogenetic Structure of Q-M378 Subclade Based On Full Y-Chromosome Sequencing
7/19
The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG
90
As can be seen from the above, below L275SNP level the following levels, not described tothis day, were discovered:
1)
Q-Y1150 level, which is downstream ofQ-L275 and parallel to Q-M378. SNPs of thislevel were discovered in only three natives ofHindustan (HG03914, HG03652, HG03864)37.
2) Q-Y2250 level, downstream of Q-M378and parallel to Q-L245. SNPs of this level (Table3) were found in Ir1 and Kz1 samples. Seeingthat Ir1 sample has a positive SNP L301 value,and Kz1 is negative to this SNP, it is evidentthat Q-L301 level is downstream of Q-Y2250.Private SNPs of Kz1 sample are listed in Appen-dix 3. Private SNPs of Ir1 sample are listed inAppendix 7.
3) Q-Y2220 level, downstream of Q-L245.This level combines haplotypes of Jewish andArmenian clusters Q-L245. All tested samples ofthis cluster representatives (AJ1, AJ2, Ar1) hadpositive SNPs of this level (see Table 4),excluding PGP130 sample (Moroccan origin).
37G.R. Magoon, R.H. Banks, C. Rottensteiner, B.E. Schrack, V.O. Tilroe, T. Robb,A.J. Grierson, Generation of high-resolution a priori Y-chromosome phylogeniesusing next-generation sequencing data, 2013, doi:10.1101/00802 (in prepara-tion, preprint on bioRxiv.org).
4) There is also Q-Y2220 level parallel to Q-Y2200 (xQ-Y2200) that contains SNPs, definingArmenian segment of DYF395S1=15-17 cluster.Due to the fact that these SNPs were found in
only one sample (Ar1) they have a status of pri-vate ones. Although one can assume the follow-ing with high probability:
- that part of these SNPs will be characte-rized by a rather wide range of haplotypes ofDYF395S1=15-17 cluster;
- Q-L619.2 level will be downstream of Q-Y2220 (xQ-Y2200), since only a part of Arme-nians, who are positive to SNP L245, belong toit. Ar1 sample, tested by us, showed no sign ofL619.2 mutation.
5) Q-Y2200 level, downstream of Q-Y2220.SNPs of this level define Jewish cluster Q-L245(see Table 5). Private SNPs of samples AJ1 andAJ2 are listed in Appendices 5, 6. In addition,both tested samples had no L315 mutation.
-
8/11/2019 Phylogenetic Structure of Q-M378 Subclade Based On Full Y-Chromosome Sequencing
8/19
The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG
91
Table 3. Q-Y2250 level. New SNPs, downstream of positive SNP M378.
Position(hg19)
Ancestralvalue
Derivedvalue
SNP name (Y)SNP name (FGC)
or synonym
7115834 C T Y2244 FGC4626
6894323 C T Y2245 PR683
3544336 C G Y2246 FGC4613
2765038 T G Y2247 FGC4607
4070598 G A Y2248 FGC4618
4242831 A G Y2249 FGC4619
4852955 G A Y2250 FGC4620
6537988 A G Y2251 FGC4624
6724553 C T Y2252
8671530 A G Y2255 FGC4631
10077457 T C Y2256 FGC4635
15766997 A C Y2263 FGC4646
18169503 A C Y2264 FGC4656
18803364 C T Y2265 FGC4657
18990293 A G Y2266 FGC4659
22525954 AT A Y2268
23956540 A T Y2269 FGC4675
24452225 G C Y2270 FGC4676
15684681 A T CTS4507
13643442 T C FGC4638
___________________________
Note:Y2268 deletion.
Table 4.Q-Y2220 level. New SNPs, downstream of positive SNP L245.
Position(hg19)
Ancestralvalue
Derivedvalue
SNP name (Y) SNP name (FGC)
9408770 G T Y2220 FGC1904
18051798 A C Y2209 FGC1917
22017904 G T Y2202 FGC1925
4914530 A G Y2229
-
8/11/2019 Phylogenetic Structure of Q-M378 Subclade Based On Full Y-Chromosome Sequencing
9/19
The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG
92
Table 5. Q-Y2200 level. New SNPs, downstream of positive SNP L245.
Position(hg19)
Ancestralvalue
Value positiveto SNP
SNP name (Y) SNP name (FGC)
23646920 C T Y2196 FGC1934
22953894 A G Y2197 FGC1933
22825080 A G Y2198 FGC1932
22588598 C T Y2200 FGC1929
22471554 A T Y2201 FGC1928
21277083 G A Y2203 FGC1923
19425984 G A Y2206
19053060 C T Y2207 FGC1919
18207170 A G Y2208 FGC1918
18046486 T C Y2210 FGC1916
18043999 G A Y2211 FGC1915
16994660 T A Y2212 FGC1914
15834557 G A Y2213 FGC1912
14385853 T G Y2215 FGC1911
14353022 A C Y2216 FGC1910
14184253 C A Y2218 FGC1909
9892635 C T Y2219 FGC1906
9401947 C A Y2221 FGC1903
8662585 C A Y2224 FGC1899
6949449 C T Y2225 FGC1897
4606181 C T Y2231 FGC1890
3995524 G A Y2232 FGC1888
3148720 A G Y2233 FGC1886
-
8/11/2019 Phylogenetic Structure of Q-M378 Subclade Based On Full Y-Chromosome Sequencing
10/19
The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG
93
Placement of SNPs, listed by ISOGG as SNPsunder Investigation, was specified within thescope of this work: F108, F803, F815, F1082,F1126, F1169, F1213, F1337, F1349, F1528,
F1537, F1594, F1734, F1780, F1836, F1839,F1858, F1875, F1974, F2023, F2145, F2230,F2313, F2343, F2440, F2628, F2657, F2777,F2851, F2877, F2894, F2934, F3084, F3121,F3193, F3207, F3389, F3621, F3680. On May 8,2013 all of the above SNPs were classified byISOGG as pertaining to level L245 or below. Theanalysis showed necessity to modify the pro-posed scheme. All SNPs, apart from F1213,F1349, F1594, F1734, F1780, F1836, F1839,F2230, F2877, pertain to level Q-L275, as theyare positive for samples HG03914, HG03652,HG03864, AJ1, AJ2, Ar1, Ir1. The remainingSNPs, in their turn, are positive to all samples inthe research that are positive to M378 and L245.Consequently, the said SNPs are at the samelevel with Q-L275 and Q-M378 respectively38.
Besides, a considerable amount of new SNPswas discovered at the same level with L275,M378 and L245.
For example, the following SNPs pertain tolevel Q-L275 - Y1014-Y1022, Y1024-Y1057,Y1059-Y1069, Y1071-Y1137, Y1139, Y1142,
Y1153, Y1160, Y1164, Y1166, Y1167, Y1169,Y1195, Y1220, Y1240, Y1978-Y1983, Y1985-Y1989, Y1991-Y1993, Y1995, Y1996-Y1997,Y2003, Y2005-Y2007, Y2009, Y2239, Y2243;
to level Q-M378 - Y2012, Y2013, Y2016-Y2082, Y2084-Y2095, Y2097, Y2098, Y2113-Y2115, Y2226, Y2361 (Appendix 1, Table 6);
to level Q-L245 - Y2116-2149, Y2195,Y2199, Y2204, Y2217, Y2222, Y2223, Y2235,Y2237 (Appendix 2, Table 7).
The said SNPs do not at the moment haveany phylogenetic meaning, but it can be as-38 It should be noted that FTDNA research team led by Dr. Thomas Krahn, withthe participation of Q yDNA Project administrator Rebekah A. Canada, came to asimilar conclusion earlier. Respective data can be found on the SNP tree draftversion page of the Family Tree DNA website:http://ytree.ftdna.com/index.php?name=Draft&parent=31182976 There was nopublished justification of such conclusions, but, presumably, samples, tested un-der National Geographic Geno 2.0 project, were used for the analysis.
signed to them later after a full sequencing ofsamples, pertaining to these levels and withoutSNP mutation, defining downstream levels.
Summary
The research proved high efficiency of full Y-chromosome sequencing to define phylogeneticstructure, allowed for forming a consistent phy-logenetic structure of Q-M378 subclade, con-firmed by analysis of SNP and STR-markers.
As part of the research, new phylogenetic le-vels of Q-Y2250 (downstream of Q-M378 and in-cluding Q-L301), Q-Y2220 (downstream of Q-L245), Q-Y2200 (downstream of Q-Y2220) weredefined. SNPs, which, in the future, may possi-bly mark certain European and Asian subclustersQ-Y2220 (including the Armenian subcluster), aswell as separate branches of the Jewish clusterQ-Y2200, were also defined.
The research confirmed connection of Q-M378 subclade distribution with migration of In-do-European language carriers from Central Asiavia Afghanistan and Iran to the West. That beingsaid, the amount of materials at the researchers'disposal at the moment is not enough to form
an entire picture of the mentioned migrationprocesses. The specified task can be resolved inthe near future, while statistically significant da-ta is being accumulated.
Acknowledgements
The authors of the article wish to thank thefollowing people, who rendered their assistancein its preparation and conducting the research:
Mikhail Edelstein (Russia)Askar Abdullin (Kazakhstan)Igor Bukharov (Russia)Nazaret Chitilian (Lebanon)Justin Allen Loe (United States)Gregory Magoon (United States)
-
8/11/2019 Phylogenetic Structure of Q-M378 Subclade Based On Full Y-Chromosome Sequencing
11/19
The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG
94
Appendix 1.Table 6. SNPs at the same level with M378.
Position(hg19)
Ancestralvalue
Derived valueSNP name (Y)or synonym
SNP name (FGC)
2806676 A G Y2012 FGC1770
3111159 G C Y2013 FGC1758
3815203 G C Y2016 FGC1774
3929337 C A Y2017 FGC1988
4234101 A G Y2018 FGC1775
4332151 G A Y2019 FGC1776
4634427 C A Y2020 FGC1777
4775787 T C Y2021 FGC1779
4778576 A G Y2022 FGC1780
4783438 T C Y20234961249 C A Y2024 FGC1781
5011266 A G Y2025
5266522 A G Y2026 FGC1782
5496739 A C Y2027 FGC1783
5687522 T A Y2028 FGC1784
5751055 T G Y2029 FGC1785
5872168 C T Y2226
5963558 G A Y2030
6085717 C A Y2031 FGC1788
6430659 T G Y2032 FGC1789
6617825 T C Y2033 FGC1790
6618215 T C Y2034 FGC1791
6746675 T C Y2035 FGC1792
6774328 T C Y2036 FGC1793
6986250 T C Y2037 FGC1794
7045044 C T Y2038 FGC1795
7071796 C G Y2039 FGC1796
7094691 A G Y2040 FGC1797
7159039 C G Y2041 FGC1798
7160439 G A Y2042 FGC1799
7339849 G T Y2043 FGC1801
7431253 C T Y2044 FGC1803
7437821 C G Y2045 FGC1804
7550568 G C Y2046 FGC1805
7652630 G A Y2047
7778164 G A Y2048 FGC1807
7856334 A G Y2049 FGC1808
7952263 C T Y2050 FGC1809
8067818 C G Y2051 FGC1810
8681004 T C Y2052 FGC1812
-
8/11/2019 Phylogenetic Structure of Q-M378 Subclade Based On Full Y-Chromosome Sequencing
12/19
The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG
95
8682184 C T Y2053 FGC1813
8821295 A G Y2054 FGC1814
9074666 C T Y2055 FGC1815
9170505 G T Y2056 FGC181713127815 A G Y2057 FGC1818
13928638 G C Y2058 FGC1820
14017272 A G Y2059 FGC1825
14193680 G A Y2060 FGC1827
14293849 T A Y2061 FGC1830
14435779 A G Y2062 FGC1833
14540558 C T Y2063 FGC1834
14674385 C T Y2064 FGC1835
14733633 C A Y2065 FGC1836
15498011 C A Y2066
15521110 T C Y2067 FGC1838
15699493 C T Y2068 FGC1841
16217389 A AT Y2069
16654310 C G Y2070 FGC1842
16678163 C T Y2071 FGC1843
17230548 G A Y2072 FGC1844
17447489 C T Y2073 FGC1845
17959860 A G Y2074 FGC1850
18243302 C T Y2075 FGC1852
18714407 C A Y2076 FGC1854
18768735 G T Y2077
18768736 C A Y2078
18769454 A G Y2079 FGC1767
18803642 T G Y2080 FGC1855
18856911 G C Y2081 FGC1856
19373808 A T Y2082 FGC1858
21365952 G A Y2084 FGC1861
21479863 G A Y2085 FGC1862
21647670 G C Y2086 FGC186321832029 C A Y2087 FGC1864
22022365 A G Y2088 FGC1865
22101157 C T Y2089 FGC1866
22440644 G A Y2361
22624047 G A Y2090 FGC1768
22931328 T A Y2091 FGC1869
23053626 A G Y2092 FGC1872
23078557 G T Y2093 FGC1873
23166596 T C Y2094 FGC1874
23279919 G T Y2095 FGC1875
23566714 C T Y2097 FGC1877
-
8/11/2019 Phylogenetic Structure of Q-M378 Subclade Based On Full Y-Chromosome Sequencing
13/19
The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG
96
23615574 AT A Y2098
28516009 A T Y2113
28593688 T C Y2114
28687807 A G Y2115
________________________
*Note:Y2098 deletion, Y2069 insertion.
Appendix 2.Table 7. SNPs at the same level with L245.
Position(hg19)
Ancestralvalue
Derived value SNP name (Y) SNP name (FGC)
2794289 C G Y2116 FGC1987
3127708 T C Y2117 FGC1771
3709585 A C Y2118 FGC1773
4502969 T C Y2119 FGC1759
4671322 C A Y2120 FGC1778
7219594 T C Y2121 FGC1800
7408851 C A Y2122 FGC1802
7590793 C T Y2123 FGC1806
8614513 C G Y2124 FGC1811
9144039 A T Y2223 FGC1901
9382621 G T Y2222 FGC1902
9798919 G A Y2125 FGC181613956388 G A Y2126 FGC1821
13982835 C T Y2127 FGC1823
14012662 G A Y2128 FGC1824
14045736 T C Y2129 FGC1826
14202870 A G Y2130 FGC1828
14285880 C G Y2131 FGC1829
14296099 C A Y2217 FGC1831
14402304 G A Y2132 FGC1832
15569048 C T Y2133 FGC1839
15614105 C G Y2134 FGC1840
16519324 A G Y2135
16757414 G GA Y2237
17686482 T C Y2136 FGC1846
17686883 A G Y2137 FGC1847
17763793 T A Y2138 FGC1848
17860015 G T Y2139 FGC1849
18134822 T C Y2140 FGC1851
18575106 G A Y2141 FGC1853
19300050 C T Y2142 FGC1857
21118566 T C Y2143 FGC1859
22015887 C A Y2144 FGC1989
-
8/11/2019 Phylogenetic Structure of Q-M378 Subclade Based On Full Y-Chromosome Sequencing
14/19
The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG
97
22934317 ATC A Y2235
23010582 C T Y2145 FGC1870
23042385 C A Y2146 FGC1871
23648959 T G Y2147 FGC187823733052 A G Y2148 FGC1879
28520821 A G Y2149
28646637 C G Y2195 FGC1883
22767464 G A Y2199 FGC1868
21235857 A G Y2204 FGC1860
________________________
*Note:Y2235 deletion, Y2237 insertion.
Appendix 3. Table 8. Private SNPs for Kz1 sample.
Position(hg19)
Ancestralvalue
Derived value SNP name (Y) SNP name (FGC)
2980949 T C YFS026208
3027441 C A YFS026210 FGC4858
3751684 G A YFS026242 FGC4859
4164029 A G YFS026250 FGC4860
4515848 G A YFS026257 FGC4862
4714529 G T YFS026264 FGC4864
5394870 T C YFS026279 FGC4865
5398133 A T YFS026280 FGC4866
6088200 T C YFS026301 FGC4867
6675390 A G YFS026321 FGC4868
7058898 G A YFS026329 FGC4869
7208802 C T YFS026339 FGC4870
7278041 G A YFS026340 FGC4871
7704050 C T YFS026351 FGC4856
7929100 A C YFS026356 FGC4872
8268654 G A YFS026361 FGC4873
8684090 G A YFS026366 FGC4874
8714870 C T YFS026367 FGC4875
9154952 G A YFS026372 FGC4876
9990725 C G FGC4878
13230336 G A FGC4879
13313894 G C FGC4880
13637299 G A FGC4881
14599760 G A YFS026426 FGC4882
15353330 C T YFS026439 FGC4883
15540398 G A YFS026445 FGC488415617600 G A YFS026447 FGC4885
15656595 A C YFS026448
-
8/11/2019 Phylogenetic Structure of Q-M378 Subclade Based On Full Y-Chromosome Sequencing
15/19
The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG
98
15881099 G A YFS026457 FGC4886
17344441 A G YFS026496 FGC4887
17455705 C G YFS026499 FGC4888
17619239 A C YFS026502 FGC488918132430 T A YFS026506 FGC4890
18205189 C A YFS026508 FGC4891
18235952 C A YFS026509 FGC4892
18427622 C T YFS026514 FGC4893
18699065 G A YFS026522 FGC4894
19119009 G A YFS026534 FGC4895
21794826 T C YFS026585 FGC4896
21824228 C T YFS026586 FGC4897
22216997 C A YFS026594 FGC4898
22263424 G T FGC4899
22464918 G A YFS029304
22470401 G T YFS029305 FGC4901
22476862 T A FGC4902
22779292 G A YFS026598 FGC4904
22845858 T A YFS026600 FGC4905
22980932 G A YFS026603 FGC4906
23097922 G T YFS026606 FGC4907
23188736 C T YFS026608 FGC4908
23574588 G T YFS026618 FGC4909
28577678 T G FGC4857
28556325 T G YFS026709
Appendix 4.Table 9. Private SNPs for Ar1 sample.
Position(hg19)
Ancestralvalue
Derived value SNP name (Y) SNP name (FGC)
2837084 G A YFS030295
4687602 C T YFS030307
3264534 G T YFS030298
3692600 G A YFS030300
6849037 A G YFS030309
7389018 T C YFS030314
7809088 C T YFS030318 FGC2000
8227956 C T YFS030321 FGC2001
8310172 G A YFS030322 FGC2002
8891034 A G YFS030324 FGC2003
9455617 G C YFS030326 FGC2004
9507128 G A YFS030327 FGC2005
13207417 C T FGC2006
13862984 G A YFS030335 FGC2007
-
8/11/2019 Phylogenetic Structure of Q-M378 Subclade Based On Full Y-Chromosome Sequencing
16/19
The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG
99
14037704 A G YFS030339 FGC2008
14266100 G A YFS030343
14271743 G T YFS030344 FGC2009
14645998 A T YFS03035015487465 T C YFS030354 FGC2010
15532493 G C YFS030355 FGC2011
15562737 G A YFS030356 FGC2012
15649426 C G YFS030357
15949197 C T YFS030358 FGC2013
16033272 G A YFS030359 FGC2014
16914913 A T YFS030368
17143642 G A YFS030370 FGC2015
17264341 C T YFS030371 FGC2016
17350212 G T YFS030372 FGC2017
17468836 G A YFS030374 FGC2018
17522056 C A YFS030375 FGC2019
17547056 C T YFS030376 FGC1986
17969724 T C YFS030377 FGC2020
18005360 G A YFS030378 FGC2021
18082500 T C YFS030379 FGC2022
18143358 C T YFS030380
18269281 T C YFS030381 FGC2023
19295864 G A YFS030386 FGC2024
19305808 C G YFS030387 FGC2025
21920836 G T YFS030396 FGC2026
22195671 T G YFS030398 FGC2027
22546195 T C YFS030431 FGC2029
23036871 A C YFS030432 FGC2030
23193319 C G YFS030433 FGC2031
23633830 T C YFS030434 FGC2032
23749442 C G YFS030435 FGC2033
23952561 G A YFS030438 FGC2034
28546577 A G YFS030460 FGC203528697215 C T YFS030463 FGC2036
28728861 A G YFS030465 FGC2037
28773229 G A YFS030466 FGC2038
-
8/11/2019 Phylogenetic Structure of Q-M378 Subclade Based On Full Y-Chromosome Sequencing
17/19
The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG
100
Appendix 5.Table 10. Private SNPs for AJ1 sample.
Position(hg19)
Ancestralvalue
Derived value SNP name (Y) SNP name (FGC)
3014878 G C YFS028077
3279492 T C YFS028084
4705139 G A YFS028121
4734829 G T YFS028122
5007712 T C YFS028135
6028097 T C YFS028158 FGC4835
6671453 T A YFS028174
6985833 G C YFS028180 FGC4836
7116693 C G YFS028187 FGC4837
13225084 C A FGC483913227006 C T FGC4840
14174284 C T YFS028277 FGC4841
14683323 G A YFS028303
15749472 C G YFS028328 FGC4842
15911171 T A YFS028333 FGC4843
17216758 C G YFS028365 FGC4844
17842405 G A YFS028379 FGC4845
18697269 A G YFS028399 FGC4846
22541678 G A YFS028484
22545510 G T YFS028485 FGC4850
22809218 A T YFS028490 FGC4851
22816094 C T YFS028491 FGC4852
22989959 T C YFS028498 FGC4853
23338485 T C YFS028509 FGC4854
Appendix 6.Table 11. Private SNPs for AJ2 sample.
Position
(hg19)
Ancestral
value
Derived value SNP name (Y) SNP name (FGC)
3085515 C A YFS030088 FGC1885
4157714 C T YFS030093 FGC1889
7357489 C T YFS030117 FGC1898
8757232 C A YFS030130 FGC1900
9761433 C T YFS030140 FGC1924
16933881 C T YFS030164 FGC1913
19228285 T C YFS030189 FGC1920
21322098 A G YFS030210 FGC1924
22128896 C T YFS030218 FGC1926
22612418 A T YFS030247 FGC1930
22720359 C T YFS030248 FGC1931
-
8/11/2019 Phylogenetic Structure of Q-M378 Subclade Based On Full Y-Chromosome Sequencing
18/19
The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG
101
Appendix 7.Table 12. Private SNPs for Ir1 sample.
Position (hg19) Ancestral value Derived value SNP name (Y)
2808294 G A YFS0304862848925 C T YFS030487
3241019 G A YFS030493
3331565 C T YFS030495
3617298 G A YFS030498
3905106 T C YFS030501
3983695 G A YFS030503
4048861 C G YFS030505
4976524 T C YFS030521
4976526 T C YFS030522
5021496 G C YFS030523
5219277 T A YFS030526
5844571 C T YFS030529
6531744 G A YFS030531
7398730 T C YFS030543
7685828 G T YFS030547
7997281 G C YFS030548
8350958 G A YFS030550
8482074 C G YFS030551
8874735 C A YFS0305539459692 A G YFS030555
9832592 A G YFS030556
14022660 C A YFS030564
14273656 A G YFS030573
14401614 C T YFS030575
14532575 G T YFS030582
14916116 G A YFS030585
14996654 G A YFS030588
15012864 C A YFS030589
15240341 G C YFS030591
15799031 G C YFS030596
15933501 T A YFS030599
16253494 C T YFS030602
16280147 C T YFS030603
16304710 T C YFS030604
16875622 C T YFS030608
17529042 G A YFS030616
18106050 C T YFS030618
18903761 A C YFS030626
19157289 G A YFS030633
19198307 A T YFS030634
-
8/11/2019 Phylogenetic Structure of Q-M378 Subclade Based On Full Y-Chromosome Sequencing
19/19
The Russian Journal of Genetic Genealogy ( ): 5, 1, 2013 ISSN: 1920-2997 http://ru.rjgg.org RJGG
102
19526472 A C YFS030637
21359025 C G YFS030656
21567329 G A YFS030657
22564450 C T YFS03068422621906 G T YFS030685
22687343 A T YFS030686
22910874 G A YFS030688
23018638 T C YFS030689
23054174 T G YFS030690
23198785 A T YFS030691
23435852 A C YFS030694
24484883 T C YFS030706
28759876 C T YFS030732
17188634 T C YFS030609
19001468 C T YFS030630
20534862 T C YFS030645
21599239 A G YFS030658
21836635 A T YFS030661