the origin and expansion of pama–nyungan languages across ...10.1038... · supplementary figure...

52
ARTICLES https://doi.org/10.1038/s41559-018-0489-3 © 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. The origin and expansion of Pama–Nyungan languages across Australia Remco R. Bouckaert  1,2 , Claire Bowern  3 and Quentin D. Atkinson  2,4 * 1 Center of Computational Evolution, University of Auckland, Auckland, New Zealand. 2 Max Planck Institute for the Science of Human History, Jena, Germany. 3 Department of Linguistics, Yale University, New Haven, CT, USA. 4 School of Psychology, University of Auckland, Auckland, New Zealand. *e-mail: [email protected] SUPPLEMENTARY INFORMATION In the format provided by the authors and unedited. NATURE ECOLOGY & EVOLUTION | www.nature.com/natecolevol

Upload: others

Post on 06-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

Articleshttps://doi.org/10.1038/s41559-018-0489-3

© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.

The origin and expansion of Pama–Nyungan languages across AustraliaRemco R. Bouckaert   1,2, Claire Bowern   3 and Quentin D. Atkinson   2,4*

1Center of Computational Evolution, University of Auckland, Auckland, New Zealand. 2Max Planck Institute for the Science of Human History, Jena, Germany. 3Department of Linguistics, Yale University, New Haven, CT, USA. 4School of Psychology, University of Auckland, Auckland, New Zealand. *e-mail: [email protected]

SUPPLEMENTARY INFORMATION

In the format provided by the authors and unedited.

NATuRe eCology & evoluTioN | www.nature.com/natecolevol

Page 2: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

Supplementary Figure 1: Map of Australia showing the overlaying graph usedfor the landscape aware geographical model. The 1446 nodes in the graph are drawnwith their neighbourhood structure showing up to eight neighbours per node, indicatingnon-zero transition rates. Grey edges are used to represent interior distances/rates, whileblue edges represent rates near water, which includes coastal areas as well as areas adjacentto the Murray-Darling river system. Background map credits: Esri, Garmin International,Inc. (formerly DeLorme Publishing Company, Inc.), U.S. Central Intelligence Agency(The World Factbook).

2
Page 3: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

0 10000 20000 30000 40000 50000 600000

Den

sity

Root age

a

b

c

Supplementary Figure 2: Inferred origin of the Pama-Nyungan languagefamily tree under the standard Brownian di↵usion model22,23. a) Map showingthe prior distribution on the root location under the standard Brownian di↵usion model- darker areas correspond to increased probability mass. Coloured polygons indicateorigins implied under the rapid replacement (red), early Holocene intensification (yellow),post-ACR (green), and initial colonisation (blue) hypotheses. b) As for a, showing theposterior distribution on the root location under the standard Brownian di↵usion model.c) Histogram showing the prior (light grey) and posterior (dark grey) distribution for theage of the family. Coloured bars indicate hypothesised ages as for panel a. Backgroundmap credits: Esri, Garmin International, Inc. (formerly DeLorme Publishing Company,Inc.), U.S. Central Intelligence Agency (The World Factbook).

3
Page 4: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

MalganaNhanta

BadimayaWajarri

NyungarWatjuk

WardandiNgadjumaya

MirninyJiwarli

WarriyanggaTharrgariPayunguThalanyjiKariyarraNgarlumaKurrama

YindjibarndiPanyjimaNgarlaNyamal

MudburraMudburraMcC

BilinarraNgarinyman

GurindjiMalngin

WarlmanpaWarlpiri

WarumunguJaru

JaruMcCNgardilyJiwarliny

WalmajarriNWWalmajarriHR

SouthernWalmajarriWalmajarriBilliluna

KarajarriKarajarriNW

NorthernNyangumartaNyangumartaMangalaMcK

NorthernMangarlaMangalaNWWarnmanKartujarra

MartuWangkaYulparija

WangkatjaPitjantjatjara

NgaanyatjarraPintupiLuritja

KukatjaBularnu

WarluwarraWakaya

YindjilandjiYanyuwaDhangu

RirratjinguDhayyi

DjambarrpuynguDhuwalDjapu

DhuwalaGupapuyngu

GumatjRitharrnguYannhangu

DjinangWesternArrarnta

AlyawarrAntekerrepenhe

NarrunggaKaurnaGuyaniNgadjuri

AdnyamathanhaParnkalaWiranguPittaPitta

WangkayutyuruArabana

WangkangurruYandruwandha

NhirrpiYawarrawarrka

KaruwaliMithakaDiyari

MountFreelingDiyariNgamini

YarluyandiKungkariPirriyaBadjiri

WangkumaraWangkumaraMcDWur

KungadutyiGarlali

MalyangapaKurnu

PaakantyiMbakwithiLinngithigh

AghuTharrnggalaKurtjar

FlindersIslandKuguNganhcara

PakanhWikMungkan

KaanjuKuukuYau

UmpilaDjabugay

YidinyGuuguYimidhirr

KukuYalanjiKukuYalanjiCurr

MabuiagKukatjGuwa

YirandaliAminungo

BiriGanguluGunggari

WadjabangayiYiningayGunya

MarganyBidyaraGungabula

GairiWargamay

DyirbalGudjal

WarunguMbabaramWulguruBindal

DharumbalKalkatunguYalarnnga

MayiThakurtiMayiKulanNgawunMayiYapi

MayiKutunaKatthangAwabakal

KarreeDarkinyung

BirrpayiIyora

DharukDharawalDhurgaNgarigu

GundungurraNgunawalNgiyambaaWiradjuri

GamilaraayKamilaroi

YuwaalaraayMuruwari

BandjalangMinjungbalNggoiMwoiYugambeh

YagaraDurubul

MaryRiverandBunyaBunyaCountryBatyala

GoorengGoorengDawsonRiverDuungidjawuGumbaynggirNgarrindjeri

KeraminNgaiawangBunganditj

WathawurrungWoiwurrung

ColacGunditjmaraMathiMathiWathiWathi

PiangilDhudhuroa

PallanganmiddangYabulaYabula

YortaYorta

BularnuWarluwarraWakayaYindjilandjiYanyuwaMalganaNhantaChampionBayBadimayaWajarriMuliarraTribeKalaamayaIrwinMurchisonNyungarWatjukNewNorciaandLeschenaultBayWardandiBibbulmanPinjarraKaniyangNgadjumayaMirninyEuclaJiwarliWarriyanggaTharrgariPayunguThalanyjiPurdunaYingkartaKariyarraNgarlumaKurramaYindjibarndiPanyjimaYinhawangkaNgarlaNyamalMartuthuniraMudburraMudburraMcCBilinarraGurindjiNgarinymanMalnginJaruJaruMcCNgardilyWarlmanpaWarlpiriWarumunguJiwarlinyWalmajarriNWWalmajarriHRSouthernWalmajarriWalmajarriBillilunaKarajarriKarajarriNWNorthernNyangumartaNyangumartaMangalaMcKNorthernMangarlaMangalaNWManjiljarraWarnmanKartujarraMartuWangkaYulparijaWangkajungaWangkatjaPitjantjatjaraNgaanyatjarraPintupiLuritjaKukatjaKaytetyeCentralAnmatyerrWesternArrarntaAlyawarrAntekerrepenheNukunuNarrunggaKaurnaNgadjuriGuyaniAdnyamathanhaParnkalaWiranguKingsCreekandtheGeorginaRiverRoxburghDowns−LowerGeorginaPittaPittaWangkayutyuruArabanaWangkangurruCoopersCreekYandruwandhaNhirrpiYawarrawarrkaDiyariMountFreelingDiyariNgaminiYarluyandiKaruwaliMithakaYandaGuwaBadjiriPunthamaraWangkumaraWangkumaraMcDWurKungadutyiGarlaliKungkariPirriyaWadikaliYardliyawarraMalyangapaKurnuPaakantyiDhanguRirratjinguDhayyiDjambarrpuynguDhuwalDjapuDhuwalaGupapuynguGumatjRitharrnguZorcGolpaYannhanguDjinangGanggalidaKayardildLardilYangkaalNguburindiYangarellaMinkinGudangInjinooYadhaykenuUradhiMpalityanMbiywomYinwumNggothNtrangithThaynakwithMbakwithiAlngithLinngithighOlkolaUwOykangandUmpithamuGuguminiKunjenTagalagIkarranggalWalangamaAghuTharrnggalaBarrowPointKukuWuraFlindersIslandKuukThaayorreYirYorontWikNgatharrAyapathuWikMuminhKuguNganhcaraPakanhWikMungkanKaanjuKuukuYauUmpilaKurtjarKokNarKokoBeraDjabugayYidinyGraniteRangeGuuguYimidhirrKukuYalanjiKKYKLYMabuiagKukatjBarnaBelyandoNatalDownsYambinaLowerBurdekinAminungoBiriGanguluYirandaliGuwamuGunggariDharawalaTamboWadjabangayiYiningayBidyaraGungabulaUpperParooGunyaMarganyNyawaygiWargamayDyirbalGuguBadhunGudjalWarunguCoonambellaWulguruBindalMbabaramDharumbalKalkatunguYalarnngaWanamaraMayiThakurtiMayiKulanNgawunMayiYapiMayiKutunaSteeleGDGKatthangAwabakalKarreeDarkinyungHawkesburyPortMacQuarieThanggattiBirrpayiIyoraDharukThurrawalDharawalDhurgaMonerooJaitmatangNgariguGundungurraNgunawalWailwanNgiyambaaWiradjuriGamilaraayKamilaroiYuwaalaraayMuruwariBigambalGlenInnesTenterfieldNerangCreekGithabulBandjalangTweedRiverandPointDangarMinjungbalNggoiMwoiYugambehGuwarCoobenpilYugarabulJandayYagaraDurubulDallaMaryRiverandBunyaBunyaCountryBatyalaBayaliGoorengGoorengWakaWakaDawsonRiverUpperBrisbaneRiverDuungidjawuYaygirrGumbaynggirPytuReachBalkurraNgarrindjeriNedsCornerStationYithaYithaKeraminWellingtonNgaiawangBunganditjBindjaliHOPKINSRIVERKeerraywoorroongWarrnamboolWathawurrungColacWoiwurrungGunditjmaraTjapwurrungWembaWembaMathiMathiLakeHindmarshTHETATIARRACOUNTRYPiangilWathiwathiDhudhuroaPallanganmiddangYabulaYabulaYortaYorta

Western

Central

Southern

Northern

KulinKulin

Macro−Maric

Paman

Tangkic

Yolngu

Warluwaric

Yolngu

Warluwaric

Western

Southern

Northern

Supplementary Figure 3: Comparison of current Pama-Nyungan maximumclade credibility tree topology with that from Bowern and Atkinson6. The cur-rent analysis includes 111 additional languages. Identical languages are joined; unjoinedlanguages appear in the current tree (on the right) but were not included in the original2012 tree. Major clades from Bowern and Atkinson’s tree6 are coloured to facilitate com-parison and nodes were labelled with the subgroup names. Note that whilst both treesare fully resolved, posterior support for some branches is low, particularly those deeperin the tree.

4
Page 5: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

- 6000 -5000 -4000 -3000 -2000 -1000 0

KuukuYau

Bindjali

Yandruwandha

Kurtjar

Yirandali

NewNorciaandLeschenaultBay

Bayali

Yaygirr

PortMacQuarie

Dharuk

Minjungbal

Pallanganmiddang

Yanda

Thaynakwith

Linngithigh

WalmajarriNW

Piangil

Jaru

YirYoront

Guyani

PytuReach

Panyjima

Payungu

Tjapwurrung

MountFreelingDiyari

Manjiljarra

Wangkatja

Githabul

MayiThakurti

Arabana

Wailwan

GuguBadhun

Yidiny

MayiKulan

Tambo

Nhirrpi

Alyawarr

Mbiywom

Awabakal

Wathawurrung

Kukatj

Bularnu

GanguluBiri

Kalkatungu

GoorengGooreng

Mithaka

Bilinarra

Pakanh

Malyangapa

Kamilaroi

Karree

Gundungurra

MangalaMcK

Yalarnnga

Mbabaram

PurdunaThalanyji

Warrnambool

Yindjibarndi

Warungu

WalmajarriHR

Wangkajunga

Ngarinyman

Ngardily

Yingkarta

WangkumaraMcDWur

KuukThaayorre

MaryRiverandBunyaBunyaCountry

UwOykangand

Yawarrawarrka

KukuWura

Pinjarra

BidyaraGungabula

Yugambeh

Nguburindi

Yadhaykenu

Yindjilandji

NggoiMwoi

Wajarri

Yinhawangka

Aminungo

Katthang

Nyangumarta

Wadjabangayi

Badimaya

SouthernWalmajarri

Jiwarli

Bandjalang

Iyora

Badjiri

Wirangu

MathiMathi

Garlali

Guwar

HOPKINSRIVER

Yuwaalaraay

Malgana

WalmajarriBilliluna

Paakantyi

ChampionBay

Muruwari

Karajarri

Wulguru

Watjuk

Gupapuyngu

NerangCreek

Durubul

Parnkala

MayiYapi

Nyawaygi

MuliarraTribe

Warumungu

Darkinyung

Ngadjumaya

WikMuminh

Dharawal

Wangkayutyuru

Gunya

Kaanju

Umpithamu

LowerBurdekin

Jaitmatang

Dhangu

Gudjal

THETATIARRACOUNTRY

Nyamal

Yannhangu

Adnyamathanha

Hawkesbury

MayiKutuna

Mudburra

Janday

GraniteRangeDjabugay

Dalla

AghuTharrnggala

Kayardild

LakeHindmarsh

MartuWangka

DawsonRiver

Malngin

Diyari

NtrangithAlngith

BarrowPoint

Dhudhuroa

Zorc

Birrpayi

Ngawun

RoxburghDowns-LowerGeorgina

Kalaamaya

Balkurra

Yinwum

Gamilaraay

Tagalag

Yiningay

Ngunawal

Ngaanyatjarra

Bindal

Dyirbal

Dharumbal

TweedRiverandPointDangar

Djinang

KokNar

Kurnu

Kaurna

Ikarranggal

KarajarriNW

Batyala

Dharawala

Kungadutyi

Wakaya

KLYMabuiag

NorthernNyangumarta

Jiwarliny

Kungkari

Bigambal

Narrungga

Kartujarra

Thurrawal

Wathiwathi

Warlmanpa

Ngamini

Yanyuwa

Gumatj

KKY

Martuthunira

KokoBera

Wangkangurru

WikNgatharr

Wargamay

KuguNganhcara

Dhuwal

Rirratjingu

Guwamu

WembaWemba

UpperBrisbaneRiver

Gunggari

Coonambella

Punthamara

Wanamara

Injinoo

NorthernMangarla

Barna

Ngarluma

MangalaNW

Kurrama

Woiwurrung

IrwinMurchison

Yardliyawarra

Bunganditj

Kunjen

UpperParoo

Warlpiri

WesternArrarnta

Yangkaal

Keerraywoorroong

Wardandi

Yangarella

Ritharrngu

Djambarrpuyngu

Colac

Wangkumara

Gumbaynggir

Duungidjawu

Nhanta

Eucla

GlenInnes

Moneroo

Warriyangga

Gugumini

PittaPitta

NatalDowns

Ngarrindjeri

Ngadjuri

WakaWaka

Pirriya

Bibbulman

YithaYitha

Nyungar

Dhuwala

Dhayyi

Mirniny

Djapu

CoopersCreek

Thanggatti

Gurindji

Kaytetye

Warluwarra

Tenterfield

Ngarla

Wadikali

Gudang

NgaiawangWellington

JaruMcC

YabulaYabula

Karuwali

Antekerrepenhe

GuuguYimidhirr

FlindersIsland

Yarluyandi

Pitjantjatjara

Yulparija

Mpalityan

Guwa

KingsCreekandtheGeorginaRiver

SteeleGDG

Ayapathu

Lardil

Yugarabul

KukuYalanji

Nukunu

Kukatja

Tharrgari

Golpa

MudburraMcC

Mbakwithi

Walangama

Nggoth

Umpila

Ngarigu

Belyando

Dhurga

Olkola

Coobenpil

Ganggalida

Kariyarra

CentralAnmatyerr

NedsCornerStation

Uradhi

Minkin

PintupiLuritja

Gunditjmara

NgiyambaaWiradjuri

Margany

YortaYorta

Kaniyang

WikMungkan

Keramin

Warnman

Yambina

Yagara

1

1

0.95

1

0.52

0.6

0.78

0.950.71

1

1

0.82

1

1

0.46

1

1

0.88

0.7

1

0.8

0.52

0.21

1

1

1

0.63

0.61

0.8

1

0.17

1

0.85

1

0.72

0.69

0.93

0.99

1

0.96

0.66

0.79

0.87

1

1

1

1

0.68

0.71

0.82

1

1

0.43

1

1

0.99

0.9

1

1

1

0.6

1

1

1

1

0.59

1

1

0.99

0.89

0.99

1

0.99

0.87

1

0.58

1

1

0.97

1

1

0.09

1

1

1

1

1

0.34

1

0.31

1

1

0.43

0.8

0.14

0.76

0.29

0.99

1

0.75

0.67

1

0.97

0.93

0.29

0.28

0.68

0.98

0.93

1

0.99

0.83

0.22

0.98

0.93

1

1

1

0.73

0.98

1

0.97

1

0.77

0.79

0.82

1

1

0.86

1

0.87

0.93

0.9

0.93

0.15

1

0.95

0.64

1

1

0.99

0.96

1

1

0.52

1

0.04

1

0.98

0.99

0.71

1

0.07

1

1

1

1

1

1

1

0.78

0.86

0.99

1

1

0.94

0.97

1

0.98

1

0.93

1

0.74

0.71

0.81

1

0.98

0.43

1

0.58

1

1

0.71

0.68

0.07

1

0.98

1

0.12

0.4

0.88

0.43

0.54

1

0.79

0.95

1

0.71

1

0.95

1

0.96

0.82

1

0.75

1

1

0.77

1

1

0.75

1

0.79

0.99

1

0.99

0.64

1

0.73

0.31

0.09

0.31

0.39

0.75

0.98

1

0.49

1

0.88

1

0.61

1

0.99

1

1

1

0.76

1

0.62

1

1

0.09

0.99

0.99

0.61

0.34

1

1

0.82

0.06

0.99

0.39

1

0.89

0.93

1

1

1

0.99

0.98

0.82

0.98

1

0.98

0.92

1

0.6

0.97

1

1

1

1

0.99

Central NSW

Yotayotic

Waka Kabi

Tangkic

Yolngu

Arandic

Lower Murray

Marrngu

Durubulic

Kartu

Pitta Pittic

EastKarnic

Ngumpin Yapa

Mayi

Karnic

SouthWest

Kanyara Mantharta

Gumbaynggiric

Wati

Warluwaric

Bigambalic

Pama Maric

Ngayarta

Kulin

Paakantyi Region

Arabana Wangkangurru

Bandjalangic

Thura Yura

Yardli

Kalkatungic

Yuin Kuri

Supplementary Figure 4: The diversification of the Pama-Nyungan languagefamily through time. Maximum clade credibility tree showing the inferred timing andemergence of the major branches and their subsequent diversification. Values above eachbranch indicate posterior support for the descendent clade. This tree is available in nexusformat as the Supplementary Data File 2 – ‘PamaNyunganMCC.txt’.

5
Page 6: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

12345678

9101112

13141522

1716

25

33282726

2423

212019 36

3839

371835

34

404241 43

4445

515052

534948

47

46

31323029

275

276 274273

272

271279

270

269268

281282

280

277278

286 288287

289291290

285

284283

246 247

248

249

250252

251

138 137

136 135 134

133253

254 132 131 130

129128

127

126

111124

125

6463263

262

261260

259258

255

256

257

266

265

267264

61

60

62

65 6668

675758

59 56

5455

69

70

71

7273

747576

77

83

84858687888990

9192939495

96

9897

99101100

102

104103105

110109

107108

106

8079

81

82

78

123

122121

120

118

119

117

116112

113115114

154155

153

151

152

156157

158

160159

161

162150

149148

147146

164

163

165

166167168

144143

145142141

170169

171

140139172

173175

174

176

177

245

244178179

180

181182

219

218

184

183

185

186

187

188

189190

191

192

193

194

195

199196

197

198

207

208 209

206

205

210 211214

212

213

204

202

200

201

203

217

216

215 220

221

222223 225

224226

227 228229

231230232

235234

233

239

236

237238 240

241

242

243

292

293

295294 305304303302306

296297

301299300

298

293

295

294 305304303302306

296297

301299300

298

1

2

3

45

678

9

1011 12

13 1415

22

1716

25

3328

2726

24

23

2120

19 36

3839

3718

35

34

40

41 42 4331

3230

29

70

71

72

73

74

7576

77

83

8485

86

87

88

89

90

91 92

93 94

9596

9897

99

100

101

102

81

82

Supplementary Figure 5: Map showing the geographic range data for eachof the sampled Pama-Nyungan languages. Circled numbers indicate location oflanguages, and can be found in Supplementary Table 8. Background map credits: Esri,Garmin International, Inc. (formerly DeLorme Publishing Company, Inc.), U.S. CentralIntelligence Agency (The World Factbook).

6
Page 7: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

a

b

Supplementary Figure 6: Map showing the prior distribution on the rootlocation together with the geographic range implied under each of the previ-ously proposed origin hypotheses. a) Prior distribution on the root location underthe standard founder-dispersal model. b) Prior distribution on the root location underthe best fitting two-times-slower-near-water founder-dispersal model. Background mapcredits: Esri, Garmin International, Inc. (formerly DeLorme Publishing Company, Inc.),U.S. Central Intelligence Agency (The World Factbook).

7
Page 8: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

Supplementary Table 1 – Four prior proposals for the location and timing of origin of the Pama-Nyungan language expansion.

These proposals were gleaned from the published literature on Australian linguistics and archaeology. They are considered to be the most prominent and most testable contemporary hypotheses regarding the expansion of Pama-Nyungan, and that have received the most discussion and attention. We summarise the original location, putative time of initial diversification of the family, the mechanism of expansion, the basis of the evidence used by the author of the hypothesis, and the main references in the literature. † This range spans Evans and Jones7 (p185, p189) date estimates for the age of the family of 4-5kya and McConvell’s9 (p125, p129) 5-6kya estimates. ‡ - Williams et al’s11 radio-carbon date modeling shows an expansion following improved climate after 9kya, with sustained population growth rates until 7kya. Smith14 suggests an 8kya age for the family. * - Clendon12 (p. 46) “when the climate became warmer and wetter after ca. 13,000 BP, the continent was reoccupied from a relatively compact demographic base consisting mainly of people from the dividing range speaking a relatively small number of languages. In this sense, then, the glacial maximum constituted a bottleneck in time through which the Pama-Nyungan languages had to pass before attaining their later diversity.” # - Dixon16 (p89-90) argues the Australian languages spread within ‘a few thousand years’ of the initial colonization. Recent estimates put the initial colonization of Sahul at 48.8 (±1.3) kya4. ∆ - Since Pama-Nyungan languages are not found throughout most of Arnhem Land and the Kimberleys, we consider this hypothesis to also include Pama-Nyungan languages adjacent to this region. Excluding these languages from the hypothesis does not change our result in any way.

Hypothesis Timing Location Reason Evidence Main references

1. Rapid replacement

c. 4-6kya† Gulf of Carpentaria region, including southwest of Gulf and northwestern Queensland

Expansion linked to one or more of technological advantage (e.g. backed artefacts), ceremonial advantage or, for later stages of the expansion, the dingo.

Archaeological record from mid-late Holocene, and McConvell’s ‘back-tracking’ method.

5,7,8,9

2. Early Holocene c. 7-9kya‡ Likely expansion from Late Pleistocene refugia - Gulf Plains/Einasleigh Uplands, Brigalow Belt South,  Murray Darling Depression

Expansion in Holocene Climatic Optimum; with further fragmentation following ENSO-onset.

Extrapolation from counts of carbon dates to a continent-wide model. Smith (2013) argues that a Pama-Nyungan boundary must have been in place well before 4.5 kya to impede spread of stone tools.

11,14,15

3. Post Antarctic Cold Reversal (ACR)

c. 10-13kya Dividing range* Pama-Nyungan expansion from refugia after ACR following a ‘bottleneck’ during the last glacial maximum.

Model based on the author’s synthesis of linguistic and archaeological data.

12

4. Initial Colonisation

c. 40-55kya# With initial colonization of Australia from the north (including Cape York, Arnhem Land, the Kimberleys). ∆

Expanding into uninhabited territory as original colonisers, with subsequent ongoing diffusion.

Argues it is implausible to assume widespread language shift and major population expansion into inhabited areas.

16,17

8
Page 9: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

Supplementary Table 2 – Bayes factors comparing support for the four origin hypothe-ses under the founder dispersal model with a relaxed and strict clock on rates of geo-graphic movement.

Bayes factors were estimated based on the ratio of posterior to prior frequency of the root location and age agreeing with each of the four hypotheses. H1 = rapid replacement, H2 = early-Holocene intensification, H3 = post-ACR, H4 = initial colonisation. Bayes factors for ‘Geography only’ do not consider the timing component of the hypotheses. We present only comparisons with respect to H1 since it is always favoured. Positive Bayes Factors support H1. A Bayes factor of 5 to 20 is taken as substantial support, greater than 20 as strong support, and greater than 100 as decisive78.

Analysis Full analysis Geography only

H1 vs H2

H1 vs H3

H1 vs H4

H1 vs H2

H1 vs H3

H1 vs H4

Standard Founder Dispersal Models

Standard Founder Dispersal Model (relaxed clock) 336.0 6163.0 153.0 6.2 76.7 31.0

Standard Founder Dispersal Model (strict clock) 1310.7 5831.4 145.2 12.2 172.2 86.5

9
Page 10: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

Supplementary Table 3 – Comparison of the standard homogeneous (equal rates) founder dispersal model with a range of heterogeneous founder dispersal models.

The first column denotes whether and to what extent the speed of movement is faster or slower near water. The second column denotes whether average rates across branches in the tree are set to be equal (strict) or allowed to vary across branches according to a relaxed random walk25,26 (relaxed). The third column gives the marginal likelihood for each model. The best fitting model is highlighted in bold.

Rates near water Rates across branches Log Marginal Likelihood

10 times faster Relaxed clock -157548.7

10 times faster Strict clock -157548.8

5 times faster Relaxed clock -157500.7

5 times faster Strict clock -157486.8

2 times faster Relaxed clock -157449.7

2 times faster Strict clock -157451.6

Equal Relaxed clock -157413.4

Equal Strict clock -157423.2

2 times slower Relaxed clock -157402.9

2 times slower Strict clock -157398.1

5 times slower Relaxed clock -157399.4

5 times slower Strict clock -157413.1

10 times slower Relaxed clock -157429.6

10 times slower Strict clock -157431.1

10
Page 11: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

Supplementary Table 4 – Bayes factors comparing support for the four origin hypothe-ses across the best-fitting heterogeneous founder dispersal models.

Bayes factors were estimated based on the ratio of posterior to prior frequency of the root location and age agreeing with each of the four hypotheses. H1 = rapid replacement, H2 = early-Holocene intensification, H3 = post-ACR, H4 = initial colonisation. Bayes factors for ‘Geography only’ do not consider the timing component of the hypotheses. We present only comparisons with respect to H1 since it is always favoured. Positive Bayes Factors support H1. A Bayes factor of 5 to 20 is taken as substantial support, greater than 20 as strong support, and greater than 100 as decisive78. We include the best fitting heterogeneous founder dispersal model and two further models within 5 log likelihood units of the best fitting model. All other models were >10 log units from the best fitting model.

Analysis Full analysis Geography only

H1 vs H2

H1 vs H3

H1 vs H4

H1 vs H2

H1 vs H3

H1 vs H4

Best-fitting Heterogeneous Founder Dispersal Model

2 x slower near water Founder Dispersal Model (strict clock) 46.2 1930.4 1938.5 7.3 49.8 100.0

Close-to-best Heterogeneous Founder Dispersal Models

5 x slower near water Founder Dispersal Model (relaxed clock) 91.3 1854.4 1862.2 5.2 31.5 34.8

2 x slower near water Founder Dispersal Model (relaxed clock) 47.8 1885.7 1893.6 6.5 53.1 106.6

11
Page 12: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

Supplementary Table 5 – Bayes factors comparing support for the four origin hypothe-ses under the standard Brownian spatial diffusion model.

Bayes factors were estimated based on the ratio of posterior to prior frequency of the root location and age agreeing with each of the four hypotheses. H1 = rapid replacement, H2 = early-Holocene intensification, H3 = post-ACR, H4 = initial colonisation. Bayes factors for ‘Geography only’ do not consider the timing component of the hypotheses. We present only comparisons with respect to H1 since it is always favoured. Positive Bayes Factors support H1. A Bayes factor of 5 to 20 is taken as substantial support, greater than 20 as strong support, and greater than 100 as decisive78.

Analysis Full analysis Geography only

H1 vs H2

H1 vs H3

H1 vs H4

H1 vs H2

H1 vs H3

H1 vs H4

Standard Diffusion Model

Standard Brownian Diffusion Model (no founder effect) 859285 402218 9141 3.3 14.8 5.5

12
Page 13: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

Supplementary Table 6 – Model fit and log Bayes Factor comparisons for alternative cognate evolution models.

Cognate evolution model

Clock model

Rates across

meaningsLog marginal

likelihoodLog Bayes Factors

1 2 3 4 5 6

1 Covarion Relaxed Fixed -157136 - 70 274 5436 99 5162 CTMC + gamma Relaxed Fixed -157206 -70 - 204 5366 29 4463 CTMC Relaxed Fixed -157410 -274 -204 - 5162 -175 2424 Stochastic Dollo Relaxed Fixed -162572 -5436 -5366 -5162 - -5337 -49205 Covarion Relaxed Estimated -157235 -99 -29 175 5337 - 4176 Covarion Strict Fixed -157652 -516 -446 -242 4920 -417 -

Following prior work26,27,38, we compare each of four cognate evolution model variants under a relaxed clock and fixed rates across meaning classes. For the best-fitting covarion model we then evaluate support for a strict clock and estimated rates across meaning classes. Log marginal likelihoods calculated using stepping stone estimates (higher is better). The right hand side of the table shows log Bayes factors for models labelled in the rows against models labelled in columns, where a positive number indicates the model in the row is favoured. Log Bayes factors over 10 indicate decisive support for a model78. The covarion with a log normal uncorrelated relaxed clock and relative substitution rates fixed to 1 for all meaning classes fits best.

13
Page 14: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

Supplementary Table 7 – Bayes factors comparing support for the four origin hypotheses under a founder dispersal model with 5%, 10% and 15% false negatives and false positives (see Supplementary Text).

Bayes factors were estimated based on the ratio of posterior to prior frequency of the root location and age agreeing with each of the four hypotheses. H1 = rapid replacement, H2 = early-Holocene intensification, H3 = post-ACR, H4 = initial colonisation. Bayes factors for ‘Geography only’ do not consider the timing component of the hypotheses. We present only comparisons with respect to H1 since it is always favoured. A Bayes factor of 5 to 20 is taken as substantial support, greater than 20 as strong support, and greater than 100 as decisive78. We include the best fitting heterogeneous founder dispersal model and two further models within 5 log likelihood units of the best fitting model. All other models were >10 log units from the best fitting model.

Analysis Full analysis Geography only

H1 vs H2 H1 vs H3 H1 vs H3 H1 vs H2 H1 vs H3 H1 vs H3

Simulated false negatives

Replacing 5% of 1’s with 0’s – run 1 627.45 813,120.90 20,243.67 5.8 43.41 130.78

Replacing 5% of 1’s with 0’s – run 2 181.26 667,678.95 16,622.71 4.53 56.78 171.06

Replacing 5% of 1’s with 0’s – run 3 1,101.52 716,344.34 17,834.30 6.02 40.53 81.4

Replacing 5% of 1’s with 0’s – run 4 1,773,669 790,119.01 19,671.01 5 61.72 35.42

Replacing 10% of 1’s with 0’s – run 1 518.23 773,815.01 19,265.10 4.47 52.67 317.33

Replacing 10% of 1’s with 0’s – run 2 1,617.42 805,045.21 20,042.62 4.96 48.14 169.2

Replacing 10% of 1’s with 0’s – run 3 1,791,597.07 798,105.17 19,869.84 5.65 35.25 355,947.39

Replacing 10% of 1’s with 0’s – run 4 610.02 787,669.95 19,610.04 7.22 86.41 65.08

Replacing 15% of 1’s with 0’s – run 1 1,799,135.35 801,463.25 19,953.44 5.78 81.06 81.4

Replacing 15% of 1’s with 0’s – run 2 1,303.70 832,631.27 20,729.41 4.8 28.12 127.06

Replacing 15% of 1’s with 0’s – run 3 1,819.60 898,149.47 22,360.57 6.4 116.03 116.52

Replacing 15% of 1’s with 0’s – run 4 1,199,423.57 534,308.83 13,302.29 5.21 58.43 88.01

Simulated false positives

Merging 5% of cognate sets – run 1 1,432,307.75 638,052.07 15,885.11 6.68 217.25 109.08

Merging 5% of cognate sets – run 2 367.17 654,261.17 16,288.66 5.98 87.23 87.6

Merging 5% of cognate sets – run 3 185.45 640,393.99 15,943.42 4.96 108.63 54.54

Merging 5% of cognate sets – run 4 435.73 592,685.13 14,755.65 5.78 36.21 54.54

Merging 10% of cognate sets – run 1 119.68 494,029.38 12,299.49 5.53 28.39 19.96

Merging 10% of cognate sets – run 2 297.46 611,578.11 15,226.01 6.29 100.6 28.86

Merging 10% of cognate sets – run 3 268.41 639,830.36 15,929.39 4.59 89.29 53.8

Merging 10% of cognate sets – run 4 135.08 643,985.10 16,032.82 6.03 137.02 137.59

Merging 15% of cognate sets – run 1 983 678,913.93 16,902.42 5.99 57.09 114.66

Merging 15% of cognate sets – run 2 381.12 685,966.19 17,078.00 5.27 52.58 44.01

Merging 15% of cognate sets – run 3 522.87 627,408.10 15,620.12 5.56 135.78 136.35

Merging 15% of cognate sets – run 4 115.53 560,712.42 13,959.65 5.61 52.77 35.33

14
Page 15: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

Extended Data Table 8 – List of sampled Pama-Nyungan languages.

Language Ascii Name Abbreviation

Fig. 2

Latitude Longitude Subgroup Glottolog

ID

Date Extended

Fig. 1

Arabana Arabana Arabana -28.34 136.07 ArabanaWangkangurru arab1267 1970 250

Wangkangurru Wangkangurru Wangkangurru -25.35 137.01 ArabanaWangkangurru wang1290 1970 249

Alyawarr Alyawarr Alyawarr -21.89 135.74 Arandic alya1239 2000 246

Antekerrepenhe Antekerrepenhe Antekerrepen -22.07 137.43 Arandic ante1238 1970 247

Central Anmatyerr CentralAnmatyerr Cntr Anmatyrr -21.98 133.42 Arandic anma1239 2000 245

Kaytetye Kaytetye Kaytetye -21.05 133.60 Arandic kayt1238 1970 244

Western Arrarnta WesternArrarnta Wst Arrarnta -24.18 132.48 Arandic west2441 2000 177

Bandjalang Bandjalang Bandjalang -28.82 152.60 Bandjalangic band1339 1980 96

Githabul Githabul Githabul -28.36 151.94 Bandjalangic gida1240 1970 83

Minjungbal Minjungbal Minjungbal -28.75 153.22 Bandjalangic band1339 1900 94

Nerang Creek NerangCreek Nerang Creek -28.75 153.22 Bandjalangic yugu1249 1900 95

Ng’goi Mwoi NggoiMwoi Nggoi Mwoi -28.75 153.22 Bandjalangic band1339 1880 91

Tweed River and

Point Dangar

TweedRiverandPointDangar Tweed River -28.75 153.22 Bandjalangic band1339 1880 92

Yugambeh Yugambeh Yugambeh -28.75 153.22 Bandjalangic yugu1249 1970 93

Bigambal Bigambal Bigambal -29.07 149.92 Bigambalic biga1237 1970 78

Glen Innes, New Eng-

land

GlenInnes Glen Innes -29.54 150.93 Bigambalic yuga1244 1880 81

Tenterfield Tenterfield Tenterfield -29.54 150.93 Bigambalic yuga1244 1880 82

Gamilaraay Gamilaraay Gamilaraay -30.35 150.27 Central NSW gami1243 1960 80

Kamilaroi Kamilaroi Kamilaroi -30.26 150.06 Central NSW gami1243 1820 79

Muruwari Muruwari Muruwari -29.18 146.90 Central NSW muru1266 1960 121

Ngiyambaa Ngiyambaa Ngiyambaa -31.44 146.41 Central NSW ngiy1239 1970 118

Wailwan Wailwan Wailwan -31.44 146.41 Central NSW wayi1238 1880 119

Wiradjuri Wiradjuri Wiradjuri -32.97 147.42 Central NSW wira1262 1970 117

Yuwaalaraay Yuwaalaraay Yuwaalaraay -29.12 148.31 Central NSW yuwa1242 1970 122

Coobenpil Coobenpil Coobenpil -27.51 153.46 Durubulic yaga1256 1900 89

Durubul Durubul Durubul -27.33 152.95 Durubulic yaga1256 1940 86

Guwar Guwar Guwar -27.20 153.41 Durubulic guwa1244 1900 88

Janday Janday Janday -27.67 153.42 Durubulic 1880 90

Yagara Yagara Yagara -27.66 152.67 Durubulic yaga1256 1940 84

Yugarabul Yugarabul Yugarabul -27.66 152.67 Durubulic yaga1256 1930 85

Garlali Garlali Garlali -28.27 143.63 EastKarnic kala1380 1965 130

Kungadutyi Kungadutyi Kungadutyi -26.22 142.04 EastKarnic ngur1261 1960 259

Kungkari Kungkari Kungkari -24.95 144.22 EastKarnic kuun1236 1900 263

Pirriya Pirriya Pirriya -25.61 143.17 EastKarnic pirr1240 1880 262

Punthamara Punthamara Punthamara -26.52 143.24 EastKarnic punt1240 1960 260

Wangkumara Wangkumara Wangkumara -27.95 142.25 EastKarnic wong1246 1960 131

Wangkumara (Gar-

lali)

WangkumaraMcDWur WangkumaraD -26.52 143.24 EastKarnic wong1246 1970 261

Gumbaynggir Gumbaynggir Gumbaynggir -30.11 152.57 Gumbaynggiric kumb1268 1970 98

Yaygirr Yaygirr Yaygirr -29.85 153.21 Gumbaynggiric yayg1236 1900 97

Kalkatungu Kalkatungu Kalkatungu -20.69 139.79 Kalkatungic kalk1246 1960 281

Yalarnnga Yalarnnga Yalarnnga -22.17 139.97 Kalkatungic yala1262 1970 268

Jiwarli Jiwarli Jiwarli -23.15 116.22 Kanyara-Mantharta djiw1239 1980 205

Payungu Payungu Payungu -22.65 113.94 Kanyara-Mantharta bayu1240 1980 208

Purduna Purduna Purduna -23.64 114.84 Kanyara-Mantharta burd1238 1970 207

Thalanyji Thalanyji Thalanyji -22.71 114.87 Kanyara-Mantharta dhal1245 2000 209

Tharrgari Tharrgari Tharrgari -23.88 115.68 Kanyara-Mantharta dhar1247 2000 206

Warriyangga Warriyangga Warriyangga -23.89 117.01 Kanyara-Mantharta wari1262 1980 204

Yingkarta Yingkarta Yingkarta -25.72 114.97 Kanyara-Mantharta ying1247 1970 198

Badjiri Badjiri Badjiri -28.18 145.60 Karnic badj1244 1960 127

Cooper’s Creek CoopersCreek Coopers Creek -28.26 140.33 Karnic 1880 254

Diyari Diyari Diyari -28.62 138.51 Karnic dira1238 1970 252

Guwa Guwa Guwa -22.20 142.99 Karnic guwa1242 1900 61

Karuwali Karuwali Karuwali -24.36 142.04 Karnic karr1236 1900 264

Mithaka Mithaka Mithaka -25.34 139.64 Karnic mith1236 1960 257

Mount Freeling Di-

yari

MountFreelingDiyari Mt Frlng Diyari -28.93 137.46 Karnic dier1241 1880 251

Ngamini Ngamini Ngamini -27.07 139.30 Karnic ngam1265 1970 255

Nhirrpi Nhirrpi Nhirrpi -28.10 141.57 Karnic yand1252 1950 132

Yanda Yanda Yanda -22.17 141.36 Karnic yand1251 1900 269

Yandruwandha Yandruwandha Yandruwandha -28.91 140.28 Karnic yand1253 1970 253

Yarluyandi Yarluyandi Yarluyandi -26.02 139.26 Karnic dier1240 1970 256

Yawarrawarrka Yawarrawarrka Yawarrawarrk -26.48 140.72 Karnic yawa1258 1970 258

Badimaya Badimaya Badimaya -28.34 117.34 Kartu badi1246 1990 200

Champion Bay ChampionBay Champion Bay -28.80 114.50 Kartu nhan1238 1930 195

Irwin and Murchison

River

IrwinMurchison Irwin Mrchsn -27.67 114.20 Kartu badi1246 1940 196

Kalaamaya Kalaamaya Kalaamaya -30.68 119.05 Kartu kala1401 1970 187

Malgana Malgana Malgana -26.44 113.94 Kartu malg1242 1970 197

Muliarra Tribe MuliarraTribe Muliarra Trb -26.90 118.90 Kartu waja1257 1886 201

15
Page 16: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

Language Ascii Name Abbreviation

Fig. 2

Latitude Longitude Subgroup Glottolog

ID

Date Extended

Fig. 1

Nhanta Nhanta Nhanta -27.58 115.19 Kartu nhan1238 1990 199

Wajarri Wajarri Wajarri -26.04 117.21 Kartu waja1257 1990 202

Bindjali Bindjali Bindjali -35.24 140.46 Kulin warr1257 1970 143

Bunganditj Bunganditj Bunganditj -37.22 140.61 Kulin bung1264 1870 165

Colac Colac Colac -38.36 143.51 Kulin cola1237 1860 158

Gunditjmara Gunditjmara Gunditjmara -38.16 142.24 Kulin west2443 1860 160

Hopkins River HOPKINSRIVER Hopkinsriver -35.24 140.46 Kulin warr1257 1880 144

Keerraywoorroong Keerraywoorroong Kraywooroong -36.36 142.32 Kulin warr1257 1900 162

Lake Hindmarsh LakeHindmarsh Lake Hindmar -35.90 141.43 Kulin 1900 163

Mathi-Mathi MathiMathi Mathi Mathi -34.62 144.12 Kulin west2443 1960 149

Piangil Piangil Piangil -34.69 142.27 Kulin west2443 1870 146

The Tatiarra Coun-

try

THETATIARRACOUNTRY The Tatiarra -35.90 141.43 Kulin west2443 1880 164

Tjapwurrung Tjapwurrung Tjapwurrung -37.39 142.71 Kulin west2443 1970 161

Warrnambool Warrnambool Warrnambool -38.16 142.24 Kulin warr1257 1970 159

Wathawurrung Wathawurrung Wathawurrung -38.09 144.08 Kulin wath1238 1900 157

Wathiwathi Wathiwathi Wathiwathi -34.95 143.01 Kulin west2443 1840 147

Wemba-Wemba WembaWemba Wemba Wemba -35.93 143.83 Kulin west2443 1960 150

Woiwurrung Woiwurrung Woiwurrung -37.80 145.22 Kulin woiw1237 1900 156

Yaraldi Balkurra Yaraldi -35.45 139.42 Lower Murray — 1840 167

Keramin Keramin Keramin -34.70 139.67 Lower Murray nort2756 1840 142

Ned’s Corner Station,

Murray River

NedsCornerStation Neds Corner -34.39 141.11 Lower Murray upper1415 1880 145

Ngaiawang Ngaiawang Ngaiawang -34.52 139.67 Lower Murray nort2756 1830 141

Ngarrindjeri Ngarrindjeri Ngarrindjeri -35.80 140.04 Lower Murray narr1259 1840 166

Pytu Reach PytuReach Pytu Reach -35.48 139.16 Lower Murray narr1259 1880 168

Wellington Wellington Wellington -34.59 138.87 Lower Murray lowe1402 1830 169

Yitha-Yitha YithaYitha Yitha Yitha -34.01 143.80 Lower Murray yara1253 1840 148

Karajarri Karajarri Karajarri -18.96 122.28 Marrngu kara1476 1990 222

Karajarri (Nekes and

Worms)

KarajarriNW Karajarri NW -18.96 122.28 Marrngu kara1476 1930 223

Mangala

(Bidyadanga)

MangalaMcK Mangala Mc K -19.38 123.51 Marrngu mang1383 1970 226

Mangala (Nekes and

Worms)

MangalaNW Mangala N W -19.38 123.51 Marrngu mang1383 1930 224

Northern Mangarla NorthernMangarla Nrth Mangrla -19.38 123.51 Marrngu mang1383 1970 225

Northern Nyangu-

marta

NorthernNyangumarta Nrt Nyngmrta -20.69 121.84 Marrngu nyan1301 1970 220

Nyangumarta Nyangumarta Nyangumarta -20.69 121.84 Marrngu nyan1301 1990 221

Mayi-Kulan MayiKulan Mayi Kulan -18.56 141.81 Mayi mayk1239 1920 272

Mayi-Kutuna MayiKutuna Mayi Kutuna -18.45 140.63 Mayi maya1280 1960 278

Mayi-Thakurti MayiThakurti Mayi Thakurti -18.45 140.63 Mayi mayk1239 1960 277

Mayi-Yapi MayiYapi Mayi Yapi -19.73 141.12 Mayi mayk1239 1960 279

Ngawun Ngawun Ngawun -19.51 142.18 Mayi ngaw1240 1900 271

Wanamara Wanamara Wanamara -20.92 141.73 Mayi 1970 270

Kariyarra Kariyarra Kariyarra -21.23 118.26 Ngayarta kari1304 1990 214

Kurrama Kurrama Kurrama -22.82 117.54 Ngayarta kurr1243 1990 213

Martuthunira Martuthunira Martuthunira -20.98 115.87 Ngayarta mart1255 1970 210

Ngarla Ngarla Ngarla -20.22 119.42 Ngayarta ngar1286 1990 215

Ngarluma Ngarluma Ngarluma -20.89 116.97 Ngayarta ngar1287 1990 211

Nyamal Nyamal Nyamal -21.15 119.86 Ngayarta nyam1271 1990 216

Panyjima Panyjima Panyjima -22.58 119.61 Ngayarta pany1241 1980 217

Yindjibarndi Yindjibarndi Yindjibarndi -22.03 117.44 Ngayarta yind1247 1970 212

Yinhawangka Yinhawangka Yinhawangka -24.02 118.69 Ngayarta pany1241 1980 203

Bilinarra Bilinarra Bilinarra -16.77 130.80 Ngumpin-Yapa ngar1235 1990 238

Gurindji Gurindji Gurindji -17.60 130.56 Ngumpin-Yapa guri1247 1990 239

Jaru Jaru Jaru -18.16 128.46 Ngumpin-Yapa jaru1254 1990 233

Jaru-McC JaruMcC Jaru McC -18.16 128.46 Ngumpin-Yapa jaru1254 1990 234

Jiwarliny Jiwarliny Jiwarliny -20.10 124.39 Ngumpin-Yapa walm1241 1990 227

Malngin Malngin Malngin -16.97 129.45 Ngumpin-Yapa maln1239 1990 236

Mudburra Mudburra Mudburra -17.07 131.90 Ngumpin-Yapa mudb1240 1990 241

Mudburra (Mc-

Convell)

MudburraMcC Mudburra McC -17.07 131.90 Ngumpin-Yapa mudb1240 1990 240

Ngardily Ngardily Ngardily -18.20 128.57 Ngumpin-Yapa nort2753 1990 235

Ngarinyman Ngarinyman Ngarinyman -16.51 130.50 Ngumpin-Yapa ngar1235 1990 237

Southern Walmajarri SouthernWalmajarri Sth Walmajri -20.07 126.39 Ngumpin-Yapa walm1241 1970 230

WalmajarriBilliluna WalmajarriBilliluna Walmajarri B -20.07 126.39 Ngumpin-Yapa walm1241 1990 229

Walmajarri (Hudson

and Richards)

WalmajarriHR Walmajarri HR -20.07 126.39 Ngumpin-Yapa walm1241 1980 231

Walmajarri (Nekes

and Worms)

WalmajarriNW Walmajarri NW -20.07 126.39 Ngumpin-Yapa walm1241 1930 228

Warlmanpa Warlmanpa Warlmanpa -18.36 132.52 Ngumpin-Yapa warl1255 1990 242

Warlpiri Warlpiri Warlpiri -20.83 130.74 Ngumpin-Yapa warl1254 2000 178

Warumungu Warumungu Warumungu -19.13 134.21 Ngumpin-Yapa waru1265 1990 243

16
Page 17: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

Language Ascii Name Abbreviation

Fig. 2

Latitude Longitude Subgroup Glottolog

ID

Date Extended

Fig. 1

Bibbulman Bibbulman Bibbulman -34.43 116.08 Nyungar 1900 190

Eucla Eucla Eucla -31.24 128.46 Nyungar mirn1243 1880 174

Kaniyang Kaniyang Kaniyang -33.82 116.82 Nyungar kani1276 1970 189

Mirniny Mirniny Mirniny -31.24 128.46 Nyungar mirn1243 1950 175

New Norcia and

Leschenault Bay

NewNorciaandLeschenaultBay New Norcia -29.92 115.76 Nyungar nyun1247 1880 194

Ngadjumaya Ngadjumaya Ngadjumaya -32.12 122.94 Nyungar ngad1258 1950 186

Nyungar Nyungar Nyungar -31.66 118.55 Nyungar nyun1247 1930 188

Pinjarra Pinjarra Pinjarra -32.84 115.92 Nyungar nyun1247 1970 192

Wardandi Wardandi Wardandi -33.87 115.45 Nyungar nyun1247 1900 191

Watjuk Watjuk Watjuk -32.00 116.07 Nyungar nyun1247 1900 193

Kurnu Kurnu Kurnu -29.97 145.68 Paakantyi darl1243 1960 120

Paakantyi Paakantyi Paakantyi -31.02 142.77 Paakantyi darl1243 1960 134

Aghu-Tharrnggala AghuTharrnggala Aghu Thrngala -15.26 143.43 PamaMaric aghu1254 1970 34

Alngith Alngith Alngith -12.58 141.93 PamaMaric alng1239 1970 10

Aminungo Aminungo Aminungo -23.62 147.58 PamaMaric east2716 1880 66

Ayapathu Ayapathu Ayapathu -14.55 142.73 PamaMaric ayab1239 1970 19

Barna Barna Barna -21.98 147.98 PamaMaric east2716 1880 67

Barrow Point BarrowPoint Barrow Point -14.59 144.09 PamaMaric barr1247 1960 36

Belyando Belyando Belyando -21.76 145.93 PamaMaric east2716 1880 58

Bidyara-Gungabula BidyaraGungabula Bdyra Gngbla -25.89 147.69 PamaMaric bidy1243 1970 124

Bindal Bindal Bindal -19.71 146.77 PamaMaric bind1237 1970 54

Biri Biri Biri -21.11 146.80 PamaMaric biri1256 1970 56

Coonambella Coonambella Coonambella -18.80 146.06 PamaMaric wulg1239 1920 52

Dharawala Dharawala Dharawala -23.68 146.93 PamaMaric bidy1243 1970 65

Dharumbal Dharumbal Dharumbal -22.42 150.12 PamaMaric dhar1248 2000 69

Djabugay Djabugay Djabugay -16.70 145.55 PamaMaric dyaa1242 1970 43

Dyirbal Dyirbal Dyirbal -17.79 145.68 PamaMaric dyir1250 1970 45

Flinders Island FlindersIsland Flinders Isl -14.09 144.27 PamaMaric flin1247 1960 37

Gangulu Gangulu Gangulu -23.01 148.54 PamaMaric gang1268 1970 68

Granite Range GraniteRange Granite Range -16.46 144.26 PamaMaric 1885 42

Gudang Gudang Gudang -11.01 142.62 PamaMaric guda1244 1970 5

Gudjal Gudjal Gudjal -19.23 144.83 PamaMaric gudj1237 1990 49

Gugu-Badhun GuguBadhun Gugu Badhun -18.83 145.01 PamaMaric gugu1253 1970 48

Gugu-Mini Gugumini Gugumini -16.13 143.95 PamaMaric gugu1257 1980 40

Gunggari Gunggari Gunggari -26.30 146.73 PamaMaric kung1258 1950 125

Gunya Gunya Gunya -26.89 146.56 PamaMaric guny1241 1970 126

Guugu-Yimidhirr GuuguYimidhirr Guugu Yimidh -14.95 144.91 PamaMaric guug1239 1970 38

Guwamu Guwamu Guwamu -27.61 149.03 PamaMaric guwa1243 1950 123

Ikarranggal Ikarranggal Ikarranggal -14.48 143.54 PamaMaric gurd1238 1970 35

Injinoo Injinoo Injinoo -11.01 142.62 PamaMaric urad1238 1970 4

Kaanju Kaanju Kaanju -13.38 142.99 PamaMaric kanj1260 1970 16

Kala Kawaw Ya KKY KKY -9.38 142.44 PamaMaric kala1377 1980 1

Kala Lagaw Ya KLY KLY -10.41 142.23 PamaMaric kala1377 1980 3

Kok-Nar KokNar Kok Nar -16.27 141.77 PamaMaric kokn1236 1970 29

Koko-Bera KokoBera Koko Bera -15.44 141.76 PamaMaric gugu1254 1970 28

Kugu Nganhcara KuguNganhcara Kugu Nganhca -14.30 141.88 PamaMaric wikn1246 1980 25

Kukatj Kukatj Kukatj -17.83 141.15 PamaMaric guga1239 1980 276

Kuku-Wura KukuWura Kuku Wura -15.19 144.28 PamaMaric gugu1256 1970 39

Kuku Yalanji KukuYalanji Kuku Yalanji -16.46 144.26 PamaMaric kuku1273 1980 41

Kunjen Kunjen Kunjen -16.37 142.59 PamaMaric kunj1245 1970 30

Kurtjar Kurtjar Kurtjar -16.99 141.52 PamaMaric kung1262 1970 275

Kuuk Thaayorre KuukThaayorre Kuuk Thaayor -14.85 141.85 PamaMaric thay1249 1970 26

Kuuku-Ya’u KuukuYau Kuuku Yau -12.57 143.04 PamaMaric kuuk1238 1990 15

Linngithigh Linngithigh Linngithigh -13.23 141.80 PamaMaric leni1238 1960 23

Lower Burdekin LowerBurdekin Lwr Burdekin -19.98 147.98 PamaMaric 1885 55

Mabuiag Mabuiag Mabuiag -9.94 142.20 PamaMaric kala1377 1905 2

Margany Margany Margany -26.58 144.80 PamaMaric marg1253 1970 128

Mbabaram Mbabaram Mbabaram -17.48 144.72 PamaMaric mbab1239 1970 46

Mbakwithi Mbakwithi Mbakwithi -12.15 141.99 PamaMaric angu1242 1980 9

Mbiywom Mbiywom Mbiywom -13.01 142.32 PamaMaric mbiy1238 1970 22

Mpalityan Mpalityan Mpalityan -12.31 142.50 PamaMaric mpal1237 1970 13

Natal Downs NatalDowns Natal Downs -20.96 145.56 PamaMaric east2716 1880 59

Nggoth Nggoth Nggoth -12.58 141.93 PamaMaric ngko1236 1970 11

Ntra’ngith Ntrangith Ntrangith -12.49 142.07 PamaMaric tyan1235 1970 12

Nyawaygi Nyawaygi Nyawaygi -18.80 146.06 PamaMaric nyaw1247 1970 51

Olkola Olkola Olkola -16.37 142.59 PamaMaric oyka1239 1970 32

Pakanh Pakanh Pakanh -15.50 142.68 PamaMaric paka1251 1990 33

Tagalag Tagalag Tagalag -18.20 143.02 PamaMaric taga1279 1900 273

Tambo Tambo Tambo -25.19 146.19 PamaMaric tamb1252 1886 64

Thaynakwith Thaynakwith Thaynakwith -11.79 142.16 PamaMaric urad1238 1970 8

Umpila Umpila Umpila -13.41 143.38 PamaMaric umpi1239 1980 17

Umpithamu Umpithamu Umpithamu -14.05 143.28 PamaMaric umbi1243 2000 18

Upper Paroo UpperParoo Upper Paroo -26.58 144.80 PamaMaric 1880 129

17
Page 18: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

Language Ascii Name Abbreviation

Fig. 2

Latitude Longitude Subgroup Glottolog

ID

Date Extended

Fig. 1

Uradhi Uradhi Uradhi -11.55 142.33 PamaMaric wuth1237 1970 7

Uw Oykangand UwOykangand Uw Oykangand -16.37 142.59 PamaMaric oyka1239 1970 31

Wadjabangayi Wadjabangayi Wadjabangayi -24.71 145.60 PamaMaric – 1880 63

Walangama Walangama Walangama -17.65 142.29 PamaMaric wala1263 1970 274

Wargamay Wargamay Wargamay -18.59 146.01 PamaMaric warr1255 1970 50

Warungu Warungu Warungu -18.37 145.01 PamaMaric waru1264 1970 47

Wik Muminh WikMuminh Wik Muminh -13.87 142.29 PamaMaric kuku1283 1970 20

Wik Mungkan WikMungkan Wik Mungkan -13.87 142.29 PamaMaric wikm1247 1970 21

Wik Ngatharr WikNgatharr Wik Ngatharr -13.64 141.60 PamaMaric wika1238 1970 24

Wulguru Wulguru Wulguru -19.41 146.23 PamaMaric wulg1239 2000 53

Yadhaykenu Yadhaykenu Yadhaykenu -11.14 142.54 PamaMaric yadh1237 1970 6

Yambina Yambina Yambina -21.75 146.96 PamaMaric east2716 1930 57

Yidiny Yidiny Yidiny -17.26 145.79 PamaMaric yidi1250 1975 44

Yiningay Yiningay Yiningay -21.14 143.91 PamaMaric sout2765 1900 60

Yinwum Yinwum Yinwum -12.40 142.74 PamaMaric yinw1236 1970 14

Yirandali Yirandali Yirandali -23.05 144.70 PamaMaric yira1239 1880 62

Yir Yoront YirYoront Yir Yoront -15.10 142.07 PamaMaric yiry1247 1975 27

Junction of King’s

Creek and the

Georgina River

KingsCreekandtheGeorginaRiver Kings Creek -23.83 140.47 PittaPittic 1885 267

Pitta-Pitta PittaPitta Pitta Pitta -23.83 140.47 PittaPittic pitt1246 1970 265

Roxburgh Downs,

Lower Georgina

RoxburghDowns-LowerGeorgina Roxburg Dwns -23.83 140.47 PittaPittic 1885 266

Wangkayutyuru Wangkayutyuru Wangkayuturu -24.00 138.58 PittaPittic wang1289 1970 248

Ganggalida Ganggalida Ganggalida -18.17 138.62 Tangkic gang1267 1970 285

Kayardild Kayardild Kayardild -17.06 139.43 Tangkic kaya1319 1970 289

Lardil Lardil Lardil -16.62 139.38 Tangkic lard1243 1970 291

Minkin Minkin Minkin -17.86 139.72 Tangkic mink1237 1970 286

Nguburindi Nguburindi Nguburindi -17.86 139.72 Tangkic ngub1238 1970 288

Yangarella Yangarella Yangarella -17.86 139.72 Tangkic nyan1300 1970 287

Yangkaal Yangkaal Yangkaal -17.06 139.43 Tangkic nyan1300 1970 290

Adnyamathanha Adnyamathanha Adnyamathnha -30.28 139.03 Thura-Yura adny1235 1975 137

Guyani Guyani Guyani -30.40 137.74 Thura-Yura guya1249 1870 138

Kaurna Kaurna Kaurna -34.52 138.47 Thura-Yura kaur1267 1820 170

Narrungga Narrungga Narrungga -34.32 137.61 Thura-Yura naru1238 1840 171

Ngadjuri Ngadjuri Ngadjuri -32.89 139.32 Thura-Yura ngad1257 1850 140

Nukunu Nukunu Nukunu -32.65 138.27 Thura-Yura nugu1241 1970 139

Parnkala Parnkala Parnkala -33.22 136.37 Thura-Yura bang1339 1840 172

Wirangu Wirangu Wirangu -31.58 133.53 Thura-Yura wira1265 1960 173

Batyala Batyala Batyala -25.23 153.15 Waka-Kabi east2717 1950 72

Bayali Bayali Bayali -23.55 150.72 Waka-Kabi baya1257 1970 70

Dalla Dalla Dalla -25.59 152.34 Waka-Kabi east2717 1880 73

Dawson River DawsonRiver Dawson River -26.34 151.35 Waka-Kabi waka1274 1880 74

Duungidjawu Duungidjawu Duungidjawu -26.85 153.04 Waka-Kabi duun1241 1950 87

Gooreng Gooreng GoorengGooreng Gooreng Goor -24.52 151.25 Waka-Kabi gure1255 1960 71

Mary River and

Bunya Bunya Coun-

try

MaryRiverandBunyaBunyaCountry Mary River -26.34 151.35 Waka-Kabi east2717 1880 75

Upper Brisbane River UpperBrisbaneRiver Upr Brsbne Rvr -27.67 150.97 Waka-Kabi east2717 1880 77

Waka-Waka WakaWaka Waka Waka -26.29 151.06 Waka-Kabi waka1274 1970 76

Bularnu Bularnu Bularnu -19.50 139.69 Warluwaric bula1255 1970 280

Wakaya Wakaya Wakaya -19.65 136.64 Warluwaric waga1260 1980 283

Warluwarra Warluwarra Warluwarra -20.33 138.63 Warluwaric warl1256 1970 282

Yanyuwa Yanyuwa Yanyuwa -15.66 136.33 Warluwaric yany1243 1970 292

Yindjilandji Yindjilandji Yindjilandji -19.06 137.71 Warluwaric yind1248 1950 284

Kartujarra Kartujarra Kartujarra -23.76 124.21 Wati kart1247 1990 183

Kukatja Kukatja Kukatja -20.18 127.23 Wati kuka1246 1990 232

Manjiljarra Manjiljarra Manjiljarra -22.27 124.75 Wati mart1256 1970 182

Martu Wangka MartuWangka Martu Wangka -23.76 124.21 Wati mart1256 1990 184

Ngaanyatjarra Ngaanyatjarra Ngaanyatjrra -27.14 122.83 Wati ngaa1240 1990 185

Pintupi-Luritja PintupiLuritja Pintupi Lrtj -22.75 127.84 Wati pint1250 1990 180

Pitjantjatjara Pitjantjatjara Pitjantjatja -26.58 130.79 Wati pitj1243 1990 176

Wangkajunga Wangkajunga Wangkajunga -21.84 126.29 Wati wang1288 1980 181

Wangkatja Wangkatja Wangkatja -21.22 128.96 Wati wang1300 1990 179

Warnman Warnman Warnman -22.41 122.18 Wati wanm1242 1990 218

Yulparija Yulparija Yulparija -21.51 123.60 Wati yul1238 1980 219

Malyangapa Malyangapa Malyangapa -30.98 141.01 Yardli yarl1236 1950 135

Wadikali Wadikali Wadikali -29.21 141.56 Yardli yarl1236 1940 133

Yardliyawarra Yardliyawarra Yardliyawarr -31.08 139.38 Yardli yarl1236 1900 136

Dhangu Dhangu Dhangu -12.33 136.69 Yolngu dhan1270 2000 298

Dhayyi Dhayyi Dhayyi -12.47 135.82 Yolngu djar1245 2000 302

Dhuwal Dhuwal Dhuwal -12.47 135.82 Yolngu marr1258 2000 304

Dhuwala Dhuwala Dhuwala -12.76 136.41 Yolngu mada1281 2000 299

Djambarrpuyngu Djambarrpuyngu Djambarrpuyn -12.47 135.82 Yolngu marr1258 2000 303

18
Page 19: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

Language Ascii Name Abbreviation

Fig. 2

Latitude Longitude Subgroup Glottolog

ID

Date Extended

Fig. 1

Djapu Djapu Djapu -12.47 135.82 Yolngu djap1238 2000 305

Djinang Djinang Djinang -12.45 134.94 Yolngu djin1253 2000 294

Golpa Golpa Golpa -11.68 136.22 Yolngu yann1237 2000 296

Gumatj Gumatj Gumatj -12.76 136.41 Yolngu guma1253 2000 300

Gupapuyngu Gupapuyngu Gupapuyngu -12.76 136.41 Yolngu gupa1247 2000 301

Rirratjingu Rirratjingu Rirratjingu -11.94 136.37 Yolngu rirr1238 2000 297

Ritharrngu Ritharrngu Ritharrngu -13.09 135.46 Yolngu rita1239 1980 293

Yan-nhangu Yannhangu Yannhangu -11.90 135.52 Yolngu yann1237 2000 295

Zorc Zorc Zorc -12.47 135.82 Yolngu dhuw1249 1980 306

Dhudhuroa Dhudhuroa Dhudhuroa -37.37 147.25 Yotayotic dhud1236 1850 155

Pallanganmiddang Pallanganmiddang Pallanganmid -36.38 146.63 Yotayotic pall1243 1860 153

Yabula Yabula YabulaYabula Yabula Yabul -35.69 145.32 Yotayotic yort1237 1880 151

Yorta Yorta YortaYorta Yorta Yorta -36.51 145.51 Yotayotic yort1237 1880 152

Awabakal Awabakal Awabakal -33.05 151.53 Yuin-Kuri awab1243 1830 103

Birrpayi Birrpayi Birrpayi -31.43 152.37 Yuin-Kuri sydn1236 1870 101

Darkinyung Darkinyung Darkinyung -32.80 150.46 Yuin-Kuri awab1243 1880 106

Dharawal Dharawal Dharawal -34.38 150.55 Yuin-Kuri thur1254 1870 110

Dharuk Dharuk Dharuk -33.34 150.24 Yuin-Kuri sydn1236 1790 108

Dhurga Dhurga Dhurga -36.36 149.93 Yuin-Kuri dhur1239 1960 113

Gundungurra Gundungurra Gundungurra -35.13 149.51 Yuin-Kuri nort2760 1850 112

Hawkesbury river Hawkesbury Hawkesbury -33.34 150.24 Yuin-Kuri 1885 107

Iyora Iyora Iyora -33.52 151.08 Yuin-Kuri sydn1236 1790 105

Jaitmatang Jaitmatang Jaitmatang -37.62 147.71 Yuin-Kuri sout2771 1840 154

Karree Karree Karree -33.05 151.33 Yuin-Kuri awab1243 1880 104

Katthang Katthang Katthang -31.99 152.07 Yuin-Kuri wori1245 1880 102

Moneroo Moneroo Moneroo -36.08 148.86 Yuin-Kuri sout2771 1886 114

Ngarigu Ngarigu Ngarigu -36.08 148.86 Yuin-Kuri sout2771 1890 115

Ngunawal Ngunawal Ngunawal -34.76 148.77 Yuin-Kuri nort2760 1890 116

Port MacQuarie PortMacQuarie Port McQuari -31.43 152.37 Yuin-Kuri sydn1236 1880 100

Steele’s Gadang SteeleGDG Steele G D G -34.38 150.55 Yuin-Kuri wori1245 1790 109

Thanggati Thanggatti Thanggatti -30.82 152.41 Yuin-Kuri dyan1250 1970 99

Thurrawal Thurrawal Thurrawal -34.38 150.55 Yuin-Kuri thur1254 1970 111

Columns represent the language name, the ascii name as used in BEAST XML files, abbreviation

used in Figure 2, the location, subgroup a�liation used to make entries monophyletic, Glottolog ID

(if any), approximate date of attestation, and number used in Supplementary Fig. 5.

19
Page 20: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

Supplementary Methods

The following provides additional detail regarding the Bayesian phylogeography model

specification and priors. First, we provide background about Bayesian inference of phyloge-

nies and provide details regarding the tree prior (section 1). Then we describe three models of

cognate evolution (section 2) - the continuous time Markov Chain (CTMC) model, the covarion

model and the stochastic Dollo model. We then describe the treatment of rate heterogeneity

(section 3), how to augment phylogenetic inference with geographical information (section 4),

matrix exponentiation (section 5) and specifics on MCMC proposals (section 6). All of these

topics are covered elsewhere in the literature in more detail, and we provide references through-

out the text. In addition, we describe two follow-up analyses evaluating the robustness of our

findings under a simple Brownian diffusion model (section 7) and to errors in cognate coding

(section 8).

1 Bayesian phylogenetics

Let there be n languages and let T be a bifurcating tree with n leaf nodes x1, . . . , xn

associated with the n languages. The internal nodes of the tree are xn+1, . . . , x2n�1 and by con-

vention the root node is x2n�1. For each of the n languages we have sequences of cognate data

such that for language i we have a sequence Si of k binary data points si1, . . . , sik. Together, the

sequences Si (i 2 1, . . . , n) form the data D. Using Bayes’ theorem, the posterior probability

20

Page 21: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

of a tree T given cognate data D consisting of cognate sequences S1, . . . , Sn is then given by

P (T |D, ✓) / P (T, ✓)P (D|T, ✓) (1)

where P (T, ✓) is the prior on the tree and the set of parameters ✓ governing the evolutionary and

dispersal models, P (D|T, ✓) the likelihood of the data given the tree and model parameters, and

P (T |D, ✓) the posterior probability of the tree, given the data and set of parameters. Following

prior work1–11, we assume that cognates evolve independently and Eq (1) reduces to the more

tractable

P (T |D, ✓) / P (T, ✓)kY

j=1

P (S.j|T, ✓) (2)

where S.j = {s1j, . . . , snj} is the cognate data at site j consisting of cognates s1j to snj . Let ⇡i

be the index of xi’s parent, then the site probability P (S.j|T, ✓) is calculated as

P (S.j|T, ✓) =

1X

vn+1=0

. . .1X

v2n�1=0

nY

i=1

P (xi = sij|x⇡i = v⇡i , ✓)⇥

2n�2Y

i=n+1

P (vi = sij|x⇡i = v⇡i , ✓)⇥ p(x2n�1 = v2n�1, ✓) (3)

where p(xi = vi|x⇡i = v⇡i , , ✓) is the probability of ending in value vi at node xi over the branch

into xi starting at the parent of xi with its parent value v⇡i . This probability is determined by

a substitution model (see below) and can be efficiently calculated using Felsenstein’s pruning

algorithm12.

To account for the fact that cognates were not included in D unless they were present in at

least one of the languages, instead of P (S.j|T, ✓) in Eq (3) we use an ascertainment correction12

P (S.j|S.j 6= 0, T, ✓) =P (S.j|T, ✓)

1� P (S.j = 0|T, ✓) (4)

21

Page 22: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

where S.j = 0 indicates a cognate vector with all zero entries. Following Chang et al2, for

those languages where S.j includes missing data we substitute a question mark (representing an

unknown state). Note that missing data can change across cognate sets, since languages have

different sets of missing meaning classes. P (S.j = 0|T, ✓) is the same within a meaning class

and for efficiency is only calculated once. Both probability terms in the fraction of Eq (4) can

be calculated using Felsenstein’s peeling algorithm12.

As can be seen in equation 1, the above requires some prior distribution on the probability

of a tree. The pure birth (also known as Yule) tree prior13 commonly used to model species di-

versification cannot be applied to our data because it assumes all lineages have been sampled at

the same time, whereas our languages are sampled over a range of 210 years. To accommodate

this stratified sampling, we use the birth-death skyline model14 to account for the proportion

of languages sampled at the various sampling times (so called rho sampling), where the rho

parameter was set to be proportional to the number of languages at a particular sample time. As

for a pure birth prior we assume a constant birth rate through time (with a uniform(0,1) prior

on birth events and death events set to zero) but allow one rate before and one rate after 210

years BP to account for the fact that all attested languages were sampled in the last 210 years.

The model also requires an origin age, which was sampled with a uniform(0, 55Kya) prior so

as to span the age ranges of our four candidate hypotheses.

We note that whilst we assume a strictly bifurcating tree (lineages can only give rise to

two daughter lineages), we can in practice accommodate multifurcations (lineages splitting into

22

Page 23: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

many descendent lineages over a short time period) because the time between diversification

events can be arbitrarily small if the data support this.

2 Models of Cognate Evolution

We considered three models of cognate evolution, as outlined below.

2.1 CTMC model

The simplest model describing cognate evolution along a branch of a tree is the continuous

time Markov chain (CTMC) model3, 5 over two states: a cognate being present and a cognate

being absent. The CTMC model is specified by an infinitesimal time rate matrix (governed by

a single parameter �) defined as

0 1

0 :

1 :

2

664� �

1 �

3

775= Q (5)

Note that in principle it is possible to specify two parameters, but since the rate matrix is nor-

malised in BEAST such that the expected number of mutations per unit of time is 1, there is

only 1 degree of freedom. Therefore, we fix one rate to 1 and estimate the other. By convention,

the diagonal entries are the rates of leaving a state and are left blank in the matrix since they can

be calculated as minus the sum of all other entries in the same row.

23

Page 24: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

The finite-time transition probabilities for this CTMC model satisfy the Chapman-Kolmogorov

equation

˙P (t) = �tP (t)Q with initial conditions P (0) = I

where �t a small time step and I is the identity matrix. The solution is P (t) = exp (tQ). So,

we calculate the transition probability of going from character j to character k over time span t

as the exponent of t times Q, i.e.

P (xi = j|x⇡i = j, t, ✓) = etQj,k

In our analyses, we used a Dirichlet(1,1) prior over frequencies for the binary CTMC model.

2.2 Covarion model

The covarion model3, 4, 15 extends the CTMC model by allowing cognates to be in either

a ‘fast’ or ‘slow’ state. Hence, for the binary cognate present (1) and absent (0) data, there is a

fast 0, a fast 1, a slow 0 and a slow 1, totalling four states. The infinitesimal time rate matrix has

two rate parameters: the switch rate s, which determines the rate of moving from slow to fast

and vice versa, and transition rate ↵, which determines the rate at which a slow 0 transitions into

a slow 1 and vice versa. The rate at which fast 0s transition into fast 1 and vice versa is fixed

to 1. The base frequencies (f0, f1) represent the number of 0s and 1s present at the stationary

distribution, and the rate matrix Q is defined as

24

Page 25: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

fast

8>><

>>:

0 :

1 :

slow

8>><

>>:

0 :

1 :

0

BBBBBBBBBB@

� 1 s 0

1 � 0 s

s 0 � ↵

0 s ↵ �

1

CCCCCCCCCCA

0

BBBBBBBBBB@

f0

f1

f0

f1

1

CCCCCCCCCCA

=

0

BBBBBBBBBB@

� f1 sf0 0

f0 � 0 sf1

sf0 0 � ↵f1

0 sf1 ↵f0 �

1

CCCCCCCCCCA

= Q (6)

In our analyses, we used the following priors: a Gamma(0.5, 10) prior on the switch rate,

and a uniform(0,1) prior on the mutation rate. Hidden frequencies, which reflect the proportion

of fast and slow evolving sites, were fixed at (1/2,1/2) ensuring the BEAST implementation

forms a reversible substitution model, and cognate frequencies have a uniform(0.001,0.999)

prior to ensure numerical stability, though the estimated values never came close to these bound-

aries.

2.3 Stochastic Dollo model

The stochastic Dollo model16–18 assumes each cognate can only arise once (with Poisson

rate �) but can be lost multiple times with death rate µ. Once the cognate is lost, it cannot arise

again. The infinitesimal rate matrix of this process is

0 1

0 :

1 :

2

664� 0

µ �

3

775= Q

We used a uniform(0,1) prior on the death rate in our analyses.

25

Page 26: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

3 Rate variation

We considered two forms of rate variation: variation across branches in the tree and vari-

ation across sites (here, cognate sets).

The strict clock model is the simplest model of rate variation across branches. This as-

sumes no rate variation and uses a single parameter, the clock rate c, which serves as a scale

factor for all branches in the tree. In our experiments, we used a uniform(0,1e-4) prior on c.

The upper bound is never reached in practice but does reduce the number of samples required

for burn in. The uncorrelated relaxed clock model19 allows rate variation across branches by

sampling a rate multiplier for each branch where the distribution of rates is drawn from a log

normal distribution with mean c (the average clock rate) and standard deviation �. Both c and

� were estimated, using a uniform(0,1e-4) on c and an exponential prior with mean 1/3 on �.

We also considered rate variation across cognate sets. The rate used for a particular cog-

nate set on a particular branch is equal to the overall clock rate c times the branch specific rate

times the site rate. Previous applications of the CTMC model of cognate evolution have consid-

ered rate variation across cognate sets modelled using a gamma distribution3–5 with mean 1 and

shape parameter ↵ after the method proposed by20. We considered gamma distributed rate vari-

ation under the CTMC model, using an exponential prior with mean 1 on ↵. In addition, since

meaning classes can evolve at different rates21 we compared model fit for a single rate across

all meaning classes versus fitting one rate paramater for each meaning class, as described in2.

When estimating seperate rates for each meaning class we used a Dirichlet(1,. . . ,1) prior over

26

Page 27: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

the 200 meaning classes in our data set.

4 Bayesian phylogeography

D in Eq (1) above can include geographic information in addition to cognate data. We

can add location data for each language xi represented by a location posi = (lati, longi) with

latitude lati and longitude longi giving a vector pos1,...,n of locations. This can then be incor-

porated into the analysis alongside the cognate data, so that Eq (1) becomes

P (T |D,pos1,...,n, ✓) / P (T, ✓)P (D,pos1,...,n|T, ✓) (7)

Consistent with previous phylogeographic modelling approaches3, 22, 23, we assume that the ge-

ographical dispersal process is independent of cognate evolution, hence P (D,pos1,...,n|T, ✓) =

P (D|T, ✓)P (pos1,...,n|T, ✓), so Eq (7) can be written as

P (T |D,pos1,...,n, ✓) / P (T, ✓)P (D|T, ✓)P (pos1,...,n|T, ✓) (8)

and P (pos1,...,n|T, ✓) is calculated through Eq (2) in the Methods.

The likelihood of observing the set of tip locations pos1...n given a tree T , with precision

b governing the diffusion process, and other parameters ✓ (including branch length, cognate

model parameters, and hyper parameters of priors) is

p(pos1...n|T, b, ✓) =Z

posn+1

. . .

Z

pos2n�1

Y

i=1...2n�2

f(xi = posi|x⇡i = pos⇡i , ✓, b)

f(pos2n�1|✓, b)dposn+1 . . . dpos2n�1 (9)

where the first density f(xi = posi|x⇡i = pos⇡i✓, b) represents the migration from parent node

27

Page 28: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

x⇡i to node xi and the second density represents the root location prior.

Since this integral is intractable, we approximate it using MCMC by augmenting the state

space with locations of internal nodes in the tree using the following density:

p(pos1...n|T, b, ✓) =

✓ Y

i=1...2n�2

f(xi = posi|x⇡i = pos⇡i , ✓, b)

◆f(pos2n�1|✓, b) (10)

sampling the locations posn+1 . . . pos2n�1 of all internal nodes in the tree.

5 Matrix exponentiation

We used matrix exponentiation of the landscape-aware rate matrix R to obtain the prob-

ability of arriving at node j after time t when starting in node i. Whilst there are well-known

pitfalls of matrix exponentiation24 we found using eigendecomposition of R into L⇤M , where

⇤ is a diagonal matrix of eigenvalues and L and M the left and right eigenvectors, resulted in

numerically stable exponentiation using P (i|j, t) = Le⇤tM(i, j). Decomposition, then multi-

plying L⇤M , showed that the largest absolute difference for an entry did not exceed 10

�12. If

only entry (i, j) of the matrix is required, this can be calculated in O(N2).

6 MCMC proposal

In addition to the default operators in BEAST25 we developed several novel proposal

mechanisms to improve the efficiency with which the MCMC algorithm explores the state

space. First, we introduced additional nearest-neighbour interchange and subtree-prune-regraft

proposals that only propose changes to the tree involving nodes above the sub-family level spec-

28

Page 29: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

ified by the monophyletic constraints on the sub-family (see Supplementary Table 8). Second,

we introduced a meta-operator that works by first applying any proposal that changes the tree

topology. Then, for every internal node xi that has the potential of having been affected by the

topology proposal, we randomly sample hi such that the location of xi is randomly assigned to

one of its children. The Hastings ratio1, 26 for the meta proposal (which corrects for biases of

the random walk) is 1, so the Hastings ratio for the combined tree topology and meta proposal

is the same as the Hastings ratio of the tree topology proposal. Finally, we added an operator

that randomly selects a node, then randomly samples hi. We otherwise used default operators in

BEAST25 for the tree and parameters of the covarion, clock model and tree prior (see BEAST

XML files for details).

7 Phylogeography based on standard Brownian diffusion

In the main text we introduce and report results based on a new founder-disperal model

of language expansion, in which one lineage migrates while the other remains. Standard dif-

fusion based phylogeographic models3, 9–11, 22 assume that, following a lineage split, descendent

lineages disperse at equal rates. This assumption is biased towards a posterior distribution on

ancestral nodes (and hence the origin) near the centre of the geographic range of the descen-

dent languages, inconsistent with proposed Pama-Nyungan homelands. However, in order to

ensure our findings are not contingent on the assumptions of the founder-dispersal model, we

repeated the main set of analyses using the standard Brownian diffusion model. This model,

described in detail in Bouckaert et al3, has previously been applied to test hypotheses regarding

29

Page 30: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

the expansion of Indo-European3, Arawakan10, Ainu9 and Bantu11 language families. In order

to correct for distortion due to projection onto a plane, we implement this model on a sphere

23. Supplementary Fig. 2 and Supplementary Table 5 show that our findings remain essentially

unchanged under the standard Brownian diffusion model, indicating our analysis is robust to

variation in these spatial diffusion model assumptions. Bayes Factors again reveal support for

the origin implied under the rapid replacement hypothesis over the three alternative hypotheses.

8 Robustness to errors in cognate coding

In order to evaluate the robustness of our findings to errors in cognate assignment, we

injected noise into the binary data matrix in the form of both false negative and false positive

errors. False negatives are cases in which a truely cognate form is wrongly judged to be non-

cognate. False positives are cases in which words from two distinct cognate sets are wrongly

judged to be cognate. We model false negatives by randomly selecting 1’s in the binary version

of our matrix and reassigning them to be falsely non-cognate 0’s. False positives are modelled

by randomly selecting a cognate set in the binary matrix and merging it with another cognate

set for the same meaning class to form a single cognate set that is the union of the two previ-

ously distinct sets. We introduced false negatives and false positives at rates of 5%, 10% and

15%, and analysed four replicates of each of these using the same procedure as in our main

analysis. Supplementary Table 7 shows Bayes Factor support for H1 (the rapid replacement

hypothesis) remains consistent across all of these datasets. These additional analyses, together

with the low observed error count and general agreement between our 20126 and current Pama-

30

Page 31: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

Nyungan datasets (Supplementary Note 2; Supplementary Fig. 3), demonstrate the robustness

of our findings to any potential errors in the cognate coding.

31

Page 32: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

Supplementary Note 1 - Correlations between the archaeological record

and our Pama-Nyungan tree

In order to provide an approximate absolute time scale, we used the Wati separation and

subsequent divergence to calibrate rates of cognate replacement in our analysis. We selected

this calibration for several reasons. First, the Wati group is well-defined, providing a clear

point on the tree to attach a calibration. Second, the calibration is made more secure by the

fact that the Western Desert is one of the best studied areas of Australia in terms of Holocene

and Pleistocene settlement. Further, our constraints are conservative in that they span a broad

date range and are based on work by scholars who’s own speculation regarding the origin of

Pama-Nyungan as a whole is earlier than is supported by our analysis27–30, supporting the use

of the Wati calibration as an independent a priori assumption and highlighting the value of

a quantitative, model-based approach to historical inference. Genetic evidence has also been

argued to support a relatively late presence of Wati languages in the Western Desert31, 32, further

supporting this choice of calibration and making it more difficult to argue for a substantially

older age for the group (which would be required to increase support for hypotheses 2, 3 and

4).

As with any absolute chronology derived using phylogenetic inference, our time estimates

are dependent on the chosen rate calibrations. We reviewed the literature for additional calibra-

tion points and evaluated a number of other potential links between the archaeological record

and our language phylogeny. We did not identify further calibration points that ex ante provide

32

Page 33: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

uncontroversial links between archaeology and language, due to ambiguity in the archaeologi-

cal record about the timing and movement of people and/or ambiguity about how any signature

in the archaeological record could be tied to specific nodes in the tree, including uncertainty

in the tree itself. However, there are numerous points in the archaeological record that can be

alligned ex post with our linguistic findings. These multiple connections in multiple, diverse,

regions of the country, based on data that were not considered in our initial analysis, lend further

support to our inferred chronology. In addition, these links provide insight into the linguistic

and cultural identity of the peoples represented in the archaeological record.

In the following sections we discuss the most promising links between the archaeological

record and linguistic subgroups occupying different parts of the country, as well as the impli-

cations of our dated findings for interpretting these potential links. All clade ages are means

derived from the maximum clade credibility tree (Supplementary Fig. 4).

Central Australia

Karnic – Hercus and Clark33 state for the Southeastern Simpson Desert that sites are less

than 5000 years old - Veth34 quotes 2840 ± 80 BP but does not specify which site. This region

is contemporary Wangkangurru Country, in the western part of the Karnic area and could be

associated with the spread of Karnic or Western Karnic, unless the groups diversified elsewhere

and the Wangkangurru expansion into the Simpson Desert is recent. We infer an age for the

breakup of Western Karnic of 2800 years.

33

Page 34: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

Thura-Yura – Walshe35 finds dates from 1500–600BP for stone hearths in Adnyamath-

anha territory in the Flinders Ranges. The site has been disturbed, making it a controversial

calibration point, but these dates are consistent with our findings, where the most recent com-

mon ancestor of Adnyamathanha and its nearest neighbour Parnkala is 1552 years.

Arandic – There is evidence for human incursion into the north-west edge of Simpson

Desert (Therreyerte), base layer estimated at 3,040 BP (Smith, 198836, p279, cited in Smith,

201328, p112), in contemporary Arandic country. This could correspond to the breakup of

Arandic but has also been linked to the Karnic subgroup Arabana-Wangkangurru (cf. discussion

in 37). The archaeological dates fall between our inferred breakup of Arandic and Karnic (c.

4800 BP) and the later the breakup of Arandic (c. 1900 years).

Warluwaric – The Bunnengalla 1 site on Musselbrook Creek, near Bourketown is de-

scribed by Slack et al. 38. They claim that ‘the site provides a record of Late Holocene occupa-

tion from at least 6000 years BP with a considerable increase in occupation debris from 1300

BP’ Bunnengalla 1 (in the Boodjamulla National Park) is on the border of Waanyi (Garrwan,

Non-Pama-Nyungan) and Warluwaric (Wakaya) territory, 250 km northwest of our hypothe-

sised origin point for the Pama-Nyungan dispersal. This archaeological finding puts people

in the relevant area for initial separation of the Warluwaric group c. 5500BP, and is broadly

consistent with a separation of Wakaya from it’s closest relative c. 1150BP.

34

Page 35: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

Eastern Australia

Paman – Turney and Hobbs39 identify an increase in human-based activity in inland

Queensland sites (defined as >1km from the coastline) after 4860BP (±15 years). They also

suggest (p1745) that this is ‘possibly as a result of a significant expansion in the Aboriginal pop-

ulation. Prior to this time, relatively little activity is recognised, which we interpret to reflect a

low population density. Activity along coastal locations occurs significantly later in time, how-

ever.’ This is consistent with our tree, which has the entrance of speakers of Paman languages

into Queensland after 5076 years BP. David and Cole40 (p801-802) in a more localised study,

identify a transition of rock art style (to simplify, from engraving styles to ochre paintings) in

the eastern Cape York Peninsula region between 3000 and 2000 years ago. This coincides with

the break-up of the main clades of southeastern Paman.

Durubalic – Walters41 finds intensification of use of marine resources in the Moreton Bay

region starting about 1000 years ago. This is highly consistent with the breakup of the Durubalic

subgroup in the region (the Durubalic group is dated to approximately 1018 BP in our consensus

tree).

Yuin-Kuri – Hiscock42 discusses a ‘dramatic increase’ in archaeological finds between

4000BP and 2000 BP in the Hunter River Valley, pointing to increased population in that region.

This is consistent with our estimate for the separation of the Yuin-Kuri lineage 3700BP and

subsequent breakup from 2500BP. It would also fit with the expansion of Yuin-Kuri from north

to south, and the break-up of Central NSW (the sister subgroup to Yuin-Kuri) and Yuin-Kuri

35

Page 36: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

around this time.

Lower Murray and Lake Mungo – The Willandra Lakes region, including Lake Mungo,

has featured prominently in the literature on Pleistocene Australian Aboriginal life, because

of the wealth of footprints preserved in lake sedimentation. Fitzsimmons, Stern and Murray-

Wallace43 find a hearth at Lake Mungo dated to 5-3.2Kya, and note hearths from other Willandra

Lakes sites from about the same period, with occupation extending to the present. This overlaps

with the time (4kya) proposed by our tree for the breakup of Lower Murray and other Victorian

subgroups; assuming a migration path down the Murray River and then into Victoria (consistent

with the order of branching for these languages) would place the relevant group in this region

at the relevant time.

St George et al.44 provide new dates for the coastal middens at Long Point in the Coorong

(contemporary Ngarrindjeri territory). They find no evidence for middens older than 2500 years.

Our dates suggest an older breakup of the Lower Murray group – 3300 years. However, given

that St George et al’s findings are at the southern end of the Lower Murray region, it is possible

that the difference in dates reflects the time that speakers took to move to the Coorong from

further north; that is, that the Lower Murray subgroup broke up as speakers migrated down the

Murray River, beginning about 3300 years ago and reaching the Coorong by 2500 years ago.

Western Australia

Pilbara groups – the Mandu Mandu rock shelter has both Pleistocene and Terminal Holocene

36

Page 37: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

dates (the latter is most relevant; after 2,420 ± 50 BP; 34, p110). The linguistic affiliation of

these people is unclear - for example they could represent the Pilbara and SW Western Aus-

tralian clades as a whole or just the Kartu subgroup. Veitch, Hook and Bradshaw45 describe ‘an

abundance’ of archaeological finds dating to within the last 2000 years BP for the Hamersley

Plateau, which could correspond to the Ngayarta breakup or Pilbara languages more generally.

Our findings support the former - we infer a separation of the Ngayarta languages after 2400

BP and subsequent break up after 1900 BP.

Yolngu and Offshore Islands

Yolngu – Bourke et al.46 review archeological findings in the Arnhem Land region, in

both contemporary Pama-Nyungan (Yolngu) speaking areas (Blue Mud Bay) and non-Pama-

Nyungan areas. They find evidence for human habitation in coastal areas from 3500 years ago

(Blyth River, Central Arnhem Land) and 3000 years ago (Blue Mud Bay). These dates are more

recent than our trees, which suggest Yolngu’s separated around 5300 years ago; the breakup of

Yolngu languages is about 2200 years ago.

Sir Edward Pellew Islands – Vanderlin Island in the Sir Edward Pellew Group in the Gulf

of Carpentaria is currently inhabited by Yanyuwa language speakers. Sim and Wallace47 note

a hiatus in occupation of the island group following the marine transgression (6700BP) until

4200BP, suggesting a possible arrival of Pama-Nyungan (possibly Yanyuwa or a parent group)

at this time. However, a later hiatus from 2500BP to 1700BP could also represent Yanyuwa

arrival. The islands are also relatively close to the coast (just 2km), and Yanyuwa is spoken on

37

Page 38: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

the mainland. Given the ease of access, periods of occupation or apparent abandonment on the

island do not necessitate the presence or absence of the Yanyuwa language on the mainland. We

infer an age for the separation of Yanyuwa of c. 3400BP.

Torres Strait Islands –The Torres Strait Islands are inhabited by mainly Pama-Nyungan

speakers, including speakers of the KKY (Kalaw Kawaw Ya), KLY (Kala Lagaw Ya) and

Mabuiag varieties. The Eastern Torres Strait islands are inhabited by speakers of Meryam

Mir, a Papuan (Eastern Trans Fly) language. The Western Torres languages form a clade in

our analysis; however, the timing and identity of colonisers is unclear. Initial dating indicated

occupation from 2500BP48, but more recent work has revealed evidence for earlier occupation.

For example, western Torres Strait occupation may date to more than 7000BP, with a hiatus

between 3000-1800BP49. Complicating claims about the identity of any colonists, the Torres

Strait settlement is also thought to have been influenced by an Austronesian incursion from

3500BP50 and Papuan maritime horticultural people between 3800 and 2600BP51. The islands

were also a way point for trade between Papua New Guinea and the Australian mainland and

their low elevation means settlements have long been at risk from cyclones. The general picture

then is that there is not a secure link that can be made between nodes on the Pama-Nyungan

tree and any particular arrival event in the archaeological record on these islands. Our anal-

ysis suggests that Western Torres split from its nearest neighbour about 3650 years ago, with

the internal diversification of Western Torres varieties being much more recent (a few hundred

years).

38

Page 39: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

Wellesley Islands – The Wellesley Islands in the Gulf of Carpentaria are currently inhab-

ited by languages from the Tangkic sub-group. Our sample includes the Kayardild language

on Bentinck Island (South Wellesley Islands) and the Lardil and Yangkaal languages on Morn-

ington and Forsyth Islands respectively (North Wellesley Islands). The earliest evidence of

occupation on the islands goes back more than 3000 years, which could represent the initial

separation of these languages from their mainland relatives. However, evidence of settlement

increases from 2000BP (particularly at Mornington Island) and most sites occur within the last

300-500 years52, raising the possibility that the current inhabitants arrived more recently. This

is further complicated by proposals for an earlier origin of the Tangkic group as a whole on

Mornington island, with back migration to the mainland and subsequent recolonisation of the

South Wellesleys52. This makes it difficult to assign an age to any particular split in the group.

Our analysis indicates Kayardild and Yangkaal diverged very recently, between 244 and 480 BP.

This is significantly younger than the earliest archaeological dates and in line with colonisation

with the increased occurrence of sites from 300-500BP.

39

Page 40: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

Supplementary Note 2 - Comparison between our tree topology and prior

work

Here we compare our cognate assignments and tree topology to Bowern and Atkinson’s6

Bayesian phylogenetic classification of 194 Pama-Nyungan languages, based on an earlier ver-

sion of the data and analysed without spatial or temporal information. In addition to high-

lighting noteworthy features of the new tree, this comparison allows us to quantify the error

rate in cognate judgments and evaluate the robustness of our inferences following five years

of improvements to the data (as well as increased language coverage and more realistic, tem-

porally and spatially explicit modelling assumptions). While there are numerous classifica-

tions of Pama-Nyungan languages (see Koch53 for summary and discussion), only Bowern and

Atkinson6 provides an explicit proposal for groupings beyond the approximately 28 lower level

subgroups that are now established in the literature. These 28 groupings are generally agreed;

even Dixon30, whose classification rejects the notion of a Pama-Nyungan family and who does

not accept the validity of genetic classifications in many cases, uses mostly the same lower level

groups (calling some linguistic areas, others ‘small families’, without providing evidence).

First, we consider the difference in data/coding between the 2012 and current analyses.

111 languages were added between the two studies, as new data became available in Chirila54,

especially for languages in the Kulin and Paman subgroups. A number of additional wordlists

from 19th century sources were also included, where the degree to which they differed from

other languages in the region was not certain. 17 additional cognate meaning classes were also

40

Page 41: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

coded, adding to the number of forms used for each language. 200 meaning classes were used

in the final analysis (compared to 187 in the 2012 analysis). Due to improvements in the Chirila

database over the 5 years between the present and the 2012 analysis, more forms were available

for languages already in the tree. Among the 38,570 cognate sets represented by languages that

occur in both datasets, 1209 ‘missing’ cognate codes were replaced with actual forms. In ad-

dition, 1034 changes to individual existing cognate codes were made. These changes corrected

a combination of typographical errors, cognate coding errors (spurious similarities which were

detected in the light of more data), updates to understanding about cognacy markers in the lan-

guages (that is, previously overlooked cognates), and a more consistent treatment of marginal

judgement cases. In summary, leaving aside the addition of new languages and meaning classes,

5.8% of the data changed between 2012 and 2017, of which 2.7% was correcting errors and the

remaining was adding previously missing data.

Next, we compare the 2012 tree topology to the topology of our current tree - the 2012

analysis did not include time estimates or a spatial component, so this aspect of the analysis can-

not be compared. Supplementary Fig. 3 plots the two maximum clade credibility trees facing

one another with tips aligned. The tree comparison figure was made using the plot.cophylo

command in the R package phytools55. This function takes two phylogenetic trees in nexus

format and optimally positions nodes so as to align the tips, making visual comparison of phy-

logenies more straightforward.

The two trees are highly consistent in most respects. The low-level subgroups are all

41

Page 42: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

recovered, as are the well supported intermediate groupings. Bowern and Atkinson6 (here-

after B&A2012) found five main groups of Pama-Nyungan languages. Northern, Southern,

and Eastern were well supported; Western and Central were less confident (.88) and the West-

ern and Central combined clade did not receive strong support (.54); neither did the (North-

ern(Western,Central)) clade (also .54). Within Western Pama-Nyungan, two groups (Yolngu

and Warluwaric) split first, with low support, while the rest of Western Pama-Nyungan received

strong posterior support. B&A2012 described these findings but did not regard the largest

groupings as conclusively demonstrated. We note that all previous trees of Pama-Nyungan

were at best agnostic about higher level structure, with the exception of O’Grady’s proposal

for Nyungic (broadly, what we call Western Pama-Nyungan). While Bowern and Atkinson re-

garded their findings as rebutting ‘rake’ models of Pama-Nyungan (e.g., 56) in toto, conclusions

about the initial breakup of the family were tentative. Below we discuss differences across each

region.

The Southern group is congruous in the B&A2012 and current trees, with the exception

of the internal structure of Kulin. It is not surprising that adding 9 Kulin varieties to the sample

changed the internal classification of this group. The current classification more closely reflects

Blake’s57 classification, which was based mostly on morphological and phonological evidence.

There are several changes among the Northern group. Guwa moves out of the group. Its

nearest phylogenetic neighbor, Yanda, was not included in the 2012 tree; Yanda, Guwa, and

Badjiri now form a group within Karnic, with reasonable support (0.75). In the 2012 tree,

42

Page 43: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

Karnic was not monophyletic, with Eastern Karnic languages a sister to Yardli (compare 58). In

the current tree, Yardli is a sister to an expanded Karnic that also includes the ‘Karnic fringe’

(see 59 for the term) languages Yanda, Guwa, and Badjiri. Bowern has previously59 argued that

Badjiri is not Karnic but this was based primarily on pronoun and scant morphological case

data. A full lexical comparison had not been undertaken at that time. We consider this result

suggestive but not proven.

Compared to B&A2012, Kalkatungic (Kalkatungu and Yalarnnga) moves out of the North-

ern group to be a sister (with Warluwaric) of the Central + Western languages, but with low

support. Lexical data are not decisive for classification here; there are few cognates shared with

other groups, and those that are shared provide conflicting evidence for classification. For exam-

ple, Kalkatungu has rnuku ‘ankle’, shared with both languages of the Pama-Maric and Ngayarta

groups (60 reconstructed *nukal to Proto-Paman, for example), but we are not aware of other

groups that share this word in this meaning. Another example is Yalarnnga tatya ‘bite’, shared

with Yardli, Wangkumara, and Paakintyi. These languages are too far away for this word to be

a loan, but the word is not found elsewhere. Other cognates, like ngama ‘breast’, are so com-

mon across Pama-Nyungan that they are not diagnostic for subgrouping. In this case, the low

posterior support for placement of this group accurately reflects the difficulties in classification

based on this data.

The Central group of B&A2012 does not appear in this current tree. Instead, the three

groups that comprised it (Arandic, Thura-Yura, and a macro-Karnic-Yardli group) are suc-

43

Page 44: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

cessive sisters to Western Pama-Nyungan. These nodes are very poorly supported (posterior

support 0.26 - 0.36). We consider this question unresolved at present, pending more detailed

analysis of these groups, with additional lexical and other material.

The internal grouping of Western Pama-Nyungan changes in the following manner. In

B&A2012, the Yolngu and Warluwaric subgroups were sisters in the first clade to split from

Western Pama-Nyungan (posterior probability of 0.66). In the 2017 tree, Yolngu is the first

branch to split from the rest of Pama-Nyungan. Warluwaric splits from the Western-Central

group several nodes down, but still very early in the breakup of the family.

A finding noted in B&A2012, but not studied directly, was the striking congruence be-

tween the sequence of subgroup and language breakup in the tree and the geographical distri-

bution of languages. That is, in a number of different parts of the country, the tree is consistent

with a migration/spread along coast or inland waterways, with the languages splitting as they

spread. Examples include the North-to-South axis of Yuin-Kuri, the South-to-North axis of

Gumbaynggir-Bandjalangic-Durubalic-Waka-Kabic, East-to-West split within Lower Murray

(along the Murray River), the North-to-South breakup of Karnic along the Diamantina and Bul-

loo Rivers, the West-to-Northeast spread of Thura-Yura inland, and the north to south spread of

Western Pama-Nyungan. Note crucially that these inferences were made – in the 2012 paper –

without any information from geography. Such results attest to the strong geographic signal in

the data.

44

Page 45: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

Supplementary References

1. Drummond, A. J. & Bouckaert, R. R. Bayesian evolutionary analysis with BEAST (Cam-

bridge University Press, Cambridge, 2015).

2. Chang, W., Cathcart, C., Hall, D. & Garrett, A. Ancestry-constrained phylogenetic analysis

supports the Indo-European steppe hypothesis. Language 91, 194–244 (2015).

3. Bouckaert, R. et al. Mapping the origins and expansion of the Indo-European language

family. Science 337, 957–960 (2012).

4. Gray, R. D., Drummond, A. J. & Greenhill, S. J. Language phylogenies reveal expansion

pulses and pauses in Pacific settlement. Science 323, 479–483 (2009).

5. Gray, R. D. & Atkinson, Q. D. Language-tree divergence times support the Anatolian

theory of Indo-European origin. Nature 426, 435–439 (2003).

6. Bowern, C. & Atkinson, Q. Computational phylogenetics and the internal structure of

Pama-Nyungan. Language 88, 817–845 (2012).

7. Kitchen, A., Ehret, C., Assefa, S. & Mulligan, C. J. Bayesian phylogenetic anal-

ysis of semitic languages identifies an Early Bronze Age origin of Semitic in the

Near East. Proceedings of the Royal Society of London B: Biological Sciences DOI:

10.1098/rspb.2009.0408 (2009).

45

Page 46: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

8. Lee, S. & Hasegawa, T. Bayesian phylogenetic analysis supports an agricultural origin

of Japonic languages. Proceedings of the Royal Society of London B: Biological Sciences

DOI: 10.1098/rspb.2011.0518 (2011).

9. Lee, S. & Hasegawa, T. Evolution of the Ainu language in space and time. PloS ONE 8,

e62243 (2013).

10. Walker, R. S. & Ribeiro, L. A. Bayesian phylogeography of the Arawak expansion in

lowland South America. Proceedings of the Royal Society B: Biological Sciences 278,

2562–2567 (2011).

11. Grollemund, R. et al. Bantu expansion shows that habitat alters the route and pace of human

dispersals. Proceedings of the National Academy of Sciences 112, 13296–13301 (2015).

12. Felsenstein, J. Inferring phylogenies, vol. 2 (Sinauer associates Sunderland, 2004).

13. Gernhard, T. The conditioned reconstructed process. Journal of theoretical biology 253,

769–778 (2008).

14. Stadler, T., Kuhnert, D., Bonhoeffer, S. & Drummond, A. J. Birth–death skyline plot reveals

temporal changes of epidemic spread in HIV and hepatitis C virus (HCV). Proceedings of

the National Academy of Sciences 110, 228–233 (2013).

15. Tuffley, C. & Steel, M. Modeling the covarion hypothesis of nucleotide substitution. Math-

ematical biosciences 147, 63–91 (1998).

46

Page 47: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

16. Nicholls, G. K. & Gray, R. D. Dated ancestral trees from binary trait data and their appli-

cation to the diversification of languages. Journal of the Royal Statistical Society: Series B

(Statistical Methodology) 70, 545–566 (2008).

17. Alekseyenko, A. V., Lee, C. J. & Suchard, M. A. Wagner and Dollo: a stochastic duet by

composing two parsimonious solos. Systematic biology 57, 772–784 (2008).

18. Atkinson, Q., Nicholls, G., Welch, D. & Gray, R. From words to dates: water into wine,

mathemagic or phylogenetic inference? Transactions of the Philological Society 103, 193–

219 (2005).

19. Drummond, A. J., Ho, S. Y. W., Phillips, M. J. & Rambaut, A. Relaxed phylogenetics and

dating with confidence. PLoS Biol 4, e88 (2006).

20. Yang, Z. Maximum likelihood phylogenetic estimation from DNA sequences with vari-

able rates over sites: Approximate methods. Journal of Molecular Evolution 39, 306–314

(1994).

21. Pagel, M., Atkinson, Q. D. & Meade, A. Frequency of word-use predicts rates of lexical

evolution throughout Indo-European history. Nature 449, 717–720 (2007).

22. Lemey, P., Rambaut, A., Welch, J. J. & Suchard, M. A. Phylogeography takes a relaxed

random walk in continuous space and time. Mol Biol Evol 27, 1877–1885 (2010).

23. Bouckaert, R. Phylogeography by diffusion on a sphere: whole world phylogeography.

PeerJ 4, e2406 (2016).

47

Page 48: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

24. Moler, C. & Van Loan, C. Nineteen dubious ways to compute the exponential of a matrix.

SIAM review 20, 801–836 (1978).

25. Bouckaert, R. R. et al. BEAST 2: a software platform for Bayesian evolutionary analysis.

PLoS Comput Biol 10, e1003537 (2014).

26. Green, P. J. Reversible jump Markov chain Monte Carlo computation and Bayesian model

determination. Biometrika 82, 711–732 (1995).

27. Veth, P. Islands in the interior: a model for the colonization of Australia’s arid zone.

Archaeology in Oceania 24, 81–92 (1989).

28. Smith, M. The Archaeology of Australia’s Deserts (Cambridge University Press, 2013).

29. Dixon, R. M. W. The Australian linguistic area. In Aikenvald, A. Y. & Dixon, R. M. W.

(eds.) Areal diffusion and genetic inheritance: Problems in comparative linguistics, 64–104

(Oxford University Press, Oxford / New York, 2001).

30. Dixon, R. M. Australian languages: Their nature and development, vol. 1 (Cambridge

University Press, Cambridge, 2002).

31. Birdsell, J. B. Microevolutionary patterns in Aboriginal Australia: a gradient analysis of

clines (Oxford University Press, USA, New York, 1993).

32. Veth, P. Origins of the Western Desert language: convergence in linguistic and archaeolog-

ical space and time models. Archaeology in Oceania 35, 11–19 (2000).

48

Page 49: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

33. Hercus, L. & Clarke, P. Nine Simpson desert wells. Archaeology in Oceania 21, 51–62

(1986).

34. Veth, P. M. Islands in the interior: the dynamics of prehistoric adaptations within the arid

zone of Australia, vol. 3 (Intl Monographs in Prehistory, 1993).

35. Walshe, K. Aboriginal Occupation at Hawker Lagoon, Southern Flinders Ranges, South

Australia. Australian Archaeology 60, 24–33 (2005).

36. Smith, M. A. Central Australian Seed Grinding Implements and Pleistocene Grindstones.

In Meehan, B. & Jones, R. (eds.) Archaeology with Ethnography, 94–108 (Dept. of Prehis-

tory, Research School of Pacific Studies, Australian National University, Canberra, 1988).

37. Hercus, L. A. A grammar of the Arabana-Wangkangurru language: Lake Eyre Basin, South

Australia (1994).

38. Slack, M., Fullagar, R., Border, A., Diamond, J. & Field, J. Late Holocene Occupation at

Bunnengalla 1, Musselbrook Creek, Northwest Queensland. Australian Archaeology 54–58

(2005).

39. Turney, C. & Hobbs, D. ENSO influence on Holocene Aboriginal populations in Queens-

land, Australia. Journal of Archaeological Science 33, 1744–1748 (2006).

40. David, B. & Cole, N. Rock art and inter-regional interaction in northeastern Australian

prehistory. Antiquity 64, 788–806 (1990).

49

Page 50: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

41. Walters, I. Intensitified fishery production at Moreton Bay, southeast Queensland, in the

late Holocene. Antiquity 63, 215–224 (1989).

42. Hiscock, P. Technological Change in the Hunter River Valley and the Interpretation of Late

Holocene Change in Australia. Archaeology in Oceania 21, 40–50 (1986).

43. Fitzsimmons, K. E., Stern, N. & Murray-Wallace, C. V. Depositional history and archae-

ology of the central Lake Mungo lunette, Willandra Lakes, southeast Australia. Journal of

Archaeological Science 41, 349–364 (2014).

44. St George, C. et al. Radiocarbon dates for coastal midden sites at Long Point in the

Coorong, South Australia. Australian Archaeology 77, 141–147 (2013).

45. Veitch, B., Hook, F. & Bradshaw, E. A Note on Radiocarbon Dates from the Paraburdoo,

Mount Brockman and Yandicoogina Areas of the Hamersley Plateau, Pilbara, Western Aus-

tralia. Australian Archaeology 60, 58–61 (2005).

46. Bourke, P., Brockwell, S., Faulkner, P. & Meehan, B. Climate variability in the mid to late

Holocene Arnhem Land Region, North Australia: Archaeological archives of environmen-

tal and cultural change. Archaeology in Oceania 42, 91–101 (2007).

47. Sim, R. & Wallis, L. Northern Australian offshore island use during the Holocene: the

archaeology of Vanderlin island, Sir Edward Pellew Group, Gulf of Carpentaria. Australian

Archaeology 67, 95–106 (2008).

50

Page 51: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

48. Barham, A. et al. Late Holocene maritime societies in the torres strait islands, northern

australia-cultural arrival or cultural emergence? In East of Wallace s Line: Studies of past

and present maritime cultures of the Indo-Pacific region, 223–314 (AA Balkema, 2000).

49. Wright, D. & Jacobsen, G. Further radiocarbon dates from dabangay, a mid-to late

Holocene settlement site in western torres strait. Australian Archaeology 76, 79–83 (2013).

50. David, B. et al. Badu 15 and the Papuan-Austronesian settlement of Torres Strait. Archae-

ology in Oceania 39, 65–78 (2004).

51. McNiven, I. J. et al. Mask cave: Red-slipped pottery and the Australian-Papuan settlement

of Zenadh Kes (torres strait). Archaeology in Oceania 41, 49–81 (2006).

52. Memmott, P., Round, E., Rosendahl, D. & Ulm, S. Fission, fusion and syncretism: lin-

guistic and environmental changes amongst the Tangkic people of the southern Gulf of

Carpentaria, northern Australia. Land and Language in Cape York Peninsula and the Gulf

Country. Culture and Language Use 18, 105–136 (2016).

53. Koch, H. & Nordlinger, R. Historical relations among the Australian languages: genetic

classification and contact-based diffusion. In The Languages and Linguistics of Australia:

A comprehensive guide, 23–90 (Walter de Gruyter GmbH & Co KG, Berlin, 2014).

54. Bowern, C. Chirila: Contemporary and Historical Resources for the Indigenous Languages

of Australia. Language Documentation and Conservation 10, 1–45 (2016).

51

Page 52: The origin and expansion of Pama–Nyungan languages across ...10.1038... · Supplementary Figure 2: Inferred origin of the Pama-Nyungan language family tree under the standard Brownian

55. Revell, L. J. Phytools: an R package for phylogenetic comparative biology (and other

things). Methods in Ecology and Evolution 3, 217–223 (2012).

56. Bellwood, P. Prehistoric cultural explanations for widespread language families. In Mc-

Convell, P. & Evans, N. (eds.) Archaeology and linguistics: Aboriginal Australia in global

perspective, 123–134 (Oxford University Press, Melbourne, 1997).

57. Blake, B. Kulin and its neighbours (2011). Manuscript, La Trobe University.

58. Hercus, L. & Austin, P. The Yarli languages. In Bowern, C. & Koch, H. (eds.) Australian

languages: Classification and the comparative method, no. 249 in Current Issues in Lin-

guistic Theory, 179–206 (John Benjamins, Amsterdam, 2004).

59. Bowern, C. Karnic classification revisited. In Simpson, J., Nash, D., Laughren, M. &

Alpher, B. (eds.) Forty years on: Ken Hale and Australian languages, 245–61 (Pacific

Linguistics, 2001).

60. Hale, K. L. Classification of northern Paman languages. Oceanic Linguistics 3, 248–265

(1964).

52