arxiv:1908.02540v1 [physics.soc-ph] 7 aug 2019 · source could be air tra c records that allow to...

12
Migrant mobility flows characterized with digital data Mattia Mazzoli, 1, * Boris Diechtiareff, 2 Ant` onia Tugores, 1 Willian Wives, 2 Natalia Adler, 3 Pere Colet, 1 and Jos´ e J. Ramasco 1, 1 Instituto de F´ ısica Interdisciplinar y Sistemas Complejos IFISC (CSIC-UIB), Campus UIB, 07122 Palma de Mallorca, Spain 2 United Nations Children’s Fund -UNICEF Bras´ ılia, DF 70750-521, Brazil 3 United Nations Children’s Fund -UNICEF3 UN Plaza, New York, NY 10017, USA Monitoring migration flows is crucial to respond to humanitarian crisis and to design efficient policies. This information usually comes from surveys and border controls, but timely accessibility and methodological concerns reduce its usefulness. Here, we propose a method to detect migration flows worldwide using geolocated Twitter data. We focus on the migration crisis in Venezuela and show that the calculated flows are consistent with official statistics at country level. Our method is versatile and far-reaching, as it can be used to study different features of migration as preferred routes, settlement areas, mobility through several countries, spatial integration in cities, etc. It provides finer geographical and temporal resolutions, allowing the exploration of issues not contemplated in official records. It is our hope that these new sources of information can complement official ones, helping authorities and humanitarian organizations to better assess when and where to intervene on the ground. INTRODUCTION Migration is an ubiquitous phenomenon in human his- tory. People move to improve living conditions or, sim- ply, to escape social distress [1] and natural disasters [2]. The collection of data on migration flows dates back at least to 1871, when the United Kingdom registered the difference of inhabitants in a period of a decade [3]. The data showed that the population changes could not be ex- plained by the number of births and deaths alone, hence another reason had to be involved in the process: migra- tion. Official data on migration flows relies on the com- parison of heterogeneous records across countries, which usually have different time scales and time coverage [4]. Data sources include national census, which are released every 10 years, specific surveys, border control and resi- dence permits requests. Surveys and census records are helpful tools for estimating migratory statistics, but the information is not provided at a multinational scale, they are costly and, consequently, the update times tend to be long. Looking at a single country, the net population variation, besides changes due to births and deaths, can be calculated as the difference between the number of immigrants and emigrants. Migrants’ mobility, however, can involve several countries biasing the single-country statistics. There can be multiple entrances in a coun- try by the same individual or several countries along the trajectory of migrants. This adds inconsistencies to the information at the local perspective, e.g., recurring and returning migrants can affect the number of border cross- ing events. To capture the complexity of the migratory phenomenon, the concept of interregional flows must be introduced [5]. In this context, another potential data source could be air traffic records that allow to estimate * mattia@ifisc.uib-csic.es jramasco@ifisc.uib-csic.es human mobility worldwide [6]. However, they only cap- ture one transportation mode, long or medium distance movements and there exist a bias toward wealthier indi- viduals. Given this situation, there has been recent calls for new sources providing reliable data on the mobility of mi- grants and, especially, refugees [7–10]. Data associated to the use of information and communication technologies (ICT) has experienced a large growth in the last decade. ICT data contains temporal and spatial records and it is a useful, fast and inexpensive information source to characterize human mobility at different scales [11]. For example, mobile phone records have been employed to study human mobility with promising results at urban [12–15] and inter-urban level [16–18]. They have been also used to analyze migrant communities distribution and integration in cities [19, 20]. Yet the phone sector is too fragmented to cover a global scale since the data is usually restricted to a single country, making cross- border movements hard to observe. Other examples with similar, yet even more reduced geographical scope, are the GPS tracks left by cars [21]. To expand the focus beyond national borders, we need data coming from services that are genuinely transna- tional. The widespread adoption of online social net- works has finally introduced such global dimension. For instance, the advertising tool of Facebook has been used as a source of migration data although the internal user classification of the platform is not very transparent [22]. In a more open context, data from Twitter has been used to uncover worldwide patterns of human mobility [23– 25], infer international migration flows [26–28], estimate cross-border movements [29] and study immigrant inte- gration [30, 31]. Twitter data is also available in countries where census data is unaccessible, outdated or only at- tainable in the local language [32]. Furthermore, Twitter data has been shown to bear mobility information com- patible with the one provided by cell phone records in cities [33]. On the other hand, the data is sparse (i.e., arXiv:1908.02540v1 [physics.soc-ph] 7 Aug 2019

Upload: others

Post on 16-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: arXiv:1908.02540v1 [physics.soc-ph] 7 Aug 2019 · source could be air tra c records that allow to estimate mattia@i sc.uib-csic.es yjramasco@i sc.uib-csic.es human mobility worldwide

Migrant mobility flows characterized with digital data

Mattia Mazzoli,1, ∗ Boris Diechtiareff,2 Antonia Tugores,1 Willian

Wives,2 Natalia Adler,3 Pere Colet,1 and Jose J. Ramasco1, †

1Instituto de Fısica Interdisciplinar y Sistemas Complejos IFISC (CSIC-UIB), Campus UIB, 07122 Palma de Mallorca, Spain2United Nations Children’s Fund -UNICEF Brasılia, DF 70750-521, Brazil

3United Nations Children’s Fund -UNICEF3 UN Plaza, New York, NY 10017, USA

Monitoring migration flows is crucial to respond to humanitarian crisis and to design efficientpolicies. This information usually comes from surveys and border controls, but timely accessibilityand methodological concerns reduce its usefulness. Here, we propose a method to detect migrationflows worldwide using geolocated Twitter data. We focus on the migration crisis in Venezuelaand show that the calculated flows are consistent with official statistics at country level. Ourmethod is versatile and far-reaching, as it can be used to study different features of migration aspreferred routes, settlement areas, mobility through several countries, spatial integration in cities,etc. It provides finer geographical and temporal resolutions, allowing the exploration of issues notcontemplated in official records. It is our hope that these new sources of information can complementofficial ones, helping authorities and humanitarian organizations to better assess when and whereto intervene on the ground.

INTRODUCTION

Migration is an ubiquitous phenomenon in human his-tory. People move to improve living conditions or, sim-ply, to escape social distress [1] and natural disasters [2].The collection of data on migration flows dates back atleast to 1871, when the United Kingdom registered thedifference of inhabitants in a period of a decade [3]. Thedata showed that the population changes could not be ex-plained by the number of births and deaths alone, henceanother reason had to be involved in the process: migra-tion. Official data on migration flows relies on the com-parison of heterogeneous records across countries, whichusually have different time scales and time coverage [4].Data sources include national census, which are releasedevery 10 years, specific surveys, border control and resi-dence permits requests. Surveys and census records arehelpful tools for estimating migratory statistics, but theinformation is not provided at a multinational scale, theyare costly and, consequently, the update times tend tobe long. Looking at a single country, the net populationvariation, besides changes due to births and deaths, canbe calculated as the difference between the number ofimmigrants and emigrants. Migrants’ mobility, however,can involve several countries biasing the single-countrystatistics. There can be multiple entrances in a coun-try by the same individual or several countries along thetrajectory of migrants. This adds inconsistencies to theinformation at the local perspective, e.g., recurring andreturning migrants can affect the number of border cross-ing events. To capture the complexity of the migratoryphenomenon, the concept of interregional flows must beintroduced [5]. In this context, another potential datasource could be air traffic records that allow to estimate

[email protected][email protected]

human mobility worldwide [6]. However, they only cap-ture one transportation mode, long or medium distancemovements and there exist a bias toward wealthier indi-viduals.

Given this situation, there has been recent calls fornew sources providing reliable data on the mobility of mi-grants and, especially, refugees [7–10]. Data associated tothe use of information and communication technologies(ICT) has experienced a large growth in the last decade.ICT data contains temporal and spatial records and itis a useful, fast and inexpensive information source tocharacterize human mobility at different scales [11]. Forexample, mobile phone records have been employed tostudy human mobility with promising results at urban[12–15] and inter-urban level [16–18]. They have beenalso used to analyze migrant communities distributionand integration in cities [19, 20]. Yet the phone sectoris too fragmented to cover a global scale since the datais usually restricted to a single country, making cross-border movements hard to observe. Other examples withsimilar, yet even more reduced geographical scope, arethe GPS tracks left by cars [21].

To expand the focus beyond national borders, we needdata coming from services that are genuinely transna-tional. The widespread adoption of online social net-works has finally introduced such global dimension. Forinstance, the advertising tool of Facebook has been usedas a source of migration data although the internal userclassification of the platform is not very transparent [22].In a more open context, data from Twitter has been usedto uncover worldwide patterns of human mobility [23–25], infer international migration flows [26–28], estimatecross-border movements [29] and study immigrant inte-gration [30, 31]. Twitter data is also available in countrieswhere census data is unaccessible, outdated or only at-tainable in the local language [32]. Furthermore, Twitterdata has been shown to bear mobility information com-patible with the one provided by cell phone records incities [33]. On the other hand, the data is sparse (i.e.,

arX

iv:1

908.

0254

0v1

[ph

ysic

s.so

c-ph

] 7

Aug

201

9

Page 2: arXiv:1908.02540v1 [physics.soc-ph] 7 Aug 2019 · source could be air tra c records that allow to estimate mattia@i sc.uib-csic.es yjramasco@i sc.uib-csic.es human mobility worldwide

2

users are not continuously active) and there are biasestowards younger and richer individuals [33–36]. Thisimplies that, when taking into account smaller scales,a thorough validation exercise is necessary. Although ge-olocated Twitter data is sparser than census, surveys andmobile phone records, the observed level of correlation al-lows for the interchangeability of these sources to studypopulation density and mobility [33, 37, 38].

In this work, we introduce a method to uncover mi-gration flows using Twitter data. As a proof of con-cept, we focus on the current Venezuelan migration cri-sis. At country level, the flows obtained from Twitterhave been validated against official estimations releasedby the International Organization for Migration (IOM)in September 2018, the Federal Police (Policia Federal)of Brazil and by the United Nations High Commissionerfor Refugees (UNHCR or ACNUR) in November 2018and January 2019. Furthermore, our method providesinformation at finer geographical and temporal resolu-tions. It also allows the exploration of other issues notcontemplated in official records, such as mobility acrossmultiple countries, routes taken by the migrants, as wellas the places where they settle down.

MATERIALS AND METHODS

Classical data sources

Traditionally, official statistics come from census sur-veys, household or labor surveys, population registers,administrative records and border control. For sake ofcompleteness, we briefly summarize the characteristicsof these sources and offer references for further informa-tion. In terms of surveys and starting by census, surveysof census focus on migrant stocks or migrant flows in-cluding socio-economic features like the country of birth,citizenship, age, sex, education, occupation and time ofarrival of migrants in the new country [39, 40]. Spatial in-formation in census surveys is typically given at regionalscale but modern censuses provide information even atfiner scales. The typical limitations of censuses on mi-gration data are their cost, the slow updating (usuallyevery 5 or 10 years), which decreases the information va-lidity since it gets outdated fast, and the fact that theydo not capture illegal immigration. Secondly, householdsurveys, apart from measuring migrant stocks or migrantflows, focus on the drivers and the impact of migration,internal displacement, emigration and immigration of agiven country. A third source comes from labor force sur-veys which produce statistics on migrant stocks in the la-bor market of the country. These data allow for countryby country comparability and they offer detailed data onsmall population groups. However, many countries donot include crucial migration questions in their censussurveys such as citizenship, country of birth and year ofimmigration of the person, they may be infrequent orcostly and can also present issues with sample size and

coverage.Population registers are held by local authorities and,

as the surveys, provide information on the people livingin the area. However, there are differences on how theregisters are designed, how they define residence, on thepopulation coverage sometimes excluding foreigners, e.g.citizens of some countries are not required to register in-side EU states and sometimes leaving migrants are notrequired to unregister. As a result, they lack of compa-rability from country to country. Other important andclassical sources of migrant flows and stocks gathered ina different way comes from administrative records, whichcan include records of citizens changing their usual resi-dence, tourist visas, work visas, study permits, residencepermits and work permits. Even if these records are con-tinuous and comprehensive, they lack compatible defini-tions among different countries, moreover they often lackcoverage and availability in some countries. Further is-sues may come from the fact that this data often doesnot cover naturalized or illegal residents who overstaytheir visas. Data may not represent the total numberof immigrants in the country, e.g., if the visa granted tothe head of the family covers his or her dependents and,as a further example, authorities may not track renewalsor changes of citizenship. Lastly, border post recordsof leaving or entering people keep track of flows betweencountries. These records often do not discern between mi-gration flows and other kinds of mobility, such as tourism.Moreover, in places where the possibility of evading theborder control is high, the records become highly unreli-able. A review on these classical methods can be found at[41, 42]. In particular, here we employ official data fromborder control, permits statistics and registers gatheredby the UNHCR [43, 44], the IOM [45] and the FederalPolice of Brazil [46] to validate the information obtainedfrom Twitter.

Twitter data for mobility information

Classical data sources have major caveats in terms ofcomparability, coverage and immediacy of the furnishedinformation [7–10, 47]. ICT data has the potential tocomplement classical sources by generating estimates ofsocio-economic phenomena such as transport, mobility,urbanism and migration. Of all the possible ICT datalike mobile phone records, social networks, etc., Twitteris one of the most accessible since the data is openly dis-tributed. Twitter data is not safe from biases, severalstudies demonstrated that people tweeting are mostlyyoung, wealthy and tend to live in cities [34, 48]. How-ever, more recent studies show that the relevance of someof these biases like those related to age or geographicalorigin are slowly decreasing [35, 48].

Beyond demographics, the impact of biases in thestudy of aggregated mobility from geolocated tweets hasbeen addressed in [33, 37]. Origin-destination (OD) ma-trices obtained from Twitter, mobile phone records and

Page 3: arXiv:1908.02540v1 [physics.soc-ph] 7 Aug 2019 · source could be air tra c records that allow to estimate mattia@i sc.uib-csic.es yjramasco@i sc.uib-csic.es human mobility worldwide

3

mobility surveys were found to be equivalent at scaleslarger than one square kilometer. This is due to the factthat mobility patterns are not strongly dissimilar acrossage groups and gender [49] and the demographic biasesdo not influence so strongly the OD matrices extracted.A similar effect could be expected in refugees and mi-grant displacements that may be carried out in groups,and thus detecting some members could be enough todescribe the mobility patterns.

Accessing and filtering Twitter data

The data was accessed using the publicly availableTwitter Streaming API [50], selecting tweets with ge-ographical information (a description of the script em-ployed is included in Availability of data and materials).We have collected data from January 2015 to April 2019.The location has to be provided at the level of place witha bounding box smaller than 40 km on the longest side,otherwise the tweets are not included in the spatial anal-ysis but we still use them to count country-level flows (us-ing the country identification in the tweet). A minimalfilter for bots is implemented by neglecting users tweet-ing more than twenty times per hour on average in theirtweeting life span. Multi-thread tweets are counted as asingle one. The analysis focuses on South and CentralAmerica and Caribbean, where over 70% of the Venezue-lan migration is concentrated, according to the UN [51].We divided this extensive territory in a grid of cells of40 km side, which became our basic spatial units. Theposition of the users is approximated by the centroid ofthe cells where they tweet from. With this information,we calculate the radius of gyration of each user trips:

rg =

√√√√ 1

n

n∑i

(rcm − ri)2 (1)

where n is the number of tweets, ri the cell centroid andrcm stands for the center of mass of each user movements.We disregard users with rg > 5000 km (see the distribu-tion of rg in the Supporting Information) because thisimplies all the trips are long distance. These accounts aretypically multiuser, institutional or company, and cannotoffer any mobility information.

Ethics statement

As a final comment, we adhere to strict data responsi-bility and ethics principles, ensuring that no personallyidentifiable information is kept. The meta-data of thetweets, which can be considered personal, are deletedbefore storage and the user ID is irreversibly encryptedusing a SHA-512 algorithm. We do not track individuals,all the spatial analyses are performed using aggregatedtrips and number of residents at a minimal scale of 1 km2

and in most of the cases over 1, 600 km2. All the figuresand tables show aggregated data and only when the num-ber of individuals is larger than three in a cell.

Resident classification

There is no unique definition of resident applicable toa situation with sparse data as ours. Given the lack ofclear definition, we provide several definitions and checkif they bear differences in the outcomes of the analyses.Every tweet has an associated country code reference.Users tweeting from Venezuela less then five times in theperiod 2015-18 or with all the tweets posted in a timewindow shorter than three months were excluded fromthe study as non-representative data. After that for eachuser we identified the most common country of origin ofthe tweets posted every month. By doing so, we createdthe users’ history as a sequence of country flags, withone flag for each month. In particular, if the most com-mon country was Venezuela, we labeled the correspond-ing month as ”Ven”. If, for a given month, there wereno tweets posted or there was a draw between Venezuelaand another country, we labeled the month as ”Und”(for ’undetermined’). Then we considered the followingcriteria to discern Twitter Users which are Venezuelanresidents (TUVs) based on filters with a gradual level ofrestriction:

• The first criterion is the most restrictive one. Weconsidered TUVs the users appearing to be flaggedVenezuela three months in a row at least once in ourdatabase. In this way, we obtained approximately160, 000 TUVs.

• The second criterion is slightly less restric-tive. It included the users identified before andthose with a sub-sequence Ven-Und-Ven. With thismethod we estimated around 200, 000 TUVs.

• The third criterion is similar to the previous one.We included all the users coming from the first cri-terion and added those with a sequence Und-Ven-Ven or Ven-Ven-Und. This yielded around 216, 000TUVs.

• The fourth criterion is comprehensive of all ofthe above. With this method, we obtained around253, 000 TUVs.

Determining the number of TUVs is crucial not only tomeasure the displacement flows but also to infer the up-scaling factors as discussed in the subsection on upscalingfactors. These factors allow us to translate the observedTUV numbers into the total flows. We estimated themas the ratio between the population of the country pro-jected from the census and the number of TUVs, updatedevery year in terms of empirical flows recorded.

Page 4: arXiv:1908.02540v1 [physics.soc-ph] 7 Aug 2019 · source could be air tra c records that allow to estimate mattia@i sc.uib-csic.es yjramasco@i sc.uib-csic.es human mobility worldwide

4

(a) (b)

CO

BREC

PE

ALL

PY

GY

BOCW

TTCR

AW

DO

PA

BREC

AR CL

PECO

ALL

FIG. 1. Country-level flow validation. Comparison between the migrant flows estimated from the upscaled Twitter dataNtwitter following the different resident criteria and the official numbers Nofficial from the UNHCR in January 2018. These

official numbers in some countries are projections. Each point corresponds to a country. The panel (a) shows in log-log scaleall the countries of the considered area, and the panel (b) a zoom for those receiving the strongest flows plus the global total.In both cases, the grey dashed lines are the diagonal. The comparison produces a R2 = 0.98 for all the countries and R2 = 0.99for Brazil, Colombia, Ecuador and Peru alone. The country codes are Argentina (AR), Aruba (AW), Bolivia (BO), Brazil(BR), Chile (CL), Colombia (CO), Costa Rica (CR), Curacao (CW), Dominican Republic (DO), Ecuador (EC), Guyana (GY),Panama (PA), Paraguay (PY), Peru (PE) and Trinidad and Tobago (TT).

Definition of migrants

The UN definition of long-term migrant requires theperson to stay in the new country for at least 12 months[52]. The number of migrants estimated by UN agenciesin a single country often relies on records referring toresidence permits or flows observed at the border. Theobjective in this work is to capture flows of people leav-ing a country in a humanitarian crisis. Definitions basedon long-time scales are not adequate for describing pop-ulation mobility under these circumstances and it doesnot exist a universal definition of migrant [52]. There-fore, abusing the term, we consider here as migrant anyindividual leaving Venezuela during the time window ofobservation.

RESULTS

Upscaling factors

The number of TUVs identified with the four crite-ria and detected as active (posting tweets) every year isshown in Table I. As shown in the Table, independentlyof the criteria used the number of TUVs is relatively sta-ble in 2015 and 2016, and then we found a decline of over20% in 2017 and almost 50% in 2018. The migration cri-sis had the first peak in the last months of 2016, so thereis a combination of factors that can explain the drop inthe number of TUVs. The first one is migration to othercountries –estimating this is our target–, which can dis-suade users from using geolocated social networks, but

this factor alone does not explain the entire complexityof the change. We see a drop of 43, 000 TUVs with crite-rion 4 between 2016 and 2017, while the observed TUVsleaving the country with the same criterion 4 in 2016 is11, 000 TUVs. Other factors such as the general use ofgeolocated Twitter, as well as economic and social stress,must have contributed to such a systematic decline. Thisimpression is further confirmed when, as a proof of con-cept, the number of geolocated users residing in Colom-bia is analyzed using the corresponding criterion 4 forthis country. We observe a drop in the same years from2016 to 2017, from 236, 000 to 198, 000. A general dropin the use of geolocated Twitter seems to be happening inthe South American region. The population of Venezuelain 2011, according to the census, was P (2011) = 29 mil-lion inhabitants [53]. The United Nations PopulationDivision projects the Venezuelan population at aroundP (2015) = 30.1 million for 2015 [54]. As a simplifyingassumption, we neglect the population changes due tothe difference between natality and mortality. Given theshort period of time considered, its effects on the totalpopulation are much weaker than those introduced by mi-gration. The number of active TUVs that same year ac-cording to criterion 4 was u4(2015) = 157, 500. This givesus an upscaling factor of S4(2015) = P (2015)/u4(2015).For the following year, we have a number e4(2015) TUVsleaving the country according to the same criterion 4.This implies a population decrease given by P (2016) =P (2015) − S4(2015) e4(2015). In general, we can write arecurrent formula for the upscaling factor associated to

Page 5: arXiv:1908.02540v1 [physics.soc-ph] 7 Aug 2019 · source could be air tra c records that allow to estimate mattia@i sc.uib-csic.es yjramasco@i sc.uib-csic.es human mobility worldwide

5

(a) (b)

(c) (d)

7

20

55

150

403

1100

2980

8100

7

20

55

150

403

1100

2980

8100

FIG. 2. Migrants’ routes. Number of individuals observed in every 40×40 km2 cell in the area of study. The heatmap scale islogarithmic. Only cells with more than 3 individuals are displayed. In (a) the full South American continent plus the Caribbeanand Center America; (b) A zoom in on the Northern area focused on the Caribbean; In (c), a zoom in highlighting the SouthernCone; And in (d), a zoom in on Brazil. Map tiles by Carto, under CC-BY 4.0. Data by OpenStreetMap, underODbL.

criterion k (k = 1, ..., 4):

Sk(t) =P (t− 1) − ek(t− 1)Sk(t− 1)

uk(t), (2)

where the first year corresponds to 2015 with P (2015) =30.1, as stated before. Applying these calculations fora given criterion k, we obtain an upscaling factor peryear. These factors are the inverse of the population frac-tion tweeting with geolocation during the correspondingyear and, with all the needed cautions on possible biases,one can assume that for criterion k one TUV leaving thecountry represents Sk(t) actual migrants. Measuring theTUVs detected abroad in 2018 and upscaling the flowsaccording to the first exit year, we obtain the numbersshown in Table II and in Figure 1.

Validation of external flows

We are interested in estimating the number of Venezue-lan migrants in each country because these are the num-bers reflected in the statistics of entries and registers, and

TABLE I. Number of TUVs detected according to the fourcriteria discussed in the subsection on resident classificationin thousands of individuals.

Year crit.1 crit.2 crit.3 crit.4

2015 106 K 129 K 134 K 157.5 K

2016 110 K 132 K 137 K 159.5 K

2017 82 K 98 K 100 K 116 K

2018 51 K 60 K 62 K 71 K

constitute the basis for the records used by national andinternational authorities. Note that the official sourcesare highly heterogeneous in nature and vary from countryto country. For this purpose, we count for each year thenumber of TUVs appearing for the first time in a differ-ent country and determine the migrant flows by upscalingas discussed before. Some of the TUVs can be passingthrough and continue the travel to third countries, wherein turn, they will be counted as well. The comparison ofthe flows obtained in Table II with the numbers providedby the international and national agencies shows a good

Page 6: arXiv:1908.02540v1 [physics.soc-ph] 7 Aug 2019 · source could be air tra c records that allow to estimate mattia@i sc.uib-csic.es yjramasco@i sc.uib-csic.es human mobility worldwide

6

TABLE II. Estimated migration of Venezuela citizens in themain four countries receiving the flows. The different linesin ”Data” correspond to the different criteria for establishingVenezuelan residents with the Twitter data and the last fourlines correspond to official figures from IOM, the United Na-tions (UNHCR) and the Federal Police of Brazil (PFB). Theunits are thousands of individuals (K).

2018 Brazil Peru Ecuador Colombia Total

Data(1st) 168K 589K 247K 1,080K 3,970K

Data(2nd) 163K 588K 242K 1,070K 3,920K

Data(3rd) 163K 593K 246K 1,090K 3,960K

Data(4th) 160K 590K 243K 1,080K 3,920K

IOM Sep18 75K 414K 209K 935K 2,600K

UNHCR Nov18 85K 500K 220K 1,000K 3,000K

UNHCR Jan19 96K 506K 221K 1,100K 3,400K

PFB Dec18 199K – – – –

alignment with our estimation. An impression furtherconfirmed in Figure 1 for the UNHCR data in most ofthe countries in the area with R2 over 0.9. The mainoutlier in the Table II is Brazil, where the IOM and UN-HCR give values well below our estimations. However,the Federal Police of Brazil registered a total numberof entries over 199, 000 from January 2017 until Decem-ber 2018 (even though half of them end up returning toVenezuela [46]). Our numbers lay within the range pro-vided by the different sources of information. Given thesimilarity across the upscaled flows obtained with the dif-ferent criteria, we continued our analysis with criterion4, which provides the largest statistics.

Migration routes

Beyond flow estimations, it is important to identify thepreferred routes taken by migrants in order to better pro-vide targeted humanitarian assistance. As mentioned, wetook as geographical basis a grid of cells of 40 km side cov-ering the full South American continent, the Caribbeanand part of Center America. We counted the cumula-tive number of TUVs posting messages in each cell. Thenumber of distinct TUVs in the whole period (2015-2018)is plotted as a heat map in Fig. 2. The densest areasare located in Venezuela, specifically near Caracas. Therelevant users for the analysis are those classified as res-idents (TUV) and who have been detected abroad. Thebeginning of most of their trips are in Venezuela and thedensity matches population distribution. Once in othercountries, the routes concentrate also in the large citieslike Buenos Aires, Santiago, Sao Paulo, Rio de Janeiroand Lima (which can be temporary stops or travel desti-nations) and around the principal roads and rivers. Forexample, following a preferred route from Venezuela toManaus, a split occurs: some TUVs take the ferry nav-igating the Amazon to the Brazilian city of Belem and

subsequently travel along the coast to the cities of Fort-aleza and Salvador; while other TUVs travel from Man-aus through the forest via the BR-319 highway towardsthe border with Peru and Bolivia. Overall, the preferredmigration route in South America is the Pan-Americanhighway, passing through Colombia, Ecuador, Peru andall the way to Chile and Argentina. Other minor routesare also detected, such as the one from La Paz in Boliviato Cordoba in Argentina.

To better understand the potential application of ourmethod, we built a vector representation in each cell. Forevery TUV tweeting from the cell, we built a unit vec-tor pointing from the present location to the cell withthe consecutive tweet with the condition that it must becloser than 500 km to exclude as much as possible airtraveling and noise coming from infrequent users. Weseparated the unit vectors in those pointing towards oraway from Venezuela, and we added those in every cellin each category. The resulting outgoing vectors are dis-played in red in Figure 3, while the in-going ones arein blue. These maps provide information on the mainground exiting routes from Venezuela and also on thetotal flow observed in both directions. To go further,we can establish a line intersecting the routes (center ofthe parallel dashed lines in Figure 3) and consider the up-scaled number of TUVs to calculate the number of peoplecrossing the line per year and the direction they are going.The results for 2017 are shown in the right-bottom cor-ner of the maps, while the information collected year byyear is included in Table III. It is important to note thatthe numbers are relatively small. This is due to the factthat we need to see two consecutive tweets: one aboveand another below the line to identify a user. However,not all TUVs have these two tweets, as some may traveldirectly from Venezuela by plane or simply do not tweetso often during their trips. These numbers are, there-fore, underestimations although they are proportional tothe total flow. We would need to further upscale themaccording to the fraction of TUVs active enough to bedetected on both sides of the lines. Without further pro-cessing, we can, nevertheless, compare results obtainedon the same route year by year and between routes withthe same methodology. In Table III, we observe a cleardomain of the outflows over the inflows to Venezuela. Inthe Pan-American highway, there is an important grow inthe flows year by year, almost doubling in the last periodbetween 2017 and 2018. The route entering in Roraimahas, in turn, a flow that is close to a factor 10 below thePan-American one, with an initial increase until 2017 fol-lowed by a strong decline in the following year.

Recurrence

Note that 25% of the migrant TUVs travel back andforth from Venezuela (recurrent travelers) as we will showbelow, while the remaining 75% stay most of the timeabroad. To estimate the time spent in a location, we

Page 7: arXiv:1908.02540v1 [physics.soc-ph] 7 Aug 2019 · source could be air tra c records that allow to estimate mattia@i sc.uib-csic.es yjramasco@i sc.uib-csic.es human mobility worldwide

7

(a) (b)

18,500

13,500

2,600

1,400

2017

FIG. 3. Crossing routes. Map of two main ground migrant exit routes from Venezuela: in (a) the Pan-American road witha portion of Colombia, Ecuador and Peru, and in (b) the area of Roraima, Amazonas and Para states in Brazil. Blue arrowsindicate flows toward Venezuela and the red ones away from it. Only cells with more than three TUVs are shown in the maps.The lightness of the color of the arrows is proportional to the net in- and out-flows from light to darker colors. The upscalednet flows crossing the dashed lines are displayed in the right-bottom corner of each plot.

TABLE III. Upscaled number of TUVs crossing the lines ofFigure 3 away (←) and toward (→) Venezuela in the tworoutes considered.

Year 2015 2016 2017 2018

Pan-American roadaway (←) 6,550 11,300 18,500 36,200

toward (→) 5,500 9,300 13,500 23,700

Roraimaaway (←) 1,200 1,490 2,600 1,070

toward (→) 1,200 1,300 1,400 720

take the following convention: the time between consec-utive tweets is assigned to the location of the first one.Applied to countries, this allows us to define the cumu-lative time spent abroad for each TUV after his/her firsttravel to other country, tout. Additionally, we can calcu-late the time span between the first tweet abroad and thelast tweet of the user, tTot. In this way, we define a ratiobetween the time spent abroad and the total after thefirst exit from Venezuela for each TUV, R = tout/tTot.There is a wide range of behaviors as can be seen in thedistribution of Figure 4. TUVs detected abroad for thefirst time in their last tweet are not considered. Notethat the precision is limited by the heterogeneity in theuser inter-event time distribution and the total time win-dow that we are able to analyze. Some TUVs correspondthe canonical view of migrants, leaving the country andcoming back only seldom, while others go back and forth.More recent migrant TUVs may be classified as not re-current, even though they might be correctly classified ifobserved for longer time periods. Even so, we need toestablish a criterion to discern between frequent return-

FIG. 4. Time spent abroad. Probability distribution ofthe fraction of time spent abroad after the first country exit.TUVs with R lower than 0.5 are classified as recurrent.

ers and those staying mostly abroad. The distributionof R shows two clear peaks at the extreme values of thedomain with a valley in the region between 0.4 and 0.75,so the threshold is set at 0.5. Results are similar forother values of the threshold provided that they are inthe range [0.4, 0.75]. We find 12, 518 TUVs classified asrecurrent and 22, 459 as non recurrent.

Recurrent TUVs stay most of the time in Venezuela, soit can be assumed that they still reside somewhere in thecountry. Similarly, non recurrent individuals are likelyto have a residence place abroad. We define as residenceplace/country the location from which they tweet most

Page 8: arXiv:1908.02540v1 [physics.soc-ph] 7 Aug 2019 · source could be air tra c records that allow to estimate mattia@i sc.uib-csic.es yjramasco@i sc.uib-csic.es human mobility worldwide

8

PY

GY

BOCW

TT

CRAW

DO

PA

BREC

ARCL

PECO

ALL

FIG. 5. Validation of new residents. Scattered plot withthe comparison between the estimations of new Venezuelanresidents obtained with our method and the official data fromthe international agencies in each country. Every circle is acountry, the dashed gray line is the diagonal and the R2 =0.97.

in the last month of activity. Out of the 22, 459 non re-current TUVs, 16, 292 have a place of residence assignedin the geographical area of our analysis out of Venezuela.To upscale these numbers in each of the countries, weapply the same technique as in previous sections, see Eq.(2). The factor is calculated as the ratio between the up-dated Venezuelan population and the number of TUVsin each year. The number of migrant TUVs in everycountry is upscaled according to the factor correspond-ing to their first year of exit. The results per countryare shown in Table IV and compared with the official es-timations from international agencies. The numbers ofTable II refer to entries across the border, a single in-dividual can contribute to more than one country. Incontrast, the records in Table IV assign one country toeach migrant and the numbers contain only distinct in-dividuals in the full study area. Our method providesnumbers below the official statistics in Brazil, Ecuadorand Colombia, while in Peru it is slightly higher. We seethat the new settlement places concentrate in Argentina,Chile, Colombia and Peru, while the flows get throughother countries such as Brazil but the fraction of migrantfixation is lower. We have performed as well a systematiccomparison between the number of Venezuelan residentsin each country provided by our method and by that ofinternational agencies. The result is displayed in Figure5, where one can see an acceptable agreement with a R2

over 0.97.

TABLE IV. Estimated number of Venezuelan residents ineach of the countries in late 2018. The units are thousands ofindividuals (K).

2018 Brazil Peru Ecuador Colombia

Data(4th) 58K 504K 170K 826K

IOM Sep18 75K 414K 209K 935K

UNHCR Nov18 85K 500K 220K 1,000K

UNHCR Jan19 96K 506K 221K 1,100K

Spatial integration of migrants

In addition to large-scale numbers, the data also al-lows for a local study on the place of residence of themigrant population. As example, we look at three of themain cities in South America: Bogota in Colombia, SaoPaulo in Brazil and Lima in Peru, where the statisticsare more reliable. The urban space is divided in a grid ofcells with 1×1 km2 area where we assign to migrants themost common place from which they tweet during nighthours (between 8PM and 8AM local time). The resultingheatmaps are displayed in Fig. 6. Below, we also showthe distribution of the local population obtained from lo-cal twitter users to have a comparison basis. Besides avisual inspection, we have calculated the segregation in-dex proposed in [31] called h. This metric is the ratiobetween the entropy of the spatial distribution of the mi-grant community and that for the local population with acorrection to take into account finite size effects. If h = 1,both populations are similarly distributed, while smallerh indicates segregation. As a complement, we also cal-culate the normalized mutual information NMI betweenthe distribution of migrants and locals. The NMI is away to compare the distribution of two variables and itranges between 0 (the variables are independent) to 1(they come from the same distribution) [55]. When ap-plied to the former Venezuelan residents, the results areshown in Table V. All the values of h and NMI are low. Inthese cities, Venezuelan migrants are far from being wellintegrated from a spatial point of view. There are manycauses behind this behavior, ranging from housing pricesand availability to the presence of migrant communitiesfrom the same country. Moreover, the distribution of lo-cals says nothing on migrants’ residence places. Hence itis not to be considered a proxy for migrant distributionin the three cities. Specifically, in the case of Sao Paulo,both metrics are very low, although this must be takenwith certain caution because only 50 users were detectedthere against 1, 300 in Bogota and 570 in Lima.

Temporal distribution of upscaled outflows

Considering only outflow from Venezuela, we can un-fold a time series of the upscaled number of exits permonth. The results are shown in Fig.7a. The monthly

Page 9: arXiv:1908.02540v1 [physics.soc-ph] 7 Aug 2019 · source could be air tra c records that allow to estimate mattia@i sc.uib-csic.es yjramasco@i sc.uib-csic.es human mobility worldwide

9

FIG. 6. Residence locations in the main cities. Log-scale heat map of the population distribution in the main cities ofthe area. In the top row, the data corresponds to numbers of migrant TUVs and in the bottom to the local geolocated Twitterusers. The scale of the heatmap is logarithmic and the maximum is rescaled in each of the maps. In (a) and (d), results forBogota (Colombia). In (b) and (e) for Sao Paulo (Brazil. And, in (c) and (f), for Lima (Peru).

TABLE V. Segregation indicators h and NMI for migrantTUVs.

Bogota Lima Sao Paulo

h 0.59 0.62 0.27

NMI 0.05 0.07 0.04

numbers start to increase in 2015, peaked in late 2016,and increases later until the end of our time window inJanuary 2019. The histogram shows some peaks and val-leys that can be correlated with special events during thecrisis. This can be seen in the lower panel Fig.7b, wherewe consider the first exit from Venezuela per TUV. Inthis version, the impact of the events is more clearly ap-preciated since they correlate better with outflow. In allcases, we are only showing the upscaled outflows. Asdiscussed for the recurrent TUVs, there exists an inflowthat partially compensates the exits.

DISCUSSION

According to five UN agencies, massive gaps in datacovering refugees, asylum seekers, migrants and inter-nally displaced populations threaten the lives and wellbe-ing of millions of children on the move. Through a joint

Call to Action [56], the agencies confirm the critical needto improve the availability, reliability, timeliness and ac-cessibility of data and evidence to better understand howmigration and forcible displacement affect the wellbeingof people. In this work, we have developed a methodto contribute to filling these glaring data and knowledgegaps by extracting migrant mobility flows from geolo-cated Twitter data. We focus on the current crisis inVenezuela, although the method is universal and can betranslated to other contexts conditioned only to the datacoverage.

With this data, we have obtained estimations of mi-gration flows, which we have validated at country levelwith official statistics, and added additional informa-tion on preferred routes and settlement places. Thiswork has proved useful for humanitarian agencies, suchas UNICEF, in better understanding the magnitude ofthe Venezuelan crisis in a country of continental propor-tions, such as Brazil, shaping the design of broader in-terventions beyond the border-crossing in the North ofthe country. In addition, the method simplicity, cou-pled with the open availability and pervasiveness of theTwitter data, can bring a new generation of studies toexplore migratory crises from a multidisciplinary pointof view. Authorities and humanitarian organizations canthus count with this extra data source to implement moreinformed protocols of response in humanitarian contexts.

Page 10: arXiv:1908.02540v1 [physics.soc-ph] 7 Aug 2019 · source could be air tra c records that allow to estimate mattia@i sc.uib-csic.es yjramasco@i sc.uib-csic.es human mobility worldwide

10

(a)

(b)

US sanctions

Food riots

Luiz Manuel Diaz dead

Govt loses elections

Maduro decleares economic emergency

Riots in CaracasReferendum denied

Meds and food shortages

Maduro declares new constituentassembly

New US sanctions

New cryptocurrency Petro available

New Bolivar currency devalued

No more dollars accepted

Bank of England not returning Venezuela's gold

FIG. 7. Exit times distribution. (a) Total upscaled exits from Venezuela Fexit. (b) First exits from Venezuela per TUVupscaled to obtain Fusers.

LIST OF ABBREVIATIONS

The abbreviations included in the main text are:

• UNHCR: United Nations High Commissioner forRefugees;

• IOM: International Organization for Migration;

• UNICEF: United Nations International Children’sEmergency Fund (now, United Nations Children’sFund);

• NMI: Normalized mutual information;

• TUV: Twitter users classified as Venezuelan resi-dents.

DATA AND MATERIALS

In this work, we use several data sources: GeolocatedTwitter, UNHCR, IOM and Federal Police of Brazil. Theaccess links are included as references in the manuscript.

In particular, the Twitter API in Ref. [50], the UNHCRin Refs. [43, 44, 51], the UN population division DESA[54], the IOM at [45], the Federal Police of Brazil at [46]and the Venezuelan census [53]. For Twitter, the data isdownloaded using the streaming API. An updated userguide on the usage of the Twitter streaming API can befound at https://developer.twitter.com/en/docs

ACKNOWNLEDGMENTS

We thank Riccardo Gallotti and Daniela Paolotti fortheir useful comments and suggestions. M.M. is fundedby the Conselleria d’Innovacio, Recerca i Turisme ofthe Government of the Balearic Islands and the Euro-pean Social Fund with grant code FPI/2090/2018. A.T.acknowledges financial support from the AEI, SpanishNational Research Agency, with grant code PTA2017-13872-I and the Government of the Balearic Islands.M.M., A.T., P.C. and J.J.R. also acknowledge fund-ing from the Spanish Ministry of Science, Innovationand Universities, the AEI and FEDER (EU) under the

Page 11: arXiv:1908.02540v1 [physics.soc-ph] 7 Aug 2019 · source could be air tra c records that allow to estimate mattia@i sc.uib-csic.es yjramasco@i sc.uib-csic.es human mobility worldwide

11

grant PACSS (RTI2018-093732-B-C22) and the Maria de Maeztu program for Units of Excellence in R&D (MDM-2017-0711).

[1] Jones, D. Conflict resolution: Wars without end. Nature519, 148 (2015).

[2] Lopez-Carr, D. and Marter-Kenyon, J. Human adapta-tion: Manage climate-induced resettlement. Nature 517,265 (2015).

[3] Ravenstein, E. G. The laws of migration. Journal of theStatistical Society of London 48, 167–235 (1885).

[4] Abel, G. J. and Sander, N. Quantifying global interna-tional migration flows. Science 343, 1520–1522 (2014).

[5] Rogers, A., Willekens, F., and Ledent, J. Migration andsettlement: a multiregional comparative study. Environ-ment and planning A 15, 1585–1612 (1983).

[6] Barrat, A., Barthelemy, M., Pastor-Satorras, R., andVespignani, A. The architecture of complex weightednetworks. Proceedings of the National Academy of Sci-ences of the U.S.A. 101, 3747–3752 (2004).

[7] Willekens, F., Massey, D., Raymer, J., and Beauchemin,C. International migration under the microscope. Science352, 897–899 (2016).

[8] Editorial article. Data on movements of refugees andmigrants are flawed. Nature 543, 5–6 (2017).

[9] Butler, D. What the numbers say about refugees. Nature543, 22–23 (2017).

[10] Dijstelbloem, H. Migration tracking is a mess. Nature543, 32 (2017).

[11] Barbosa, H., Barthelemy, M., Ghoshal, G., James, C. R.,Lenormand, M., Louail, T., Menezes, R., Ramasco, J. J.,Simini, F., and Tomasini, M. Human mobility: Modelsand applications. Physics Reports 734, 1–74 (2018).

[12] Kang, C., Ma, X., Tong, D., and Liu, Y. Intra-urban hu-man mobility patterns: An urban morphology perspec-tive. Physica A: Statistical Mechanics and its Applica-tions 391(4), 1702–1717 (2012).

[13] Calabrese, F., Diao, M., Di Lorenzo, G., Ferreira Jr, J.,and Ratti, C. Understanding individual mobility patternsfrom urban sensing data: A mobile phone trace example.Transportation Research Part C: Emerging Technologies26, 301–313 (2013).

[14] Louail, T., Lenormand, M., Ros, O. G. C., Picornell,M., Herranz, R., Frias-Martinez, E., Ramasco, J. J., andBarthelemy, M. From mobile phone data to the spatialstructure of cities. Scientific Reports 4, 5276 (2014).

[15] Yang, Y., Tan, C., Liu, Z., Wu, F., and Zhuang, Y. Urbandreams of migrants: A case study of migrant integrationin shanghai. In Procs. of The Thirty-Second AAAI Con-ference on Artificial Intelligence (AAAI-2018). AAAI,(2018).

[16] Gonzalez, M. C., Hidalgo, C. A., and Barabasi, A.-L. Un-derstanding individual human mobility patterns. Nature453, 779–782 (2008).

[17] Krings, G., Calabrese, F., Ratti, C., and Blondel, V. D.Urban gravity: a model for inter-city telecommunicationflows. Journal of Statistical Mechanics: Theory and Ex-periment 2009, L07003 (2009).

[18] Kung, K. S., Greco, K., Sobolevsky, S., and Ratti, C.Exploring universal patterns in human home-work com-muting from mobile phone data. PLoS ONE 9, e96180

(2014).[19] Bajardi, P., Delfino, M., Panisson, A., Petri, G., and Tiz-

zoni, M. Unveiling patterns of international communitiesin a global city using mobile phone data. EPJ Data Sci-ence 4, 3 (2015).

[20] Alfeo, A., Cimino, M. G. C. A., Lepri, B., Pentland,A. S., and Vaglini, G. Assessing refugees’ integrationvia spatio-temporal similarities of mobility and callingbehaviors. IEEE Transactions on Computational SocialSystems ( Early Access ) , 1–13 (2019).

[21] Gallotti, R., Bazzani, A., Rambaldi, S., and Barthelemy,M. A stochastic model of randomly accelerated walkersfor human mobility. Nature Communications 7, 12600(2016).

[22] Zagheni, E., Weber, I., Gummadi, K., et al. Leveragingfacebook’s advertising platform to monitor stocks of mi-grants. Population and Development Review 43, 721–734(2017).

[23] Hawelka, B., Sitko, I., Beinat, E., Sobolevsky, S., Kaza-kopoulos, P., and Ratti, C. Geo-located twitter as proxyfor global mobility patterns. Cartography and GeographicInformation Science 41, 260–271 (2014).

[24] Lenormand, M., Goncalves, B., Tugores, A., and Ram-asco, J. J. Human diffusion and city influence. Journalof The Royal Society Interface 12, 20150473 (2015).

[25] Dredze, M., Garcıa-Herranz, M., Rutherford, A., andMann, G. Twitter as a source of global mobility pat-terns for social good. arXiv preprint arXiv:1606.06343(2016).

[26] Zagheni, E., Garimella, V. R. K., Weber, I., et al. Infer-ring international and internal migration patterns fromtwitter data. In Proceedings of the 23rd InternationalConference on World Wide Web, 439–444. ACM, (2014).

[27] Aswad, F. and Menezes, R. Refugee and immigra-tion:twitter as a proxy for reality. In The Thirty-First In-ternational Florida Artificial Intelligence Research Soci-ety Conference (FLAIRS-31), 17627. AAAI Publications,(2018).

[28] Hausman, R., Hinz, J., and Yildirim, M. A. Measur-ing venezuelan emigration with twitter. Technical re-port, Kiel Working Paper, No. 2106, Kiel Institute forthe World Economy (IfW), Kiel, (2018).

[29] Blanford, J. I., Huang, Z., Savelyev, A., and MacEachren,A. M. Geo-located tweets. Enhancing mobility mapsand capturing cross-border movement. PLoS ONE 10,e0129202 (2015).

[30] Arribas-Bel, D. The spoken postcodes. Regional Studies,Regional Science 2, 458–461 (2015).

[31] Lamanna, F., Lenormand, M., Salas-Olmedo, M. H., Ro-manillos, G., Goncalves, B., and Ramasco, J. J. Immi-grant community integration in world cities. PLoS ONE13, e0191612 (2018).

[32] Stepanova, E. The role of information communicationtechnologies in the arab spring. Ponars Eurasia 15, 1–6(2011).

[33] Lenormand, M., Picornell, M., Cantu-Ros, O. G., Tu-gores, A., Louail, T., Herranz, R., Barthelemy, M., Frias-

Page 12: arXiv:1908.02540v1 [physics.soc-ph] 7 Aug 2019 · source could be air tra c records that allow to estimate mattia@i sc.uib-csic.es yjramasco@i sc.uib-csic.es human mobility worldwide

12

Martinez, E., and Ramasco, J. J. Cross-checking differentsources of mobility information. PLoS One 9, e105184(2014).

[34] Mislove, A., Lehmann, S., Ahn, Y.-Y., Onnela, J.-P., andRosenquist, J. N. Understanding the demographics oftwitter users. In Fifth international AAAI conference onweblogs and social media, (2011).

[35] Bokanyi, E., Kondor, D., Dobos, L., Sebok, T., Steger,J., Csabai, I., and Vattay, G. Race, religion and thecity: twitter word frequency patterns reveal dominantdemographic dimensions in the United States. PalgraveCommunications 2, 16010 (2016).

[36] Sloan, L. Who tweets in the United Kingdom? pro-filing the Twitter population using the british socialattitudes survey 2015. Social Media & Society 3,2056305117698981 (2017).

[37] Lenormand, M., Tugores, A., Colet, P., and Ramasco,J. J. Tweets on the road. PLoS ONE 9, e105407 (2014).

[38] Mazzoli, M., Molas, A., Bassolas, A., Lenormand, M.,Colet, P., and Ramasco, J. J. Field theory for recurrentmobility. in press XX, XX (2019).

[39] Bilsborrow, R. E., Hugo, G., Oberai, A. S., et al. Inter-national migration statistics: Guidelines for improvingdata collection systems. International Labour Organiza-tion, Geneva, Switzerland, (1997).

[40] United Nations, Economic Commission for Europe, Com-mittee on Environmental Policy. Principles and Recom-mendations for Population and Housing Censuses, Revi-sion 2. United Nations Publications, Geneva, Switzer-land, (2008).

[41] Hughes, C., Zagheni, E., Abel, G. J., Sorichetta, A.,Wisniowski, A., Weber, I., and Tatem, A. J. Inferring mi-grations: Traditional methods and new approaches basedon mobile phone, social media, and other big data: Feasi-bility study on inferring (labour) mobility and migrationin the european union from big data and social mediadata. (2016).

[42] Migration Data Portal, . Migration data sources.https://migrationdataportal.org/themes/

migration-data-sources, (2019). Accessed: 2019-07-17.

[43] UNHCR website, . Number of refugees and mi-grants from venezuela reaches 3 million. https:

//www.unhcr.org/news/press/2018/11/5be4192b4/

number-refugees-migrants-venezuela-reaches-3-million.

html?query=venezuela, (2018). Accessed: 2019-05-20.[44] UNHCR website, . R4v america latina y el caribe,

refugiados y migrantes venezolanos en la region - en-ero 2019. https://data2.unhcr.org/es/documents/

details/68070, (2019). Accessed: 2019-05-20.[45] IOM website, . Migration trends in

the americas. https://www.iom.int/

venezuela-migration-trends-americas-september-2018,(2018). Accessed: 2019-05-20.

[46] Federal Police of Brazil 2018.http://www.casacivil.gov.br/central-de-conteudos/

noticias/2018/dezembro/comite-federal-apresenta-balanco-

de-acoes-de-acolhimento-de-venezuelanos, (2018).Accessed: 2019-05-20.

[47] Cesare, N., Lee, H., McCormick, T., Spiro, E., and Za-gheni, E. Promises and pitfalls of using digital tracesfor demographic research. Demography 55, 1979–1999(2018).

[48] Sloan, L. and Morgan, J. Who tweets with their lo-cation? understanding the relationship between demo-graphic characteristics and the use of geoservices andgeotagging on twitter. PloS one 10, e0142209 (2015).

[49] Lenormand, M., Louail, T., Cantu-Ros, O. G., Picor-nell, M., Herranz, R., Arias, J. M., Barthelemy, M.,San Miguel, M., and Ramasco, J. J. Influence of sociode-mographic characteristics on human mobility. ScientificReports 5, 10075 (2015).

[50] Documentation on the Twitter access API. https://

developer.twitter.com/en/docs. Accessed: 2019-04-17.

[51] Joint unhcr-iom press release: Venezuelan outflowcontinues unabated, stands now at 3.4 million.https://www.unhcr.org/ph/15238-venezuelan-outflow-

continues-unabated-stands-now-at-3-4-million.html.Accessed: 2019-05-20.

[52] Internation Organization for Migration, U. N. Glossaryon Migration. IOM, Geneva, Switzerland, (2019). Ac-cessed: 2019-07-17.

[53] Instituto Nacional de Estadıstica de Venezuela, . Censode poblacion y vivienda de venezuela 2011. http:

//www.redatam.ine.gob.ve/Censo2011/index.html,(2011). Accessed: 2019-05-20.

[54] United Nations Population Division DESA, . Worldpopulation prospects 2017. https://population.un.

org/wpp/Download/Standard/Population/, (2017). Ac-cessed: 2019-05-20.

[55] Mackay, D. J. C. Information Theory, Inference andLearning Algorithms. Cambridge University Press, Cam-bridge, UK, (2003).

[56] Unicef: A call to action: Protecting chil-dren on the move starts with better data.https://data.unicef.org/resources/call-action-protecting

-children-move-starts-better-data/. Accessed:2019-05-20.