Mobile phones: call detail records and GPS data from smartphone operating systems
Mobile phone Call Detail Records (CDR) can track the approximate location of individuals and, as a result, display movements across space by capturing the call signal sent to cell towers for each outgoing and incoming call (Williams et al., 2015). All caller details are anonymized. Some telecommunications providers are amenable to social research as well, and often provide documented and anonymized digital trace data from their customers to researchers interested in analysing these data (e.g. Cesare et al., 2018; Chi et al., 2020).
Reliability CDR provides reliable measures of migration in terms of consistency over time. Movements are recorded automatically as required by operating the telephone network. An advantage in terms of reliability is that the information on location does not rely on self-reports by individuals, which may be subject to response biases (a common issue in surveys). However, reliability issues may apply when using CDR data from different operators. This is a common issue because most countries have several telecommunication companies. As consumers switch services, measures of movement over time become less reliable. As a result, CDR is often used on narrowly defined locations and limited time frames.
Validity The key disadvantage is that such data refers to mobile devices, not individuals as such. It is possible that individuals will share the same device, or gift it to others. Furthermore, many migrants may change devices and/or SIM cards after migrating to other countries, given that service providers offer deals limited to particular countries. Therefore, most contributions using phone data have analysed mobility within narrow geographic units (cities, regions) rather than movements across borders. Furthermore, CDRs can be biased because locations are only recorded when calls are made leaving blank spots in the migration process.Footnote 6
Scope While CDR data are usually more helpful for identifying internal (sub-national) migration patterns,Footnote 7 in some cases they can also be used to measure international migration at the sub-regional level, particularly when combined with other sources. For example, CDR have been used to track internal displacement following natural disasters such as the Haiti and Nepal earthquakes (Bengtsson et al., 2011), and the combination of CDR with satellite data can help to map movements between cross-border communities (Hughes et al., 2016). Recent work has leveraged Google Location History data for analysis on migration flows. Google Location History is collected through smartphones that operate the Google Android system and through Google services used through smartphones (e.g. Google Maps or Gmail). Pilot research suggests that this novel source of information could provide information about international migration through ‘fine scale mobility with rare, long distance and international trips’ documented through changes in location by users (Ruktanonchai et al., 2018). Using the same data, Kraemer et al. (2020) described ‘global human mobility patterns, aggregated from over 300 million smartphone users’. According to the authors, the data cover nearly all countries and 65% of earth’s populated surface, including cross-border movements and international migration. The advantage of CDR and location data through smartphone use for measuring migration is the timeliness and detail regarding the location. As such, phone records are particularly useful for studying sudden movements in defined geographic locations. Fast evolving migration situations are difficult to capture with “traditional” data sources such as sample surveys and administrative data, and impossible using censuses.
Without linking mobile phone records to other data sources, CDR provides a limited scope for migration scholars. The only information available is time and location. The type, channel and motivation for a change in location remains unobserved. It thus remains unclear who moved, why people moved, where they wanted to go, who they travelled with, through which channel they travelled, and whether they are likely to stay in their current location. This lack of context information is a key shortcoming compared to “traditional” sample survey research.
Accessibility CDR data is not commonly available to researchers and access depends on willingness of telecommunication companies to collaborate. Different operators in different countries may need to comply with different data protection legislation limiting the extent and level of detail of data that can be shared. In addition, access is often tied to large fees.
Ethics CDR data poses serious ethical concerns. When entering a mobile phone contract or installing a smartphone operating system, many users may not be aware that their location data is collected and analysed for various purposes (Beduschi, 2020; Brayne, 2018; Molnar, 2019). Such data uses are often hidden in the fine print. Since telecommunication and smartphone operating system providers are often private companies, there is a lack of transparency of what companies do with the data. In many countries, governments can mandate companies to provide access to data, for example, for the purpose of criminal investigations (Brayne, 2018). In the field of migration, the granular CDR data can be used to target humanitarian assistance to specific populations in specific locations, however, it could also be used by authorities for enforcing immigration policies, border protection and identifying individuals entering or residing in a country with an irregular status.
Social media
Geo-located social media activity, such as activity on Facebook (Zagheni et al., 2017), Twitter (Chi et al., 2020; Fiori et al., 2017; Martin et al., 2020; UN Global Pulse, 2017; Zagheni et al., 2014), Skype (Kikas et al., 2015), or LinkedIn (State et al., 2014), have been used to infer migration flows and stocks based on the location where users log in or information on location provided by the users themselves through geo-tagged posts or profile information (e.g. nationality or birthplace).
Movement is usually inferred based on changes users make to their self-reported location on the respective platform, or changes in location of log-ins. For example, data from the Facebook advertising platform can yield information on ‘home country’ and country of current residence. This means that Facebook could be used as a ‘real-time census’ to estimate, among other things, the number of users classified by the social media platform as ‘expats’ (users living in a country other than their ‘home country’) at the national or global level at a certain point in time (Zagheni et al., 2017). Using changes in Facebook users’ locations over time, others have identified the increase in the number of Venezuelan migrants in Spain in early 2018, confirmed by official statistics from the Spanish National Statistical Office (Spyratos et al., 2019).
Reliability There are a host of reliability concerns involved in measuring migration using social media data. First, certain segments of the population may be over- or under-represented (for instance, on average, young people are more likely to use Facebook than older people).Footnote 8 Second, even frequent users may choose not to provide information on their past and current location. Certain types of migrants may deliberately avoid providing information on their location on social networks. Third, it is difficult to verify whether changes in location are accurate, given that this information is sometimes self-reported on a voluntary basis. Fourth, the user base of social media providers constantly changes, which complicates analysis of trends over time (see e.g. Cesare et al., 2018).
Validity With many kinds of social media data, there is a lack of transparency on key measures relevant for migration are generated. For example, there is limited information on how Facebook identifies who is an “expat” or how it labels users as speakers of a different language. This complicates meaningful interpretation of migration patterns observable in the data.
Scope The advantage of geo-located social media data is that, in many countries, certain social media platforms are wildly popular, so that real-time data on large volumes of movements can potentially be accessed. Such data may be particularly useful to study broader migration trends. The level of detail provided by geo-coded social media data is limited in many cases but more extensive compared to CDR data. For example, Facebook provides aggregate-level information on the number of users with specific characteristics such as age, gender, or even education and income proxies as well as a vast range of preferences (measured via users’ “likes” of particular pages). Changes in the characteristics of the number of people living in a specific place are used by researchers to infer ‘migration flows’, assuming that changes in the ‘stock’ of people that report that they live somewhere necessitates that people moves from countries with lower stock numbers to countries with higher stock numbers. Information on friendship networks across countries—recently made available by Facebook—may be used in the future to forecast cross-country migration trends (Tjaden et al., 2021). Despite availability of additional characteristics, the data provide no information about the causes, means, or consequences of migration. There are attempts by governments, law enforcement agencies, international organization and research institutes to monitor the social media activity of migrants before, during and after migration to understand changes in migration patterns (Brenner & Frouws, 2019; Dekker et al., 2018; Sanchez et al., 2018).Footnote 9
Access Many social media companies offer public APIs to allow access to certain parts of their data to researchers. In many cases (e.g. Facebook, Twitter), access can be obtained at no cost which is a substantial advantage over traditional sources such as censuses, administrative data and survey. However, access modalities can change at any given time because data is provided by private companies, rather than taxpayer-funded government or research bodies that are mandated to provide systematic data over time.
Ethics Users of social media are often unaware of the data that is being collected about them and there is a general lack of understanding how such data is and can be used by companies themselves or third parties (Cesare et al., 2018; Zwitter, 2014). Migration enforcement agencies may use such data for surveillance purposes, which are particularly serious in contexts of irregular migration and forced displacement. Agencies could monitor communication of specific groups or individuals on Twitter and Facebook to identify irregular migrants and track them during their journeys. Companies such as Facebook, however, only allow access to anonymized, aggregate level data to researchers which limits the possibility of using data to harm individuals. Any information on narrowly defined locations and groups becomes inaccessible if the underlying target population decreases beyond a threshold that risks identifying any specific individuals. However, this does not apply to attempts to monitor public communication in social media groups indicating changes in migration patterns. The European Asylum Support Office (EASO) has suspended its efforts to monitor communication of migrants on social media following concerns by the EU own data protection body.Footnote 10
Email IP addresses
Repeated logins to the same website and IP addresses from e-mail activity have also been used to estimate international mobility patterns and users’ likelihood to move to another country (Zagheni & Weber, 2012). Rather than self-reported location by the user, certain online services such as email providers collect data on where users log into their accounts.
Reliability and validity The same limitation in terms of reliability and validity apply compared to social media data. Similar to log-ins to social media, log-ins to emails are usually recorded via devices (IP addresses) not necessarily people. For example, it is possible—yet presumably rare—that various people use the same email account which will distort any aggregate measure of migration.
Scope The scope of potential migration analysis is further reduced in the case of email log-ins given that additional socio-demographic and socio-economic information about the users (which are available for Facebook) is lacking or not publicly accessible.
Access Most email providers also do not provide public APIs that make data available to researchers. Email communication is considered personal and private communication whereas some communication on social media platforms is (intentionally or unintentionally) made public by users.
Ethics Similar to social media data, there are issues concerning consent and data privacy. Users may not be aware that email providers track their location. In other cases, governmental enforcement agencies may mandate companies to share content of emails of specific individuals for the purpose of criminal investigations or intelligence which bears the potential for misuse also in case of migrants in irregular settings (Brayne, 2018).
Online search data
Online search data has also been used more recently to study migration. Records on Google searchers, for example, have been explored to forecast the number of arrivals of asylum-seekers in Europe (Connor 2017) or internal migration within the U.S. (Lin et al., 2019). Search data generated through Google’s online search platform for migration can be exploited to measure migration intentions and predict subsequent emigration flows (Böhme et al., 2018; UN Global Pulse, 2014). For example, researchers retrieve data on how many times individuals in country A have ‘googled’ a term that the researcher believes to indicate an intention to migrate (to country B)—for example, ‘jobs’, ‘visa’, or the name of the destination country.
Reliability Google Searches are recorded consistently and provide high reliability in terms of the measure as such. The main advantage to Search Data is that Google’s search engine is widely used across the globe and has been successfully used to study other social behaviours (e.g. flu outbreaks). Despite broad coverage, important countries (i.e. China) are missing entirely. Issues of reliability emerge regarding applicability across various country contexts, languages and specific populations. Preliminary research in this area suggests that online searches (e.g. via Goole searchers) are related to actual movement at the aggregate level, yet the selection of specific search terms in various country contexts appears to be highly important. Syrians looking for ways to flee to Europe ‘google’ different terms than Canadians looking for a job in the US. The meaning of the same search terms may also vary in different languages. Overall, this means that Google searches may be indicative of migration from a certain country to another country, but difficult to scale up to multiple migration contexts (see Tjaden et al., 2021).
Validity Online search data has one obvious shortcoming: ‘searching’ is not ‘doing’. Just because someone looks up information on another country or, more explicitly, gathers information on how to move to another country, does not mean that they will actually move. Search data (similarly to survey data on emigration intentions) are a ‘pre-behavioural’ proxy for actual migration. Some studies suggest that intentions are a good predictor for eventual migration (Van Dalen & Henkens, 2013; Tjaden et al., 2019), but research also suggests that the strength of the predictor varies considerably based on where migrants are from and where they want to go (Tjaden et al., 2019).
Scope A major disadvantage of Google search data is the high level of aggregation at which data is made available. Search data is made available at the population level for countries or, in certain countries like the US, for subregions. Search data does not include any additional information about those who show interest in migrating, and thus renders any individual-level analysis impossible.
Access Google search data is freely and publicly accessible via the Google Trends platform and API.
Ethics The potential risk of misuse of data is limited given the high level of aggregation and anonymity of data which the company makes available. Serious concerns would arise when data for specific locations and IP addresses is used to infer individual level migration behaviour. Google itself is analysing individual-level location data to provide targeted advertisements to users who use their search engine. However, there is a lack of transparency in terms of the conditions under which such data may be shared with governments or other third parties. In addition, usual concerns around unawareness among users about the usage of their data apply.
Bibliometric data
Bibliometrics is a field of research that uses statistical methods to systematically analyse publications records (books, articles etc.). One sub-field of bibliometrics—scientometrics—is the analysis of scientific publications. Detailed information about academic output is recorded and made accessible through scientific databases (e.g. Scopus, Web of Science, Google scholar and others). This information has been used to model the international mobility of academics (Czaika & Orazbayev, 2018; Laudel, 2003; Moed & Halevi, 2014; Sudakova & Tarasyev, 2019; Wang et al., 2019). Changes in the researchers’ affiliation to institutions located in different countries indicates migration.
Reliability Measuring migration through changes in affiliations is consistent and reliable. Scientists have an interest to publish their work in recognized journals and books, institutions have an interest that researchers indicate their home institution, and most research outlets make it mandatory for authors to provide this information. Nevertheless, the data is sensitive to the accuracy of self-reported data which can be outdated.
Validity Migration analysis based on bibliometric data has the potential to collect additional context information including socio-demographic characteristics of the professionals (age, gender, ethnic origin, for example, may be inferred based on name recognition algorithms and web scraping individual professionals’ web pages). Additional information about the universities, faculty and chair may be matched with additional effort.
Scope The drawback of this data source is its restriction to a narrowly defined group of professionals (i.e. academics) where public access to their affiliation is the norm. However, it may be possible to extend this approach to other fields of professionals where public information on affiliations is common (i.e. athletes, musicians etc.).
Access Bibliographic data has become available through the digitalization of entire libraries, records of publishers, academic journals, and ambitious projects such as Google Books and Google Scholar that aim to record any academic publications that is published. Most academics provide their affiliations publicly to gain visibility and broaden their reach.
Ethics Compared to previously described sources, ethical concerns are limited because the personal information used for analysis is provided voluntarily and knowingly. The population is restricted to regular labour migrants which limits the potential for misuse by authorities.
Remote sensing technologies
Remote sensing is an umbrella term for collecting information about something without making physical contact. In current usage, remote sensing refers to the use of satellite or aircraft-based sensor technologies (i.e. drones). Remote sensing is commonly used in geography, earth sciences, climate research, agricultural studies, wildlife studies, military, and intelligence gathering, but also increasingly for urban planning, tourism, commerce, and various humanitarian applications (Miller et al., 2019). Changes in human activity visible in the images (i.e. settlements, refugee camps, light emissions at night) can be used to infer mobility.
Reliability If applied consistently, the approach to measuring migration using remote sensing technology by averaging physical quantities over pixels can yield reliable migration measures. Algorithms automatically detect changes in visual patterns on satellite or drone images over time. For example, the population size of settlements can be estimated by counting rooftops visible on satellite/drone images. Depending on the proximity and resolution of the image, individuals within certain localities can be identified. Comparing images over time can be used to estimate immigration and emigration into a certain, narrowly defined, location.
Validity The obvious downside of satellite and drone images for measuring migration is that no additional individual-level information about migrants is available: Who is moving, from where, to where, how etc. By itself, remote sensing provides information on how many tents, rooftops or individuals are present in a certain locality, but no information about what happened when there are less dots and shadows the next time new images become available.
Scope There is a rapidly growing number of examples with relevance for migration studies. First, drones and satellite images inform policies and direct aid to refugees. For instance, the United Nations Institute for Training and Research (UNITAR) mapped refugee camps in Jordan and elsewhere with its Operational Satellite Applications Programme.Footnote 11 Civil society organizations such as Human Rights Watch or Amnesty International use satellite imagery to document humanitarian needs of displaced populations at borders or in refugee camps by measuring the growth of settlements.Footnote 12 In this case, satellite images are providing an indication of where aid and assistance are most needed (Bitelli et al., 2017; Quinn et al., 2018; Shatnawi et al., 2020; Tiede et al., 2017).
Satellite imagery also forms a key part of the ‘smart border’ agenda, which attempt to use modern technology to improve border management around the world and track ‘illegal’ crossings. Systems relying on remote sensing were developed “to assist border authorities with more effective surveillance and reliable decision-making support” (Al Fayez et al., 2019). In contrast, civil society organizations use the same technology to monitor deaths and violations of migrants' rights at the maritime borders of the EU.Footnote 13
For the moment, remote sensing appears to be most useful for informing operations on the ground (managing refugee camps, targeting humanitarian assistance, managing borders etc.) and less for research on migration per se. The technology can also be used to monitor slow onset emigration rates due to changes in climate which can also be inferred from images.
Access With improvements in the quality and accessibility of satellite imagery (Popkin, 2018) provided by the European Space Agency, NASA, and others, researchers are also exploring ways to use remote sensing data to measure human migration globally. Public and private bodies offer access to satellite imagery for research purposes and tech companies offer cloud computing power to conduct complex and demanding analyses within minutes.Footnote 14Depending on the specific data provider, access can be free of charge to research institutes or come with a fee.
Ethics Ethical issues are a key concern for remote censoring technologies because information is collected without the knowledge or consent of individuals. New high-resolution satellite imagery and drone images can identify individuals using face recognition technology. Law enforcement, policing and intelligence agencies use such approaches (Brayne, 2018; Hayes, 2017; Leese et al., 2021; Molnar, 2019) which raises serious concerns regarding the situation in undemocratic countries with low data protection standards and policies aiming to suppress and control groups in society. Drones may also be increasingly operated by companies in addition to governments which raises concerns over unknown privacy violations by non-governmental actors.
International air travel
Upon first view, international air passenger traffic belongs to the realm of tourism and transport studies, not migration (see Sect. 2.1.). However, there have been attempts to use this information to infer migration flows. For example. Gabrielli et al. (2019) used dyadic monthly air passenger traffic between 239 countries and territories worldwide from January 2010 to March 2018 to estimate the number of passengers on commercial flights operated globally. The study explored whether a surplus in travel (increase in travel from A to B but no increase in return travel from B to A within a year) can be linked to migration flows.
Reliability Air passenger data is highly standardized and consistent as it is subject to international industry standards.
Validity Passenger data does not measure migration directly and can only be used to infer different types of migration by inference. Since air passengers data does not allow to track individual passengers or specific cohorts on the basis of their date of entry, researchers need to make assumptions about the length of stay of the passengers. This is problematic because the publicly available data does not indicate who the passengers are, how long they will stay in the country, on which visa they are travelling etc. In addition, flight passenger data is a selective picture of global mobility. 44 percent of registered cross-border travels occur through commercial flights, and that this proportion increases at rising distances between countries (Recchi et al., 2019).
Scope Overall, the data may be used to estimate international migration flows if combined with additional data sources. At the moment, the research is still in its exploratory stage and the methods appear underdeveloped. In the future, this approach may bear the potential to measure the level of visa overstayers between countries, one indicator of irregular migration.
Access Air passenger data is collected by flight companies which some make available for purchase. The EU recently made a public and free dataset available.Footnote 15
Ethics Ethical concerns regarding flight data are limited in its current state of the available data. Flight data is aggregated at the country and month level and anonymized. Currently, any misuse for the disadvantage for individuals is unlikely.
Online news data
New advances in technology have made available online news aggregators such as Google News or the Global Database of Events, Language, and Tone (GDELT).Footnote 16 Such platforms monitor the world's news media from nearly every corner of every country in print, broadcast, and web formats. This data has the potential to capture acts of past or prospective migration that were not covered in traditional sources such as administrative data or surveys.
Reliability Migration measures based on online news aggregator data can be considered reliable to the extent that algorithms deriving information on migration apply consistently across all countries and news sources. The issue is that the success of the algorithm in detecting migration may vary by country, by quality of the news outlets, by language and type of migration to be covered. In addition, algorithms may capture the same migration events several times as the same event may have been covered by several news outlets.
Validity The large volume of news articles required to collect information on migration encourages researchers to use language processing algorithms. The emerging evidence is still unclear on how accurately such algorithms may detect events that actually capture migration.
Scope Approaches are still very recent but several uses of this data are available. Carammia et al., (2020) have used the GDELT database to measure political, social, economic “push factor” events that could motivate people to leave their country. In combination with other data sources, they attempt to forecast displacement and migration with a view to set up early warning systems currently under development in the EU.Footnote 17 In a similar vain, the Internal Displacement Monitoring Centre (IDMC) uses the GDELT database to track internal displacement.Footnote 18 The IOM is also experimenting with such data to improve analysis of the number of migrants that went missing along their journeys, such as the IOM’s missing migrants project (Borja & Black, 2021). Apart from eye-witness reports, news articles are the main way to systematically collect data on migrant fatalities and bring light to this tragic topic for policymakers.
Access GDELT and Google News can be accessed online free of charge for researchers.
Ethics Ethical concerns arise in countries with low standards of journalism and data privacy. It is possible, for example, that the identity of individual migrants is revealed in a news article and picked up by automated text analysis. In theory, this information could be used by enforcement agencies to press charges in case of irregular migration or used by smugglers for debt collection. Even when interviewees provide consent for their personal information to be used, they may not be aware that their information may enter migration databases. Such abuses are possible with traditional media sources, however, digital applications may exacerbate the problem by providing cheaper, faster and broader access to data.