hart, adam g orcid: 0000-0002-4795-9986, carpenter ...eprints.glos.ac.uk/5961/1/5961 - hart - 2018 -...

22
This is a peer-reviewed, post-print (final draft post-refereeing) version of the following published document, This is the peer reviewed version of the following article: Hart, A. et al. (2018) 'Testing the potential of Twitter mining methods for data acquisition: evaluating novel opportunities for ecological research in multiple taxa', Methods in Ecology and Evolution 00, pp.1-12 which has been published in final form at https://besjournals.onlinelibrary.wiley.com/doi/abs/10.1111/2041-210X.13063. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Self-Archiving. and is licensed under All Rights Reserved license: Hart, Adam G ORCID: 0000-0002-4795-9986, Carpenter, William S, Hlustik- Smith, Estelle, Reed, Matt ORCID: 0000-0003-1105-9625, Goodenough, Anne E ORCID: 0000-0002-7662-6670 and Ellison, Aaron (2018) Testing the potential of Twitter mining methods for data acquisition: Evaluating novel opportunities for ecological research in multiple taxa. Methods in Ecology and Evolution, 9 (11). 2194 -2205. doi:10.1111/2041-210x.13063 Official URL: https://doi.org/10.1111/2041-210x.13063 DOI: http://dx.doi.org/10.1111/2041-210x.13063 EPrint URI: http://eprints.glos.ac.uk/id/eprint/5961 Disclaimer The University of Gloucestershire has obtained warranties from all depositors as to their title in the material deposited and as to their right to deposit such material. The University of Gloucestershire makes no representation or warranties of commercial utility, title, or fitness for a particular purpose or any other warranty, express or implied in respect of any material deposited. The University of Gloucestershire makes no representation that the use of the materials will not infringe any patent, copyright, trademark or other property or proprietary rights. The University of Gloucestershire accepts no liability for any infringement of intellectual property rights in any material deposited but will remove such material from public view pending investigation in the event of an allegation of any such infringement. PLEASE SCROLL DOWN FOR TEXT.

Upload: others

Post on 10-Feb-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

  • This is a peer-reviewed, post-print (final draft post-refereeing) version of the following published document, This is the peer reviewed version of the following article: Hart, A. et al. (2018) 'Testing the potential of Twitter mining methods for data acquisition: evaluating novel opportunities for ecological research in multiple taxa', Methods in Ecology and Evolution 00, pp.1-12 which has been published in final form at https://besjournals.onlinelibrary.wiley.com/doi/abs/10.1111/2041-210X.13063. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Self-Archiving. and is licensed under All Rights Reserved license:

    Hart, Adam G ORCID: 0000-0002-4795-9986, Carpenter, William S, Hlustik-Smith, Estelle, Reed, Matt ORCID: 0000-0003-1105-9625, Goodenough, Anne E ORCID: 0000-0002-7662-6670 and Ellison, Aaron (2018) Testing the potential of Twitter mining methods for data acquisition: Evaluating novel opportunities for ecological research in multiple taxa. Methods in Ecology and Evolution, 9 (11). 2194 -2205. doi:10.1111/2041-210x.13063

    Official URL: https://doi.org/10.1111/2041-210x.13063DOI: http://dx.doi.org/10.1111/2041-210x.13063EPrint URI: http://eprints.glos.ac.uk/id/eprint/5961

    Disclaimer

    The University of Gloucestershire has obtained warranties from all depositors as to their title in the material deposited and as to their right to deposit such material.

    The University of Gloucestershire makes no representation or warranties of commercial utility, title, or fitness for a particular purpose or any other warranty, express or implied in respect of any material deposited.

    The University of Gloucestershire makes no representation that the use of the materials will not infringe any patent, copyright, trademark or other property or proprietary rights.

    The University of Gloucestershire accepts no liability for any infringement of intellectual property rights in any material deposited but will remove such material from public view pending investigation in the event of an allegation of any such infringement.

    PLEASE SCROLL DOWN FOR TEXT.

  • Testing the potential of Twitter mining methods for data acquisition:

    Evaluating novel opportunities for ecological research in multiple taxa

    Adam G. Hart1, William S. Carpenter, Estelle Hlustik-Smith, Matt Reed, Anne E. Goodenough

    1 School of Natural and Social Sciences, University of Gloucestershire, Cheltenham, UK.

    [email protected]

    Abstract 1. Social media provides unique opportunities for data collection. Retrospective analysis of social

    media posts has been used in seismology, political science and public risk perception studies but has

    not been used extensively in ecological research. There is currently no assessment of whether such

    data are valid and robust in ecological contexts.

    2. We used “Twitter mining” methods to search Twitter (a microblogging site) for terms relevant to

    three nationwide UK ecological phenomena: winged ant emergence; autumnal house spider

    sightings; and starling murmurations. To determine the extent to which Twitter-mined data were

    reliable and suitable for answering specific ecological questions the data so gathered were analysed

    and the results directly compared to the findings of three published studies based on primary data

    collected by citizen scientists during the same time period.

    3. Twitter-mined data proved robust for quantifying temporal ecological patterns. There was striking

    similarity in the temporal patterns of winged ant emergence between previously published work and

    our analysis of Twitter-mined data at national scales; this was also the case for house spider

    sightings. Spatial data were less available but analysis of Twitter-mined data was able to replicate

    most spatial findings from all three studies. Baseline ecological findings, such as the sex ratio of

    house spider sightings, could also be replicated. Where Twitter mining was less successful was

    answering specific questions and testing hypotheses. Thus, we were unable to determine the

    influence of microhabitat on winged ants or test predation and weather hypotheses for initiation of

    murmuration behaviour.

    4. Twitter mining clearly has great potential to generate spatiotemporal ecological data and to

    answer specific ecological questions. However, we found that the types and usefulness of data

    differed substantially between the three phenomena. Consequently, we suggest that understanding

    users’ behaviour when posting on ecological topics would be useful if using social media is to

    generate ecological data.

    Keywords House spiders, Phenology, Social media, Spatial ecology, Starling mumurations, Twitter mining,

    Winged ants

    mailto:[email protected]

  • 1. Introduction

    Public participation in scientific research, especially when members of the public directly assist

    scientists in the collection or processing of data, has become known as citizen science (Bonney et al.,

    2009). Citizen science is now an established research method that is increasingly well used (e.g.

    Cooper, Shirk, & Zuckerberg, 2014; McKinley et al., 2017; Newson et al., 2016; Pescott et al., 2015;

    Theobald et al., 2015). Instrumental in the increase of citizen science research has been the

    development of web-enabled mobile devices, especially smart phones. Such technology has allowed

    people to participate in projects locally, nationally and internationally in fields from astronomy

    (Kuchner et al., 2017) to public health (Rowbotham, McKinnon, Leach, Lamberts, & Hawe, 2017).

    One field that has been particularly successful in using citizen science is ecology, especially for

    studies on spatiotemporal distribution where the ubiquity of the public is a major advantage (e.g.

    Hart, Hesselberg, Nesbit, & Goodenough, 2017). There are some challenges with using citizen

    science data (Lukyanenko, Parsons, & Wiersma, 2016) including the fact that: (1) data often cannot

    be validated; (2) data on rare species or complex ecological phenomena can be hard to obtain; (3)

    data can be of lower quality and consistency than those collected by experts (but see Willis et al.,

    2017); and (4) recording frequency can be biased towards highly populated areas or times such as

    weekends and holidays. Despite these issues, large datasets can be assembled across larger spatial

    scales and longer time periods than would otherwise be possible. Thus, citizen science is now widely

    used to understand species’ distributions (Fournier et al., 2017), record the spread of non-native

    species, pests, or diseases (Parr & Sewell, 2017), monitor population trends (Dennis, Morgan,

    Brereton, Roy, & Fox, 2017), quantify seasonal phenological patterns (Méndez, de Jaime, &

    Alcántara, 2017), and better understand behaviour (Goodenough, Little, Carpenter, & Hart, 2017).

    The same technological developments that have facilitated the rise of citizen science have also led to

    the rise of social media applications including Facebook, Twitter, Flikr, Instagram and Snapchat. The

    willingness with which people post on public-facing social media, and the sheer volume of such

    posts, makes these platforms a potential source of useful scientific data (e.g. Tufekci, 2014). Such

    data still make use of citizens for scientific research (and is therefore still “citizen science”), but are

    gathered post-hoc and indirectly, with citizens contributing, usually unknowingly, as an incidental

    consequence of social media activity.

    Twitter, a microblogging application, is a particularly valuable tool for researchers seeking to make

    use of information contained in, or derived from, social media (Kumar, Morstatter, & Liu, 2014).

    Users of Twitter (“tweeters”) can post short messages (140 characters until 2017, thence 280),

    termed “tweets” and reply to other users’ messages with tweets being visible to users and non-users

  • via search engines. Each tweet has date- and time-stamps; some users also indicate location. The

    immediacy with which users can tweet, the apparent desire to share information, and the ability to

    search tweets for text or hashtags (keywords related to a topic that are proceeded by #) enables

    researchers to examine Twitter archives to extract information. This process is known as “Twitter

    mining” and has been used in a number of disciplines including seismology (Crooks, Croitoru,

    Stefanidis, & Radzikowski, 2013), coordinating disaster relief (Purohit et al., 2013), studying voter

    behaviour (Grover, Kar, Dwivedi, & Janssen, 2017), quantifying the timing of crop planting (Zipper,

    2018) and evaluating public perception of ecological risks (Fellenor et al., 2017).

    The immediacy of Twitter gives Twitter mining great potential for studying memorable or significant

    ecological phenomena. However, although this has been suggested, for example in monitoring

    invasive species (Daume, 2016) and studying phenology (Catlin-Groves, 2012), few ecological studies

    have used the approach (an exception being Fuka, Osborne-Gowey, and Fuka (2013) to map range

    shifts). Before widespread use of Twitter mining for ecological research is recommended, it is

    necessary to be confident that ecological data in this way reflect ecology rather than patterns of

    social media use. This validation can only be achieved if Twitter-derived data can be directly

    compared to data gathered on the same phenomena by other means.

    In this study, we compare Twitter-derived data on ecological phenomena with primary data from

    published ecological studies in three taxa. The primary studies used citizen science to gather data

    across the UK to answer different ecological questions. The first study quantified spatiotemporal

    distribution and environmental triggers of ant mating flights (primarily of Lasius niger) (Hart et al.,

    2017). It included analyses of synchronicity and spatial concurrence of winged ant emergences

    (“flying ants”) at national and regional scales. The second study assessed geographical patterns,

    seasonal peaks, daily rhythms and location of spiders (primarily those in the genera Tengenaria and

    Eratigena) within houses during the autumn (Hart, Nesbit, & Goodenough, 2018). The third study

    examined starling (Sturnus vulgaris) murmuration behaviour to assess the effect of predators and

    temperature (Goodenough et al., 2017). Here, for each study subject, we compare Twitter-mined

    data with published data to determine: (1) whether the research questions in each study could be

    answered robustly through Twitter mining (i.e. to determine whether tweets contain relevant

    information, provide a sufficient sample size and contain the necessary spatiotemporal data); and (2)

    for those research questions that can be answered using Twitter data, whether the findings replicate

    the published results. We draw our findings together to highlight the opportunities and challenges

    presented by Twitter mining, and offer suggestions on the use of this approach in future ecological

    projects.

  • 2. Materials and methods

    2.1 Ecological studies

    Three published datasets based on ecological studies conducted in the UK were used for comparison

    purposes. The winged ant study (Hart et al., 2017) ran for three UK summers from 1st June (day 1) to

    4th September (day 96) in 2012, 2013 and 2014. To find out more about the species involved an

    additional study in 2013 where the public also submitted samples of the winged ants they recorded

    (N = 436). The house spider study (Hart et al., 2018) ran across autumn and winter 2013/14 from 1st

    August 2013 (day 1) to 28th January 2014 (day 181). The starling murmuration study (Goodenough

    et al., 2017) ran across autumn and winter in 2014/15 and 2015/16 covering the period 1st October

    (day 1) to 31st March (day 183). See above and individual publications for more details.

    2.2 Twitter mining

    Data were mined from Twitter’s Application Program Interface (API), which allows access to

    Twitter’s raw data, using proprietary code commercially developed by the eponymous company

    “FollowtheHashtag.com”. The data provided included measures of influence and suggestion of the

    likely gender of the Twitter user, using algorithms developed by the company. These secondary

    analyses were not relevant to our research and only the basic components, as described below,

    were used. For tweets about ants and starlings, use of hashtags generated extensive datasets. For

    ants, the search hashtags were #flyingants and #flyingantday; for starlings the search hashtags were

    #murmuration, #murmurations, #murmurating and #starlingsurvey. For house spiders, the planned

    hashtags (#spider, #spiders, #housespider and #housespiders) generated just 38 tweets over 2 years

    combined. Accordingly, the search was changed to also include any tweet including “house spider*”.

    For each tweet, information was available on: (1) tweet content, (2) associated media (photographs,

    gifs or video), (3) date, (4) time, (5) twitter username, (6) tweeter biography (if available) and (7)

    self-declared tweeter location when available. These data were automatically parsed to create a

    comma separated variable file that could be loaded into Excel. Data included all tweets within the

    period covered by the relevant study and, for ants and spiders, the corresponding period from the

    preceding year (ants: 1st June–4th September 2011; spiders: 1st August 2012–28th January 2013).

    This enabled us to identify whether publicity surrounding the studies influenced Twitter activity

    2.3 Tweet cleaning and processing

    Tweets were manually processed before analysis on a tweet-bytweet basis. All tweets that did not

    relate to the phenomenon under consideration were removed, including non-relevant tweets (e.g.

    those about “The Flying Ants” band or “Murmuration Theatre”), tweets about flying ants, house

    spiders or starling murmurations not reporting a primary sighting, and non-UK tweets. Also, all

  • retweets were removed, again manually by searching for “RT” within the tweet contents. Finally,

    tweets from the organizations and individuals running the initially surveys (@uglosbioscience,

    @society_biology, @ RoyalSocBio, @AdamHartScience, @Dockling83, @RebeccaNesbit,

    @Natwittle) were deleted. This reduced the number of tweets from 3,009 to 2,345 for ants (77.9%

    retention), 11,227 to 6,218 for spiders (55.1% retention), and for starlings 1,520 to 135 (8.8%

    retention) (Table 1).

    There was no automated geotagging information for tweets but locational data could be determined

    for some based on content or tweeter biographies on a tweet-by-tweet basis. If a location was

    specified in the tweet (e.g. “#flyingants in Cardiff”) this could be manually transformed into latitude

    and longitude. The second method used the “home location” information present in around a third

    of Twitter biographies. Because there was no reason to suppose that the tweeter’s home location

    was the location of the ecological record, this approach was only used when additional confirmatory

    information was given in the tweet (e.g. “amazing #murmuration from my garden” or “my house is

    being overrun with #housespiders”). Again, location was transformed into latitude and longitude.

    Table 1 details the relative use of different methods; all work was done manually.

    Because tweeting is often undertaken on personal mobile devices as a rapid reaction to an

    ephemeral event it was assumed that tweet date (and in the case of spiders, tweet time) was closely

    related to that of the initial observation. Where there was evidence that the observation occurred

    on a different date (e.g. “fantastic #murmuration last night”) an amended date was generated and

    used in subsequent analyses. This happened for

  • Table 1 Sample sizes of Citizen Science (CS) studies and Twitter analyses (N= number of individual tweets)

    Flying ants House spiders Starling

    murmurations

    CS Twitter CS Twitter CS Twitter

    Submitted (raw data)

    Records year -1 - 577 - 6,384 - -

    Records year 1 6,034 806 10,268 4,893 1,644 312

    Records year 2 5,023 850 - - 1,567 1194

    Records year 3 4,982 766 - - - -

    Processed data (postcleaning)

    Records year -1 - 533 - 3,320 - -

    Records year 1 5,073 642 9,905 2,898 553 31

    Records year 2 4,074 598 - - 513 104

    Records year 3 4,247 572 - - - -

    Location derivable postcleaning

    Records year -1 - 107 (64+43)a - 697 (0+697) - -

    Records year 1 All 85 (68+17)a All 909 (0+909) All 28 (28+0)a

    Records year 2 All 162 (82+21)a - - All 84 (84+0)a

    Records year 3 All 138 (89+49)a - - - -

    Flying ants, year 1 = 2012, House spiders year 1 = 2013 and Starling murmurations year 1 = 2014. a Numbers in

    parentheses denote method of establishing location (location specified in tweet content + location specified in the

    biography of tweeter coupled with information within the content of the tweet specifying that they were at home or close

    by).

    Figure 1 Data from UK-wide study of winged ant emergences derived from Twitter mentions (top row) and a citizen science study (Hart et al., 2017) (bottom row) clearly show the close similarity between methods in temporal pattern and relative number, with no significant differences in distribution for any year (see Results). The number of tweets reporting ant emergences (See Methods) per week across the UK summer is shown for years −1 to 3 (2011–2014), whereas ant emergences reported as part of the flying ant citizen science study start in year 1 (2012)

  • Using Twitter data from the same year, only 5 of 597 (0.8%) tweets contained video or photos that

    were unambiguously of winged ants and clear enough for identification to genus (Lasius in all cases).

    3.1.2 Temporal patterns

    The temporal pattern of winged ant emergences was markedly different between years (Hart et al.,

    2017) and this was replicated in the Twitter-derived data in this study (Figure 1). There was broad

    agreement between the datasets at monthly level, with 97% of sightings occurring in July and August

    in Hart et al. (2017) compared to 95% in the same period for Twitter-derived data. However, more

    striking is the remarkable agreement in the national-scale temporal patterns described in Hart et al.

    (2017) and those derived from corresponding Twitter data for each of the three study years (Figure

    1). National-scale Twitter-derived data for 2011 (the year preceding the start of the study by Hart et

    al. (2017)) are generally comparable in terms of patterns and amplitude, strongly suggesting that

    Twitter-derived data for 2012–2014 (Years 1–3 of Hart et al. (2017)) were not confounded by social

    media activity related to the original study.

    Figure 2 A subset of data taken from the Greater London region for weekly winged ant emergences derived from Twitter mentions (top row) and a UK-wide citizen science study (Hart et al., 2017) (bottom row). The number of tweets reporting ant emergences (See Methods) per week across the UK summer is shown for years −1 to 3 (2011–2014) whereas ant emergences reported as part of the flying ant citizen science study start in year 1 (2012). As with the national data (Figure 1), the close similarity between methods in temporal pattern and relative number is clearly apparent

    The relatively high density of records from Greater London allowed Hart et al. (2017) to use London

    as a subset of data. London was a consistent subset in Hart et al. (2017) and in tweets response (Hart

    et al., 2017: 16.1%, 13.8% and 13.4% of all records in 2012, 2013 and 2014, respectively versus 5.1%,

    7.9% and 8.7% of all tweets in the same 3 years) (overall: 1,944/13,394 = 14.5% vs. 131/1827 =

    7.2%). Hart et al. (2017) analysed emergences using the subset of data relating to London, and once

  • again the patterns found in that study clearly agree with the patterns obtained using Twitter-derived

    data (Figure 2).

    A two-sample Kolmogorov–Smirnov test was used to compare temporal patterns of national ant

    sightings based on Twitter-derived and published data for each of the three survey years. To ensure

    that annual data could be compared directly (the samples sizes from Twitter differed by almost an

    order of magnitude from published data), data were transformed to give, for each dataset, the

    percentage of sightings reported each week that the survey was open. There was no significant

    difference in the temporal patterns shown by Hart et al. (2017) and Twitter-derived data for any

    year for either the national-scale data (2012: Z = 0.844, n1 = 14, n2 = 14, p = 0.415; 2013: Z = 0.530, n1

    = 14, n2 = 14, p = 0.941; 2014: Z = 0.705, n1 = 14, n2 = 14, p = 0.730) or the regional London subset

    (2012: Z = 1.095, n1 = 14, n2 = 14, p = 0.181; 2013: Z = 0.1.278, n1 = 14, n2 = 14, p = 0.076; 2014: Z =

    0.1.278, n1 = 14, n2 = 14, p = 0.076).

    3.1.3 Location: latitude and longitude

    Hart et al. (2017) found a weak but significant northwards and westwards movement in winged ant

    sightings as summer progressed (i.e. latitude was positively related and longitude was negatively

    related to date). Year was factored into their model but separate annual analyses were not reported.

    For comparison purposes, we have performed annual correlations in the original dataset (Table 2)

    showing that latitude was significantly positively related to date for all 3 years and that longitude

    was significantly negatively related to date every year. An analysis of Twitter-derived data showed

    latitude was likewise significantly positively related to date for all 3 years, but that longitude was

    only significantly negatively related to date in 2014 (Table 2), the year where the effect size in the

    original data was the largest.

    3.1.4 Spatial co-occurrence

    Hart et al. (2017) demonstrated that observations of winged ants were not significantly clustered at

    national, regional or local (Meteorological Office weather station) scales. They did this by calculating

    Euclidean distances between observations on a given day and using bootstrapping to compare these

    distances to the mean Euclidean distance for the same number of samples randomly drawn from the

    relevant full dataset nationally (UK), regional (London subset) or locally (closest Meteorological

    Office weather station) for that year (Hart et al. (2017)). The distance between observation locations

    at any scale on specific days was no lower than the distance for the related comparison data points.

    It would be possible to undertake the same analyses on Twitter-derived data but there were

    insufficient data: 128 tweets per year, on average, had reliable locational data, only four more data

  • points than the number of weather stations (N = 124) used for analysis of local spatial synchrony by

    Hart et al. (2017).

    3.1.5 Environmental triggers of ant flights

    Hart et al. (2017) performed detailed analyses of the environmental triggers of ant flights by

    comparing weather (specifically wind, temperature and pressure) on flight days with non-flight days

    (3 days before and after the focal day) at the same location. As with spatial synchrony, the density of

    tweets with suitable locational information was too low for such analysis. Theoretically it would be

    possible to examine tweets for weather information but only 124 of 1827 (6.8%) tweets across the 3

    years directly mentioned weather (hot* n = 49, warm* n = 8, sun* = 45, heatwave n = 5, humid n = 5,

    heat n = 12). No tweets mentioned any antonyms of weather conditions found by Hart et al. (2017)

    to be favourable for ant flight (cold*, cool, cloudy, windy, rain*).

    Table 2 Spatial patterns in winged ant emergence in relation to latitude and longitude in the original dataset and the Twitter-derived data from the same year

    Original data Twitter-derived data

    Year F df p Dir R2 F df p Dir R2

    Latitude 2012 13.960 1,5071

  • 3.2 Spiders

    3.2.1 Temporal pattern

    A two-sample Kolmogorov–Smirnov test was used to compare the temporal pattern of house spider

    tweets with the temporal pattern of house spider records from Hart et al. (2018). No rescaling of the

    house spider data to percentages was necessary as the sample sizes of the two datasets were

    approximately equal. There was no significant difference between the temporal distribution of

    recorded sightings and sightings derived from tweets (two-sample Kolmogorov–Smirnov test: Z =

    1.248, n1 = 26, n2 = 26, p = 0.089) (Figure 3).

    3.2.2 Time of day

    Sighting times reported by Hart et al. (2018) were significantly unimodal with a pronounced peak in

    early evening (mean 19:35 GMT (19:25–19:45 95% CI); Rayleigh’s test: Z = 981.6, N = 9,807, p <

    0.001). Sighting times derived from the time that a tweet was posted and were again significantly

    unimodal with a pronounced peak in early evening (mean 21:02 GMT (20:47–21:17 95% CI);

    Rayleigh’s test: Z = 410.7, N = 2,898 p < 0.001) (Figure 4). There was a statistically significant

    difference in the circular (time) distributions between the datasets (Mardia–Watson–Wheeler test:

    W = 97.687, n1 = 9807, n2 = 2,898, p < 0.001) with the Twitter data yielding a significantly later time.

    3.2.3 Latitude and longitude

    Hart et al. (2018) found a statistically significant but weak effect of latitude and longitude on spider

    phenology with sightings moving northwards and westwards through the autumn; a similar effect

    was found here using Twitter-derived data (Spearman rank correlation for latitude: rs = 0.067, n =

    1,606, p = 0.008; longitude: rs = −0.070, n = 1,606, p = 0.005). The r2 values estimated from Pearson

    were 0.004 and 0.002 for latitude and longitude, respectively, versus 0.076 and 0.027 in Hart et al.

    (2018).

    3.2.4 Sex ratio

    The relative ease with which larger spiders can be sexed allowed Hart et al. (2018) to ask

    respondents specifically to provide sex information for recorded sightings. They found 3,795 male

    spiders (82.3%) and 818 female spiders (17.7%), giving a highly significant male-skewed sex ratio for

    spiders recorded in residential homes. Of tweets that reported sex of observed spiders, 43 reported

    males (75.4%) and 14 reported females (24.6%), again giving a significant male-bias (chi-square

    goodness-of-fit test: χ2 = 14.754, df = 1, p = 0.0001). A chi-square test for association between Hart

    et al. (2018) data and Twitter-derived data was not significant (χ2 = 1.793 df = 1, p = 0.181), and thus

    the sex ratio does not differ between the datasets.

  • Figure 3 Sightings of house spiders (likely Tegenaria and Eratigena were reported via a citizen science study (Hart et al., 2018) (top) and derived from mentions within in tweets for the same year (middle) and the previous year (bottom). Week 1 begins on August 1st. The citizen science data and Twitter data are not significantly different (see Results)

    3.2.5 Location within the house

    Hart et al. (2018) asked respondents about where in the house their spider was seen made and

    8,241/9,905 respondents (83.2%) provided this information. Useable room information derivable

    from tweets was far lower but was available for 417/2,898 tweets (14.4%). The top five rooms in

    Hart et al. (2018) were, in descending order, living room, bathroom, bedroom, hallway/stairs and

    kitchen. Using Twitter-derived data, the top five rooms, in descending order, were bedroom,

  • bathroom, living room, hallway/stairs, and kitchen. The distribution of sightings was significantly

    different (chi- square test for association: χ2 = 130, df = 8, p < 0.001), a pattern driven by the higher

    frequency of sightings in bedrooms and lower frequency of sightings in living rooms for Twitter-

    derived data.

    Hart et al. (2018) also asked respondents about the location of reported spiders within a room

    (options were wall, floor, ceiling, furniture, door/window and sink/bath). This gave 7,789 records

    from 9,905 (78.6%) that could be analysed. Useable information derivable from tweets gave 265

    records from 2,898 (9.1%). The order of locations, in declining order was floor, wall, ceiling, sink,

    door/window, furniture (Hart et al., 2018) and furniture, floor, wall, door/window, ceiling, sink (this

    study). This difference was significant (chi-square test for association: χ2 = 1720, df = 5, p < 0.001)

    and was driven by the higher proportion of furniture-related sightings associated with Twitter-

    derived data (this was the most reported location in this study and only the fifth highest reported

    location in Hart et al. (2018)).

    Figure 4 The time of day of sightings of house spiders (likely Tegenaria and Eratigena) given by participants in a citizen science study (Hart et al., 2017) (left) and derived from tweets. There was a statistically significant difference in the circular (time) distributions between the datasets (see Results)

    3.3 Starlings

    The number of tweets referencing starling murmurations was considerably lower (N = 135) than the

    number of records obtained by Goodenough et al. (2017) (N = 1,066). In contrast to tweets on the

    other taxa, geographical location was almost always given in murmuration tweets (2014/15 = 90.3%;

    2015/16 = 80.8%). Accordingly, it was possible to map murmuration sightings from both original

  • records and Twitter-derived data (Figure 5). The main spatial patterns in the large dataset reported

    in Goodenough et al. (2017) were also present in the Twitter-derived data. Specifically, key

    murmuration hotspots (including Blackpool, Aberystwyth, Brighton, the Somerset levels and East

    Anglia) were identified in both datasets as were the limited sightings in Scotland (probably reflecting

    a lack of people recording murmurations rather than an absence of the phenomenon).

    Murmuration size and duration, were important components of Goodenough et al.’s (2017) analysis

    and compulsory questions in that study. Conversely, murmuration size, was mentioned in just nine

    tweets across 2 years (6.7%), while duration was only mentioned in one tweet. Given this lack of

    data, it was not possible to test the relationship between size and duration, nor the spatiotemporal

    patterns in these parameters, using Twitter-derived data to replicate Goodenough et al.’s (2017)

    analyses.

    3.3.1 Predator presence and temperature

    To test whether predator presence or temperature influenced murmuration size or duration,

    Goodenough et al. (2017) asked respondents to record these variables during murmuration events.

    Just five tweets mentioned birds of prey (specifically sparrowhawk Accipiter nisus (N = 3), peregrine

    Falco pereginus (N = 1) and kestrel Falco tinnunculus (N = 1)), a total of 3.7% of records versus 29.6%

    in Goodenough et al. (2017). Only one tweet specifically mentioned the absence of birds of prey.

    Temperature information was never given. The lack of information on predators and temperature

    (and murmuration size/duration) meant that no analyses could be undertaken to examine

    relationships between these variables.

    4. Discussion

    It has been suggested previously that Twitter-derived data has potential for ecological research (e.g.

    Daume, 2016), but this has not been extensively tested. Here, by comparing Twitter-derived data

    with published datasets on three ecological phenomena we have, for the first time, been able to

    provide a robust analysis of the value of Twitter mining in ecology. The comparator datasets were

    gathered using citizen science studies so we are in effect comparing primary, direct citizen science

    data (collected during nationwide citizen science campaigns) with secondary, indirect citizen science

    data (collected from Twitter). Citizen science is proving to be a reliable technique for gathering data

    across large spatial scales (e.g. Cooper et al., 2014) but is not without shortcomings (data are rarely

    validated, for example) (Lukyanenko et al., 2016). However, the citizen science studies used here

    provide the only data with which we can meaningfully compare nationwide Twitter-derived data,

    and the ecological insights derived from these studies have been published previously. If we are

  • willing to accept that citizen science provides useful ecological data, then using such studies as the

    basis for a cautious comparison of methods is an acceptable approach. With this caveat in mind, we

    found that such data can be used successfully but there some important limitations.

    Figure 5 Map of starling murmurations based on Goodenough et al. (2017) (top) and Twitter-derived data (bottom); the size of the symbol denotes the number of records from that location

  • Central to the winged ants and house spider focal comparator studies were temporal aspects of the

    phenomena and we were able to replicate the main temporal findings of both studies using Twitter

    data. Indeed, the complex pattern of peaks and troughs in ant emergence and the autumnal peak in

    spider sightings within homes as reported on Twitter were so similar to the published data that

    Twitter mining would have yielded conclusions identical to those in the published dataset. This was

    also the case for winged ant emergences from the London subset, indicating that Twitter data can be

    robust at sub-national scales provided that there are sufficient tweets. It is possible that people

    could both be tweeting about a phenomenon and recording identical data in the citizen science

    studies, leading to a form of pseudoreplication. We doubt that this is a substantial effect since in

    cases where we had tweets from the year before the citizen science study, there was no pronounced

    increase in the subsequent year. In any case, anonymity (in the case of the citizen science studies)

    and identify ambiguity (Twitter names are not necessarily relatable to an individual) means we were

    unable to investigate this further.

    For both ants and spiders, it is perhaps the immediacy of Twitter, the “urgency” of the phenomena

    in question and the desire to connect to other users (Chen, 2011) that contributes to this success.

    The emergence of winged ants is popular in the media (e.g. Vulliamy, 2017) and frequently evokes

    an emotional response from tweeters (e.g. “#Flyingants have taken over London today!”). Likewise,

    the annual appearance of house spiders has garnered considerable media attention (e.g. Duell,

    2017). Many people have negative attitude to spiders in their homes and often share spider

    sightings on this basis (a typical tweet being “Just found a MASSIVE house spider in my bedroom, I’m

    too scared to sleep now”). Consequently, tweets are generally posted on the same day as the event.

    At a finer scale, the time of spider sightings using Twitter-derived data yielded a significantly later

    mean time than that reported in Hart et al. (2018). We suspect that while tweets may be posted

    soon after the actual event, they are not always posted immediately. Thus, asking people to report

    an exact time retrospectively is a more accurate measure of the timing than using tweet posting

    time, at least for this phenomenon.

    All tweets have an automatic date/time stamp and temporal data do not rely on additional user

    input. Conversely, spatial data were not, at the time of the study, automatically available. Other

    social media sites such as Flickr, which geotag uploaded images, have been used in studies of species

    distributions using either a keyword tagging or mining tag approach (El Qadi et al., 2017; Stafford et

    al., 2010). It is possible to use the World Wide Web Consortium (W3C) Geolocation API to enable

    device information from web browsers of mobile phones or laptops (Doty & Wilde, 2010) to be

    accessed but this was not available in this retrospective study. More recently, Twitter has launched

    the option of having latitude and longitude automatically added to tweets via “share precise

  • location” in version 6.26.0. This only became live on 9 December 2016 and does not apply to

    retrospective mining, such as that used here, but might open new research avenues in the future

    provided sufficient users opt in.

    Despite the above limitations, we were able to replicate most baseline spatial and spatiotemporal

    findings of the comparator studies by trawling tweet content and tweeter biography to determine

    likely location. Like Hart et al. (2017) we found latitude was significantly positively related to date of

    winged ant emergence (i.e. emergences move northwards) for all 3 years using Twitter data and

    significantly negatively related to longitude for 1 year (2014) but we were unable to replicate their

    findings for longitude for the other 2 years where original effect sizes were much smaller. This

    suggests that weak trends might be harder to quantify using Twitter, possibly because of the smaller

    sample sizes involved or the courser spatial scale inherent in determining location indirectly. We

    were able to replicate both the latitude and longitude relationships for spider sightings found by

    Hart et al. (2018), probably because of the much higher sample size. However, although baseline

    spatial findings could be replicated, more complex analyses were not possible. For example, the

    paucity of spatial data precluded replication of the spatial clustering analyses or modelling

    environmental triggers of ant flights, which were major components of the original study.

    In stark contrast to ant and spider tweets, location information was provided in ca. 90% of tweets

    relating to starling murmurations. Given the fact that murmurations often become hotspots for

    people wanting to view them, and thus location is relevant to both the tweeter and the followers, it

    is not perhaps surprising that most tweets gave locational information. We were able to replicate

    the broad spatial occurrence map of Goodenough et al. (2017) and to identify a number of the same

    key locations. However, our ability to further analyse starling murmurations spatially was hampered

    by the relatively small number of people tweeting about them. There is no a priori reason to assume

    that murmurations are less “tweet worthy” than house spiders but it is possible that many of those

    visiting murmurations are not typical tweeters. Delving in to the motivations behind tweeting (e.g.

    Toubia & Stephen, 2013) is beyond the scope of this paper and has not been carried out for

    ecological phenomena but our findings throughout strongly suggest that it would be a sensible

    approach if Twitter mining is to be used for ecology research. It might be, for example, that there is

    considerable bias in tweeting about ecological phenomena perceived negatively or that tweeters are

    more prone to exaggerate or embellish reports compared to those respondents motivated to fill in

    details in a bespoke citizen science campaign.

    The ant and spider studies were primarily spatiotemporal explorations but the starling murmuration

    study tested two causal hypotheses viz. the “safer together” hypothesis (murmurations protect

  • against predators) and the “warmer together” hypothesis (murmurations advertise roost location).

    Goodenough et al. (2017) were able to support the former because they had specifically requested

    data on murmuration size, duration, predator presence and temperature. Such information was

    rarely recorded on tweets, probably because the tweeter’s motivations for tweeting and the

    restricted character count did not encourage sharing this ecologically useful information (also found

    by Fuka et al., 2013), and it was thus not possible to replicate their findings.

    All three comparator studies had additional idiosyncratic ecological findings that we were not always

    successful in replicating. The winged ant study (Hart et al., 2017) was able to identify the species for

    a substantial subset of 1 year’s records. Tweets did not provide species identifications although

    uploaded media allowed identification in a few cases. Hart et al. (2017) also found that ant nests in

    urban areas or associated with heat-retaining structures emerged earlier but few tweets provided

    relevant information and so we were unable to replicate these findings. Neither Twitter data nor the

    comparison citizen science data were able to identify spiders to species, although comparison with

    other studies led Hart, Nesbit and Goodenough to conclude that their data were likely reflecting the

    ecology of Tengenaria and Eratigena, the spiders most commonly referred to as house spiders. The

    sex ratio of house spiders (Hart et al., 2018) was a result we were able to replicate with Twitter data

    even though the number of tweets mentioning sex was relatively low (N = 57). The clear sexual

    dimorphism in many of the larger spider species possibly contributed to tweeters including that

    information on their tweets. Hart et al. (2018) analysed the rooms in which spiders were sighted

    position in those rooms and, perhaps because of the immediate relevance of this finer-scale

    locational information (e.g. “I’ve just seen a giant spider on the floor of my bathroom!”), 14.4% of

    tweets reported room location and 9.1% reported location within room. Comparative analysis,

    however, showed a difference in the distribution between and within rooms relative to the original

    dataset. This might be due to the difference between the objective report of a sighting and a tweet.

    Twitter is not an objective reporting medium and the emotional response of seeing a spider

    somewhere close to sleeping or bathing areas might be sufficient to tip tweeters over a “tweeting

    threshold”. As discussed above, the motivation and behaviour of people on social media are likely to

    affect when and what people post and, in turn, the availability and reliability of Twitter-derived

    information.

    We conclude that Twitter mining has great potential for providing data on ecological phenomena,

    especially temporal data and, for some phenomena, spatial data. This is especially true for

    phenomena that have a specific date of occurrence (e.g. winged ant emergence) or a specific

    location (e.g. a murmuration site). However, while broad-scale spatial patterns can be identified at

    both national and sub-national levels, low sample sizes can preclude detailed analysis of spatial data

  • in relation to, for example, environmental parameters or temporal data, especially when effect sizes

    are small. Tweets were very much less successful in gathering data needed for testing specific

    hypotheses or answering specific question (e.g. abiotic cues for ant emergence, biotic cues for

    murmuration behaviour) simply because so few tweeters provided the necessary (possibly rather

    obscure) biological detail. Overall, we conclude that Twitter mining, and likely other social media

    data mining, is a form of indirect citizen science that represents a real opportunity for ecological

    research, especially for phenological studies of relatively charismatic events and species, provided

    that the pattern of Twitter usage is well-matched to the questions and phenomena of interest.

    Acknowledgements

    A.G.H. and A.E.G. conceived the idea and designed methodology; M.R. oversaw the collection the

    Twitter data; E.H.S. helped with cleaning the Twitter data, A.G.H. and A.E.G. analysed the data;

    W.S.C. produced the maps, A.G.H. and A.E.G. led wrote the manuscript. All authors contributed

    critically to the drafts and gave final approval for publication.

    Data accessibility

    Data used for this study are archived in the University of Gloucestershire Repository

    http://eprints.glos.ac.uk/id/eprint/5772.

    ORCID

    Adam G. Hart http://orcid.org/0000-0002-4795-9986

    http://eprints.glos.ac.uk/id/eprint/5772http://orcid.org/0000-0002-4795-9986

  • References Bonney, R., Cooper, C. B., Dickinson, J., Kelling, S., Phillips, T., Rosenberg, K. V., & Shirk, J. (2009).

    Citizen science: A developing tool for expanding science knowledge and scientific literacy.

    BioScience, 59, 977–984. https://doi.org/10.1525/bio.2009.59.11.9

    Catlin-Groves, C. L. (2012). The citizen science landscape: From volunteers to citizen sensors and

    beyond. International Journal of Zoology, 2012. https://doi.org/10.1155/2012/349630

    Chen, G. M. (2011). Tweet this: A uses and gratifications perspective on how active Twitter use

    gratifies a need to connect with others. Computers in Human Behavior, 27(2), 755–762.

    https://doi. org/10.1016/j.chb.2010.10.023

    Cooper, C. B., Shirk, J., & Zuckerberg, B. (2014). The invisible prevalence of citizen science in global

    research: Migratory birds and climate change. PLoS ONE, 9, e106508.

    https://doi.org/10.1371/journal. pone.0106508

    Crooks, A., Croitoru, A., Stefanidis, A., & Radzikowski, J. (2013). #Earthquake: Twitter as a distributed

    sensor system. Transactions in GIS, 17(1), 124–147. https://doi. org/10.1111/j.1467-

    9671.2012.01359.x

    Daume, S. (2016). Mining twitter to monitor invasive alien species—an analytical framework and

    sample information topologies. Ecological Informatics, 31, 70–82.

    https://doi.org/10.1016/j.ecoinf.2015.11.014

    Dennis, E. B., Morgan, B. J., Brereton, T. M., Roy, D. B., & Fox, R. (2017). Using citizen science

    butterfly counts to predict species population trends. Conservation Biology, 31, 1350–1361.

    https://doi. org/10.1111/cobi.12956

    Doty, N., & Wilde, E. (2010). Geolocation privacy and application platforms. In Proceedings of the 3rd

    ACM SIGSPATIAL International Workshop on Security and Privacy in GIS and LBS (pp. 65–69)

    ACM.

    Duell, M. (2017). Wet weather sparks huge spider invasion in homes across UK as creatures rush

    inside for warmth and shelter. The Daily Mail, 25th August

    http://www.dailymail.co.uk/news/article-4822640/UK-weather-Bank-Holiday-weekend-

    SPIDERinvasion.html

    El Qadi, M. M., Dorin, A., Dyer, A., Burd, M., Bukovac, Z., & Shrestha, M. (2017). Mapping species

    distributions with social media geo-tagged images: Case studies of bees and flowering plants

    in Australia. Ecological informatics, 39, 23–31.

    Fellenor, J., Barnett, J., Potter, C., Urquhart, J., Mumford, J. D., & Quine, C. P. (2017). The social

    amplification of risk on Twitter: The case of ash dieback disease in the United Kingdom.

    Journal of Risk Research, 1–21. https://doi.org/10.1080/13669877.2017.1281339

    Fournier, A., Sullivan, A. R., Bump, J. K., Perkins, M., Shieldcastle, M. C., & King, S. L. (2017).

    Combining citizen science species distribution models and stable isotopes reveals migratory

    connectivity in the secretive Virginia rail. Journal of Applied Ecology, 54, 618–627. https://

    doi.org/10.1111/1365-2664.12723

    https://doi.org/10.1155/2012/349630https://doi.org/10.1016/j.ecoinf.2015.11.014http://www.dailymail.co.uk/news/article-4822640/UK-weather-Bank-Holiday-weekend-SPIDERinvasion.htmlhttp://www.dailymail.co.uk/news/article-4822640/UK-weather-Bank-Holiday-weekend-SPIDERinvasion.htmlhttps://doi.org/10.1080/13669877.2017.1281339

  • Fuka, M. Z., Osborne-Gowey, J. D., & Fuka, D. R. (2013). Shifting species ranges and changing

    phenology: A new approach to mining social media for ecosystems observations. In AGU Fall

    Meeting Abstracts.

    Goodenough, A. E., Little, N., Carpenter, W. S., & Hart, A. G. (2017). Birds of a feather flock together:

    Insights into starling murmuration behaviour revealed using citizen science. PLoS ONE, 12,

    e0179277. https://doi.org/10.1371/journal.pone.0179277

    Grover, P., Kar, A. K., Dwivedi, Y. K., & Janssen, M. (2017). The untold story of USA presidential

    elections in 2016-insights from twitter analytics. In A. K. Kar, P. Vigneswara Ilavarasan, M. P.

    Gupta, Y. K. Dwivedi , M. Mäntymäki, M. Janssen, A. Simintiras, & S. Al-Sharhan, (Eds.),

    Conference on e-Business, e-Services and e-Society (pp. 339– 350).

    Cham, Switzerland: Springer. Hart, A. G., Hesselberg, T., Nesbit, R., & Goodenough, A. E. (2017). The

    spatial distribution and environmental triggers of ant mating flights: Using citizen-science

    data to reveal national patterns. Ecography, 40, https://doi.org/10.1111/ecog.03140

    Hart, A. G., Nesbit, R., & Goodenough, A. E. (2018). Spatiotemporal variation in house spider

    phenology at a national scale using citizen science. Arachnology, 17, 331–334.

    https://doi.org/10.13156/ arac.2017.17.7.331

    Kuchner, M. J., Faherty, J. K., Schneider, A. C., Meisner, A. M., Filippazzo, J. C., Gagné, J., … Mokaev,

    K. (2017). The first brown dwarf discovered by the backyard worlds: Planet 9 citizen science

    project. The Astrophysical Journal Letters, 841(2), L19. https://doi. org/10.3847/2041-

    8213/aa7200

    Kumar, S., Morstatter, F., & Liu, H. (2014). Twitter data analytics. New York: Springer.

    https://doi.org/10.1007/978-1-4614-9372-3

    Lukyanenko, R., Parsons, J., & Wiersma, Y. F. (2016). Emerging problems of data quality in citizen

    science. Conservation Biology, 30, 447–449. https://doi.org/10.1111/cobi.12706

    McKinley, D. C., Miller-Rushing, A. J., Ballard, H. L., Bonney, R., Brown, H., Cook-Patton, S. C., … Ryan,

    S. F. (2017). Citizen science can improve conservation science, natural resource

    management, and environmental protection. Biological Conservation, 208, 15–28. https://

    doi.org/10.1016/j.biocon.2016.05.015

    Newson, S. E., Moran, N. J., Musgrove, A. J., Pearce-Higgins, J. W., Gillings, S., Atkinson, P. W., …

    Baillie, S. R. (2016). Long-term changes in the migration phenology of UK breeding birds

    detected by largescale citizen science recording schemes. Ibis, 158, 481–495. https://

    doi.org/10.1111/ibi.12367

    Parr, J., & Sewell, J. (2017). Citizen sentinels: The role of citizen scientists in reporting and monitoring

    invasive non-native species. In J. A. Cigliano, & H. L. Ballard (Eds.), Citizen Science for Coastal

    and Marine Conservation (pp. 59–76). London, UK: Routledge.

    Pescott, O. L., Walker, K. J., Pocock, M. J., Jitlal, M., Outhwaite, C. L., Cheffings, C. M., … Roy, D. B.

    (2015). Ecological monitoring with citizen science: The design and implementation of

    schemes for recording plants in Britain and Ireland. Biological Journal of the Linnean Society,

    115, 505–521. https://doi.org/10.1111/bij.12581

    https://doi.org/10.1371/journal.pone.0179277https://doi.org/10.1111/ecog.03140https://doi.org/10.1007/978-1-4614-9372-3https://doi.org/10.1111/cobi.12706https://doi.org/10.1111/bij.12581

  • Purohit, H., Hampton, A., Shalin, V. L., Sheth, A. P., Flach, J., & Bhatt, S. (2013). What kind of#

    conversation is Twitter? Mining# psycholinguistic cues for emergency coordination.

    Computers in Human Behavior, 29, 2438–2447. https://doi.org/10.1016/j.chb.2013.05.007

    Rowbotham, S., McKinnon, M., Leach, J., Lamberts, R., & Hawe, P. (2017). Does citizen science have

    the capacity to transform population health science? Critical Public Health, 1–11.

    https://doi.org/10.1080 /09581596.2017.1395393

    Stafford, R., Hart, A. G., Collins, L., Kirkhope, C. L., Williams, R. L., Rees, S. G., … Goodenough, A. E.

    (2010). Eu-social science: The role of internet social networks in the collection of bee

    biodiversity data. PLoS ONE, 5, e14381. https://doi.org/10.1371/journal.pone.0014381

    Theobald, E. J., Ettinger, A. K., Burgess, H. K., DeBey, L. B., Schmidt, N. R., Froehlich, H. E., … Parrish,

    J. K. (2015). Global change and local solutions: Tapping the unrealized potential of citizen

    science for biodiversity research. Biological Conservation, 181, 236–244. https://doi.

    org/10.1016/j.biocon.2014.10.021

    Toubia, O., & Stephen, A. T. (2013). Intrinsic vs. image-related utility in social media: Why do people

    contribute content to twitter? Marketing Science, 32, 368–392.

    https://doi.org/10.1287/mksc.2013.0773

    Tufekci, Z. (2014). Big questions for social media big data: Representativeness, validity and other

    methodological pitfalls. ICWSM, 14, 505–514.

    Vulliamy, E. (2017). Flying ant day 2017: When is it? What is it? Everything you need to know. The

    Independent July 5th https://www.independent.co.uk/news/science/flying-ant-day-2017-

    when-is-it-what-is-ituk-ants-infestation-a7824186.html

    Willis, C. G., Law, E., Williams, A. C., Franzone, B. F., Bernardos, R., Bruno, L., … Davis, C. C. (2017).

    CrowdCurio: An online crowdsourcing platform to facilitate climate change studies using

    herbarium specimens. New Phytologist, 215, 479–488. https://doi.org/10.1111/nph.14535

    Zipper, S. C. (2018). Agricultural research using social media data. Agronomy Journal, 110, 349–358.

    https://doi.org/10.2134/ agronj2017.08.0495

    https://doi.org/10.1016/j.chb.2013.05.007https://doi.org/10.1080%20/09581596.2017.1395393https://doi.org/10.1371/journal.pone.0014381https://doi.org/10.1287/mksc.2013.0773https://www.independent.co.uk/news/science/flying-ant-day-2017-when-is-it-what-is-ituk-ants-infestation-a7824186.htmlhttps://www.independent.co.uk/news/science/flying-ant-day-2017-when-is-it-what-is-ituk-ants-infestation-a7824186.htmlhttps://doi.org/10.1111/nph.14535