2.1. Nipah Virus Spillover Events To investigate the spatial and temporal patterns of Nipah virus spillover in Bangladesh, we compiled data on the number of spillover events and affected administrative districts during 2001–2018. Cases prior to 2007 were detected through community investigations following reports of clusters of encephalitis. Cases from 2007 onward reflect those identified through systematic surveillance for Nipah virus infection at three tertiary care hospitals combined with investigations of all cases detected to look for clusters, as well as any reports of possible outbreaks through media or other information sources. Independent spillover events were defined as index cases of Nipah virus infection within a given outbreak year. This definition excludes cases that resulted from secondary human-tohuman transmission following spillover.
2.2. Climate Data Expanding on the results from Cortes et al. showing associations between climate and the number of spillover events during 2007–2013, we used data from 20 weather stations in Bangladesh. Mean temperature at 3 hour intervals and daily precipitation between 1953–2015 were obtained from the Bangladesh Meteorological Department. Daily temperature and precipitation summary data from 2015 onward were obtained from the National Climatic Data Center and merged with the older data. We also downloaded monthly indices for three major climate cycles that lead to temperature and precipitation anomalies in the region: the multivariate El Niño–Southern Oscillation (ENSO) index (MEI), the Indian Ocean dipole mode index (DMI), and the subtropical Indian Ocean dipole index (SIOD). Data were retrieved from the Japan Agency for Marine-Earth Science and Technology Application Laboratory [80] and the National Oceanic and Atmospheric Administration Physical Sciences Laboratory. On the basis of the frequency of Nipah virus spillovers occurring in winter, we focused on weather summary statistics for each year that covered the period from the start of the preceding December to the end of February of a focal outbreak year. We calculated the mean and recorded the minimum temperature over all stations, the percentage of days below 17 0C, and the cumulative precipitation from all stations over the focal period. The choice of 17 0C was arbitrary but represents an upper bound for relative coolness during winter that does not produce any zeros. Mean winter MEI, DMI, and SIOD values were also calculated for each year.
2.3. Survey of Bat Roost Sites and Food Resources The spatial distribution of Pteropus medius in Bangladesh was inferred from a countrywide survey of villages as part of investigations regarding risk factors for Nipah spillover performed over the winters of 2011–2012 and 2012–2013. Briefly, trained teams of data collectors interviewed key informants within villages, who identified known bat roost sites (both occupied and unoccupied) in the village and within 5 km of the village and reported details of the duration of roost occupancy and perceived population trends. The interviewers also mapped the location and number of date palm trees (Phoenix sylvestris) and known feeding sites that bats were reported to visit within 500 m of the villages. Feeding sites included fruit trees planted in orchards or in residential areas: jujube (Ziziphus mauritiana), banana, mango, guava, lychee, star fruit, jackfruit, papaya, sapodilla (Manilkara zapota), mulberry, hog plum (Spondias mombin), Indian olive (Elaeocarpus serratus), and other species.
2.4. Spatial Covariates of Bat Roost Sites To evaluate spatial covariates that could explain the occupancy (presence/absence of bats) and abundance (estimated population size) of bats living in mapped roost sites, we extracted data from available raster surfaces describing human population density, land-use, bioclimatic variables (e.g., mean annual temperature and precipitation), elevation, slope, and forest cover. Spatial covariate raster files were downloaded from WorldPop [82,83], the Socioeconomic Data and Applications Center (SEDAC) [84], WorldClim [85], and a study on global forest-cover change [86]. We also calculated the distance from an index roost site to the nearest village, neighboring roost, date palm tree, and feeding site, and the number of villages, other mapped roosts, date palm trees, and feeding sites within a 15 km radius around each roost. Average nightly foraging distances of individual P. medius in two colonies in Bangladesh were estimated to be 10.8 km and 18.7 km; thus, 15 km was chosen to represent the distance a bat might expect to travel to reach a suitable feeding site. The number of potential covariates was initially reduced by removing variables that were colinear (Pearson’s correlation greater than 0.7). Descriptions, sources, spatial resolution, and distribution statistics for all 32 covariates are provided in Table S1 (Supplementary Materials).
2.5. Historical Land-Use Data Given the reliance of P. medius on tall trees for roosting and various native and cultivated fruit trees for food, we gathered data on historical changes in land-use, particularly forested lands, across Bangladesh from data sources covering separate but overlapping time periods. Reconstructed natural biomes and anthropogenic biomes from 1700–2000 were extracted from rasters produced by Ellis et al. [87] using the HYDE 3.1 data model [88] and available from SEDAC. We reclassified their land-use subcategories into three primary categories: dense settlements, consisting of urban and suburban areas with high human population density (>100 persons/km2 for settlements, >2500 persons/km2 for urban areas), rice villages and other croplands or rangelands, and forested areas, including populated woodlands and remote forests. Land-use data for the years 1992, 2004, 2015, and 2018 were downloaded from the Organization for Economic Cooperation and Development (OECD) land-cover database [89], derived from European Space Agency Climate Change Initiative land-cover maps. Data for 1990 and 2016 were provided by the World Bank. Land cover over the period 1930–2014 came from an analysis by Reddy et al. Lastly, forest cover from 2000 and subsequent forest loss as of 2017 were calculated from maps produced by Hansen et al. [86] using the R package gfc analysis. For the calculations from Hansen et al. data, we chose a cutoff of 40% forest-cover density to match the definition of dense forests used by Reddy et al. Across these datasets, we calculated the percentage of Bangladesh’s total land area (147,570 km2 that was classified as forest.
2.6. Statistical Analysis Separate Nipah virus spillover events were clustered geographically by the latitude and longitude of affected administrative districts and temporally by the date of illness of each index case using a bivariate normal kernel via the R package MASS. To examine the association between Nipah virus spillovers and climate variables, separate generalized linear models were produced that examined climate summary statistics and the number of spillover districts or independent spillover events assuming a Poisson distribution for each response. Model selection was performed to choose the best-fitting combination of climate covariates according to Akaike’s information criterion corrected for small sample sizes (AICc) [96] using the R package MuMIn.
The importance of spatial covariates in explaining variation in the occupancy and abundance of bats at roost sites was assessed through a combination of linear modeling and machine learning. The covariates were standardized, and data were split into two sets: an occupancy dataset of 488 mapped roost sites with a binary variable describing whether bats were currently present or not and an abundance dataset of 323 mapped roost sites with the estimated count of bats at each currently occupied roost at the time of the interview. Both datasets were split into training (80%) and testing (20%) sets for the validation of models. Generalized linear models (GLMs) were fit with all potential covariates, assuming a binomial distribution for roost site occupancy and a negative binomial distribution for roost counts, which was chosen because of the observed overdispersion of the data, with a variance–mean ratio greater than unity. Due to a large number of potential covariates, least absolute shrinkage and selection operator (LASSO) regularization was implemented to reduce the number of covariates and minimize prediction error. We also used random forests to perform covariate selection and assess explanatory power. This machine learning method constructs many decision trees using random subsets of the response variable and covariates then averages the predictions. This method of constructing and averaging a set of uncorrelated decision trees reduces overfitting relative to single decision trees. Linear modeling and random forests were performed in R using the packages caret, glmnet, and ranger.