• View in gallery

    Distribution of the 177 spatially unique anthrax localities used for genetic algorithm for rule-set prediction model building. Print: Black dots indicate the data used for model building (n = 130) and gray dots indicate the independent hold-out sample used for model validation (n = 47). Online: Yellow dots indicate the data used for model building (n = 130) and green dots indicate the independent hold-out sample used for model validation (n = 47). This figure appears in color at www.ajtmh.org.

  • View in gallery

    Predicted distribution of Bacillus anthracis in the 48 contiguous United States based on the genetic algorithm for rule-set prediction. Color ramp from light to dark represents increasing number of models that agree from the best subset. Stars indicate areas where recent laboratory-confirmed outbreaks have occurred, but lack enough spatial resolution to be included in model building or accuracy metrics. This figure appears in color at www.ajtmh.org.

  • 1

    Smith KL, De Vos V, Price LB, Hugh-Jones ME, Keim P, 2000. Bacillus anthracis diversity in Kruger National Park. J Clin Microbiol 38 :3780–3784.

    • Search Google Scholar
    • Export Citation
  • 2

    Gainer RS, Saunders R, 1989. Aspects of the epidemiology of anthrax in Wood Buffalo National Park and environs. Can Vet J 30 :953–956.

  • 3

    Kaufmann AF, 1990. Observations on the occurrence of anthrax as related to soil type and rainfall. Salisbury Med Bull Suppl 68 :16–17.

  • 4

    Smith KL, De Vos V, Bryden HB, Hugh-Jones ME, Klevytska A, Price LB, Keim P, Scholl DT, 1999. Meso-scale ecology of anthrax in southern Africa: a pilot study of diversity and clustering. J Appl Microbiol 87 :204–207.

    • Search Google Scholar
    • Export Citation
  • 5

    Van Ness G, Stein CD, 1956. Soils of the United States favorable for anthrax. J Am Vet Med Assoc 128 :7–9.

  • 6

    Van Ness GB, 1959. Anthrax—a soil borne disease. Soil Conserv 21 :206–208.

  • 7

    Van Ness GB, 1959. Soil relationship in the Oklahoma-Kansas anthrax outbreak of 1957. J Soil Water Conserv 1 :70–71.

  • 8

    Van Ness GB, 1971. Ecology of anthrax. Science 172 :1303–1307.

  • 9

    Dragon DC, Rennie RP, 1995. The ecology of anthrax spores: tough but not invincible. Can Vet J 36 :295–301.

  • 10

    Stein CD, 1945. The history and distribution of anthrax in livestock in the United States. Vet Med (Praha) 40 :340–349.

  • 11

    Stein CD, van Ness GB, 1955. A ten year survey of anthrax in livestock with special reference to outbreaks in 1954. Vet Med (Praha) 50 :579–590.

    • Search Google Scholar
    • Export Citation
  • 12

    Hugh-Jones ME, de Vos V, 2002. Anthrax and wildlife. Rev Sci Tech 21 :359–383.

  • 13

    Grinnell J, 1917. The niche-relationships of the California thrasher. Auk 34 :427–433.

  • 14

    Hutchinson GE, 1957. Concluding remarks. Cold Spring Harb Symp Quant Biol 22 :415–427.

  • 15

    Peterson AT, Bauer JT, Mills JN, 2004. Ecologic and geographic distribution of filovirus disease. Emerg Infect Dis 10 :40–47.

  • 16

    Stockwell D, Peters D, 1999. The GARP modelling system: problems and solutions to automated spatial prediction. Int J Geogr Inf Sci 13 :143–158.

    • Search Google Scholar
    • Export Citation
  • 17

    Stockwell DRB, Peterson AT, 2002. Effects of sample size on accuracy of species distribution models. Ecol Modell 148 :1–13.

  • 18

    Anderson RP, Lew D, Peterson AT, 2003. Evaluating predictive models of species’ distributions: criteria for selecting optimal models. Ecol Modell 162 :211–232.

    • Search Google Scholar
    • Export Citation
  • 19

    Kluza DA, McNyset KM, 2005. Ecological niche modeling of aquatic invasion species. Aquat Invaders 16 :1–7.

  • 20

    Ron RS, 2005. Predicting the distribution of the amphibian pathogen Batrachochytrium dendrobatidis in the New World. Biotropica 37 :209–221.

    • Search Google Scholar
    • Export Citation
  • 21

    Peterson AT, 2001. Predicting species’ geographic distributions based on ecological niche modeling. Condor 103 :599–605.

  • 22

    Peterson AT, Vieglais DA, 2001. Predicting species invasions using ecological niche modeling: new approaches from bioinformatics attack a pressing problem. Bioscience 51 :363–371.

    • Search Google Scholar
    • Export Citation
  • 23

    Anderson RP, Gomez-Laverde M, Peterson AT, 2002. Geographical distributions of spiny pocket mice in South America: insights from predictive models. Glob Ecol 11 :131–141.

    • Search Google Scholar
    • Export Citation
  • 24

    Raxworthy CJ, Martinez-Meyer E, Horning N, Nussbaum RA, Schneider GE, Ortega-Huerta MA, Townsend Peterson A, 2003. Predicting distributions of known and unknown reptile species in Madagascar. Nature 426 :837–841.

    • Search Google Scholar
    • Export Citation
  • 25

    Wiley EO, McNyset KM, Peterson AT, Robins CR, Stewart AM, 2003. Niche modeling and geographic range predictions in the marine environment using a machine-learning algorithm. Oceanography 16 :120–127.

    • Search Google Scholar
    • Export Citation
  • 26

    McNyset KM, 2005. Use of ecological niche modelling to predict distributions of freshwater fish species in Kansas. Ecol Freshwater Fish 14 :243–255.

    • Search Google Scholar
    • Export Citation
  • 27

    Peterson AT, Sanchez-Cordero V, Beard CB, Ramsey JM, 2002. Ecologic niche modeling and potential reservoirs for Chagas disease, Mexico. Emerg Infect Dis 8 :662–667.

    • Search Google Scholar
    • Export Citation
  • 28

    Costa J, Peterson AT, Beard CB, 2002. Ecologic niche modeling and differentiation of populations of Triatoma brasiliensis neiva, 1911, the most important Chagas’ disease vector in northeastern Brazil (hemiptera, reduviidae, triatominae). Am J Trop Med Hyg 67 :516–520.

    • Search Google Scholar
    • Export Citation
  • 29

    Beard CB, Pye G, Steurer FJ, Rodriquez R, Campman R, Townsend Peterson A, Ramsey J, Wirtz RA, Robinson LE, 2003. Chagas disease in a domestic transmission cycle in southern Texas, USA. Emerg Infect Dis 9 :103–105.

    • Search Google Scholar
    • Export Citation
  • 30

    Adjemian JCZ, Girvetz EH, Beckett L, Foley JE, 2006. Analysis of genetic algorithm for rule-set prodution (GARP) modeling approach for predicting distributions of fleas implicated as vectors of plague, Yersinia pestis, in California. J Med Entomol 43 :93–103.

    • Search Google Scholar
    • Export Citation
  • 31

    Hijmans RJ, Cameron SE, Parra JL, Jones PG, Jarvis A, 2005. Very high resolution interpolated climate surfaces for global land areas. Int J Climatol 25 :1965–1978.

    • Search Google Scholar
    • Export Citation
  • 32

    Hay SI, Tatem AJ, Graham AJ, Goetz SJ, Rogers DJ, 2006. Global environmental data for mapping infectious disease distribution. Hay S, Graham AJ, Rogers DJ, eds. Global Mapping of Infectious Diseases: Methods, Examples, and Emerging Application. London: Academic Press, 38–79.

  • 33

    Peterson AT, Papes M, Kluza DA, 2003. Predicting the potential invasive distributions of four alien plant species in North America. Weed Sci 51 :863–868.

    • Search Google Scholar
    • Export Citation
  • 34

    Peterson AT, Cohoon KP, 1999. Sensitivity of distributional prediction algorithms to geographic data completeness. Ecol Modell 117 :159–164.

    • Search Google Scholar
    • Export Citation
  • 35

    Centor RM, 1991. Signal detectability: the use of ROC curves and their analyses. Med Decis Making 11 :102–106.

  • 36

    Zweig MH, Campbell G, 1993. Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clin Chem 39 :561–577.

    • Search Google Scholar
    • Export Citation
  • 37

    Hanley JA, McNeil BJ, 1982. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143 :29–36.

    • Search Google Scholar
    • Export Citation
  • 38

    Van Ert MN, Easterday WR, Huynh LY, Okinaka RT, Hugh-Jones ME, Ravel J, Zanecki SR, Pearson T, Simonson T, Uren JM, Kachur SM, Leadem-Dougherty RR, Rhoton SD, Zinser G, Farlow J, Coker PR, Smith KL, Wang B, Kenefic LJ, Fraser-Liggett CM, Wagner DM, Keim P, 2007. Global genetic population structure of Bacillus anthracis. PLoS ONE 2 :e461.

    • Search Google Scholar
    • Export Citation
  • 39

    ProMED-mail, Anthrax—Bovine, USA (Montana). ProMED-mail 2005; September 16, 2005: 20050916.2737. Accessed August 23, 2007. Available from http://www.promedmail.org.

  • 40

    ProMED-mail, Anthrax—Cattle—USA (Montana). PromMED-mail 1999; May 28, 1999: 19990528.0895. Accessed August 23, 2007. Available from http://www.promedmail.org.

  • 41

    ProMED-mail, Anthrax—Cervidae, Livestock—USA (Texas). ProMED-mail 1999, July 9, 1999. 20050709.1944. Accessed August 23, 2007. Available from http://www.promedmail.org.

  • 42

    Peterson AT, Sánchez-Cordero V, Martínez-Meyer E, Navarro-Sigüenza AG, 2006. Tracking population extirpations via melding ecological niche modeling with land-cover information. Ecol Modell 195 :229–236.

    • Search Google Scholar
    • Export Citation
 
 

 

 

 

 

 

 

Modeling the Geographic Distribution of Bacillus anthracis, the Causative Agent of Anthrax Disease, for the Contiguous United States using Predictive Ecologic Niche Modeling

View More View Less
  • 1 Spatial Epidemiology and Ecology Research Laboratory, Department of Geography, California State University, Fullerton, California; Environmental Protection Agency, Office of Research and Development/Western Ecology Division, Corvallis, Oregon; Department of Geography, College of Letters, Arts, and Sciences, University of Southern California, Los Angeles, California; Department of Environmental Studies, School of the Coast and Environment, Louisiana State University, Baton Rouge, Louisiana

The ecology and distribution of Bacillus anthracis is poorly understood despite continued anthrax outbreaks in wildlife and livestock throughout the United States. Little work is available to define the potential environments that may lead to prolonged spore survival and subsequent outbreaks. This study used the genetic algorithm for rule-set prediction modeling system to model the ecological niche for B. anthracis in the contiguous United States using wildlife and livestock outbreaks and several environmental variables. The modeled niche is defined by a narrow range of normalized difference vegetation index, precipitation, and elevation, with the geographic distribution heavily concentrated in a narrow corridor from southwest Texas northward into the Dakotas and Minnesota. Because disease control programs rely on vaccination and carcass disposal, and vaccination in wildlife remains untenable, understanding the distribution of B. anthracis plays an important role in efforts to prevent/eradicate the disease. Likewise, these results potentially aid in differentiating endemic/natural outbreaks from industrial-contamination related outbreaks or bioterrorist attacks.

INTRODUCTION

Anthrax is a zoonotic disease that remains a problem in many countries worldwide, including the United States for herbivorous livestock and wildlife species, and secondarily humans.1 Despite being a disease of antiquity, little is known about the spatial ecology or the specific geography of environmental conditions that promote long-term survival of Bacillus anthracis, the causative agent of anthrax.24 Much of the literature addresses the ubiquitous nature of B. anthracis as a soil-borne bacteria and many of these studies recognize the environmental constraints on long-term survivability of B. anthracis spores in soils (e.g., soil pH, calcium levels).1,59 Although early literature argued that B. anthracis could replicate in soil,3,8 the current literature supports that B. anthracis only replicates in the animal host and can then survive long periods of dormancy in soil.1 Despite a body of literature defining the specifics of anthrax outbreaks regionally, and a body of work on the specific soil parameters required to maintain B. anthracis in the environment, few studies have evaluated the geographic distribution of these environmental characteristics. Some early work did map the distribution of anthrax outbreaks at the county level and through simple visualization suggested potential high risk areas based on soil type, but without any associated quantitative analysis.5 Although some spatio-temporal analyses have been used to relate specific soil conditions to anthrax outbreaks and spore persistence in the Kruger National Park in Africa,1 no such detailed spatial analyses exist for North America.

Bacillus anthracis is an old world bacterial species and was most likely introduced into the United States by early European colonists8,10 through cattle trading, bone meal production, and industrial hide tanning. However, the exact timing and specific introduction pathway remains unclear.10 Anthrax was a significant problem in domestic American livestock and wildlife through the 1950s.10,11 A series of surveys showed an annual increase in the number of counties affected by anthrax between 1915 and 1944, with 157 counties affected in 21 states in 1915 that expanded to 405 counties in 37 states by 1944.10 Continued surveys reported an increase in the spatial distribution and incidence of the disease up until the mid-1950s.11 With the introduction of a mass-produced vaccine,11 disease management improved and the number of counties affected decreased after the 1950s. Anthrax however, still remains a problem in both livestock and wildlife in certain parts of the United States.12 Although vaccination is inexpensive and readily available, it is often used in reaction to an outbreak rather than as a disease preventative. This reactive approach perpetuates the likelihood of future outbreaks by not preventing the continued introduction of viable spores into disease promoting environments. Further complicating disease eradication, the current vaccine is administered through injection and therefore only useful for livestock or farmed wild species that can be safely handled. As a result, in areas such as western Texas where the disease remains a problem in white-tailed deer Odocoileus virginianus,12 vaccination is ineffective for disease control until an oral vaccine can be developed.

Because anthrax remains a problem in both livestock and wildlife, a clear understanding of the spatial ecology and corresponding geographic distribution of the B. anthracis is essential. Given that the disease remains enzootic, surveillance efforts must be targeted on areas of greatest risk of infection, and cover all ecologic components of the disease (hosts, reservoirs, potential vectors, outbreak triggering climatic events). However, disease surveillance is expensive, requires multi-agency networks, and a multi-disciplinary approach. These networks must be clearly identified and the goal of the surveillance explicit. This study aims to identify the geographic potential of B. anthracis in the United States, further the current understanding of the disease, improve our understanding of its ecology, and target areas of risk that require further research or surveillance.

Ecologic niche modeling (ENM) is one tool for evaluating the potential distribution of species. Its approaches define the niche of the target species and predict its potential geographic and ecologic distribution through the analysis of relationships between combinations of environmental variables (e.g., temperature, precipitation, and elevation derived from digital maps or satellite data) and species’ locality data. The ecologic niche follows the Hutchinsonian definition as the hyper-volume of ecologic parameters that allow a species to maintain populations without immigration.1315

We evaluated the specific geography of B. anthracis in the continental United States using ENM to clearly delineate the areas most suitable for spore survival in the 48 contiguous states. Defining the potential spatial extent of B. anthracis can be useful for generating new research hypotheses about disease persistence and for targeting surveillance efforts to areas at greatest risk of potential disease presence, with the ultimate goal of providing sound methods for improving disease control. At the same time, a great deal of biologic information can be extracted from the modeling process, providing insights into the ecologic requirements, biogeography, and evolution of an important zoonotic disease agent.

METHODS

Ecologic niche modeling.

This study was completed using the Genetic Algorithm for Rule-Set Prediction (GARP) to develop an ENM for anthrax in the United States. For this study, the DesktopGARP version 1.1.3 [DG] application was used to develop all GARP models (available from http://www.lifemapper.org/desktopgarp). GARP is a presence-only modeling technique that determines non-random associations between point localities (anthrax outbreak locations) and environmental parameters (environmental coverages),16,17 such as satellite-derived data and interpolated field measurements. It generates presence/absence predictions on the basis of a set of heterogeneous rules (rule-set) derived from a series of rule types in an iterative process. GARP uses four specific IF/ THEN rule types in model development: 1) atomic rules, where predicted locations are defined by a specific environmental variable (e.g., IF temperature = [22°C] AND precipitation = [380 mm] THEN species = present/absent); 2) range rules, where predicted locations are defined by a range of variables (e.g., temperature = [18–22°C] AND precipitation = [350–380 mm]); 3) negated range rules, where prediction locations are defined as values outside of a defined range (e.g., If range not temperature = [18–22°C] AND precipitation = [350–380 mm]); and 4) logit rules, where predicted locations are fit to a logistic regression model with the environmental variables.16 It can be considered a super-set of individual modeling approaches because range rules are similar to bioclimatic rules and other approaches use only logistic regression, and should have higher predictive accuracy than any single modeling approach.16

GARP modeling is stochastic in nature, due in part to both the genetic algorithm for building models and the random partitioning of input locality data. Because of this characteristic, GARP can generate multiple solutions across multiple model runs. To evaluate this potential inter-model variation, it is critical to develop multiple models. Optimal models are those that compromise between omission (exclusion of known locations from the model) and commission (inclusion of areas with no known cases).18 DesktopGARP uses a best subset procedure to optimize model outputs by selecting models with user defined omission and commission thresholds. GARP outputs are rasterized coverages of the study area representing presence and absence pixels that can be manipulated in a geographic information system (GIS). These individual models can be summated to identify geographic areas where none, some, or all of the models predict presence or absence.19 The greater the number of models that agree, the more certainty there is in the prediction classification.20 Likewise, similarity across models indicates strong associations between the input data set and the environmental coverages. A number of studies have confirmed the usefulness, success, and applicability of GARP to a wide range of species across terrestrial2124 and aquatic taxa.19,25,26 Additionally, a number of studies have applied the GARP modeling approach to disease systems.15,20,2730

Input data.

A GIS database of specific anthrax outbreak localities within the 48 contiguous states was developed from a variety of data sources to represent the distribution of B. anthracis for modeling. These data were assimilated during 2000–2005, with the exception of a 1957 outbreak report that could be mapped at the point level for Oklahoma and a 1968 outbreak for east central California. Point level maps were derived from coordinate pairs (latitude/longitude) collected by field personnel, address-matched using a GIS to farm front gates from diagnostic laboratory records, or heads-up digitized (capturing data by clicking the computer mouse on-screen at appropriate spatial locations) over high resolution satellite imagery using field reports and paper maps as guides to case locations. Point data representing both confirmed wildlife and livestock outbreaks were available from six states representing three regions of anthrax outbreaks for the contiguous 48 states: 1) the Dakotas Region (North Dakota, South Dakota, Minnesota), 2) the Southern Region (Oklahoma and Texas), and 3) the Western Region (Nevada, California; Figure 1). Table 1 summarizes the sample sizes and methods of data collection for each of the states used in this analysis.

A set of environmental coverages was constructed from publicly available satellite-derived climatic and biophysical parameters. Nineteen variables were downloaded from the WorldClim data set representative of various temperature and precipitation measurements (www.worldclim.org).31 Thirteen additional environmental variables, including temperature and vegetation measures (mean normalized difference vegetation index [NDVI]), were provided by the Trypanosomiasis and Land Use in Africa (TALA) research group at Oxford University (Oxford, United Kingdom).32 Two continuous soil parameters from the state soil geographic (STATSGO) data set (soil moisture, soil pH; www.ncgc.nrcs.usda.gov/products/datasets/statsgo) were used to incorporate measures of known ecologic factors that impact or promote spore survival. Both soils variables were rasterized for inclusion in the ENM. All environmental coverages were re-sampled to 0.10 degree2 (~8 × 8 km) and clipped to the boundary of the 48 contiguous states. All data sets were prepared using ERDAS Imagine version 8.7 (Leica GeoSystems, St. Gallen, Switzerland), ArcGIS 9.0, and ArcView 3.2a (Environmental Systems Research Institute, Redlands, CA).

All coverages were subjected to a culling procedure prior to inclusion in modeling, first eliminating variables that represented similar parameters from the two data sources. Combinations of coverages were then evaluated using a jackknife procedure in DG (N-1 variables are used to build models iteratively until all N-1 combinations of variables have been used).33 The jackknife procedure is useful for eliminating variables that lead to overfitting.34 To evaluate jackknife results a correlation matrix was derived from a set of models using the N-1 procedure and the measure of omission error for a 20-model set.26 Environmental variables were excluded from the final variable set if they increased omission between model outputs. A combination of jackknife evaluations and systematic model development and omission evaluation lead to the selection of the environmental coverage set. A final coverage set of six environmental variables was used that captured temperature, precipitation, elevation, soil moisture and pH, and mean NDVI (Table 2).

Model building and evaluation.

For this study, 177 spatially unique anthrax outbreak locations were available for model building (Table 1). Prior to initiating DG, a randomly selected, independent hold-out sample of ~25% (n = 47) of the original data was withheld for later calculation of accuracy metrics. The remaining ~75% of the data (n = 130) were used for model building. A training/testing partition (50% and 50%, respectively) internal to DG was used for model building. To maximize DG performance, model runs were set to a maximum of 1,000 models and the best subset procedure was used to select the 20 best models under a 10% hard omission threshold and a 50% commission threshold for a final 10-model best subset. The final 10 models were summated within the GIS to visualize the geographic areas of presence/absence predicted across the best subsets.

An area under the curve (AUC) in a receiver operating characteristic (ROC) analysis was used to evaluate the predictive performance of the 10 best model subset using measures of specificity (absence of commission error) and sensitivity (absence of omission error) following other GARP studies.25,26 The ROC analysis is a threshold independent assessment of model quality derived from a plot of sensitivity (true positive rate; y-axis) versus 1 - specificity (error or true negative rate; x-axis) constructed from the best subset to determine if models are predicting better than random.35,36 Likewise, AUCs, as used here, are based on all pixels of presence and all pixels of absence. The AUC of a given model set is compared with the AUC of a random prediction using a z-test. Successful models have AUC scores approaching 1.0 (a perfect model or a measure of reality), the higher the AUC the better the model is predicting presence/absence. Models predicting no better than random will have an AUC approaching 0.5.37 The ROC was derived from the 25% independent test data points withheld from the original GARP model building data sets.26

Two measures of omission were calculated from the 10-best model subset and the independent test data.26 First, total omission was calculated as the total number of independent test points predicted absent by the summated grid of all 10 best models. Second, an average omission was calculated as the average omission across each of the 10 best models. Omission indices are useful for evaluating the success of GARP at predicting known localities not included in model building. Two commission indices were also developed. First, total commission was calculated as the total number of pixels predicted present across all ten models divided by the total number of pixels in the study area. Second, an average commission was calculated as the average of the total number of cells predicted present divided by the total number of pixels within the study area on a model-by-model basis for each of the 10 models in the best subset. Little difference between these two measures indicates little variation in the rule-sets across the models; whereas a large difference indicates high variation across the models.

RESULTS

The modeling process reached convergence of accuracy (0.01) prior to the maximum iteration setting of 1,000 models. The AUC score from the ROC analysis was 0.7916 and was significantly different from a line of no information (P < 0.01). Average omission was 23.2% and total omission was 6.8%. The geographic prediction for B. anthracis in the United States is shown in Figure 2 as the summation (or overlap) of the 10-model best subset. Table 3 summarizes the accuracy metrics and AUC score for this analysis.

The predicted distribution of B. anthracis is primarily restricted to a narrow corridor from southwest Texas northward through western Oklahoma, central Kansas, central Nebraska, and into the Dakotas and Minnesota (Figure 2). This north/south corridor expands eastward in South Dakota into western Minnesota. From North Dakota, the distribution is predicted westward through Montana into Idaho, especially along the Snake River drainage basin merging with the Columbia River drainage into central Oregon and Washington. There are also small areas predicted in eastern Michigan and northwestern Ohio along the shorelines of Lake Huron and Lake Erie, respectively. Bacillus anthracis is also predicted to occur from western Texas across southern New Mexico into Arizona. The southwestern half of Arizona was also identified as suitable as well as western California and a narrow strip of the state east of the central valley. The distribution is patchy in Nevada, but predicted along the central California/ Nevada border and appears to be a continuation south from the Snake River drainage.

A review of the individual rule-sets (logic strings of atomic, range, negated range, and logit regression rules) developed during the modeling process indicate that narrow envelopes of mean NDVI, precipitation, and elevation define the ecological niche for B. anthracis in this analysis. Rule sets from this modeling process were dominated by range rules (rules defining presence or absence based on the occurrence of points within a specific range of values from a given variable). Table 4 summarizes the primary presence-predicting rules from a single model experiment to demonstrate the biologic information contained within the GARP model output.

DISCUSSION

This study presents the first estimate of the geographic potential for B. anthracis in the contiguous United States on the basis of the modeled ecological niche. According to the models, there is an important north–south corridor from west Texas up to the Canadian border and two east–west corridors traversing Arizona and Montana into California and Idaho/ Washington, respectively. Although the specifics of disease introduction into the United States are unclear,10 it is hypothesized that B. anthracis was imported into Mexico by the Spanish and by colonial activities eastwards into what is now Texas and Louisiana10 and then disseminated across the central and western states through the movement of livestock via cattle trails and railways. The historical cattle trails and railways for the United States have been mapped (Hugh-Jones ME, unpublished data), and this mapping has confirmed that routes of cattle movement most likely traversed the regions of high model agreement from Texas up through the Dakotas and west through Arizona and Montana into the west and Pacific Northwest, which suggests that animals were moved through areas that could support long-term spore survival. Additional genetic evidence from a global analysis of the B. anthracis genome suggests a close relationship between a North American sub-lineage and the dominant European sub-group, which supports a European introduction of the species.38

Previous studies have also indicated that industry-related outbreaks, those associated with the processing of imported, contaminated hides, hair, and wool and resulting subsequent downstream livestock outbreaks that lead to production of contaminated bone meal, were concentrated in the New England area and across states east of the Mississippi River.8 However, none of these eastern states have reported cases associated with naturally occurring spores in the past several decades. This suggests that the close of these industries lead to a decrease in industrial outbreaks. Additionally, this suggests that once these sources of spores were removed, the environments of these areas were unable to sustain long-term spore survival.8 This is supported by the ecological niche models because little area is predicted as suitable north of the Ohio or east of the Mississippi River drainages. Although this may in part be caused by a lack of locality data from the eastern states, the overall geographic extent is predicted to occur from central Texas north and west. The limited areas of predicted distribution in Michigan and Ohio are consistent with soil maps produced in the 1950s to delineate potential outbreak areas based solely on soil conditions,5 but the GARP models are more restrictive in the geographic extent of area predicted in both states. Likewise, the same soils-based map suggests that Arizona and New Mexico may contain adequate soil pH to maintain spores (Figure 1),5 but lack adequate soil moisture. The GARP models predict parts of both states with high model agreement. These differences may be caused by the additional environmental signals captured with improved environmental data, improved techniques in deriving soil measurements, and the more sophisticated analyses provided here. Although anthrax has not been reported in Arizona, there have been historical cases in New Mexico.11

The accuracy metrics (Table 3) show that these models successfully predicted 76.8% of the independent test points (average omission = 23.2%). Apart from successfully predicting the validation data, the models predict well in regions from which no explicit validation data were included. For example, the northwestern corner of Montana, which had anthrax cases on at least two ranches in 200539 (specific localities not available for modeling) was predicted present by the models. Likewise, cases reported near Billings, Montana in 199940 also occurred within the areas of greatest model agreement. Ozona and Sonora, Texas also reported severe outbreaks (estimates > 1,000 deer) in the summer of 2005,41 and both of these areas are within the areas of highest model agreement (Figure 2). As for those points in the validation data set that the models seemingly failed to correctly predict, closer examination shows that they were randomly selected from the 1957 eastern Oklahoma outbreak. This outbreak occurred in a region of eastern Oklahoma that has reported little to no anthrax since 1957. One study described the event in detail and suggested that the environmental conditions for the region of the state where cases occurred should not normally sustain B. anthracis spores.6 The study suggested that anthropogenic activities around the time of the outbreak, such as limestone mining efforts and road building, lead to a temporary surface soil environment that could support spores. Although it is difficult to confirm that the 1957 outbreak was caused by these superficial environmental conditions, or potentially undocumented industrial feed contamination common at the time, we had no expectation that the models would predict this area. This was confirmed by a lack of prediction in the eastern part of the state across the best model subset. When those independent validation points from Oklahoma were removed from the calculation of average omission, only 16.7% of the independent validation points were omitted. Likewise, total omission improved from 6.8% to 0% because all independent points are predicted by at least one of the 10 best subset models. This indicates that the GARP modeling process can be robust to the inclusion of small amounts of outlier data. This is important because it is not always apparent a priori if a given outbreak is derived from naturally occurring populations or was artificial, as may be the case in Oklahoma. The models were re-run excluding the Oklahoma localities included in the initial GARP calculation. An evaluation of the best subset showed that the overall geographic extent of the predicted distribution of anthrax did not change.

The accuracy of the predicted distribution of B. anthracis and the distribution of recent outbreaks within the predicted distribution indicates that B. anthracis has established a natural ecology in North America, despite the introduction of vaccination efforts to control the disease (and therefore potentially restrict the survival of B. anthracis). Although there are vaccination efforts in livestock and some farmed wildlife, the disease is still prevalent over a large geographic area in both groups. In general, anthrax vaccination has become a reactionary practice in the United States used in response to outbreaks once they have begun. Integrating modeling approaches such as these can improve our understanding of where vaccination should be a priority before, during and after outbreak events. Additionally, the location of outbreaks in free ranging wildlife within the predicted distribution suggests a mechanism for maintaining the disease in unvaccinated wildlife populations, even in areas where livestock herds may be vaccinated. In this way, ENM is useful for identifying areas where wildlife populations should be monitored for the disease more regularly, and possible post-outbreak control such as carcass disposal should be used (e.g. western Texas, the eastern Dakotas, and northwestern Minnesota).

Although our results are presented at the continental scale, with relatively large pixel size (8 × 8 km), GARP is scalable and not limited to broad-scale data. Investigating local scale niche parameters, perhaps on a state-by-state basis using higher resolution data (i.e., with the exception of the STATSGO soils data, all of the parameters discussed in this paper are available at ~1.2 × 1.2 km) is possible. Additionally, more specific filtering techniques may be useful for subdividing the predicted distribution into areas that may or may not currently support B. anthracis transmission cycles or animal/spore interactions. For example, future work is planned to assess the usefulness of using land cover/land use data to determine areas within the predicted distribution that do not support livestock or wildlife populations.42 However, even at lower resolutions, these models provide a geographically explicit understanding of the natural ecology of B. anthracis, and should help shape future studies and control measures for anthrax. Future modeling efforts should also incorporate data from genetic studies38 to develop individual model experiments for each strain in the United States. Such an analysis could be useful to determine if clade or strain-specific ecologic affinities can be found to improve the niche definition and enhance predictive power. One other local-level study from Africa has provided evidence for strain-specific differences in physiologic tolerances for certain soil conditions and associated differences in those strains’ geographic distributions.1 Ecological niche modeling, when coupled with genetic methods, provides a tool for determining continental-scale differences in ecologic affinities and corresponding geographies for diverse genetic lineages of B. anthracis.

These results also provide a preliminary tool that could be useful for discriminating between endemic/naturally occurring outbreaks, contaminated feed or industrial-related outbreaks, and potential bioterrorist attacks. Although continued efforts to refine and validate these models are certainly required, outbreaks occurring outside of predicted geographic regions (unsuitable areas for spore survival) may indicate the intentional release of spores or target epidemiologists to investigate sources of contamination, such as feed. When modeling and genetic analysis are combined, there is potential for identifying areas of risk for subsequent natural outbreaks and improving trace back analyses for epidemiologists or forensic scientists in the case of intentional release.

Table 1

Specific locality data used to develop genetic algorithm for rule-set prediction (GARP) models*

Outbreak locationOutbreak dateResolutionNo.GARP no.†Geocoding technique
* GPS = global positioning system.
† Sample size of spatially unique anthrax localities in each state.
‡ Minnesota Board of Animal Health.
§ North Dakota State University Veterinary Diagnostic Laboratory.
¶ South Dakota State University Agriculture Extension Office and Geographic Information System Center for Excellence.
# Oklahoma Department of Agriculture.
** Centers for Disease Control and Prevention and Louisiana State University–World Health Organization Collaborating Center field investigations.
†† U.S. Department of Agriculture Animal and Plant Inspection Service.
‡‡ (1968) Centers for Disease Control and Prevention Report; (1984) California Department of Food and Agriculture report; (1991) Dr. Frank Paterson, investigating veterinarian.
Dakotas region
    Minnesota‡2000–2005Farm location6023GPS coordinates
    North Dakota§2005Farm location8854Address matching
    South Dakota¶2005Farm location4940Address matching
Southern region
    Oklahoma#1957Pasture locations2110Heads-up digitizing
    Texas**2001–2005Carcass locations12228GPS coordinates
Western Region
    Nevada††2002Carcass locations322GPS coordinates
    California‡‡1968, 1984, 1991Farm locations4220Heads-up digitizing
Table 2

Environmental coverages used to develop the genetic algorithm for rule-set prediction model*

Environmental variableReference or Source
* STATSGO = state soil geographic; NDVI = normalized difference vegetation index.
Mean annual temperature (°C)31
Annual precipitation (mm)31
Elevation (meters above mean sea level)32
Soil moisture (lowest liquid limit as % of weight)STATGO U.S. soil database
Soil pH (lowest reaction, no units)STATGO U.S. soil database
Mean NDVI (no units)32
Table 3

Model sample sizes and accuracy metrics for genetic algorithm for rule-set prediction model development and validation*

MetricModel specifications
* AUC = area under curve.
N was divided into 50% training/50% testing at each model iteration.
‡ z = 10.503 (P < 0.01).
§ SE = 0.0394.
N to build models130†
N to test models (independent)47
Total omission6.8%
Average omission23.2%
Total commission66.5%
Average commission41.6%
AUC0.7916‡§
Table 4

Primary presence rules (logic strings) from the output of a single model from best model subset*

* NDVI = normalized difference vegetation index.
Task 9
    1 range rule
        If precipitation = (219.69, 1,078.77) and elevation = (221.29, 682.60) and soil pH 1 = (0.00, 7.90) and NDVI = (0.24, 0.38)
        Then sp = presence
    4 logit rule
        If temperature × 0.0039 - precipitation × 0.0156 - elevation × 0.0078 - soil pH 1 × 0.0000 + NDVI × 0.0117
        Then sp = presence
    6 range rule
        If precipitation = (117.11, 2,784.11) and elevation = (778.71, 3,777.24) and soil moisture = (0.00, 0.90) and NDVI = (0.01, 0.65)
        Then sp = absence
    12 negated range rule
        If not temperature = (−2.90, 23.20) and elevation = (144.40, 2,277.97) and NDVI = (−0.99, 0.70)
        Then sp = absence
    13 negated range rule
        If not temperature = (0.32, 22.20) and precipitation = (53.00, 2,271.23) and elevation = (−9.37, 2,969.94) and soil moisture = (0.00, 0.90) and NDVI = (0.12, 0.55)
        Then sp = absence
    14 range rule
        If temperature = (1.21, 21.42) and precipitation = (1,091.59, 2,335.34) and elevation = (−566.79, 3,815.68) and soil moisture = (0.00, 0.90) and soil pH 1 = (0.00, 9.04) and NDVI = (−0.92, 0.69)
        Then sp = absence
    16 range rule
        If temperature = (2.32, 22.09) and elevation = (221.29, 1,432.23) and soil pH 1 = (0.00, 7.90) and NDVI = (0.27, 0.46)
        Then sp = presence
Figure 1.
Figure 1.

Distribution of the 177 spatially unique anthrax localities used for genetic algorithm for rule-set prediction model building. Print: Black dots indicate the data used for model building (n = 130) and gray dots indicate the independent hold-out sample used for model validation (n = 47). Online: Yellow dots indicate the data used for model building (n = 130) and green dots indicate the independent hold-out sample used for model validation (n = 47). This figure appears in color at www.ajtmh.org.

Citation: The American Journal of Tropical Medicine and Hygiene Am J Trop Med Hyg 77, 6; 10.4269/ajtmh.2007.77.1103

Figure 2.
Figure 2.

Predicted distribution of Bacillus anthracis in the 48 contiguous United States based on the genetic algorithm for rule-set prediction. Color ramp from light to dark represents increasing number of models that agree from the best subset. Stars indicate areas where recent laboratory-confirmed outbreaks have occurred, but lack enough spatial resolution to be included in model building or accuracy metrics. This figure appears in color at www.ajtmh.org.

Citation: The American Journal of Tropical Medicine and Hygiene Am J Trop Med Hyg 77, 6; 10.4269/ajtmh.2007.77.1103

*

Address correspondence to Jason K. Blackburn, Spatial Epidemiology and Ecology Research Laboratory, Department of Geography, California State University, Fullerton, 800 St. North College Drive, Fullerton, CA 92834. E-mail: jablackburn@fullerton.edu

Authors’ addresses: Jason K. Blackburn, Spatial Epidemiology and Ecology Research Laboratory, Department of Geography, California State University, Fullerton, 800 North St. College Drive, Fullerton, CA 92834. Kristina M. McNyset, Environmental Protection Agency, Office of Research and Development/Western Ecology Division, Corvallis, OR 97333. Andrew Curtis, Department of Geography, College of Letters, Arts, and Sciences, University of Southern California, Kaprielian Hall, Room 416, Los Angeles, CA 90089. Martin E. Hugh-Jones, Department of Environmental Studies, School of the Coast and Environment, Louisiana State University, Baton Rouge, LA 70803.

Acknowledgments: We thank R. Scachetti-Pereira for developing the DesktopGARP implementation of the GARP algorithm; N. Dyer, R. Daly, M. Wimberly, and L. Glaser for providing outbreak data; D. Wiklund for providing data from the Dakotas and Minnesota; and D. Rogers and S. Hay for developing and providing the TALA environmental data set. We also thank the editor of the American Journal of Tropical Medicine and Hygiene and two anonymous reviewers for strengthening this manuscript. The Texas data set was a combination of data from Louisiana State University field investigations from 2000–2005 and the Centers for Disease Control and Prevention outbreak response in 2001.

Financial support: Texas field data was supported by the RV Ranch and staff. Other support was provided by Louisiana State University.

REFERENCES

  • 1

    Smith KL, De Vos V, Price LB, Hugh-Jones ME, Keim P, 2000. Bacillus anthracis diversity in Kruger National Park. J Clin Microbiol 38 :3780–3784.

    • Search Google Scholar
    • Export Citation
  • 2

    Gainer RS, Saunders R, 1989. Aspects of the epidemiology of anthrax in Wood Buffalo National Park and environs. Can Vet J 30 :953–956.

  • 3

    Kaufmann AF, 1990. Observations on the occurrence of anthrax as related to soil type and rainfall. Salisbury Med Bull Suppl 68 :16–17.

  • 4

    Smith KL, De Vos V, Bryden HB, Hugh-Jones ME, Klevytska A, Price LB, Keim P, Scholl DT, 1999. Meso-scale ecology of anthrax in southern Africa: a pilot study of diversity and clustering. J Appl Microbiol 87 :204–207.

    • Search Google Scholar
    • Export Citation
  • 5

    Van Ness G, Stein CD, 1956. Soils of the United States favorable for anthrax. J Am Vet Med Assoc 128 :7–9.

  • 6

    Van Ness GB, 1959. Anthrax—a soil borne disease. Soil Conserv 21 :206–208.

  • 7

    Van Ness GB, 1959. Soil relationship in the Oklahoma-Kansas anthrax outbreak of 1957. J Soil Water Conserv 1 :70–71.

  • 8

    Van Ness GB, 1971. Ecology of anthrax. Science 172 :1303–1307.

  • 9

    Dragon DC, Rennie RP, 1995. The ecology of anthrax spores: tough but not invincible. Can Vet J 36 :295–301.

  • 10

    Stein CD, 1945. The history and distribution of anthrax in livestock in the United States. Vet Med (Praha) 40 :340–349.

  • 11

    Stein CD, van Ness GB, 1955. A ten year survey of anthrax in livestock with special reference to outbreaks in 1954. Vet Med (Praha) 50 :579–590.

    • Search Google Scholar
    • Export Citation
  • 12

    Hugh-Jones ME, de Vos V, 2002. Anthrax and wildlife. Rev Sci Tech 21 :359–383.

  • 13

    Grinnell J, 1917. The niche-relationships of the California thrasher. Auk 34 :427–433.

  • 14

    Hutchinson GE, 1957. Concluding remarks. Cold Spring Harb Symp Quant Biol 22 :415–427.

  • 15

    Peterson AT, Bauer JT, Mills JN, 2004. Ecologic and geographic distribution of filovirus disease. Emerg Infect Dis 10 :40–47.

  • 16

    Stockwell D, Peters D, 1999. The GARP modelling system: problems and solutions to automated spatial prediction. Int J Geogr Inf Sci 13 :143–158.

    • Search Google Scholar
    • Export Citation
  • 17

    Stockwell DRB, Peterson AT, 2002. Effects of sample size on accuracy of species distribution models. Ecol Modell 148 :1–13.

  • 18

    Anderson RP, Lew D, Peterson AT, 2003. Evaluating predictive models of species’ distributions: criteria for selecting optimal models. Ecol Modell 162 :211–232.

    • Search Google Scholar
    • Export Citation
  • 19

    Kluza DA, McNyset KM, 2005. Ecological niche modeling of aquatic invasion species. Aquat Invaders 16 :1–7.

  • 20

    Ron RS, 2005. Predicting the distribution of the amphibian pathogen Batrachochytrium dendrobatidis in the New World. Biotropica 37 :209–221.

    • Search Google Scholar
    • Export Citation
  • 21

    Peterson AT, 2001. Predicting species’ geographic distributions based on ecological niche modeling. Condor 103 :599–605.

  • 22

    Peterson AT, Vieglais DA, 2001. Predicting species invasions using ecological niche modeling: new approaches from bioinformatics attack a pressing problem. Bioscience 51 :363–371.

    • Search Google Scholar
    • Export Citation
  • 23

    Anderson RP, Gomez-Laverde M, Peterson AT, 2002. Geographical distributions of spiny pocket mice in South America: insights from predictive models. Glob Ecol 11 :131–141.

    • Search Google Scholar
    • Export Citation
  • 24

    Raxworthy CJ, Martinez-Meyer E, Horning N, Nussbaum RA, Schneider GE, Ortega-Huerta MA, Townsend Peterson A, 2003. Predicting distributions of known and unknown reptile species in Madagascar. Nature 426 :837–841.

    • Search Google Scholar
    • Export Citation
  • 25

    Wiley EO, McNyset KM, Peterson AT, Robins CR, Stewart AM, 2003. Niche modeling and geographic range predictions in the marine environment using a machine-learning algorithm. Oceanography 16 :120–127.

    • Search Google Scholar
    • Export Citation
  • 26

    McNyset KM, 2005. Use of ecological niche modelling to predict distributions of freshwater fish species in Kansas. Ecol Freshwater Fish 14 :243–255.

    • Search Google Scholar
    • Export Citation
  • 27

    Peterson AT, Sanchez-Cordero V, Beard CB, Ramsey JM, 2002. Ecologic niche modeling and potential reservoirs for Chagas disease, Mexico. Emerg Infect Dis 8 :662–667.

    • Search Google Scholar
    • Export Citation
  • 28

    Costa J, Peterson AT, Beard CB, 2002. Ecologic niche modeling and differentiation of populations of Triatoma brasiliensis neiva, 1911, the most important Chagas’ disease vector in northeastern Brazil (hemiptera, reduviidae, triatominae). Am J Trop Med Hyg 67 :516–520.

    • Search Google Scholar
    • Export Citation
  • 29

    Beard CB, Pye G, Steurer FJ, Rodriquez R, Campman R, Townsend Peterson A, Ramsey J, Wirtz RA, Robinson LE, 2003. Chagas disease in a domestic transmission cycle in southern Texas, USA. Emerg Infect Dis 9 :103–105.

    • Search Google Scholar
    • Export Citation
  • 30

    Adjemian JCZ, Girvetz EH, Beckett L, Foley JE, 2006. Analysis of genetic algorithm for rule-set prodution (GARP) modeling approach for predicting distributions of fleas implicated as vectors of plague, Yersinia pestis, in California. J Med Entomol 43 :93–103.

    • Search Google Scholar
    • Export Citation
  • 31

    Hijmans RJ, Cameron SE, Parra JL, Jones PG, Jarvis A, 2005. Very high resolution interpolated climate surfaces for global land areas. Int J Climatol 25 :1965–1978.

    • Search Google Scholar
    • Export Citation
  • 32

    Hay SI, Tatem AJ, Graham AJ, Goetz SJ, Rogers DJ, 2006. Global environmental data for mapping infectious disease distribution. Hay S, Graham AJ, Rogers DJ, eds. Global Mapping of Infectious Diseases: Methods, Examples, and Emerging Application. London: Academic Press, 38–79.

  • 33

    Peterson AT, Papes M, Kluza DA, 2003. Predicting the potential invasive distributions of four alien plant species in North America. Weed Sci 51 :863–868.

    • Search Google Scholar
    • Export Citation
  • 34

    Peterson AT, Cohoon KP, 1999. Sensitivity of distributional prediction algorithms to geographic data completeness. Ecol Modell 117 :159–164.

    • Search Google Scholar
    • Export Citation
  • 35

    Centor RM, 1991. Signal detectability: the use of ROC curves and their analyses. Med Decis Making 11 :102–106.

  • 36

    Zweig MH, Campbell G, 1993. Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clin Chem 39 :561–577.

    • Search Google Scholar
    • Export Citation
  • 37

    Hanley JA, McNeil BJ, 1982. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143 :29–36.

    • Search Google Scholar
    • Export Citation
  • 38

    Van Ert MN, Easterday WR, Huynh LY, Okinaka RT, Hugh-Jones ME, Ravel J, Zanecki SR, Pearson T, Simonson T, Uren JM, Kachur SM, Leadem-Dougherty RR, Rhoton SD, Zinser G, Farlow J, Coker PR, Smith KL, Wang B, Kenefic LJ, Fraser-Liggett CM, Wagner DM, Keim P, 2007. Global genetic population structure of Bacillus anthracis. PLoS ONE 2 :e461.

    • Search Google Scholar
    • Export Citation
  • 39

    ProMED-mail, Anthrax—Bovine, USA (Montana). ProMED-mail 2005; September 16, 2005: 20050916.2737. Accessed August 23, 2007. Available from http://www.promedmail.org.

  • 40

    ProMED-mail, Anthrax—Cattle—USA (Montana). PromMED-mail 1999; May 28, 1999: 19990528.0895. Accessed August 23, 2007. Available from http://www.promedmail.org.

  • 41

    ProMED-mail, Anthrax—Cervidae, Livestock—USA (Texas). ProMED-mail 1999, July 9, 1999. 20050709.1944. Accessed August 23, 2007. Available from http://www.promedmail.org.

  • 42

    Peterson AT, Sánchez-Cordero V, Martínez-Meyer E, Navarro-Sigüenza AG, 2006. Tracking population extirpations via melding ecological niche modeling with land-cover information. Ecol Modell 195 :229–236.

    • Search Google Scholar
    • Export Citation
Save