There has been heightened interest on the effect of large-scale environmental change, such as deforestation, on human health.1 For example, Gibb et al.2 have suggested that land use conversion is creating increasingly hazardous interfaces between people and wildlife reservoirs of zoonotic diseases. Similarly, Dobson et al.3 have discussed how efforts to reduce deforestation could substantially reduce the risk of zoonosis outbreaks. Aside from virus spillovers from human–wildlife contact, there is also growing (and often contentious) literature on the relationship between deforestation and endemic diseases. This topic has even been incorporated into global health policy discussions. For example, the Pan American Health Organization lists deforestation as an important contributor to malaria and leishmaniasis.4,5
Despite the importance of this topic, it is challenging to study how environmental exposure, such as deforestation, influences disease risk. For example, environmental exposure often does not vary enough within populations to enable their effects to be identified, and confounders often affect entire communities.6 For these reasons, environmental epidemiology studies often rely on observational data collected across large spatial scales and/or over long time periods. An important and relatively recent development has been the use of causal-inference approaches (e.g., instrumental variables [IVs]) to more properly analyze these types of data.6 For example, a recent high-profile study evaluated how malaria is causally related to deforestation.7 In this study, the authors attempted to disentangle the effect of malaria on deforestation from the effect of deforestation on malaria by applying an IV approach to municipal-level data on malaria incidence from the Brazilian Amazon. This approach purportedly enabled the authors to make causal claims about the relationship between deforestation and malaria by controlling for unobserved determinants of the outcome and removing bidirectional feedbacks. In particular, they found that deforestation increases malaria (e.g., they estimate that a 10% increase in deforestation leads to a 3.3% increase in malaria incidence) through ecological mechanisms, whereas malaria reduces deforestation through socioeconomic mechanisms. These results have important implications for land use policy and public health interventions by highlighting win–win solutions for conservation and health.
An important characteristic of causal-inference approaches that is not necessarily evident is that they are critically dependent on the plausibility of the underlying assumptions and that, differently from standard statistical models, these assumptions are not testable in observational studies.8 For this reason, careful consideration of the assumptions that substantiate causality claims is critical.9 In the case of the IV approach, the key assumptions for it to be valid are that this variable “1) is independent of the unmeasured confounding, 2) affects the treatment, and 3) affects the outcome only indirectly through its effect on the treatment.”10 These assumptions can be clearly illustrated through causal graphs. For example, in the graph shown in Figure 1A, assumption 1 entails no arrow between the IV and the omitted variable, assumption 2 implies an arrow from IVs to treatment, and assumption 3 implies that the only path from IVs to the outcome of interest is through the treatment.
To understand how the IV approach works without delving into the underlying equations, it is useful to consider a classic example described in Ref. 9 For example, researchers might be interested in understanding the causal effect of schooling on wages. The problem with just performing a regression between these variables is that there might be additional factors (e.g., motivation) that influence both wages and years of education. If these additional factors are ignored (i.e., are omitted variables), then our estimate of the causal effect of schooling on wages based on a straightforward regression model will be biased. Although this might seem an unsurmountable problem as omitted variables are always likely to exist, it is possible to use an IV approach to carefully estimate the effect of schooling on wages. In this example, a good IV would be birth date because this variable causes variation in years of education (assumption 2, arrow from birth date to schooling in Figure 1B), is only likely to affect wages through schooling (assumption 3; the only path from birth date to wages is through schooling in Figure 1B), and is likely to be unrelated to other omitted variables (e.g., motivation) that influence wages (assumption 1; no arrow between birth date and omitted variables in Figure 1B). The reason birth date causes variation in schooling is because many states in the United States require students to enter school in August if they turn six before a particular cutoff date. For instance, in states with a December 31 birthday cutoff, children born in the fourth quarter will enter school at the age of 5 years 7 months, whereas those born in the first quarter enter school at the age of 6 years 7 months. If compulsory schooling laws require students to remain in school until their 16th birthday, students born in different quarters of the year will have experienced different numbers of years of schooling on average.
To quantify how deforestation impacts malaria, the authors of the high-profile article7 mentioned previously relied on aerosol pollution as an instrument for deforestation (Figure 2A). There is a strong association between deforestation and aerosol pollution because fires are often used to clear the land in the Brazilian Amazon region. However, it is deforestation that causes aerosol pollution (Figure 2B) rather than aerosol pollution that causes deforestation, as implicitly assumed by the authors. As a result, assumption (2) is clearly violated. The authors also relied on optimal temperature as an instrument variable for malaria (Figure 2C). However, there is substantial literature supporting the nonlinear role of temperature on agricultural gross domestic product and overall economic production.11 As a result, it is possible that temperature affects deforestation not only through malaria but also through other causal paths (dashed arrow in Figure 2D), violating assumption (3). Using simulations and analytical derivations (available in the Supplemental Appendix), it is possible to show that, because of these assumption violations, the estimated parameter of interest using the IV approach can be substantially biased, potentially even being of the opposite sign as the true parameter of interest. What this means is that, because of the bias associated with invalid assumptions, it is possible that the correct conclusion could have been the opposite of that reported by the authors (e.g., deforestation decreases, rather than increases, malaria through ecological mechanisms).
Although synergisms between forest conservation and public health initiatives have been strongly emphasized in the literature,3 important trade-offs may also exist.12 The use of causal-inference approaches is likely going to be critical for a better characterization of the relationship between large-scale environmental changes (e.g., deforestation) and disease risk, but conclusions based on these methods might be as (or even more) unreliable as those from traditional methods if careful attention is not given to the plausibility of the underlying assumptions. We urge researchers in this field to embrace causal-inference approaches while at the same time being very careful with the untestable assumptions that inherently come with these approaches.
Gibb R, Redding DW, Chin KQ, Donnelly CA, Blackburn TM, Newbold T, Jones KE, 2020. Zoonotic host diversity increases in human-dominated ecosystems. Nature 584: 392–402.
PAHO, 2012. Chapter 3: the environment and human security. Health in the Americas, 2012. Washington, DC: Pan American Health Organization.
PAHO, 2017. Suriname. Health in the Americas, 2017. Summary: Regional Outlook and Country Profiles. Washington, DC: Pan American Health Organization.
Pearce N, Vandenbroucke JP, Lawlor DA, 2019. Causal inference in environmental epidemiology: old and new approaches. Epidemiology 30: 311–316.
MacDonald AJ, Mordecai EA, 2019. Amazon deforestation drives malaria transmission, and malaria burden reduces forest clearing. Proc Natl Acad Sci U S A 116: 22212–22218.
Angrist JD, Krueger AB, 2001. Instrumental variables and the search for identification: from supply and demand to natural experiments. J Econ Perspect 15: 69–85.