Air Quality and COVID-19 Mortality in California

MEDS
Examining the relationship between PM 2.5, COVID-19 mortality rates, and poverty rates across 52 counties in California
Author
Affiliation
Published

December 5, 2022

Introduction

Air pollution is now considered the largest environmental health threat. Of all of the common pollutants, PM2.5 has an especially severe impact on human health due to it’s ability to affect both the respiratory and vascular systems(). PM2.5 is mainly produced by the combustion of fuels and wood. Exposure to PM2.5 can exacerbate heart and lung diseases, which are both known risk factors for negative outcomes from COVID-19().

A 2020 study on county-level COVID-19 mortality across the United States found a statistically significant relationship between air pollution exposure and COVID-19 death rates. This study provided strong evidence for area-level determinants of negative health outcomes. While this cannot inform individual outcomes, this research can be used to inform policy decisions in the United States.()

As wildfires become more frequent in the west, Californians are exposed to much higher levels of pollutants, specifically PM2.5. This places Californians at a now greater risk for negative health outcomes from respiratory illness. In this project, I will study the effects of PM2.5 exposure on COVID-19 mortality across 52 counties in California in 2020. This will help us understand what areas are at the greatest risk for negative health outcomes, therefore informing resource allocations and preparation plans for future pandemics.

Data

COVID-19 Data

I accessed COVID-19 data from the California Open Data Portal, specifically the COVID-19 Time-Series Metrics by County and State. This source provides daily county-level counts on cases, deaths, and tests from February 1, 2020 to present. The data comes from the California COVID-19 Dashboard, which provides the most comprehensive data on reported COVID-19 cases in California. That being said, many COVID-19 cases went unreported (asymptomatic cases, cases without a positive test) and thus would not be included in this data set.

Air Quality Data

I accessed air quality data from the US EPA Outdoor Air Quality Data portal. This source provides air quality data from Air Quality System (AQS) and AirNow (when AQS data is not available). AQS tracks data from thousands of monitors across the United States. It provides Daily Mean PM2.5 Concentrations and Daily AQI Values from each monitor. Within California, there are 151 unique air monitoring stations across 52 counties in California- not every county has an air quality monitoring station.

Poverty Data

I accessed poverty data from the United States Census Bureau on Poverty Status in the Past 12 Months in 2020. This data is the most comprehensive data available on poverty rates at the county level in California. The data set provides 5-year estimates on poverty status by age, race, education, and employment.

Analysis

I will start by fitting a linear regression model for COVID-19 mortality with a single predictor indicating the county’s 6-year mean PM2.5 concentrations to examine the relationship between PM2.5 and COVID-19 mortality on a county-level scale.

Deathsi=β0+β1PM2.5i+εi

For the values for COVID-19 mortality, I’m choosing to study cumulative deaths in 2020 as a percent of cumulative cases in 2020 in each county: (Cumulative Deaths / Cumulative Cases) * 100%. This allows us to compare COVID-19 mortality rates in counties with varying populations. Since this data is from 2020, I assume that members of the population being studied are not vaccinated against COVID-19. I will remove two outliers: Inyo County and Alpine County. Inyo County had extremely high COVID mortality (~3%), and Alpine County had extremely low COVID mortality (0%).

For the PM2.5 values, I will be using the average of the daily PM2.5 values between February 2020 and December 2020. The county PM2.5 values are the averages of all stations in that county.

This linear regression model with a single predictor does not account for the numerous other influences on COVID-19 mortality and air quality, so we cannot draw strong conclusions from this basic model.

One obvious confounding variable in this analysis is poverty. To account for this, I will also perform a multiple linear regression with interactions.

Deathsi=β0+β1Povertyi+β2PM2.5i+β3PovertyiPM2.5i+εi

Since poverty is likely to impact both COVID-19 mortality and one’s exposure to poor air quality, I will include poverty in this model. The poverty values are county-scale poverty rates (%).

Results

Models

Simple Linear Regression Model Deathsi=β0+β1PM2.5i+εi

Interaction Model Deathsi=β0+β1Povertyi+β2PM2.5i+β3PovertyiPM2.5i+εi

Plots

Figure 1: Plot of COVID-19 deaths as a percent of cases (cumulative) and long term average daily PM 2.5 values from 2015-2020 in California. Inyo and Alpine County (outliers) were removed. Points are labeled by county. The coral line represents the linear regression model described above. We can see a positive correlation between average daily PM 2.5 and cumulative deaths as a percent of cumulative cases, but there are high residuals.

Figure 2: Plot of COVID-19 deaths as a percent of cases (cumulative) and long term average daily PM 2.5 values from 2015-2020 in California. Color indicates the poverty level in each county- red points are counties with high poverty levels, blue points are counties with low poverty levels, and purple points are counties with intermediate poverty levels. Inyo and Alpine County (outliers) were removed. The blue line represents the simple linear regression model described in the section above.

Residuals

Simple Linear Regression Model

Table: Residuals (From Simple Linear Regression Model)
Min Q1 Median Q3 Max
-0.91711 -0.20024 -0.02783 0.20909 1.07737

The median value suggests that the model generally predicts slightly lower values for COVID-19 deaths as a percent of cases than observed (-0.02783%). The relatively large maximum and minimum values indicate that some observed values are more than a full percent off from the model (1.07737%).

Interaction Model

Table: Residuals (From Interaction Model)
Min Q1 Median Q3 Max
-0.88871 -0.24042 -0.02951 0.15008 1.13647

The median value suggests that the model generally predicts slightly lower values for COVID-19 deaths as a percent of cases than observed (-0.02951%). The relatively large maximum and minimum values indicate that some observed values are a full percent off from the model (1.13647%).

Coefficients

Simple Linear Regression lm() Output

The intercept 0.87 states that, according to this model, if PM2.5 was 0, we would expect cumulative deaths in 2020 to be 0.87% of cumulative cases in 2020 on average.

According to this model, for every 1 microgram / m3 increase in PM2.5, we would expect cumulative deaths as a percent of cumulative cases in 2020 to increase by 0.03%. This value is relatively low, indicating that changes in PM2.5 have a small effect on COVID-19 mortality on the county scale.

Interaction Model lm() Output

The intercept 0.87 states that, according to this model, if the PM2.5 concentration and the poverty rate were both 0, we would expect cumulative deaths in 2020 to be 0.89% of cumulative cases in 2020 on average.

According to this model, for every 1 microgram / m3 increase in PM2.5, when the poverty rate is 0%, we would expect cumulative deaths as a percent of cumulative cases in 2020 to neither increase nor decrease. This value indicates that changes in PM2.5 have little effect on COVID-19 mortality on the county scale.

Additionally, for every 1% increase in the poverty rate when PM2.5 is 0 micrograms / m3, we expect cumulative deaths as a percent of cumulative cases in 2020 to increase by 0.01%. This indicates that increases in poverty rates slightly increase COVID-19 mortality rates on the county scale.

The effect of PM2.5 on COVID-19 mortality increases by 0.001% for every percent increase in poverty rates (not shown in table due to rounding). This indicates that on this scale, PM2.5 and poverty rates have minimal interaction.

The p-value

The p-value is an indication of the likelihood that the coefficient estimates from our model differ from 0 purely by chance. A low p-value would indicate that there is a low chance that we would see our estimated coefficient value if the true value of that coefficient was 0. If the p-value is below 0.05, we can reject the null hypothesis: air quality (and poverty for our second model) have no effect on COVID-19 mortality.

Simple Linear Regression Model

The p-value for our simple linear regression model was 0.2189. Since this is greater than 0.05, we fail to reject our null hypothesis. Using this model, there is not sufficient evidence to conclude that there is a relationship between particulate matter and COVID-19 mortality.

Interaction Model

The p-value for our interaction model was 0.333, Since this is greater than 0.05, we fail to reject our null hypothesis. Using this model, there is not sufficient evidence to conclude that there is a relationship between particulate matter, poverty, and COVID-19 mortality.

R-Squared

Simple Linear Regression Model

The Multiple R-Squared value for this model was 0.03132. This indicates that this model does not fit the data very well.

Interaction Model

The Adjusted R-Squared value for this model was 0.01006. This indicates that this model does not fit the data very well.

Confidence Interval

When the sampling distribution of a point estimate can be modeled as normal, the point estimate we observe will be within 1.96 standard errors of the true value of interest about 95% of the time.

Simple Linear Regression Model - 95% confidence intervals for all point estimates

Intercept Average PM 2.5 (B1)
0.4, 1.33 -0.02, 0.08

We can be 95% confident that the interval between 0.4% and 1.33% captures the true value of the intercept— the value of COVID deaths as a percent of cases when the concentration of PM 2.5 is 0 micrograms / cubic meter.

We can also be 95% confident that the interval between -0.02% and 0.08% captures the true value of B1 — the percent change in COVID deaths as a percent of cases that a 1 unit increase in PM 2.5 will cause.

Interaction Model - 95% confidence intervals for all point estimates

Intercept Average PM2.5 (B1) Poverty Level (B2) Average PM : Poverty Level (B3)
-0.8, 2.58 -0.18, 0.18 -0.11, 0.12 -0.01, 0.01

Again, for the interaction model, we can be 95% confident that the intervals in the table above contain the true values for each of our point estimates.

B1, B2, and B3 indicate that our model is quite unsure of the effects that each variable will have on COVID mortality.

Further Research

There was a very similar study done by researchers at the Harvard T.H. Chan School of Public Health() which examined the relationship between PM 2.5 and COVID-19 mortality rates across counties in the United States. They found that an increase of PM 2.5 by 1 microgram per cubic meter was associated with an 11% increase in COVID-19 mortality in that county. This research involved accounting for many different confounding variables that could affect COVID-19 mortality rates.

To more rigorously analyze PM 2.5’s relationship with COVID-19 mortality in California, I would account for additional variables such as other county level health metrics, number of hospital beds, and other environmental factors that could influence air quality. I would also consider studying city- or ZIP code- level data to achieve a higher spatial resolution.

While area-level data can not be used to determine outcomes for individual patients, a more rigorous analysis would be able to inform policy decisions in local governments and resource allocations.

Affiliated GitHub Repository: https://github.com/jilliannallison/final_project

References

“Air Pollution and Health.” n.d. https://unece.org/air-pollution-and-health.
Kinney, Patrick. 2008. “Climate Change, Air Quality, and Human Health.” American Journal of Preventative Medicine 35 (5): 459–67. https://doi.org/https://doi.org/10.1016/j.amepre.2008.08.025.
Wu, X. 2020a. “Air Pollution and COVID-19 Mortality in the United States: Strengths and Limitations of an Ecological Regression Analysis.” Science Advances 6 (45). https://doi.org/https://doi.org/10.1126/sciadv.abd4049.
———. 2020b. “Air Pollution and COVID-19 Mortality in the United States: Strengths and Limitations of an Ecological Regression Analysis.” Science Advances 6 (45). https://doi.org/https://doi.org/10.1126/sciadv.abd4049.

Citation

BibTeX citation:
@online{allison2022,
  author = {Jillian Allison},
  title = {Air {Quality} and {COVID-19} {Mortality} in {California}},
  date = {2022-12-05},
  url = {https://jilliannallison.github.io/posts/2022-12-05-covid-air-quality},
  langid = {en}
}
For attribution, please cite this work as:
Jillian Allison. 2022. “Air Quality and COVID-19 Mortality in California.” December 5, 2022. https://jilliannallison.github.io/posts/2022-12-05-covid-air-quality.