Predicting the Mortality Rate Due to Opioid Addiction At the County Level

Daniel Lee

September 19, 2018

Click here for Blog

Annual Cause of Death by Opioid Overdose

alt text

Number of Deaths by Demographic Factors 1999 - 2016

Living with an Opioid Addiction

Reading Between the Numbers - Why We Should Care

  • Not represented in the death numbers are those who are struggling with opioid addiction
  • Addiction to opioids, like any other addictions, leads to
  • Health Problems
  • Financial Problems
  • Relational Problems

Opioids

  • Diverse class of moderately strong painkillers
  • Oxycodone
  • Hydrocodone
  • Fentanyl

Problem

  • Can we predict how bad the problem will be in a county?
  • Useful to help allocate resources
  • Learn more about causes of opioid overdose
  • Make biggest impact

Problem

  • Predict the county's mortality rate caused by drug overdose for 2016 given various independent variables
  • County's median household income
  • Age and race demographic information
  • Unemployment
  • Poverty rate estimates
  • Educational attainment
  • Opioid prescription rate

USA by Opioid Overdose Mortality Rate, 2016

Top Ten Counties with Highest Mortality Rate Caused by Opioid Overdose in 2016

County with Highest Mortality Rate Due to Opioid Addiction

  • Highest county: Harrison, KY
  • 118 deaths per 100,000 people due to opioid overdose

Description of Data

Crude Opioid Mortality Rate

  • Estimated rate for deaths caused by opioid overdose in the county for 2016 (per 100,000 people)
  • Specifically, the types of drug-related deaths include the following:
  • Drug poisonings (overdose)
  • Unintentional
  • Suicide
  • Homicide
  • Undetermined

Crude Opioid Mortality Rate

  • The specific drugs included in the death rates are the following:
  • Opium
  • Heroin
  • Opther opioids
  • Methadone
  • Other synthetic narcotics
  • Other unspecified narcotics

Distribution of County Level Opioid Overdose Mortality Rate

Distribution of County Level Opioid Overdose Mortality Rate

Distribution of County Level Opioid Overdose Mortality Rate

Distribution of County Level Opioid Overdose Mortality Rate

Distribution of County Level Opioid Overdose Mortality Rate

Distribution of County Level Opioid Overdose Mortality Rate (Mortality Rate Greater Than Zero)

Violin Plots of All Variables Except Crude Mortality Rate

Violin Plots of All Variables Except Crude Mortality Rate

Violin Plots of All Variables Except Crude Mortality Rate

Correlation Heat Map of All the Variables

Pearson's Correlation Between Crude Mortality Rate and All Other Variables

Conclusion From Exploratory Data Analysis

  • Counties with higher opioid addiction mortality rates have
  • Higher GQ estimates
  • Higher population percentage of 45 - 49 year olds
  • Higher population estimates
  • Higher median household income

Conclusion From Exploratory Data Analysis

  • Further research
  • Why do counties that have higher population living in group quarters have higher opioid overdose mortality rates?
  • Possibly consider looking into specific group quarters estimates by county
  • Why do counties that have higher population percentage of 45 - 49 year olds have higher opioid overdose mortality rates?

In Depth Analysis

Procedures

  • For a given regression prediction algorithm, do the following:
  • Split data into 80% train set and 20% test set
  • Use default hyperparameter values to fit data (no cross-validation used)
  • Calculate RMSE
  • Generate Bootstrap Sampling Distribution of RMSE (10000 bootstrap samples)
  • Tune hyperparameter values using RandomizedSearchCV (1000 iterations) and 5-fold cross-validation
  • Select model with lowest RMSE
  • Generate Bootstrap Sampling Distribution of RMSE (10000 bootstrap samples)

Bootstrap RMSE Distributions for All Methods

Bootstrap RMSE Distributions for All Methods

Boxplot of Bootstrap RMSE Distributions for All Methods

Observations

  • Tree-based methods tend to have lower RMSE than linear methods
  • Tuning hyperparameters lowers RMSE

Predicted vs Actual Opioid Mortality Rate

Predicted vs Actual Opioid Mortality Rate

Feature Importances

Let's take a closer look at the distribution of the population sizes for the counties

In [168]:
population_distribution
Out[168]:
Population Estimate of All Counties Population Estimate of Counties with Zero Mortality Population Estimate of Counties with Greater than Zero Mortality
count 2,962 2,261 701
mean 108,844 30,427 361,770
std 339,283 37,773 631,186
min 1,183 1,183 18,646
25% 12,949 10,242 91,251
50% 28,180 19,920 174,827
75% 74,528 37,304 390,918
max 10,137,915 849,843 10,137,915

Observations from Distribution of Population from Counties

  • Minimum population size for counties with greater than zero mortality rate is 18646
  • 1078 counties have zero mortality rate and have population less than 18646
  • 1183 counties have zero mortality rate and have population greater than or equal to 18646
  • 701 counties have greater than zero mortality rate and have population greater than or equal to 18646

Observations from Distribution of Population from Counties

  • Remove 1078 counties that have zero mortality rate and have population less than 18646
  • Redo prediction analysis with the remaining 1884 counties

Bootstrap RMSE Distributions for All Methods (Population >= 18646)

Bootstrap RMSE Distributions for All Methods (Population >= 18646)

In [170]:
two_sample_t_test_p_values
Out[170]:
Model 1 Model 2 p-value
0 Stochastic Gradient Boosting (Tuned Hyperparam... Random Forest Regression (Tuned Hyperparameters) 1.515526e-25
1 Stochastic Gradient Boosting (Tuned Hyperparam... Gradient Boosting Regression (Tuned Hyperparam... 8.911868e-37
2 Stochastic Gradient Boosting (Tuned Hyperparam... Stochastic Gradient Boosting (Default Hyperpar... 1.082024e-164
3 Stochastic Gradient Boosting (Tuned Hyperparam... Stochastic Gradient Boosting (Tuned Hyperparam... 1.264246e-192
4 Stochastic Gradient Boosting (Tuned Hyperparam... Stochastic Gradient Boosting (Default Hyperpar... 0.000000e+00
5 Stochastic Gradient Boosting (Tuned Hyperparam... Gradient Boosting Regression (Default Hyperpar... 0.000000e+00
6 Stochastic Gradient Boosting (Tuned Hyperparam... LASSO Regression (Tuned Hyperparameters) 0.000000e+00

Boxplot of Bootstrap RMSE Distributions for All Methods (Population >= 18646)

Predicted vs Actual Opioid Mortality Rate (Population >= 18646)

Predicted vs Actual Opioid Mortality Rate (Population >= 18646)

Feature Importances (Population >= 18646)

Conclusion from Prediction Analysis

  • Model of choice: XGBoost's stochastic gradient boosting model with tuned hyperparameters
  • Use this model to do the following:
  • Target counties with high opioid overdose mortality rate
  • Help in investigating the leading causes of opioid addiction
  • Predict the change in opioid overdose mortality rate when some of the predictor variable values change

Further Research

  • Investigate as to why the following are important featuresin making accurate opioid overdose mortality rate:predictions
  • american_indian_or_alaska\_native\_%
  • asian_or_pacific\_islander\_%
  • 45_49\_years\_%

References

1. https://en.wikipedia.org/wiki/Opioid_epidemic

2. https://data.library.virginia.edu/getting-started-with-hurdle-models/

3. https://agupubs.onlinelibrary.wiley.com/doi/full/10.1002/wrcr.20308

Special Thanks

  • Special thanks to Tommy Blanchard for mentoring me throughout this project
  • Special thanks to Peter Lee for helping me with the presentation slides
  • Special thanks to Abraham Choe for helping me with the blog layout