Forecasting Influenza in the United States

Machine learning methods for spaciotemporally granular projections

This page presents results for the project "Deep Learning for Forecasting and Nowcasting Seasonal Diseases in the thesis "Machine Learning for Epidemiological Prediction." The goal of this project is to obtain accurate projections of Influenza incidence on the state and city level up to 8 weeks in advance. Since "ground truth" reports of Influenza from the CDC are typically only available with a delay, this project is both a forecasting and a "nowcasting" effort: we can combine real-time digital data with the historic epidemiological information that is available to obtain estimates for Flu incidence in real time.

To skip straight to interactive dashboard of predictions, click below.

Data Sets

Two geogrpahic granularities of prediction are evaluated: state-level and city-level.

The state-level epidemiological dataset is collected from the CDC's official counts from October 4, 2009 to May 14, 2017, and when states with missing data are removed, it contains 37 states. Since models are evaluated on the second half of each dataset, the time period of evaluation for the state-level dataset is August 1, 2013 to May 1, 2016.
The city-level epidemiological dataset is compiled by IMS health using insurance claims from 61.5% of physicians in the United States, for the period Jan. 1, 2004 to July 20, 2010. When cities with population over 100,000 are removed and only cities for which digital data from Google Trends are also available are included, this dataset contains 180 cities. The time period of evaluation for the city-level dataset is September 30, 2007 to June 20, 2010.
Certain models also integrate digital data collected from Google Trends. Google Trends data are available on the national, state, and Designated Market Area (DMA). The DMA regions defined by Google Trends are similar to the cities or counties in the city-level epidemiological dataset. Time-series downloaded from Google Trends represent the normalized number of people searching for a given keyword or key phrase in a given location on a weekly basis. Time-series for 256 keywords used in previous work on digital epidemiology for ILI prediction are extracted and included in models.

Models

The bulk of the analysis focuses on comparing different supervised machine learning methods for Influenza prediction. The forecasting methods evaluated are as follows:

Persistence: This standard baseline for time-series models propagated forward the most recently observed value for ILI incidence.
Linear Autoregression - AR(GO): A linear regression mapping 52 autoregressive terms to the predicted Influenza incidence at a 1-8 week time horizon. Uses L1 regularization for feature selection, with penalty parameter chosen by 4-fold cross-validation. Nowcasting models also include synchronous Google Trends information for 256 terms.
Linear Network Autoregression - AR(GO)-net LR: A linear regression mapping 52 autoregressive terms in r regions in the dataset (with r chosen by 4-fod cross validation) to the predicted Influenza incidence at a 1-8 week time horizon for each region. A separate model is fitted for each region in the dataset. Uses L1 regularization for feature selection, with penalty parameter chosen by 4-fold cross-validation. Nowcasting models also include synchronous Google Trends information for 256 terms from all locations in the dataset.
Nonparametric Network Autoregression - AR-net RF: Has the same inputs and outputs as the above AR-net LR, but uses a random forest model with 50 decision trees. Maximum tree depth chosen by four-fold cross-validation.
Gated Recurrent Unit Neural Network - AR-net GRU: Accepts as input a 52-by-|R| matrix of incidence time series for all regions in the dataset, and outputs predictions for all regions simultaneously.

Nowcasting methods (denoted ARGO) use the same historic epidemiological data above, plus 1-8 weeks of search query data obtained from Google Trends (GT). Search query data is obtained for 256 flu-related search terms (chosen in previous work) on the same geographic and temporal granularity as the epidemiological data.

Summary of Model Evaluations

The following violin plots compare how each of the five models listed above compare on the two datasets for both the forecasting and the nowcasting scenario. It is clear that the deep learning model exceeds the baseline models' performance for later time horizons in the forecasting scenario. In the nowcasting scenario, the simpler models perform better (likely due to convergence issues for the more complex models).

Explore Detailed Results

While aggregated results can help compare across methodologies and model architectures, it is also important to examine how models perform over entire time-series in specific locations. To enter a dashboard of detailed model-by-model and location-by-location results, click below.