We are interested in a class of unsupervised methods to detect possible disease outbreaks, that is, rapid increases in the number of cases of a particular disease that deviate from the pattern observed in the past. The motivating application for this article deals with detecting outbreaks using generalized additive models (GAMs) to model weekly counts of certain infectious diseases. We can use the distance between the predicted and observed counts for a specific week to determine whether an important departure has occurred. Unfortunately, this approach may not work as desired because GAMs can be very sensitive to the presence of a small proportion of observations that deviate from the assumed model. Thus, the outbreak may affect the predicted values causing these to be close to the atypical counts, and thus mask the outliers by having them appear not to be too extreme or atypical. We illustrate this phenomenon with influenza-like-illness doctor-visits data from the United States for the 2006-2008 flu seasons. One way to avoid this masking problem is to derive an algorithm to fit GAM models that can resist the effect of a small number of atypical observations. In this article we discuss such an outlier-robust fit for GAMs based on the backfitting algorithm. The basic idea is to replace the maximum likelihood based weights used in the generalized local scoring algorithm with those derived from robust quasi-likelihood equations (Cantoni and Ronchetti 2001b). These robust estimators for generalized linear models work well for the Poisson family of distributions, and also for binomial distributions with relatively large numbers of trials. We show that the resulting estimated mean function is resistant to the presence of outliers in the response variable and that it also remains close to the usual GAM estimator when the data do not contain atypical observations. We illustrate the use of this approach on the detection of the recent outbreak of H1N1 flu by looking at the weekly counts of influenza-like-illness (ILI) doctor visits, as reported through the U.S. Outpatient Influenza-like Illness Surveillance Network (ILINet), and also apply our method to the numbers of requested isolates in Canada. Weeks with a sudden increase in ILI visits or requested isolates are much more clearly identified as atypical by the robust fit because the observed counts are far from the ones predicted by the fitted GAM model.

}, keywords = {Outliers, Robust quasi-likelihood, Robustness}, issn = {0162-1459}, doi = {10.1198/jasa.2011.tm09654}, author = {Alimadad, Azadeh and Salibian-Barrera, Mat{\'\i}as} }