Annals of Epidemiology
Volume 19, Issue 12 , Pages 908-914, December 2009

Multiple Imputation for Missing Laboratory Data: An Example from Infectious Disease Epidemiology

  • Zuber D. Mulla, PhD

      Affiliations

    • Department of Obstetrics and Gynecology, Paul L. Foster School of Medicine, Texas Tech University Health Sciences Center, El Paso
    • Department of Epidemiology and Biostatistics, University of South Florida College of Public Health, Tampa
    • Corresponding Author InformationAddress correspondence to: Zuber D. Mulla, Department of Obstetrics and Gynecology, Paul L. Foster School of Medicine, Texas Tech University Health Sciences Center, 4800 Alberta Ave., El Paso, TX 79905. Tel: (915) 545-6710. Fax: (915) 545-6946.
  • ,
  • Byungtae Seo, PhD

      Affiliations

    • Department of Mathematics and Statistics, Texas Tech University, Lubbock
  • ,
  • Ramaswami Kalamegham, PhD

      Affiliations

    • Department of Obstetrics and Gynecology, Paul L. Foster School of Medicine, Texas Tech University Health Sciences Center, El Paso
    • Department of Pathology, Paul L. Foster School of Medicine, Texas Tech University Health Sciences Center, El Paso
  • ,
  • Bahij S. Nuwayhid, MD, PhD

      Affiliations

    • Department of Obstetrics and Gynecology, Paul L. Foster School of Medicine, Texas Tech University Health Sciences Center, El Paso

Received 13 May 2009; accepted 9 August 2009. published online 07 October 2009.

Purpose

To present multiple imputation (MI) as an appropriate method to address missing values for a laboratory parameter (serum albumin) in an epidemiologic study.

Methods

A data set of patients who were hospitalized for invasive group A streptococcal infections was accessed. Age was the exposure of interest. The outcome was hospital mortality. Several variables, including serum albumin, were considered to be potential confounders. Of the 201 records, 91 had missing values for serum albumin. The MI procedure in SAS was used to perform 20 imputations of serum albumin by using a Markov chain Monte Carlo approach. Logistic regression was then performed on each of the 20 filled-in data sets, and the results were appropriately combined by using the MIANALYZE procedure.

Results

Age (≥55 years vs. 0–54 years) was not a risk factor for hospital mortality in the complete-case analysis (n=110): adjusted odds ratio (OR)=2.43 (95% confidence interval [CI]: 0.79–7.53). Age was a significant risk factor in the imputed data set (n=201): adjusted OR=3.08 (95% CI: 1.22–7.78).

Conclusions

Epidemiologists frequently encounter data sets that contain missing values. Traditional missing data techniques such as the complete-subject analysis may lead to biased results. We have demonstrated the use of a novel technique, MI, to account for missing data.

Key Words: Streptococcus pyogenes, Serum Albumin, Missing Data, Multiple Imputation, Markov Chains, Monte Carlo Methods

Selected Abbreviations and Acronyms: CI, confidence interval, EM, expectation-maximization, GAS, group A streptococcal, MAR, missing at random, MCMC, Markov chain Monte Carlo, MI, multiple imputation, OR, odds ratio

To access this article, please choose from the options below

Login to an existing account or Register a new account.

  • Purchase this article for 31.50 USD (You must login/register to purchase this article)

    Online access for 24 hours. The PDF version can be downloaded as your permanent record.

  • Subscribe to this title

    Get unlimited online access to this article and all other articles in this title 24/7 for one year.

  • Claim access now

    For current subscribers with Society Membership or Account Number.

  • Visit SciVerse ScienceDirect to see if you have access via your institution.
 

PII: S1047-2797(09)00285-3

doi:10.1016/j.annepidem.2009.08.002

Annals of Epidemiology
Volume 19, Issue 12 , Pages 908-914, December 2009