A General Framework for Two-Stage Analysis of Genome-wide Association Studies and Its Application to Case-Control Studies


Two-stage analyses of genome-wide association studies have been proposed as a means to improving power for designs including family-based association and gene-environment interaction testing. In these analyses, all markers are first screened via a statistic that may not be robust to an underlying assumption, and the markers thus selected are then analyzed in a second stage with a test that is independent from the first stage and is robust to the assumption in question. We give a general formulation of two-stage designs and show how one can use this formulation both to derive existing methods and to improve upon them, opening up a range of possible further applications. We show how using simple regression models in conjunction with external data such as average trait values can improve the power of genome-wide association studies. We focus on case-control studies and show how it is possible to use allele frequencies derived from an external reference to derive a powerful two-stage analysis. An illustration involving the Wellcome Trust Case-Control Consortium data shows several genome-wide-significant associations, subsequently validated, that were not significant in the standard analysis. We give some analytic properties of the methods and discuss some underlying principles.

American Journal of Human Genetics 2012; 90(5):P760-P773