2016/2017 QMeHSS Workshop (2/10/2017)

The February 10, 2017 Workshop on Quantitative Methods in Education, Health, and the Social Sciences (QMeHSS) will take place from 10:30am - 12:00pm in NORC Conference room 344. Stephen Raudenbush and Daniel Schwartz from the University of Chicago will be leading this workshop. NORC is located at 1155 E. 60th Street.

Estimation in Multisite Randomized Trials with Heterogeneous Treatment Effects

Stephen W. Raudenbush

Lewis-Sebring Distinguished Service Professor, Department of Sociology, the College, the Harris School of Public Policy and Chair, Committee on Education, University of Chicago


Daniel Schwartz

Department of Statistics, University of Chicago



A multi-site randomized trial is a fleet of independent experiments, each testing the same hypothesis. Impacts will vary if sites vary in organizational capacity or if persons vary in response to intervention. We consider two targets of statistical generalization: a population of sites and a population of persons. We introduce a hierarchical linear model that incorporates design weights and inverse-probability of treatment weights. Maximizing a weighted log likelihood yields consistent estimators of the average treatment effect, the variance of treatment effects, and the covariance between the treatment effect and the control group mean. However, these estimates can be seriously inefficient when the aim is to generalize to a population of sites. Alternative estimators that use precision weighting may produce biased estimates but with comparatively small standard errors. Applying Xie and Meng’s (2014) criterion of self-efficiency, we develop tools to diagnose and manage the bias-variance tradeoff. We propose scale-free measures of between-site imbalance in precision, heterogeneity of impact, and precision-impact correlation, and we use these to study the behavior of alternative estimators. We apply these methods to two iconic field trials: the US National Welfare to Work Experiment the US national Head Start Impact Study, each of which has highly variable sample sizes. When the aim is to generalize to a population of sites, the risk of bias is minimal in the Welfare to Work study, and precision weighting improves efficiency; in the case of Head Start, consistent estimators are grossly self-inefficient, meaning estimates can be improved by discarding data from many small sites, but precision-weighted estimators may be biased. Implications for study design and analysis appear to be profound.