CALDER Research Method: Panel Data Models (Fixed Effect Models)

One of the analytic issues that plagues education research is the non-random assignment of teachers and students to schools and classrooms.

CALDER's use of longitudinal data allows us to employ panel data models to address this problem. Unobserved heterogeneity that would typically doom a study to biased estimates and incorrect statistical tests can be differenced out in fixed effects models. These models can offer huge advantages over cross-sectional models, and can substantially reduce the bias arising from non-random assignment.

Example

A fixed effect model can eliminate unobservable cross-sectional individual differences that affect achievement. If students with high unobservable ability tend to be clustered in one type of school, and gains were positively related to unobservable ability, we might improperly ascribe the high gains at that type of school to the school type, rather than the nonrandom distribution of students. To control for this, we allow there to be an unobservable ability measure for each student and difference it out in our estimation. 1

Previous Research

Hanushek and Pace (2002) provides an overview of the advantages of fixed effects models. 2

The CALDER research team has published extensively using fixed effects models. For example, Hanushek, Kain and Rivkin (1988 and 2002); Rivkin, Hanushek, and Kain (1988); Figlio and Lucas (2003); and Bifulco and Ladd (2003) use models that control for individual unobserved heterogeneity across schools (using school fixed effects) and students (using student fixed effects). Other examples abound. Goldhaber and Anthony (forthcoming) model teacher fixed effects. Hanushek, Kain, and Rivkin (2002) model classroom fixed effects and Goldhaber, Anthony, & Perry (2004) model district fixed effects.

OTHER CALDER RESEARCH METHODS

Hazard models
Regression discontinuity
Qualitative studies within a longitudinal data system

Notes

1. More technically, if we call the individual "fixed effect" c, we can write a model for each individual where and then estimate the mean effect b using a regression of the form Y=a+Xb+e (where the letters a, b, and e are understood to be vectors estimating time-specific effects, the coefficients on X, and a residual, respectively, and X is a matrix of explanatory variables). However, this regression will not produce an unbiased estimate of the mean b (the mean effect of each column of X on Y). Since the covariance of ci and xi is unknown, it is difficult to even quantify the nature of the bias in these cases. However, since we have multiple observations on individuals over time (panel data), we can estimate an equation of the form but ct-ct-1=0 (since the effect ci for each individual does not vary over time) and therefore is dropped from the regression. Under these assumptions, it is not a problem that we cannot observe c. Any unobservable heterogeneity that does not change over time does not bias our coefficient estimate in this type of fixed-effect regression.

2. We can also model the differences across individuals in the coefficients of interest (in the example, the effect of x on y), using random coefficient models, often implemented using longitudinal data as hierarchical linear models with random effects (not discussed here for brevity's sake).