Generalized Estimating Equations

Generalized Estimating Equations estimate generalized linear models for panel, cluster or repeated measures data when the observations are possibly correlated withing a cluster but uncorrelated across clusters. It supports estimation of the same one-parameter exponential families as Generalized Linear models (GLM).

See Module Reference for commands and arguments.

Examples

The following illustrates a Poisson regression with exchangeable correlation within clusters using data on epilepsy seizures.

In [1]: import statsmodels.api as sm

ImportErrorTraceback (most recent call last)
<ipython-input-1-6030a6549dc0> in <module>()
----> 1 import statsmodels.api as sm

/builddir/build/BUILD/statsmodels-0.8.0/statsmodels/api.py in <module>()
      5 from . import regression
      6 from .regression.linear_model import OLS, GLS, WLS, GLSAR
----> 7 from .regression.recursive_ls import RecursiveLS
      8 from .regression.quantile_regression import QuantReg
      9 from .regression.mixed_linear_model import MixedLM

/builddir/build/BUILD/statsmodels-0.8.0/statsmodels/regression/recursive_ls.py in <module>()
     14 from statsmodels.regression.linear_model import OLS
     15 from statsmodels.tools.data import _is_using_pandas
---> 16 from statsmodels.tsa.statespace.mlemodel import (
     17     MLEModel, MLEResults, MLEResultsWrapper)
     18 from statsmodels.tools.tools import Bunch

/builddir/build/BUILD/statsmodels-0.8.0/statsmodels/tsa/statespace/mlemodel.py in <module>()
     12 from scipy.stats import norm
     13 
---> 14 from .kalman_smoother import KalmanSmoother, SmootherResults
     15 from .kalman_filter import (KalmanFilter, FilterResults, INVERT_UNIVARIATE,
     16                             SOLVE_LU)

/builddir/build/BUILD/statsmodels-0.8.0/statsmodels/tsa/statespace/kalman_smoother.py in <module>()
     12 import numpy as np
     13 
---> 14 from statsmodels.tsa.statespace.representation import OptionWrapper
     15 from statsmodels.tsa.statespace.kalman_filter import (KalmanFilter,
     16                                                       FilterResults)

/builddir/build/BUILD/statsmodels-0.8.0/statsmodels/tsa/statespace/representation.py in <module>()
      8 
      9 import numpy as np
---> 10 from .tools import (
     11     find_best_blas_type, prefix_dtype_map, prefix_statespace_map,
     12     validate_matrix_shape, validate_vector_shape

/builddir/build/BUILD/statsmodels-0.8.0/statsmodels/tsa/statespace/tools.py in <module>()
     10 from scipy.linalg import solve_sylvester
     11 from statsmodels.tools.data import _is_using_pandas
---> 12 from . import _statespace
     13 
     14 has_find_best_blas_type = True

ImportError: cannot import name _statespace

In [2]: import statsmodels.formula.api as smf

In [3]: data = sm.datasets.get_rdataset('epil', package='MASS').data

NameErrorTraceback (most recent call last)
<ipython-input-3-4e55b8bec212> in <module>()
----> 1 data = sm.datasets.get_rdataset('epil', package='MASS').data

NameError: name 'sm' is not defined

In [4]: fam = sm.families.Poisson()

NameErrorTraceback (most recent call last)
<ipython-input-4-27e83d0aa15c> in <module>()
----> 1 fam = sm.families.Poisson()

NameError: name 'sm' is not defined

In [5]: ind = sm.cov_struct.Exchangeable()

NameErrorTraceback (most recent call last)
<ipython-input-5-c90f3accbba7> in <module>()
----> 1 ind = sm.cov_struct.Exchangeable()

NameError: name 'sm' is not defined

In [6]: mod = smf.gee("y ~ age + trt + base", "subject", data,
   ...:               cov_struct=ind, family=fam)
   ...: 

NameErrorTraceback (most recent call last)
<ipython-input-6-ff1703408948> in <module>()
----> 1 mod = smf.gee("y ~ age + trt + base", "subject", data,
      2               cov_struct=ind, family=fam)

NameError: name 'data' is not defined

In [7]: res = mod.fit()

NameErrorTraceback (most recent call last)
<ipython-input-7-deef2687e692> in <module>()
----> 1 res = mod.fit()

NameError: name 'mod' is not defined

In [8]: print(res.summary())

NameErrorTraceback (most recent call last)
<ipython-input-8-a8dc848a1f25> in <module>()
----> 1 print(res.summary())

NameError: name 'res' is not defined

Several notebook examples of the use of GEE can be found on the Wiki: Wiki notebooks for GEE

References

  • KY Liang and S Zeger. “Longitudinal data analysis using generalized linear models”. Biometrika (1986) 73 (1): 13-22.
  • S Zeger and KY Liang. “Longitudinal Data Analysis for Discrete and Continuous Outcomes”. Biometrics Vol. 42, No. 1 (Mar., 1986), pp. 121-130
  • A Rotnitzky and NP Jewell (1990). “Hypothesis testing of regression parameters in semiparametric generalized linear models for cluster correlated data”, Biometrika, 77, 485-497.
  • Xu Guo and Wei Pan (2002). “Small sample performance of the score test in GEE”. http://www.sph.umn.edu/faculty1/wp-content/uploads/2012/11/rr2002-013.pdf
  • LA Mancl LA, TA DeRouen (2001). A covariance estimator for GEE with improved small-sample properties. Biometrics. 2001 Mar;57(1):126-34.

Module Reference

Model Class

GEE(endog, exog, groups[, time, family, …]) Estimation of marginal regression models using Generalized Estimating Equations (GEE).

Results Classes

GEEResults(model, params, cov_params, scale) This class summarizes the fit of a marginal regression model using GEE.
GEEMargins(results, args[, kwargs]) Estimated marginal effects for a regression model fit with GEE.

Dependence Structures

The dependence structures currently implemented are

CovStruct([cov_nearest_method]) A base class for correlation and covariance structures of grouped data.
Autoregressive([dist_func]) A first-order autoregressive working dependence structure.
Exchangeable() An exchangeable working dependence structure.
GlobalOddsRatio(endog_type) Estimate the global odds ratio for a GEE with ordinal or nominal data.
Independence([cov_nearest_method]) An independence working dependence structure.
Nested([cov_nearest_method]) A nested working dependence structure.

Families

The distribution families are the same as for GLM, currently implemented are

Family(link, variance) The parent class for one-parameter exponential families.
Binomial([link]) Binomial exponential family distribution.
Gamma([link]) Gamma exponential family distribution.
Gaussian([link]) Gaussian exponential family distribution.
InverseGaussian([link]) InverseGaussian exponential family.
NegativeBinomial([link, alpha]) Negative Binomial exponential family.
Poisson([link]) Poisson exponential family.