Skip to contents

🛸 The Directed Prediction Index (DPI).

The Directed Prediction Index (DPI) is a simulation-based method for quantifying the relative endogeneity of outcome versus predictor variables in multiple linear regression models.

Author

Han-Wu-Shuang (Bruce) Bao 包寒吴霜

📬 baohws@foxmail.com

📋 psychbruce.github.io

Citation

Installation

## Method 1: Install from CRAN
install.packages("DPI")

## Method 2: Install from GitHub
install.packages("devtools")
devtools::install_github("psychbruce/DPI", force=TRUE)

Computation Details

DPIXY=t2ΔR2=tβXY|Covs2(RYX+Covs2RXY+Covs2)=tpartial.rXY|Covs2(RYX+Covs2RXY+Covs2) \begin{aligned} \text{DPI}_{X \rightarrow Y} & = t^2 \cdot \Delta R^2 \\ & = t_{\beta_{XY|Covs}}^2 \cdot (R_{Y \sim X + Covs}^2 - R_{X \sim Y + Covs}^2) \\ & = t_{partial.r_{XY|Covs}}^2 \cdot (R_{Y \sim X + Covs}^2 - R_{X \sim Y + Covs}^2) \end{aligned}

In econometrics and broader social sciences, an exogenous variable is assumed to have a unidirectional (causal or quasi-causal) influence on an endogenous variable (ExoVarEndoVarExoVar \rightarrow EndoVar). By quantifying the relative endogeneity of outcome versus predictor variables in multiple linear regression models, the DPI can suggest a more plausible direction of influence (e.g., DPIXY>0: XY\text{DPI}_{X \rightarrow Y} > 0 \text{: } X \rightarrow Y) after controlling for a sufficient number of potential confounding variables.

  1. It uses ΔRYvs.X2\Delta R_{Y vs. X}^2 to test whether YY (outcome), compared to XX (predictor), can be more strongly predicted by mm observable control variables (included in a regression model) and kk unobservable random covariates (specified by k.cov; see the DPI() function). A higher R2R^2 indicates relatively higher dependence (i.e., relatively higher endogeneity) in a given variable set.
  2. It also uses tpartial.r2t_{partial.r}^2 to penalize insignificant partial correlation (rpartialr_{partial}, with equivalent tt test as βpartial\beta_{partial}) between YY and XX, while ignoring the sign (±\pm) of this correlation. A higher t2t^2 (equivalent to FF test value when df=1df = 1) indicates a more robust (less spurious) partial relationship when controlling for other variables.
  3. Simulation samples with k.cov random covariates are generated to test the statistical significance of DPI.