The Directed Prediction Index • DPI

🛸 The Directed Prediction Index (DPI).

The Directed Prediction Index (DPI) is a simulation-based method for quantifying the relative endogeneity of outcome versus predictor variables in multiple linear regression models.

Author

Bruce H. W. S. Bao 包寒吴霜

📬 baohws@foxmail.com

📋 psychbruce.github.io

Citation

Bao, H. W. S. (2025). DPI: The Directed Prediction Index. https://CRAN.R-project.org/package=DPI
- Note: This is the original citation. Please refer to the information when you library(DPI) for the APA-7 format of the version you installed.

Installation

## Method 1: Install from CRAN
install.packages("DPI")

## Method 2: Install from GitHub
install.packages("devtools")
devtools::install_github("psychbruce/DPI", force=TRUE)

Computation Details

$\begin{aligned} \text{DPI}_{X \rightarrow Y} & = t^2 \cdot \Delta R^2 \\ & = t_{\beta_{XY|Covs}}^2 \cdot (R_{Y \sim X + Covs}^2 - R_{X \sim Y + Covs}^2) \\ & = t_{partial.r_{XY|Covs}}^2 \cdot (R_{Y \sim X + Covs}^2 - R_{X \sim Y + Covs}^2) \end{aligned}$

In econometrics and broader social sciences, an exogenous variable is assumed to have a unidirectional (causal or quasi-causal) influence on an endogenous variable ( $ExoVar \rightarrow EndoVar$ ). By quantifying the relative endogeneity of outcome versus predictor variables in multiple linear regression models, the DPI can suggest a more plausible direction of influence (e.g., $\text{DPI}_{X \rightarrow Y} > 0 \text{: } X \rightarrow Y$ ) after controlling for a sufficient number of potential confounding variables.

It uses $\Delta R_{Y vs. X}^2$ to test whether $Y$ (outcome), compared to $X$ (predictor), can be more strongly predicted by $m$ observable control variables (included in a regression model) and $k$ unobservable random covariates (specified by k.cov; see the DPI() function). A higher $R^2$ indicates relatively higher dependence (i.e., relatively higher endogeneity) in a given variable set.
It also uses $t_{partial.r}^2$ to penalize insignificant partial correlation ( $r_{partial}$ , with equivalent $t$ test as $\beta_{partial}$ ) between $Y$ and $X$ , while ignoring the sign ( $\pm$ ) of this correlation. A higher $t^2$ (equivalent to $F$ test value when $df = 1$ ) indicates a more robust (less spurious) partial relationship when controlling for other variables.
Simulation samples with k.cov random covariates are generated to test the statistical significance of DPI.