Skip to contents

The Directed Prediction Index (DPI) is a simulation-based and conservative method for quantifying the relative endogeneity (relative dependence) of outcome (Y) vs. predictor (X) variables in multiple linear regression models. By comparing the proportion of variance explained (R-squared) between the Y-as-outcome model and the X-as-outcome model while controlling for a sufficient number of potential confounding variables, it suggests a more plausible influence direction from a more exogenous variable (X) to a more endogenous variable (Y). Methodological details are provided at https://psychbruce.github.io/DPI/.

Usage

DPI(
  model,
  y,
  x,
  data = NULL,
  k.cov = 1,
  n.sim = 1000,
  seed = NULL,
  progress,
  file = NULL,
  width = 6,
  height = 4,
  dpi = 500
)

Arguments

model

Model object (lm).

y

Dependent (outcome) variable.

x

Independent (predictor) variable.

data

[Optional] Defaults to NULL. If data is specified, then model will be ignored and a linear model lm({y} ~ {x} + .) will be fitted inside. This is helpful for exploring all variables in a dataset.

k.cov

Number of random covariates (simulating potential omitted variables) added to each simulation sample.

  • Defaults to 1. Please also test different k.cov values as robustness checks (see DPI_curve).

  • If k.cov > 0, the raw data (without bootstrapping) are used, with k.cov random variables appended, for simulation.

  • If k.cov = 0 (not suggested), bootstrap samples (resampling with replacement) are used for simulation.

n.sim

Number of simulation samples. Defaults to 1000.

seed

Random seed for replicable results. Defaults to NULL.

progress

Show progress bar. Defaults to FALSE (if n.sim < 5000).

file

File name of saved plot (".png" or ".pdf").

width, height

Width and height (in inches) of saved plot. Defaults to 6 and 4.

dpi

Dots per inch (figure resolution). Defaults to 500.

Value

Return a data.frame of simulation results:

  • DPI

    • t.beta.xy^2 * (R2.Y - R2.X)

  • t.beta.xy

    • t value for coefficient of X predicting Y (always equal to t value for coefficient of Y predicting X) when controlling for all other covariates

  • df.beta.xy

    • residual degree of freedom (df) of t.beta.xy

  • r.partial.xy

    • partial correlation (always with the same t value as t.beta.xy) between X and Y when controlling for all other covariates

  • delta.R2

    • R2.Y - R2.X

  • R2.Y

    • \(R^2\) of regression model predicting Y using X and all other covariates

  • R2.X

    • \(R^2\) of regression model predicting X using Y and all other covariates

Examples

model = lm(Ozone ~ ., data=airquality)
DPI(model, y="Ozone", x="Solar.R", seed=1)
#> Sample size: N.valid = 111
#> Model formula: Ozone ~ Solar.R + Wind + Temp + Month + Day
#> Directed prediction tested: "Solar.R" (X) -> "Ozone" (Y)
#> Simulation sample settings: k.random.covs = 1, n.sim = 1000, seed = 1
#> 
#> 
#> ── (1) Partial Correlation (pr_XY) ──
#> 
#> Estimated r(partial) = 0.205, t(104) = 2.133, p = 0.0353 *  
#> ── (2) Delta R^2 (= R^2_Y - R^2_X) ──
#> 
#>     Estimate  Sim.SE z.value    p.z sig Sim.Conf.Interval
#> ΔR²    0.453 (0.011)  40.021 <1e-99 ***    [0.423, 0.471]
#> ── (3) Directed Prediction Index (DPI) ──
#> 
#>     Estimate  Sim.SE z.value   p.z sig Sim.Conf.Interval
#> DPI    2.063 (0.181)  11.395 4e-30 ***    [1.658, 2.411]

DPI(data=airquality, y="Ozone", x="Solar.R", k.cov=10, seed=1)
#> Sample size: N.valid = 111
#> Model formula: Ozone ~ Solar.R + (Solar.R + Wind + Temp + Month + Day)
#> Directed prediction tested: "Solar.R" (X) -> "Ozone" (Y)
#> Simulation sample settings: k.random.covs = 10, n.sim = 1000, seed = 1
#> 
#> 
#> ── (1) Partial Correlation (pr_XY) ──
#> 
#> Estimated r(partial) = 0.204, t(95) = 2.030, p = 0.0452 *  
#> ── (2) Delta R^2 (= R^2_Y - R^2_X) ──
#> 
#>     Estimate  Sim.SE z.value   p.z sig Sim.Conf.Interval
#> ΔR²    0.414 (0.035)  11.950 7e-33 ***    [0.338, 0.472]
#> ── (3) Directed Prediction Index (DPI) ──
#> 
#>     Estimate  Sim.SE z.value    p.z sig Sim.Conf.Interval
#> DPI    1.748 (0.569)   3.071 0.0021 **     [0.716, 3.043]