The Directed Prediction Index (DPI) is a simulation-based and conservative method for quantifying the relative endogeneity (relative dependence) of outcome (Y) vs. predictor (X) variables in multiple linear regression models. By comparing the proportion of variance explained (R-squared) between the Y-as-outcome model and the X-as-outcome model while controlling for a sufficient number of potential confounding variables, it suggests a more plausible influence direction from a more exogenous variable (X) to a more endogenous variable (Y). Methodological details are provided at https://psychbruce.github.io/DPI/.
Usage
DPI(
model,
y,
x,
data = NULL,
k.cov = 1,
n.sim = 1000,
seed = NULL,
progress,
file = NULL,
width = 6,
height = 4,
dpi = 500
)
Arguments
- model
Model object (
lm
).- y
Dependent (outcome) variable.
- x
Independent (predictor) variable.
- data
[Optional] Defaults to
NULL
. Ifdata
is specified, thenmodel
will be ignored and a linear modellm({y} ~ {x} + .)
will be fitted inside. This is helpful for exploring all variables in a dataset.- k.cov
Number of random covariates (simulating potential omitted variables) added to each simulation sample.
Defaults to
1
. Please also test differentk.cov
values as robustness checks (seeDPI_curve
).If
k.cov > 0
, the raw data (without bootstrapping) are used, withk.cov
random variables appended, for simulation.If
k.cov = 0
(not suggested), bootstrap samples (resampling with replacement) are used for simulation.
- n.sim
Number of simulation samples. Defaults to
1000
.- seed
Random seed for replicable results. Defaults to
NULL
.- progress
Show progress bar. Defaults to
FALSE
(ifn.sim < 5000
).- file
File name of saved plot (
".png"
or".pdf"
).- width, height
Width and height (in inches) of saved plot. Defaults to
6
and4
.- dpi
Dots per inch (figure resolution). Defaults to
500
.
Value
Return a data.frame of simulation results:
DPI
t.beta.xy^2 * (R2.Y - R2.X)
t.beta.xy
t value for coefficient of X predicting Y (always equal to t value for coefficient of Y predicting X) when controlling for all other covariates
df.beta.xy
residual degree of freedom (df) of
t.beta.xy
r.partial.xy
partial correlation (always with the same t value as
t.beta.xy
) between X and Y when controlling for all other covariates
delta.R2
R2.Y - R2.X
R2.Y
\(R^2\) of regression model predicting Y using X and all other covariates
R2.X
\(R^2\) of regression model predicting X using Y and all other covariates
Examples
model = lm(Ozone ~ ., data=airquality)
DPI(model, y="Ozone", x="Solar.R", seed=1)
#> Sample size: N.valid = 111
#> Model formula: Ozone ~ Solar.R + Wind + Temp + Month + Day
#> Directed prediction tested: "Solar.R" (X) -> "Ozone" (Y)
#> Simulation sample settings: k.random.covs = 1, n.sim = 1000, seed = 1
#>
#>
#> ── (1) Partial Correlation (pr_XY) ──
#>
#> Estimated r(partial) = 0.205, t(104) = 2.133, p = 0.0353 *
#> ── (2) Delta R^2 (= R^2_Y - R^2_X) ──
#>
#> Estimate Sim.SE z.value p.z sig Sim.Conf.Interval
#> ΔR² 0.453 (0.011) 40.021 <1e-99 *** [0.423, 0.471]
#> ── (3) Directed Prediction Index (DPI) ──
#>
#> Estimate Sim.SE z.value p.z sig Sim.Conf.Interval
#> DPI 2.063 (0.181) 11.395 4e-30 *** [1.658, 2.411]
DPI(data=airquality, y="Ozone", x="Solar.R", k.cov=10, seed=1)
#> Sample size: N.valid = 111
#> Model formula: Ozone ~ Solar.R + (Solar.R + Wind + Temp + Month + Day)
#> Directed prediction tested: "Solar.R" (X) -> "Ozone" (Y)
#> Simulation sample settings: k.random.covs = 10, n.sim = 1000, seed = 1
#>
#>
#> ── (1) Partial Correlation (pr_XY) ──
#>
#> Estimated r(partial) = 0.204, t(95) = 2.030, p = 0.0452 *
#> ── (2) Delta R^2 (= R^2_Y - R^2_X) ──
#>
#> Estimate Sim.SE z.value p.z sig Sim.Conf.Interval
#> ΔR² 0.414 (0.035) 11.950 7e-33 *** [0.338, 0.472]
#> ── (3) Directed Prediction Index (DPI) ──
#>
#> Estimate Sim.SE z.value p.z sig Sim.Conf.Interval
#> DPI 1.748 (0.569) 3.071 0.0021 ** [0.716, 3.043]