The Directed Prediction Index (DPI) is a simulation-based method for quantifying the relative endogeneity (relative dependence) of outcome (Y) vs. predictor (X) variables in multiple linear regression models. By comparing the proportion of variance explained (R-squared) between the Y-as-outcome model and the X-as-outcome model while controlling for a sufficient number of potential confounding variables, it suggests a more plausible influence direction from a more exogenous variable (X) to a more endogenous variable (Y). Methodological details are provided at https://psychbruce.github.io/DPI/.
Usage
DPI(
model,
y,
x,
data = NULL,
k.cov = 1,
n.sim = 1000,
seed = NULL,
progress,
file = NULL,
width = 6,
height = 4,
dpi = 500
)
Arguments
- model
Model object (
lm
).- y
Dependent (outcome) variable.
- x
Independent (predictor) variable.
- data
[Optional] Defaults to
NULL
. Ifdata
is specified, thenmodel
will be ignored and a linear modellm({y} ~ {x} + .)
will be fitted inside. This is helpful for exploring all variables in a dataset.- k.cov
Number of random covariates (simulating potential omitted variables) added to each simulation sample.
Defaults to
1
. Please also test differentk.cov
values as robustness checks (seeDPI_curve()
).If
k.cov > 0
, the raw data (without bootstrapping) are used, withk.cov
random variables appended, for simulation.If
k.cov = 0
(not suggested), bootstrap samples (resampling with replacement) are used for simulation.
- n.sim
Number of simulation samples. Defaults to
1000
.- seed
Random seed for replicable results. Defaults to
NULL
.- progress
Show progress bar. Defaults to
FALSE
(ifn.sim < 5000
).- file
File name of saved plot (
".png"
or".pdf"
).- width, height
Width and height (in inches) of saved plot. Defaults to
6
and4
.- dpi
Dots per inch (figure resolution). Defaults to
500
.
Value
Return a data.frame of simulation results:
DPI
t.beta.xy^2 * (R2.Y - R2.X)
t.beta.xy
t value for coefficient of X predicting Y (always equal to t value for coefficient of Y predicting X) when controlling for all other covariates
df.beta.xy
residual degree of freedom (df) of
t.beta.xy
r.partial.xy
partial correlation (always with the same t value as
t.beta.xy
) between X and Y when controlling for all other covariates
delta.R2
R2.Y - R2.X
R2.Y
\(R^2\) of regression model predicting Y using X and all other covariates
R2.X
\(R^2\) of regression model predicting X using Y and all other covariates
Examples
model = lm(Ozone ~ ., data=airquality)
DPI(model, y="Ozone", x="Solar.R", seed=1)
#> Sample size: N.valid = 111
#> Model formula: Ozone ~ Solar.R + Wind + Temp + Month + Day
#> Directed prediction tested: "Solar.R" (X) -> "Ozone" (Y)
#> Simulation sample settings: k.random.covs = 1, n.sim = 1000, seed = 1
#>
#> ── (1) Strength: Partial Correlation (pr_XY) ──
#>
#> Estimated r(partial) = 0.205, t(104) = 2.133, p = 0.0353 *
#> ── (2) Direction: ΔR² (= R²Y - R²X) ──
#>
#> Estimate Sim.SE z.value p.z sig Sim.Conf.Interval
#> ΔR² 0.453 (0.011) 40.021 <1e-99 *** [0.423, 0.471]
#> ── (3) DPI: The Directed Prediction Index ──
#>
#> Estimate Sim.SE z.value p.z sig Sim.Conf.Interval
#> DPI 2.063 (0.181) 11.395 4e-30 *** [1.658, 2.411]
DPI(data=airquality, y="Ozone", x="Solar.R", k.cov=10, seed=1)
#> Sample size: N.valid = 111
#> Model formula: Ozone ~ Solar.R + (Solar.R + Wind + Temp + Month + Day)
#> Directed prediction tested: "Solar.R" (X) -> "Ozone" (Y)
#> Simulation sample settings: k.random.covs = 10, n.sim = 1000, seed = 1
#>
#> ── (1) Strength: Partial Correlation (pr_XY) ──
#>
#> Estimated r(partial) = 0.204, t(95) = 2.030, p = 0.0452 *
#> ── (2) Direction: ΔR² (= R²Y - R²X) ──
#>
#> Estimate Sim.SE z.value p.z sig Sim.Conf.Interval
#> ΔR² 0.414 (0.035) 11.950 7e-33 *** [0.338, 0.472]
#> ── (3) DPI: The Directed Prediction Index ──
#>
#> Estimate Sim.SE z.value p.z sig Sim.Conf.Interval
#> DPI 1.748 (0.569) 3.071 0.0021 ** [0.716, 3.043]