Tabulate data (cosine similarity and standardized effect size) and conduct the permutation test of significance for the Word Embedding Association Test (WEAT) and Single-Category Word Embedding Association Test (SC-WEAT).
For WEAT, two-samples permutation test is conducted (i.e., rearrangements of data).
For SC-WEAT, one-sample permutation test is conducted (i.e., rearrangements of +/- signs to data).
Usage
test_WEAT(
data,
T1,
T2,
A1,
A2,
use.pattern = FALSE,
labels = list(),
p.perm = TRUE,
p.nsim = 10000,
p.side = 2,
seed = NULL,
pooled.sd = "Caliskan"
)
Arguments
- data
A
wordvec
(data.table) orembed
(matrix), seedata_wordvec_load
.- T1, T2
Target words (a vector of words or a pattern of regular expression). If only
T1
is specified, it will tabulate data for single-category WEAT (SC-WEAT).- A1, A2
Attribute words (a vector of words or a pattern of regular expression). Both must be specified.
- use.pattern
Defaults to
FALSE
(using a vector of words). If you use regular expression inT1
,T2
,A1
, andA2
, please specify this argument asTRUE
.- labels
Labels for target and attribute concepts (a named
list
), such as (the default)list(T1="Target1", T2="Target2", A1="Attrib1", A2="Attrib2")
.- p.perm
Permutation test to get exact or approximate p value of the overall effect. Defaults to
TRUE
. See also thesweater
package.- p.nsim
Number of samples for resampling in permutation test. Defaults to
10000
.If
p.nsim
is larger than the number of all possible permutations (rearrangements of data), then it will be ignored and an exact permutation test will be conducted. Otherwise (in most cases for real data and always for SC-WEAT), a resampling test is performed, which takes much less computation time and produces the approximate p value (comparable to the exact one).- p.side
One-sided (
1
) or two-sided (2
) p value. Defaults to2
.In Caliskan et al.'s (2017) article, they reported one-sided p value for WEAT. Here, I suggest reporting two-sided p value as a more conservative estimate. The users take the full responsibility for the choice.
The one-sided p value is calculated as the proportion of sampled permutations where the difference in means is greater than the test statistic.
The two-sided p value is calculated as the proportion of sampled permutations where the absolute difference is greater than the test statistic.
- seed
Random seed for reproducible results of permutation test. Defaults to
NULL
.- pooled.sd
Method used to calculate the pooled SD for effect size estimate in WEAT.
Defaults to
"Caliskan"
:sd(data.diff$cos_sim_diff)
, which is highly suggested and identical to Caliskan et al.'s (2017) original approach.Otherwise specified, it will calculate the pooled SD as: \(\sqrt{[(n_1 - 1) * \sigma_1^2 + (n_2 - 1) * \sigma_2^2] / (n_1 + n_2 - 2)}\). This is NOT suggested because it may overestimate the effect size, especially when there are only a few T1 and T2 words that have small variances.
Value
A list
object of new class weat
:
words.valid
Valid (actually matched) words
words.not.found
Words not found
data.raw
A
data.table
of cosine similarities between all word pairsdata.mean
A
data.table
of mean cosine similarities across all attribute wordsdata.diff
A
data.table
of differential mean cosine similarities between the two attribute conceptseff.label
Description for the difference between the two attribute concepts
eff.type
Effect type: WEAT or SC-WEAT
eff
Raw effect, standardized effect size, and p value (if
p.perm=TRUE
)
Download
Download pre-trained word vectors data (.RData
):
https://psychbruce.github.io/WordVector_RData.pdf
References
Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334), 183--186.
Examples
## cc() is more convenient than c()!
weat = test_WEAT(
demodata,
labels=list(T1="King", T2="Queen", A1="Male", A2="Female"),
T1=cc("king, King"),
T2=cc("queen, Queen"),
A1=cc("male, man, boy, brother, he, him, his, son"),
A2=cc("female, woman, girl, sister, she, her, hers, daughter"),
seed=1)
weat
#>
#> ── WEAT (Word Embedding Association Test) ──────────────────────────────────────
#>
#> 2 King (T1) valid words
#> 2 Queen (T2) valid words
#> 8 Male (A1) valid words
#> 8 Female (A2) valid words
#>
#> Relative semantic similarities (differences):
#> ───────────────────────────────────────────────
#> Target cos_sim_diff std_diff closer_to
#> ───────────────────────────────────────────────
#> "king" King 0.104 1.587 Male
#> "King" King 0.090 1.598 Male
#> "queen" Queen -0.176 -1.751 Female
#> "Queen" Queen -0.129 -1.781 Female
#> ───────────────────────────────────────────────
#>
#> Mean differences for single target category:
#> ─────────────────────────────────────────────
#> Target mean_raw_diff mean_std_diff p
#> ─────────────────────────────────────────────
#> King 0.097 1.592 <.001 ***
#> Queen -0.152 -1.766 <.001 ***
#> ─────────────────────────────────────────────
#> Permutation test: approximate p values (forced to two-sided)
#>
#> Overall effect (raw and standardized mean differences):
#> ─────────────────────────────────────────────────────────
#> Target Attrib mean_diff_raw eff_size p
#> ─────────────────────────────────────────────────────────
#> King/Queen Male/Female 0.249 1.716 <.001 ***
#> ─────────────────────────────────────────────────────────
#> Permutation test: exact p value = 0.00e+00 (two-sided)
#>
sc_weat = test_WEAT(
demodata,
labels=list(T1="Occupation", A1="Male", A2="Female"),
T1=cc("
architect, boss, leader, engineer, CEO, officer, manager,
lawyer, scientist, doctor, psychologist, investigator,
consultant, programmer, teacher, clerk, counselor,
salesperson, therapist, psychotherapist, nurse"),
A1=cc("male, man, boy, brother, he, him, his, son"),
A2=cc("female, woman, girl, sister, she, her, hers, daughter"),
seed=1)
sc_weat
#>
#> ── SC-WEAT (Single-Category Word Embedding Association Test) ───────────────────
#>
#> 21 Occupation (T1) valid words
#> 8 Male (A1) valid words
#> 8 Female (A2) valid words
#>
#> Relative semantic similarities (differences):
#> ──────────────────────────────────────────────────
#> cos_sim_diff std_diff closer_to
#> ──────────────────────────────────────────────────
#> "architect" 0.087 1.048 Male
#> "boss" 0.081 1.027 Male
#> "leader" 0.079 0.985 Male
#> "engineer" 0.071 0.775 Male
#> "CEO" 0.045 0.637 Male
#> "officer" 0.033 0.448 Male
#> "manager" 0.022 0.342 Male
#> "lawyer" 0.020 0.238 Male
#> "scientist" 0.008 0.170 Male
#> "doctor" 0.013 0.169 Male
#> "psychologist" 0.001 0.030 Male
#> "investigator" 0.001 0.012 Male
#> "consultant" -0.003 -0.055 Female
#> "programmer" -0.006 -0.144 Female
#> "teacher" -0.067 -0.800 Female
#> "clerk" -0.078 -1.035 Female
#> "counselor" -0.064 -1.196 Female
#> "salesperson" -0.083 -1.207 Female
#> "therapist" -0.069 -1.245 Female
#> "psychotherapist" -0.092 -1.323 Female
#> "nurse" -0.162 -1.491 Female
#> ──────────────────────────────────────────────────
#>
#> Overall effect (raw and standardized mean differences):
#> ─────────────────────────────────────────────────────────
#> Target Attrib mean_diff_raw eff_size p
#> ─────────────────────────────────────────────────────────
#> Occupation Male/Female -0.008 -0.124 .599
#> ─────────────────────────────────────────────────────────
#> Permutation test: approximate p value = 5.99e-01 (two-sided)
#>
if (FALSE) {
## the same as the first example, but using regular expression
weat = test_WEAT(
demodata,
labels=list(T1="King", T2="Queen", A1="Male", A2="Female"),
use.pattern=TRUE, # use regular expression below
T1="^[kK]ing$",
T2="^[qQ]ueen$",
A1="^male$|^man$|^boy$|^brother$|^he$|^him$|^his$|^son$",
A2="^female$|^woman$|^girl$|^sister$|^she$|^her$|^hers$|^daughter$",
seed=1)
weat
## replicating Caliskan et al.'s (2017) results
## WEAT7 (Table 1): d = 1.06, p = .018
## (requiring installation of the `sweater` package)
Caliskan.WEAT7 = test_WEAT(
as_wordvec(sweater::glove_math),
labels=list(T1="Math", T2="Arts", A1="Male", A2="Female"),
T1=cc("math, algebra, geometry, calculus, equations, computation, numbers, addition"),
T2=cc("poetry, art, dance, literature, novel, symphony, drama, sculpture"),
A1=cc("male, man, boy, brother, he, him, his, son"),
A2=cc("female, woman, girl, sister, she, her, hers, daughter"),
p.side=1, seed=1234)
Caliskan.WEAT7
# d = 1.055, p = .0173 (= 173 counts / 10000 permutation samples)
## replicating Caliskan et al.'s (2017) supplemental results
## WEAT7 (Table S1): d = 0.97, p = .027
Caliskan.WEAT7.supp = test_WEAT(
demodata,
labels=list(T1="Math", T2="Arts", A1="Male", A2="Female"),
T1=cc("math, algebra, geometry, calculus, equations, computation, numbers, addition"),
T2=cc("poetry, art, dance, literature, novel, symphony, drama, sculpture"),
A1=cc("male, man, boy, brother, he, him, his, son"),
A2=cc("female, woman, girl, sister, she, her, hers, daughter"),
p.side=1, seed=1234)
Caliskan.WEAT7.supp
# d = 0.966, p = .0221 (= 221 counts / 10000 permutation samples)
}