Compute a matrix of cosine similarity/distance of word pairs.
Source:R/01-basic.R
pair_similarity.Rd
Compute a matrix of cosine similarity/distance of word pairs.
Usage
pair_similarity(
data,
words = NULL,
pattern = NULL,
words1 = NULL,
words2 = NULL,
distance = FALSE
)
Arguments
- data
A
wordvec
(data.table) orembed
(matrix), seedata_wordvec_load
.- words
[Option 1] Character string(s).
- pattern
[Option 2] Regular expression (see
str_subset
). If neitherwords
norpattern
are specified (i.e., both areNULL
), then all words in the data will be extracted.- words1, words2
[Option 3] Two sets of words for only n1 * n2 word pairs. See examples.
- distance
Compute cosine distance instead? Defaults to
FALSE
(cosine similarity).
Download
Download pre-trained word vectors data (.RData
):
https://psychbruce.github.io/WordVector_RData.pdf
Examples
pair_similarity(demodata, c("China", "Chinese"))
#> China Chinese
#> China 1.0000000 0.7678081
#> Chinese 0.7678081 1.0000000
pair_similarity(demodata, pattern="^Chi")
#> 4 words matched...
#> China Chicago Chinese Chile
#> China 1.0000000 0.13040186 0.76780811 0.38012317
#> Chicago 0.1304019 1.00000000 0.09174141 0.08685822
#> Chinese 0.7678081 0.09174141 1.00000000 0.21538189
#> Chile 0.3801232 0.08685822 0.21538189 1.00000000
pair_similarity(demodata,
words1=c("China", "Chinese"),
words2=c("Japan", "Japanese"))
#> Japan Japanese
#> China 0.5967756 0.413391
#> Chinese 0.4226447 0.642242