Skip to contents

Load word vectors data (wordvec or embed) from ".RData" file.

Usage

data_wordvec_load(
  file,
  as = c("wordvec", "embed"),
  normalize = FALSE,
  verbose = TRUE
)

load_wordvec(file, normalize = TRUE)

load_embed(file, normalize = TRUE)

Arguments

file

File name of .RData transformed by data_transform. Can also be an .RData file containing an embedding matrix with words as row names.

as

Load as wordvec (data.table) or embed (matrix). Defaults to the original class of the R object in file. The two wrapper functions load_wordvec and load_embed automatically reshape the data to the corresponding class and normalize all word vectors (for faster future use).

normalize

Normalize all word vectors to unit length? Defaults to FALSE. See normalize.

verbose

Print information to the console? Defaults to TRUE.

Value

A wordvec (data.table) or embed (matrix).

Download

Download pre-trained word vectors data (.RData): https://psychbruce.github.io/WordVector_RData.pdf

Examples

d = demodata[1:200]
save(d, file="demo.RData")
d = load_wordvec("demo.RData")
#> Loading...

#>  Word vectors data: 200 vocab, 300 dims (time cost = 0.005 secs)
#>  All word vectors have been normalized to unit length 1.
d
#> # wordvec (data.table): [200 × 2] (normalized)
#>        word                      vec
#>   1:     in [ 0.0530, ...<300 dims>]
#>   2:    for [-0.0085, ...<300 dims>]
#>   3:   that [-0.0124, ...<300 dims>]
#>   4:     is [ 0.0037, ...<300 dims>]
#>   5:     on [ 0.0167, ...<300 dims>]
#> ----                                
#> 196: really [ 0.0440, ...<300 dims>]
#> 197:  found [-0.0144, ...<300 dims>]
#> 198:   used [ 0.0964, ...<300 dims>]
#> 199:    lot [ 0.0678, ...<300 dims>]
#> 200:  money [ 0.0644, ...<300 dims>]
d = load_embed("demo.RData")
#> Loading...

#>  Word vectors data: 200 vocab, 300 dims (time cost = 0.005 secs)
#>  All word vectors have been normalized to unit length 1.
d
#> # embed (matrix): [200 × 300] (normalized)
#>                dim1 ...     dim300
#>   1: in      0.0530 ... <300 dims>
#>   2: for    -0.0085 ... <300 dims>
#>   3: that   -0.0124 ... <300 dims>
#>   4: is      0.0037 ... <300 dims>
#>   5: on      0.0167 ... <300 dims>
#> ----                              
#> 196: really  0.0440 ... <300 dims>
#> 197: found  -0.0144 ... <300 dims>
#> 198: used    0.0964 ... <300 dims>
#> 199: lot     0.0678 ... <300 dims>
#> 200: money   0.0644 ... <300 dims>
unlink("demo.RData")  # delete file for code check

if (FALSE) {
# please first manually download the .RData file
# (see https://psychbruce.github.io/WordVector_RData.pdf)
# or transform plain text data by using `data_transform()`

# the RData file must be on your disk
# the following code cannot run unless you have the file
library(bruceR)
set.wd()
d = load_embed("../data-raw/GloVe/glove_wiki_50d.RData")
d
}