Load word vectors data (wordvec
or embed
) from ".RData" file.
Source: R/01-basic.R
data_wordvec_load.Rd
Load word vectors data (wordvec
or embed
) from ".RData" file.
Usage
data_wordvec_load(
file,
as = c("wordvec", "embed"),
normalize = FALSE,
verbose = TRUE
)
load_wordvec(file, normalize = TRUE)
load_embed(file, normalize = TRUE)
Arguments
- file
File name of .RData transformed by
data_transform
. Can also be an .RData file containing an embedding matrix with words as row names.- as
Load as
wordvec
(data.table) orembed
(matrix). Defaults to the original class of the R object infile
. The two wrapper functionsload_wordvec
andload_embed
automatically reshape the data to the corresponding class and normalize all word vectors (for faster future use).- normalize
Normalize all word vectors to unit length? Defaults to
FALSE
. Seenormalize
.- verbose
Print information to the console? Defaults to
TRUE
.
Download
Download pre-trained word vectors data (.RData
):
https://psychbruce.github.io/WordVector_RData.pdf
Examples
d = demodata[1:200]
save(d, file="demo.RData")
d = load_wordvec("demo.RData")
#> Loading...
#> ✔ Word vectors data: 200 vocab, 300 dims (time cost = 0.005 secs)
#> ✔ All word vectors have been normalized to unit length 1.
d
#> # wordvec (data.table): [200 × 2] (normalized)
#> word vec
#> 1: in [ 0.0530, ...<300 dims>]
#> 2: for [-0.0085, ...<300 dims>]
#> 3: that [-0.0124, ...<300 dims>]
#> 4: is [ 0.0037, ...<300 dims>]
#> 5: on [ 0.0167, ...<300 dims>]
#> ----
#> 196: really [ 0.0440, ...<300 dims>]
#> 197: found [-0.0144, ...<300 dims>]
#> 198: used [ 0.0964, ...<300 dims>]
#> 199: lot [ 0.0678, ...<300 dims>]
#> 200: money [ 0.0644, ...<300 dims>]
d = load_embed("demo.RData")
#> Loading...
#> ✔ Word vectors data: 200 vocab, 300 dims (time cost = 0.005 secs)
#> ✔ All word vectors have been normalized to unit length 1.
d
#> # embed (matrix): [200 × 300] (normalized)
#> dim1 ... dim300
#> 1: in 0.0530 ... <300 dims>
#> 2: for -0.0085 ... <300 dims>
#> 3: that -0.0124 ... <300 dims>
#> 4: is 0.0037 ... <300 dims>
#> 5: on 0.0167 ... <300 dims>
#> ----
#> 196: really 0.0440 ... <300 dims>
#> 197: found -0.0144 ... <300 dims>
#> 198: used 0.0964 ... <300 dims>
#> 199: lot 0.0678 ... <300 dims>
#> 200: money 0.0644 ... <300 dims>
unlink("demo.RData") # delete file for code check
if (FALSE) {
# please first manually download the .RData file
# (see https://psychbruce.github.io/WordVector_RData.pdf)
# or transform plain text data by using `data_transform()`
# the RData file must be on your disk
# the following code cannot run unless you have the file
library(bruceR)
set.wd()
d = load_embed("../data-raw/GloVe/glove_wiki_50d.RData")
d
}