Skip to contents

All functions

as_embed() as_wordvec() `[`(<embed>) pattern()
Word vectors data class: wordvec and embed.
cosine_similarity() cos_sim() cos_dist()
Cosine similarity/distance between two vectors.
data_transform()
Transform plain text of word vectors into wordvec (data.table) or embed (matrix), saved in a compressed ".RData" file.
data_wordvec_load() load_wordvec() load_embed()
Load word vectors data (wordvec or embed) from ".RData" file.
data_wordvec_subset() subset(<wordvec>) subset(<embed>)
Extract a subset of word vectors data (with S3 methods).
demodata
Demo data (pre-trained using word2vec on Google News; 8000 vocab, 300 dims).
dict_expand()
Expand a dictionary from the most similar words.
dict_reliability()
Reliability analysis and PCA of a dictionary.
get_wordvec()
Extract word vector(s).
most_similar()
Find the Top-N most similar words.
normalize()
Normalize all word vectors to the unit length 1.
orth_procrustes()
Orthogonal Procrustes rotation for matrix alignment.
pair_similarity()
Compute a matrix of cosine similarity/distance of word pairs.
plot_network()
Visualize a (partial correlation) network graph of words.
plot_similarity()
Visualize cosine similarity of word pairs.
plot_wordvec()
Visualize word vectors.
plot_wordvec_tSNE()
Visualize word vectors with dimensionality reduced using t-SNE.
sum_wordvec()
Calculate the sum vector of multiple words.
tab_similarity()
Tabulate cosine similarity/distance of word pairs.
test_RND()
Relative Norm Distance (RND) analysis.
test_WEAT()
Word Embedding Association Test (WEAT) and Single-Category WEAT.
text_init()
Install required Python modules in a new conda environment and initialize the environment, necessary for all text_* functions designed for contextualized word embeddings.
text_model_download()
Download pre-trained language models from HuggingFace.
text_model_remove()
Remove downloaded models from the local .cache folder.
text_to_vec()
Extract contextualized word embeddings from transformers (pre-trained language models).
text_unmask()
<Deprecated> Fill in the blank mask(s) in a query (sentence).
tokenize()
Tokenize raw text for training word embeddings.
train_wordvec()
Train static word embeddings using the Word2Vec, GloVe, or FastText algorithm.