Function reference
-
as_embed()
as_wordvec()
`[`(<embed>)
pattern()
- Word vectors data class:
wordvec
andembed
.
-
cosine_similarity()
cos_sim()
cos_dist()
- Cosine similarity/distance between two vectors.
-
data_transform()
- Transform plain text of word vectors into
wordvec
(data.table) orembed
(matrix), saved in a compressed ".RData" file.
-
data_wordvec_load()
load_wordvec()
load_embed()
- Load word vectors data (
wordvec
orembed
) from ".RData" file.
-
data_wordvec_subset()
subset(<wordvec>)
subset(<embed>)
- Extract a subset of word vectors data (with S3 methods).
-
demodata
- Demo data (pre-trained using word2vec on Google News; 8000 vocab, 300 dims).
-
dict_expand()
- Expand a dictionary from the most similar words.
-
dict_reliability()
- Reliability analysis and PCA of a dictionary.
-
get_wordvec()
- Extract word vector(s).
-
most_similar()
- Find the Top-N most similar words.
-
normalize()
- Normalize all word vectors to the unit length 1.
-
orth_procrustes()
- Orthogonal Procrustes rotation for matrix alignment.
-
pair_similarity()
- Compute a matrix of cosine similarity/distance of word pairs.
-
plot_network()
- Visualize a (partial correlation) network graph of words.
-
plot_similarity()
- Visualize cosine similarity of word pairs.
-
plot_wordvec()
- Visualize word vectors.
-
plot_wordvec_tSNE()
- Visualize word vectors with dimensionality reduced using t-SNE.
-
sum_wordvec()
- Calculate the sum vector of multiple words.
-
tab_similarity()
- Tabulate cosine similarity/distance of word pairs.
-
test_RND()
- Relative Norm Distance (RND) analysis.
-
test_WEAT()
- Word Embedding Association Test (WEAT) and Single-Category WEAT.
-
text_init()
- Install required Python modules
in a new conda environment
and initialize the environment,
necessary for all
text_*
functions designed for contextualized word embeddings.
-
text_model_download()
- Download pre-trained language models from HuggingFace.
-
text_model_remove()
- Remove downloaded models from the local .cache folder.
-
text_to_vec()
- Extract contextualized word embeddings from transformers (pre-trained language models).
-
text_unmask()
- <Deprecated> Fill in the blank mask(s) in a query (sentence).
-
tokenize()
- Tokenize raw text for training word embeddings.
-
train_wordvec()
- Train static word embeddings using the Word2Vec, GloVe, or FastText algorithm.