Skip to contents

Word Embedding Research Framework for Psychological Science.

An integrated toolbox of word embedding research that provides:

  1. A collection of pre-trained static word vectors in the .RData compressed format;
  2. A series of functions to process, analyze, and visualize word vectors;
  3. A range of tests to examine conceptual associations, including the Word Embedding Association Test (Caliskan et al., 2017) and the Relative Norm Distance (Garg et al., 2018), with permutation test of significance;
  4. A set of training methods to locally train (static) word vectors from text corpora, including Word2Vec (Mikolov et al., 2013), GloVe (Pennington et al., 2014), and FastText (Bojanowski et al., 2017);
  5. A group of functions to download pre-trained language models (e.g., GPT, BERT), extract contextualized (dynamic) word vectors (based on the R package text), and perform language analysis tasks (e.g., fill in the blank masks).

⚠️ All users should update the package to version ≥ 0.3.0. Old versions (≤ 0.2.0) may run slowly, and some old functions have been deprecated.

Author

Han-Wu-Shuang (Bruce) Bao 包寒吴霜

Email: baohws@foxmail.com

Homepage: psychbruce.github.io

Citation

Installation

## Method 1: Install from CRAN
install.packages("PsychWordVec")

## Method 2: Install from GitHub
install.packages("devtools")
devtools::install_github("psychbruce/PsychWordVec", force=TRUE)

Types of Data for PsychWordVec

Note: Word embedding refers to a natural language processing technique that embeds word semantics into a low-dimensional embedding matrix, with each word (actually token) quantified as a numeric vector representing its (uninterpretable) semantic features. Users are suggested to import word vectors data as the embed class using the function load_embed(), which would automatically normalize all word vectors to the unit length 1 (see the normalize() function) and accelerate the running of most functions in PsychWordVec.
embed wordvec
Basic class matrix data.table
Row size vocabulary size vocabulary size
Column size dimension size 2 (variables: word, vec)
Advantage faster (with matrix operation) easier to inspect and manage
Function to get as_embed() as_wordvec()
Function to load load_embed() load_wordvec()

Functions in PsychWordVec

See the documentation (help pages) for their usage and details.