Skip to contents

Word Embedding Research Framework for Psychological Science.

An integrated toolbox of word embedding research that provides:

  1. A collection of pre-trained static word vectors in the .RData compressed format;
  2. A series of functions to process, analyze, and visualize word vectors;
  3. A range of tests to examine conceptual associations, including the Word Embedding Association Test (Caliskan et al., 2017) and the Relative Norm Distance (Garg et al., 2018), with permutation test of significance;
  4. A set of training methods to locally train (static) word vectors from text corpora, including Word2Vec (Mikolov et al., 2013), GloVe (Pennington et al., 2014), and FastText (Bojanowski et al., 2017);
  5. A group of functions to download pre-trained language models (e.g., GPT, BERT), extract contextualized (dynamic) word vectors (based on the R package text), and perform language analysis tasks (e.g., fill in the blank masks).

⚠️ All users should update the package to version ≥ 0.3.2. Old versions may have slow processing speed and other problems.


Han-Wu-Shuang (Bruce) Bao 包寒吴霜




  • Bao, H.-W.-S. (2023). PsychWordVec: Word embedding research framework for psychological science. R package version 0.3.x.
  • Bao, H.-W.-S., Wang, Z.-X., Cheng, X., Su, Z., Yang, Y., Zhang, G.-Y., Wang, B., & Cai, H. (2023). Using word embeddings to investigate human psychology: Methods and applications. Advances in Psychological Science, 31(6), 887–904.
    [包寒吴霜, 王梓西, 程曦, 苏展, 杨盈, 张光耀, 王博, 蔡华俭. (2023). 基于词嵌入技术的心理学研究:方法及应用. 心理科学进展, 31(6), 887–904.]


## Method 1: Install from CRAN

## Method 2: Install from GitHub
devtools::install_github("psychbruce/PsychWordVec", force=TRUE)

Types of Data for PsychWordVec

Note: Word embedding refers to a natural language processing technique that embeds word semantics into a low-dimensional embedding matrix, with each word (actually token) quantified as a numeric vector representing its (uninterpretable) semantic features. Users are suggested to import word vectors data as the embed class using the function load_embed(), which would automatically normalize all word vectors to the unit length 1 (see the normalize() function) and accelerate the running of most functions in PsychWordVec.
embed wordvec
Basic class matrix data.table
Row size vocabulary size vocabulary size
Column size dimension size 2 (variables: word, vec)
Advantage faster (with matrix operation) easier to inspect and manage
Function to get as_embed() as_wordvec()
Function to load load_embed() load_wordvec()

Functions in PsychWordVec

See the documentation (help pages) for their usage and details.