Skip to contents

😷 The Fill-Mask Association Test (ζŽ©η ε‘«η©Ίθ”η³»ζ΅‹ιͺŒ).

The Fill-Mask Association Test (FMAT) is an integrative, versatile, and probability-based method that uses Masked Language Models (BERT) to measure conceptual associations (e.g., attitudes, biases, stereotypes) as propositional representations in natural language.

Python (conda) environment and the β€œtransformers” module can be installed automatically using the FMAT_load() function, but users must also specify the Python version in RStudio afterwards:

RStudio β†’ Tools β†’ Global/Project Options
β†’ Python β†’ Select β†’ Conda Environments
β†’ Choose β€œβ€¦/textrpp_condaenv/python.exe”

FMAT Workflow (Bao, 2023)

A full list of BERT-family models are available at Hugging Face. Use the FMAT_load() function to download and load specific BERT models. All downloaded model files are saved at your local folder β€œC:/Users/[YourUserName]/.cache/”.

Several necessary pre-processing steps have been designed in the functions for easier and more direct use (see FMAT_run() for details).

  • For those BERT variants using <mask> rather than [MASK] as the mask token, the input query will be automatically modified so that users can always use [MASK] in query design.
  • For some BERT variants, special prefix characters such as \u0120 and \u2581 will be automatically added to match the whole words (rather than subwords) for [MASK].

Improvements are still needed. If you find bugs or have problems using the functions, please report them at GitHub Issues or send me an email.

Author

Han-Wu-Shuang (Bruce) Bao εŒ…ε―’ε΄ιœœ

πŸ“¬ baohws@foxmail.com

πŸ“‹ psychbruce.github.io

Citation

  • Bao, H.-W.-S. (2023). FMAT: The Fill-Mask Association Test (Version 2023.8) [Computer software]. https://CRAN.R-project.org/package=FMAT
  • Bao, H.-W.-S. (2023). The Fill-Mask Association Test (FMAT): Using AI language models to better understand society and culture [Manuscript submitted for publication].

Installation

## Method 1: Install from CRAN
install.packages("FMAT")

## Method 2: Install from GitHub
install.packages("devtools")
devtools::install_github("psychbruce/FMAT", force=TRUE)

BERT Models

The reliability and validity of the following 12 representative BERT models have been established in my research articles, but future work is needed to examine the performance of other models.

(model name on Hugging Face - downloaded file size)

  1. bert-base-uncased (420MB)
  2. bert-base-cased (416MB)
  3. bert-large-uncased (1.25GB)
  4. bert-large-cased (1.25GB)
  5. distilbert-base-uncased (256MB)
  6. distilbert-base-cased (251MB)
  7. albert-base-v1 (45.2MB)
  8. albert-base-v2 (45.2MB)
  9. roberta-base (478MB)
  10. distilroberta-base (316MB)
  11. vinai/bertweet-base (517MB)
  12. vinai/bertweet-large (1.32GB)

If you are new to BERT, please read:

While the FMAT is an innovative method for computational intelligent analysis of psychology and society, you may also seek for an integrative toolbox for other text-analytic methods. Another R package I developedβ€”PsychWordVecβ€”is one of the most useful and user-friendly package for word embedding analysis (e.g., the Word Embedding Association Test, WEAT). Please refer to its documentation and feel free to use it.