Skip to contents

Note: This function has been deprecated and will not be updated since I have developed new package FMAT as the integrative toolbox of Fill-Mask Association Test (FMAT).

Predict the probably correct masked token(s) in a sequence, based on the Python module transformers.

Usage

text_unmask(query, model, targets = NULL, topn = 5)

Arguments

query

A query (sentence/prompt) with masked token(s) [MASK]. Multiple queries are also supported. See examples.

model

Model name at HuggingFace. See text_model_download. If the model has not been downloaded, it would automatically download the model.

targets

Specific target word(s) to be filled in the blank [MASK]. Defaults to NULL (i.e., return topn). If specified, then topn will be ignored (see examples).

topn

Number of the most likely predictions to return. Defaults to 5. If targets is specified, then it will automatically change to the length of targets.

Value

A data.table of query results:

query_id (if there are more than one query)

query ID (indicating multiple queries)

mask_id (if there are more than one [MASK] in query)

[MASK] ID (position in sequence, indicating multiple masks)

prob

Probability of the predicted token in the sequence

token_id

Predicted token ID (to replace [MASK])

token

Predicted token (to replace [MASK])

sequence

Complete sentence with the predicted token

Details

Masked language modeling is the task of masking some of the words in a sentence and predicting which words should replace those masks. These models are useful when we want to get a statistical understanding of the language in which the model is trained in. See https://huggingface.co/tasks/fill-mask for details.

Examples

if (FALSE) {
# text_init()  # initialize the environment

model = "distilbert-base-cased"

text_unmask("Beijing is the [MASK] of China.", model)

# multiple [MASK]s:
text_unmask("Beijing is the [MASK] [MASK] of China.", model)

# multiple queries:
text_unmask(c("The man worked as a [MASK].",
              "The woman worked as a [MASK]."),
            model)

# specific targets:
text_unmask("The [MASK] worked as a nurse.", model,
            targets=c("man", "woman"))
}