<Deprecated> Fill in the blank mask(s) in a query (sentence).

Note: This function has been deprecated and will not be updated since I have developed new package FMAT as the integrative toolbox of Fill-Mask Association Test (FMAT).

Predict the probably correct masked token(s) in a sequence, based on the Python module transformers.

Usage

text_unmask(query, model, targets = NULL, topn = 5)

Arguments

query: A query (sentence/prompt) with masked token(s) [MASK]. Multiple queries are also supported. See examples.
model: Model name at HuggingFace. See text_model_download. If the model has not been downloaded, it would automatically download the model.
targets: Specific target word(s) to be filled in the blank [MASK]. Defaults to NULL (i.e., return topn). If specified, then topn will be ignored (see examples).
topn: Number of the most likely predictions to return. Defaults to 5. If targets is specified, then it will automatically change to the length of targets.

Value

A data.table of query results:

query_id (if there are more than one query): query ID (indicating multiple queries)
mask_id (if there are more than one [MASK] in query): [MASK] ID (position in sequence, indicating multiple masks)
prob: Probability of the predicted token in the sequence
token_id: Predicted token ID (to replace [MASK])
token: Predicted token (to replace [MASK])
sequence: Complete sentence with the predicted token

Details

Masked language modeling is the task of masking some of the words in a sentence and predicting which words should replace those masks. These models are useful when we want to get a statistical understanding of the language in which the model is trained in. See https://huggingface.co/tasks/fill-mask for details.

Examples

if (FALSE) {
# text_init()  # initialize the environment

model = "distilbert-base-cased"

text_unmask("Beijing is the [MASK] of China.", model)

# multiple [MASK]s:
text_unmask("Beijing is the [MASK] [MASK] of China.", model)

# multiple queries:
text_unmask(c("The man worked as a [MASK].",
              "The woman worked as a [MASK]."),
            model)

# specific targets:
text_unmask("The [MASK] worked as a nurse.", model,
            targets=c("man", "woman"))
}