Note: This function has been deprecated and will not be updated since I have developed new package FMAT as the integrative toolbox of Fill-Mask Association Test (FMAT).
Predict the probably correct masked token(s) in a sequence,
based on the Python module transformers
.
Arguments
- query
A query (sentence/prompt) with masked token(s)
[MASK]
. Multiple queries are also supported. See examples.- model
Model name at HuggingFace. See
text_model_download
. If the model has not been downloaded, it would automatically download the model.- targets
Specific target word(s) to be filled in the blank
[MASK]
. Defaults toNULL
(i.e., returntopn
). If specified, thentopn
will be ignored (see examples).- topn
Number of the most likely predictions to return. Defaults to
5
. Iftargets
is specified, then it will automatically change to the length oftargets
.
Value
A data.table
of query results:
query_id
(if there are more than onequery
)query
ID (indicating multiple queries)mask_id
(if there are more than one[MASK]
inquery
)[MASK]
ID (position in sequence, indicating multiple masks)prob
Probability of the predicted token in the sequence
token_id
Predicted token ID (to replace
[MASK]
)token
Predicted token (to replace
[MASK]
)sequence
Complete sentence with the predicted token
Details
Masked language modeling is the task of masking some of the words in a sentence and predicting which words should replace those masks. These models are useful when we want to get a statistical understanding of the language in which the model is trained in. See https://huggingface.co/tasks/fill-mask for details.
Examples
if (FALSE) {
# text_init() # initialize the environment
model = "distilbert-base-cased"
text_unmask("Beijing is the [MASK] of China.", model)
# multiple [MASK]s:
text_unmask("Beijing is the [MASK] [MASK] of China.", model)
# multiple queries:
text_unmask(c("The man worked as a [MASK].",
"The woman worked as a [MASK]."),
model)
# specific targets:
text_unmask("The [MASK] worked as a nurse.", model,
targets=c("man", "woman"))
}