Note: This function has been deprecated and will not be updated since I have developed new package FMAT as the integrative toolbox of Fill-Mask Association Test (FMAT).
Predict the probably correct masked token(s) in a sequence,
based on the Python module transformers.
Arguments
- query
A query (sentence/prompt) with masked token(s)
[MASK]. Multiple queries are also supported. See examples.- model
Model name at HuggingFace. See
text_model_download. If the model has not been downloaded, it would automatically download the model.- targets
Specific target word(s) to be filled in the blank
[MASK]. Defaults toNULL(i.e., returntopn). If specified, thentopnwill be ignored (see examples).- topn
Number of the most likely predictions to return. Defaults to
5. Iftargetsis specified, then it will automatically change to the length oftargets.
Value
A data.table of query results:
query_id(if there are more than onequery)queryID (indicating multiple queries)mask_id(if there are more than one[MASK]inquery)[MASK]ID (position in sequence, indicating multiple masks)probProbability of the predicted token in the sequence
token_idPredicted token ID (to replace
[MASK])tokenPredicted token (to replace
[MASK])sequenceComplete sentence with the predicted token
Details
Masked language modeling is the task of masking some of the words in a sentence and predicting which words should replace those masks. These models are useful when we want to get a statistical understanding of the language in which the model is trained in. See https://huggingface.co/tasks/fill-mask for details.
Examples
if (FALSE) {
# text_init() # initialize the environment
model = "distilbert-base-cased"
text_unmask("Beijing is the [MASK] of China.", model)
# multiple [MASK]s:
text_unmask("Beijing is the [MASK] [MASK] of China.", model)
# multiple queries:
text_unmask(c("The man worked as a [MASK].",
"The woman worked as a [MASK]."),
model)
# specific targets:
text_unmask("The [MASK] worked as a nurse.", model,
targets=c("man", "woman"))
}
