Compute multiple features of surnames and given names.
Source:R/ChineseNames.R
compute_name_index.Rd
Compute all available name features (indices) based on
familyname
and givenname
.
You can either input a data frame
with a variable of Chinese full names
(and a variable of birth years, if necessary)
or just input a vector of full names
(and a vector of birth years, if necessary).
Usage 1: Input a single value or a vector of
name
[andbirth
, if necessary].Usage 2: Input a data frame of
data
and the variable name ofvar.fullname
(orvar.surname
and/orvar.givenname
) [andvar.birthyear
, if necessary].
Caution. Name-character uniqueness (NU) for birth year >= 2010 is estimated by forecasting and thereby may not be accurate.
Usage
compute_name_index(
data = NULL,
var.fullname = NULL,
var.surname = NULL,
var.givenname = NULL,
var.birthyear = NULL,
name = NA,
birth = NA,
index = c("NLen", "SNU", "SNI", "NU", "CCU", "NG", "NV", "NW", "NC"),
NU.approx = TRUE,
digits = 4,
return.namechar = TRUE,
return.all = FALSE
)
Arguments
- data
Data frame.
- var.fullname
Variable name of Chinese full names (e.g.,
"name"
).- var.surname
Variable name of Chinese surnames (e.g.,
"surname"
).- var.givenname
Variable name of Chinese given names (e.g.,
"givenname"
).- var.birthyear
Variable name of birth year (e.g.,
"birth"
).- name
If no
data
, you can just input a vector of full name(s).- birth
If no
data
, you can just input a vector of birth year(s).- index
Which indices to compute?
By default, it computes all available name indices:
NLen
: full-name length (2~4).SNU
: surname uniqueness (1~6).SNI
: surname initial (1~26).NU
: name-character uniqueness (1~6).CCU
: character-corpus uniqueness (1~6).NG
: name gender (-1~1).NV
: name valence (1~5).NW
: name warmth (1~5).NC
: name competence (1~5).
For details, see https://psychbruce.github.io/ChineseNames/
- NU.approx
Whether to approximately compute name-character uniqueness (NU) using the nearest two birth cohorts with relative weights (which would be more precise than just using a single birth cohort). Default is
TRUE
.- digits
Number of decimal places. Default is
4
.- return.namechar
Whether to return separate name characters. Default is
TRUE
.- return.all
Whether to return all temporary variables in the computation of the final variables. Default is
FALSE
.
Value
A new data frame (of class data.table
) with name indices appended.
Full names are split into name0
(surnames, with compound surnames automatically detected),
name1
, name2
, and name3
(given-name characters).
Note
For details and examples, see https://psychbruce.github.io/ChineseNames/
Examples
## Prepare ##
sn = familyname$surname[1:12]
gn = c(top100name.year$name.all.1960[1:6],
top100name.year$name.all.2000[1:6],
top100name.year$name.all.1960[95:100],
top100name.year$name.all.2000[95:100])
demodata = data.frame(name=paste0(sn, gn),
birth=c(1960:1965, 2000:2005,
1960:1965, 2000:2005))
demodata
#> name birth
#> 1 王秀英 1960
#> 2 李军 1961
#> 3 张平 1962
#> 4 刘建华 1963
#> 5 陈伟 1964
#> 6 杨勇 1965
#> 7 黄鑫 2000
#> 8 赵杰 2001
#> 9 吴涛 2002
#> 10 周浩 2003
#> 11 徐宇 2004
#> 12 孙俊杰 2005
#> 13 王秀花 1960
#> 14 李新华 1961
#> 15 张雪梅 1962
#> 16 刘荣 1963
#> 17 陈峰 1964
#> 18 杨春花 1965
#> 19 黄娟 2000
#> 20 赵志豪 2001
#> 21 吴天宇 2002
#> 22 周文博 2003
#> 23 徐子怡 2004
#> 24 孙灿 2005
## Compute ##
newdata = compute_name_index(demodata,
var.fullname="name",
var.birthyear="birth")
newdata
#> name birth name0 name1 name2 name3 NLen SNU SNI NU CCU NG
#> 1: 王秀英 1960 王 秀 英 3 1.1257 23 1.3158 3.5975 -0.8840
#> 2: 李军 1961 李 军 2 1.1325 12 1.8763 3.0228 0.8241
#> 3: 张平 1962 张 平 2 1.1529 26 1.6493 2.9196 0.1704
#> 4: 刘建华 1963 刘 建 华 3 1.2753 12 1.5465 3.1711 0.2838
#> 5: 陈伟 1964 陈 伟 2 1.3415 3 2.0655 3.8865 0.6859
#> 6: 杨勇 1965 杨 勇 2 1.4725 25 2.2930 3.8916 0.9313
#> 7: 黄鑫 2000 黄 鑫 2 1.6292 8 1.9089 5.2161 0.4574
#> 8: 赵杰 2001 赵 杰 2 1.6754 26 1.7429 4.1740 0.5029
#> 9: 吴涛 2002 吴 涛 2 1.6992 23 2.0333 4.4778 0.8763
#> 10: 周浩 2003 周 浩 2 1.7071 26 1.8644 4.4124 0.8955
#> 11: 徐宇 2004 徐 宇 2 1.8247 24 1.5635 4.0210 0.5025
#> 12: 孙俊杰 2005 孙 俊 杰 3 1.8370 19 1.7784 4.3870 0.4234
#> 13: 王秀花 1960 王 秀 花 3 1.1257 23 1.6076 3.4917 -0.9282
#> 14: 李新华 1961 李 新 华 3 1.1325 12 1.6567 3.1363 0.1740
#> 15: 张雪梅 1962 张 雪 梅 3 1.1529 26 1.9998 3.8696 -0.8495
#> 16: 刘荣 1963 刘 荣 2 1.2753 12 1.6599 3.8888 -0.1303
#> 17: 陈峰 1964 陈 峰 2 1.3415 3 2.4063 4.1185 0.8346
#> 18: 杨春花 1965 杨 春 花 3 1.4725 25 1.7471 3.3374 -0.5918
#> 19: 黄娟 2000 黄 娟 2 1.6292 8 2.2167 4.7391 -0.9959
#> 20: 赵志豪 2001 赵 志 豪 3 1.6754 26 1.9377 3.7412 0.7828
#> 21: 吴天宇 2002 吴 天 宇 3 1.6992 23 1.8800 3.3503 0.5210
#> 22: 周文博 2003 周 文 博 3 1.7071 26 1.7989 3.4332 0.5658
#> 23: 徐子怡 2004 徐 子 怡 3 1.8247 24 1.8541 3.7488 -0.2639
#> 24: 孙灿 2005 孙 灿 2 1.8370 19 2.7219 4.7010 0.4339
#> name birth name0 name1 name2 name3 NLen SNU SNI NU CCU NG
#> NV NW NC
#> 1: 4.0938 3.80 3.50
#> 2: 3.9375 3.40 3.70
#> 3: 3.6875 3.30 3.10
#> 4: 4.2188 3.65 3.30
#> 5: 4.2500 3.50 3.40
#> 6: 4.1875 3.60 3.80
#> 7: 3.9375 3.20 3.60
#> 8: 4.6250 3.90 4.30
#> 9: 3.7500 3.80 3.60
#> 10: 4.2500 3.90 3.60
#> 11: 3.7500 3.70 3.60
#> 12: 4.5625 3.85 3.95
#> 13: 3.8438 3.80 3.00
#> 14: 4.1250 3.50 3.20
#> 15: 3.8750 3.25 3.25
#> 16: 4.1875 3.90 3.80
#> 17: 4.1250 3.40 3.80
#> 18: 3.9375 3.80 2.65
#> 19: 4.1250 3.60 2.80
#> 20: 4.2812 3.75 3.65
#> 21: 3.7812 3.60 3.45
#> 22: 4.5312 3.80 3.80
#> 23: 3.6875 3.45 2.90
#> 24: 4.1875 4.20 3.30
#> NV NW NC