Compute multiple features of surnames and given names. — compute_name

Compute all available name features (indices) based on familyname and givenname. You can either input a data frame with a variable of Chinese full names (and a variable of birth years, if necessary) or just input a vector of full names (and a vector of birth years, if necessary).

Usage 1: Input a single value or a vector of name [and birth, if necessary].
Usage 2: Input a data frame of data and the variable name of var.fullname (or var.surname and/or var.givenname) [and var.birthyear, if necessary].

Caution. Name-character uniqueness (NU) for birth year >= 2010 is estimated by forecasting and thereby may not be accurate.

Usage

compute_name_index(
  data = NULL,
  var.fullname = NULL,
  var.surname = NULL,
  var.givenname = NULL,
  var.birthyear = NULL,
  name = NA,
  birth = NA,
  index = c("NLen", "SNU", "SNI", "NU", "CCU", "NG", "NV", "NW", "NC"),
  NU.approx = TRUE,
  digits = 4,
  return.namechar = TRUE,
  return.all = FALSE
)

Arguments

data

Data frame.

var.fullname

Variable name of Chinese full names (e.g., "name").

var.surname

Variable name of Chinese surnames (e.g., "surname").

var.givenname

Variable name of Chinese given names (e.g., "givenname").

var.birthyear

Variable name of birth year (e.g., "birth").

name

If no data, you can just input a vector of full name(s).

birth

If no data, you can just input a vector of birth year(s).

index

Which indices to compute?

By default, it computes all available name indices:

NLen: full-name length (2~4).
SNU: surname uniqueness (1~6).
SNI: surname initial (1~26).
NU: name-character uniqueness (1~6).
CCU: character-corpus uniqueness (1~6).
NG: name gender (-1~1).
NV: name valence (1~5).
NW: name warmth (1~5).
NC: name competence (1~5).

For details, see https://psychbruce.github.io/ChineseNames/

NU.approx

Whether to approximately compute name-character uniqueness (NU) using the nearest two birth cohorts with relative weights (which would be more precise than just using a single birth cohort). Default is TRUE.

digits

Number of decimal places. Default is 4.

return.namechar

Whether to return separate name characters. Default is TRUE.

return.all

Whether to return all temporary variables in the computation of the final variables. Default is FALSE.

Value

A new data frame (of class data.table) with name indices appended. Full names are split into name0 (surnames, with compound surnames automatically detected), name1, name2, and name3 (given-name characters).

Note

For details and examples, see https://psychbruce.github.io/ChineseNames/

Examples

## Prepare ##
sn = familyname$surname[1:12]
gn = c(top100name.year$name.all.1960[1:6],
       top100name.year$name.all.2000[1:6],
       top100name.year$name.all.1960[95:100],
       top100name.year$name.all.2000[95:100])
demodata = data.frame(name=paste0(sn, gn),
                      birth=c(1960:1965, 2000:2005,
                              1960:1965, 2000:2005))
demodata
#>      name birth
#> 1  王秀英  1960
#> 2    李军  1961
#> 3    张平  1962
#> 4  刘建华  1963
#> 5    陈伟  1964
#> 6    杨勇  1965
#> 7    黄鑫  2000
#> 8    赵杰  2001
#> 9    吴涛  2002
#> 10   周浩  2003
#> 11   徐宇  2004
#> 12 孙俊杰  2005
#> 13 王秀花  1960
#> 14 李新华  1961
#> 15 张雪梅  1962
#> 16   刘荣  1963
#> 17   陈峰  1964
#> 18 杨春花  1965
#> 19   黄娟  2000
#> 20 赵志豪  2001
#> 21 吴天宇  2002
#> 22 周文博  2003
#> 23 徐子怡  2004
#> 24   孙灿  2005

## Compute ##
newdata = compute_name_index(demodata,
                             var.fullname="name",
                             var.birthyear="birth")
newdata
#>       name birth  name0  name1  name2  name3  NLen    SNU   SNI     NU    CCU
#>     <char> <int> <char> <char> <char> <char> <int>  <num> <num>  <num>  <num>
#>  1: 王秀英  1960     王     秀     英            3 1.1257    23 1.3158 3.5975
#>  2:   李军  1961     李     军                   2 1.1325    12 1.8763 3.0228
#>  3:   张平  1962     张     平                   2 1.1529    26 1.6493 2.9196
#>  4: 刘建华  1963     刘     建     华            3 1.2753    12 1.5465 3.1711
#>  5:   陈伟  1964     陈     伟                   2 1.3415     3 2.0655 3.8865
#>  6:   杨勇  1965     杨     勇                   2 1.4725    25 2.2930 3.8916
#>  7:   黄鑫  2000     黄     鑫                   2 1.6292     8 1.9089 5.2161
#>  8:   赵杰  2001     赵     杰                   2 1.6754    26 1.7429 4.1740
#>  9:   吴涛  2002     吴     涛                   2 1.6992    23 2.0333 4.4778
#> 10:   周浩  2003     周     浩                   2 1.7071    26 1.8644 4.4124
#> 11:   徐宇  2004     徐     宇                   2 1.8247    24 1.5635 4.0210
#> 12: 孙俊杰  2005     孙     俊     杰            3 1.8370    19 1.7784 4.3870
#> 13: 王秀花  1960     王     秀     花            3 1.1257    23 1.6076 3.4917
#> 14: 李新华  1961     李     新     华            3 1.1325    12 1.6567 3.1363
#> 15: 张雪梅  1962     张     雪     梅            3 1.1529    26 1.9998 3.8696
#> 16:   刘荣  1963     刘     荣                   2 1.2753    12 1.6599 3.8888
#> 17:   陈峰  1964     陈     峰                   2 1.3415     3 2.4063 4.1185
#> 18: 杨春花  1965     杨     春     花            3 1.4725    25 1.7471 3.3374
#> 19:   黄娟  2000     黄     娟                   2 1.6292     8 2.2167 4.7391
#> 20: 赵志豪  2001     赵     志     豪            3 1.6754    26 1.9377 3.7412
#> 21: 吴天宇  2002     吴     天     宇            3 1.6992    23 1.8800 3.3503
#> 22: 周文博  2003     周     文     博            3 1.7071    26 1.7989 3.4332
#> 23: 徐子怡  2004     徐     子     怡            3 1.8247    24 1.8541 3.7488
#> 24:   孙灿  2005     孙     灿                   2 1.8370    19 2.7219 4.7010
#>       name birth  name0  name1  name2  name3  NLen    SNU   SNI     NU    CCU
#>          NG     NV    NW    NC
#>       <num>  <num> <num> <num>
#>  1: -0.8840 4.0938  3.80  3.50
#>  2:  0.8241 3.9375  3.40  3.70
#>  3:  0.1704 3.6875  3.30  3.10
#>  4:  0.2838 4.2188  3.65  3.30
#>  5:  0.6859 4.2500  3.50  3.40
#>  6:  0.9313 4.1875  3.60  3.80
#>  7:  0.4574 3.9375  3.20  3.60
#>  8:  0.5029 4.6250  3.90  4.30
#>  9:  0.8763 3.7500  3.80  3.60
#> 10:  0.8955 4.2500  3.90  3.60
#> 11:  0.5025 3.7500  3.70  3.60
#> 12:  0.4234 4.5625  3.85  3.95
#> 13: -0.9282 3.8438  3.80  3.00
#> 14:  0.1740 4.1250  3.50  3.20
#> 15: -0.8495 3.8750  3.25  3.25
#> 16: -0.1303 4.1875  3.90  3.80
#> 17:  0.8346 4.1250  3.40  3.80
#> 18: -0.5918 3.9375  3.80  2.65
#> 19: -0.9959 4.1250  3.60  2.80
#> 20:  0.7828 4.2812  3.75  3.65
#> 21:  0.5210 3.7812  3.60  3.45
#> 22:  0.5658 4.5312  3.80  3.80
#> 23: -0.2639 3.6875  3.45  2.90
#> 24:  0.4339 4.1875  4.20  3.30
#>          NG     NV    NW    NC