Skip to contents

In Excel, we can use VLOOKUP, HLOOKUP, XLOOKUP (a new function released in 2019), or the combination of INDEX and MATCH to search, match, and look up values. Here I provide a similar function.

Usage

LOOKUP(
  data,
  vars,
  data.ref,
  vars.ref,
  vars.lookup,
  return = c("new.data", "new.var", "new.value")
)

Arguments

data

Main data.

vars

Character (vector), specifying the variable(s) to be searched in data.

data.ref

Reference data containing both the reference variable(s) and the lookup variable(s).

vars.ref

Character (vector), with the same length and order as vars, specifying the reference variable(s) to be matched in data.ref.

vars.lookup

Character (vector), specifying the variable(s) to be looked up and returned from data.ref.

return

What to return. Default ("new.data") is to return a data frame with the lookup values added. You may also set it to "new.var" or "new.value".

Value

New data object, new variable, or new value (see the argument return).

Details

If multiple values were simultaneously matched, a warning message would be printed.

Examples

ref = data.table(City=rep(c("A", "B", "C"), each=5),
                 Year=rep(2013:2017, times=3),
                 GDP=sample(1000:2000, 15),
                 PM2.5=sample(10:300, 15))
ref
#>       City  Year   GDP PM2.5
#>     <char> <int> <int> <int>
#>  1:      A  2013  1615   285
#>  2:      A  2014  1507    62
#>  3:      A  2015  1497   268
#>  4:      A  2016  1040   213
#>  5:      A  2017  1558   296
#>  6:      B  2013  1726   237
#>  7:      B  2014  1026    25
#>  8:      B  2015  1221   187
#>  9:      B  2016  1401    36
#> 10:      B  2017  1825   102
#> 11:      C  2013  1670    52
#> 12:      C  2014  1458   128
#> 13:      C  2015  1392   115
#> 14:      C  2016  1649   286
#> 15:      C  2017  1439   183

data = data.table(sub=1:5,
                  city=c("A", "A", "B", "C", "C"),
                  year=c(2013, 2014, 2015, 2016, 2017))
data
#>      sub   city  year
#>    <int> <char> <num>
#> 1:     1      A  2013
#> 2:     2      A  2014
#> 3:     3      B  2015
#> 4:     4      C  2016
#> 5:     5      C  2017

LOOKUP(data, c("city", "year"), ref, c("City", "Year"), "GDP")
#>      sub   city  year   GDP
#>    <int> <char> <num> <int>
#> 1:     1      A  2013  1615
#> 2:     2      A  2014  1507
#> 3:     3      B  2015  1221
#> 4:     4      C  2016  1649
#> 5:     5      C  2017  1439
LOOKUP(data, c("city", "year"), ref, c("City", "Year"), c("GDP", "PM2.5"))
#>      sub   city  year   GDP PM2.5
#>    <int> <char> <num> <int> <int>
#> 1:     1      A  2013  1615   285
#> 2:     2      A  2014  1507    62
#> 3:     3      B  2015  1221   187
#> 4:     4      C  2016  1649   286
#> 5:     5      C  2017  1439   183