The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
In some situations, you may want to use encodefrom()
to collapse values, that is, group unique raw values into a smaller set of clean values / labels. For example, say you have the following data set, which gives each state’s census division number and name:
id | state | cendiv | cendiv_name |
---|---|---|---|
1 | AL | 6 | East South Central |
2 | AK | 9 | Pacific |
3 | AZ | 8 | Mountain |
4 | AR | 7 | West South Central |
5 | CA | 9 | Pacific |
6 | CO | 8 | Mountain |
7 | CT | 1 | New England |
8 | DE | 5 | South Atlantic |
10 | FL | 5 | South Atlantic |
12 | HI | 9 | Pacific |
14 | IL | 3 | East North Central |
15 | IN | 3 | East North Central |
16 | IA | 4 | West North Central |
31 | NJ | 2 | Middle Atlantic |
33 | NY | 2 | Middle Atlantic |
Rather than using the nine census divisions, you would rather group states by their regions. You have the following crosswalk:
cendiv | cenreg | cenregnm |
---|---|---|
1 | 1 | Northeast |
2 | 1 | Northeast |
3 | 2 | Midwest |
4 | 2 | Midwest |
5 | 3 | South |
6 | 3 | South |
7 | 3 | South |
8 | 4 | West |
9 | 4 | West |
As long as
raw
values are unique in the crosswalkclean
and label
columns have a 1:1 matchThen you can use encodefrom()
to collapse categories as you move from raw to clean values.
## data
df <- tibble(id = c(1:8,10,12,14:16,31,33),
state = c('AL','AK','AZ','AR','CA','CO','CT','DE','FL','HI',
'IL','IN','IA','NJ','NY'),
cendiv = c(6,9,8,7,9,8,1,5,5,9,3,3,4,2,2),
cendiv_name = c('East South Central','Pacific','Mountain',
'West South Central','Pacific','Mountain','New England',
'South Atlantic','South Atlantic','Pacific',
'East North Central','East North Central',
'West North Central','Middle Atlantic','Middle Atlantic'))
## crosswalk
cw <- tibble(cendiv = 1:9,
cenreg = c(1,1,2,2,3,3,3,4,4),
cenregnm = c('Northeast','Northeast','Midwest','Midwest',
'South','South','South','West','West'))
## encode new column
df <- df %>%
mutate(cenreg = encodefrom(., var = cendiv, cw_file = cw, raw = cendiv,
clean = cenreg, label = cenregnm))
## # A tibble: 15 x 5
## id state cendiv cendiv_name cenreg
## <dbl> <chr> <dbl> <chr> <dbl+lbl>
## 1 1 AL 6 East South Central 3 [South]
## 2 2 AK 9 Pacific 4 [West]
## 3 3 AZ 8 Mountain 4 [West]
## 4 4 AR 7 West South Central 3 [South]
## 5 5 CA 9 Pacific 4 [West]
## 6 6 CO 8 Mountain 4 [West]
## 7 7 CT 1 New England 1 [Northeast]
## 8 8 DE 5 South Atlantic 3 [South]
## 9 10 FL 5 South Atlantic 3 [South]
## 10 12 HI 9 Pacific 4 [West]
## 11 14 IL 3 East North Central 2 [Midwest]
## 12 15 IN 3 East North Central 2 [Midwest]
## 13 16 IA 4 West North Central 2 [Midwest]
## 14 31 NJ 2 Middle Atlantic 1 [Northeast]
## 15 33 NY 2 Middle Atlantic 1 [Northeast]
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.