Recodes a vector (numeric, character or factor) according to a set of rules.
Similar to the recode()
function in package
car, but more flexible.
recode(x, rules, cuts, values, ...)
x |
A vector of mode numeric, character or factor. | |||
rules |
Character string or a vector of character strings for recoding specifications | |||
cuts |
A vector of one or more unique cut points. | |||
values |
A vector of output values. | |||
... |
Other parameters, for compatibility with other functions such as
recode() in package car but also
factor()
in package base |
Similar to the recode()
function in package
car, the recoding rules are separated by semicolons,
of the form input = output
, and allow for:
a single value | 1 = 0 |
|||
a range of values | 2:5 = 1 |
|||
a range of values | c(6,7,10) = 2 |
|||
else |
everything that is not covered by the previously specified rules |
Contrary to the recode()
function in package
car, this function allows the :
sequence
operator (even for factors), so that a rule such as c(1,3,5:7)
, or
c(a,d,f:h)
would be valid.
Actually, since all rules are specified in a string, it really doesn't matter
if the c()
function is used or not. For compatibility reasons it accepts it, but
a more simple way to specify a set of rules is "1,3,5:7=A; else=B"
Special values lo
and hi
may also appear in the range of values.
In the package car, a character output
would have to be quoted,
like "1:2='A'"
but that is not mandatory is this function, "1:2=A"
would do just as well. Output values such as "NA"
or "missing"
are converted to NA
.
Another difference from the car package: the output is not automatically
converted to a factor even if the original variable is a factor. That option is left to the
user's decision to specify as.factor.result
, defaulted to FALSE
.
A capital difference is the treatment of the values not present in the recoding rules. By default,
package car copies all those values in the new object, whereas in this package
the default values are NA
and new values are added only if they are found in the rules.
Users can choose to copy all other values not present in the recoding rules, by specifically adding
else=copy
in the rules.
Since the two functions have the same name, it is possible that users loading both packages to use one instead of the other (depending which package is loaded first). In order to preserve functionality, special efforts have been made to ensure this function performs exactly as the other one (plus more).
In order to minimize possible namespace collisions with package
car, special efforts have been invested to ensure
perfect compatibility with the other recode()
function. The
argument ...
allows for more arguments specific to the
car package, such as as.factor.result
,
as.numeric.result
and levels
. In addition, it
also accepts labels
and ordered
specific to function
factor()
in package base.
Blank spaces outside category labels are ignored, see the last example.
It is possible to use recode()
in a similar
way to function
cut()
,
by specifying a vector of cuts which work for both numeric and character/factor objects.
For any number of c
cuts, there should be c + 1
values,
and if not otherwise specified the argument values
is automatically constructed
as a sequence of numbers from 1
to c + 1
.
Unlike the function
cut()
,
arguments such as include.lowest
or right
are not
necessary because the final outcome can be changed by tweaking the cut values.
x <- rep(1:3, 3) x[1] 1 2 3 1 2 3 1 2 3recode(x, "1:2 = A; else = B")[1] "A" "A" "B" "A" "A" "B" "A" "A" "B"set.seed(1234) x <- factor(sample(letters[1:10], 20, replace = TRUE), levels = letters[1:10]) x[1] b g g g i g a c g f g f c j c i c c b c Levels: a b c d e f g h i jrecode(x, "b:d = 1; g:hi = 2; else = NA") # note the "hi" special value[1] 1 2 2 2 2 2 NA 1 2 NA 2 NA 1 2 1 2 1 1 1 1recode(x, "a, c:f = A; g:hi = B; else = C", as.factor.result = TRUE)[1] C B B B B B A A B A B A A B A B A A C A Levels: A B Crecode(x, "a, c:f = 1; g:hi = 2; else = 3", as.factor.result = TRUE, labels = c("one", "two", "three"), ordered = TRUE)[1] three two two two two two one one two one two one [13] one two one two one one three one Levels: one < two < threeset.seed(1234) categories <- c("An", "example", "that has", "spaces") x <- factor(sample(categories, 20, replace = TRUE), levels = categories) sort(x)[1] An An An An An example example example [9] example that has that has that has that has that has that has that has [17] that has spaces spaces spaces Levels: An example that has spacesrecode(sort(x), "An : 'that has' = 1; spaces = 2")[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2# same thing with recode(sort(x), "An : that has = 1; spaces = 2")[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2# same using cut values recode(sort(x), cuts = "that has")[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2# modifying the output values recode(sort(x), cuts = "that has", values = 0:1)[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1# more treatment of "else" values x <- 10:20 # recoding rules don't overlap all existing values, the rest are empty recode(x, "8:15=1")[1] 1 1 1 1 1 1 NA NA NA NA NA# all other values are copied recode(x, "8:15=1; else=copy")[1] 1 1 1 1 1 1 16 17 18 19 20