expss
package provides tabulation functions with support of SPSS-style labels, multiple / nested banners, weights and multiple-response variables. Additionally it offers useful functions for data processing workflow in the social / marketing research surveys - popular data transformation functions from SPSS Statistics (RECODE
, COUNT
, COMPUTE
, DO IF
, etc.) and Excel (COUNTIF
, VLOOKUP
, etc.). Proper methods for labelled variables add value labels support to base R and other packages. Package aimed to help people to move data processing from ‘Excel’/‘SPSS’ to R. See examples below. You can get help about any function by typing ?function_name
in the R console.
expss
is on CRAN, so for installation you can print in the console install.packages("expss")
.
We will use for demonstartion well-known mtcars
dataset. Let’s start with adding labels to the dataset. Then we can continue with tables creation.
library(expss)
data(mtcars)
mtcars = apply_labels(mtcars,
mpg = "Miles/(US) gallon",
cyl = "Number of cylinders",
disp = "Displacement (cu.in.)",
hp = "Gross horsepower",
drat = "Rear axle ratio",
wt = "Weight (1000 lbs)",
qsec = "1/4 mile time",
vs = "Engine",
vs = c("V-engine" = 0,
"Straight engine" = 1),
am = "Transmission",
am = c("Automatic" = 0,
"Manual"=1),
gear = "Number of forward gears",
carb = "Number of carburetors"
)
For quick cross-tabulation there are fre
and cro
family of function. For simplicity we demonstrate here only cro_cpct
which caluclates column percent. Documentation for other functions, such as cro_cases
for counts, cro_rpct
for row percent, cro_tpct
for table percent and cro_fun
for custom summary functions can be seen by typing ?cro
and ?cro_fun
in the console.
# 'cro' examples
# multiple banners
mtcars %>%
calculate(cro_cpct(cyl, list(total(), am, vs))) %>%
htmlTable(caption = "Table with multiple banners (column %).")
Table with multiple banners (column %). | |||||||
#Total | Transmission | Engine | |||||
---|---|---|---|---|---|---|---|
Automatic | Manual | V-engine | Straight engine | ||||
Number of cylinders | |||||||
4 | 34.4 | 15.8 | 61.5 | 5.6 | 71.4 | ||
6 | 21.9 | 21.1 | 23.1 | 16.7 | 28.6 | ||
8 | 43.8 | 63.2 | 15.4 | 77.8 | |||
#Total cases | 32 | 19 | 13 | 18 | 14 |
# nested banners
mtcars %>%
calculate(cro_cpct(cyl, list(total(), am %nest% vs))) %>%
htmlTable(caption = "Table with nested banners (column %).")
Table with nested banners (column %). | |||||||
#Total | Transmission | ||||||
---|---|---|---|---|---|---|---|
Automatic | Manual | ||||||
Engine | Engine | ||||||
V-engine | Straight engine | V-engine | Straight engine | ||||
Number of cylinders | |||||||
4 | 34.4 | 42.9 | 16.7 | 100 | |||
6 | 21.9 | 57.1 | 50 | ||||
8 | 43.8 | 100 | 33.3 | ||||
#Total cases | 32 | 12 | 7 | 6 | 7 |
magrittr
piping. Documentation for this interface can be seen via ?tables
.
mtcars %>%
tab_cells(mpg, disp, hp, wt, qsec) %>%
tab_cols(total(), am) %>%
tab_stat_fun(Mean = w_mean, "Std. dev." = w_sd, "Valid N" = w_n) %>%
tab_pivot() %>%
htmlTable(caption = "Table with summary statistics. Statistics labels in rows.")
Table with summary statistics. Statistics labels in rows. | ||||
#Total | Transmission | |||
---|---|---|---|---|
Automatic | Manual | |||
Miles/(US) gallon | ||||
Mean | 20.1 | 17.1 | 24.4 | |
Std. dev. | 6 | 3.8 | 6.2 | |
Valid N | 32 | 19 | 13 | |
Displacement (cu.in.) | ||||
Mean | 230.7 | 290.4 | 143.5 | |
Std. dev. | 123.9 | 110.2 | 87.2 | |
Valid N | 32 | 19 | 13 | |
Gross horsepower | ||||
Mean | 146.7 | 160.3 | 126.8 | |
Std. dev. | 68.6 | 53.9 | 84.1 | |
Valid N | 32 | 19 | 13 | |
Weight (1000 lbs) | ||||
Mean | 3.2 | 3.8 | 2.4 | |
Std. dev. | 1 | 0.8 | 0.6 | |
Valid N | 32 | 19 | 13 | |
1/4 mile time | ||||
Mean | 17.8 | 18.2 | 17.4 | |
Std. dev. | 1.8 | 1.8 | 1.8 | |
Valid N | 32 | 19 | 13 |
mtcars %>%
tab_cells(mpg, disp, hp, wt, qsec) %>%
tab_cols(total(label = "#Total| |"), am) %>%
tab_stat_fun(Mean = w_mean, "Std. dev." = w_sd, "Valid N" = w_n, method = list) %>%
tab_pivot() %>%
htmlTable(caption = "Table with the same summary statistics. Statistics labels in columns.")
Table with the same summary statistics. Statistics labels in columns. | |||||||||||
#Total | Transmission | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Automatic | Manual | ||||||||||
Mean | Std. dev. | Valid N | Mean | Std. dev. | Valid N | Mean | Std. dev. | Valid N | |||
Miles/(US) gallon | 20.1 | 6 | 32 | 17.1 | 3.8 | 19 | 24.4 | 6.2 | 13 | ||
Displacement (cu.in.) | 230.7 | 123.9 | 32 | 290.4 | 110.2 | 19 | 143.5 | 87.2 | 13 | ||
Gross horsepower | 146.7 | 68.6 | 32 | 160.3 | 53.9 | 19 | 126.8 | 84.1 | 13 | ||
Weight (1000 lbs) | 3.2 | 1 | 32 | 3.8 | 0.8 | 19 | 2.4 | 0.6 | 13 | ||
1/4 mile time | 17.8 | 1.8 | 32 | 18.2 | 1.8 | 19 | 17.4 | 1.8 | 13 |
mtcars %>%
tab_cols(total(), vs) %>%
tab_cells(mpg) %>%
tab_stat_mean() %>%
tab_stat_valid_n() %>%
tab_cells(am) %>%
tab_stat_cpct(total_row_position = "none", label = "col %") %>%
tab_stat_rpct(total_row_position = "none", label = "row %") %>%
tab_stat_tpct(total_row_position = "none", label = "table %") %>%
tab_pivot(stat_position = "inside_rows") %>%
htmlTable(caption = "Different statistics for differen variables.")
Different statistics for differen variables. | ||||||
#Total | Engine | |||||
---|---|---|---|---|---|---|
V-engine | Straight engine | |||||
Miles/(US) gallon | ||||||
Mean | 20.1 | 16.6 | 24.6 | |||
Valid N | 32 | 18 | 14 | |||
Transmission | ||||||
Automatic | col % | 59.4 | 66.7 | 50 | ||
row % | 100 | 63.2 | 36.8 | |||
table % | 59.4 | 37.5 | 21.9 | |||
Manual | col % | 40.6 | 33.3 | 50 | ||
row % | 100 | 46.2 | 53.8 | |||
table % | 40.6 | 18.8 | 21.9 |
mtcars %>%
tab_cells(cyl) %>%
tab_cols(total(), vs) %>%
tab_rows(am) %>%
tab_stat_cpct(total_row_position = "above",
total_label = c("number of cases", "row %"),
total_statistic = c("u_cases", "u_rpct")) %>%
tab_pivot() %>%
htmlTable(caption = "Table with split by rows and with custom totals.")
Table with split by rows and with custom totals. | |||||||
#Total | Engine | ||||||
---|---|---|---|---|---|---|---|
V-engine | Straight engine | ||||||
Transmission | |||||||
Automatic | Number of cylinders | #number of cases | 19 | 12 | 7 | ||
#row % | 100 | 63.2 | 36.8 | ||||
4 | 15.8 | 42.9 | |||||
6 | 21.1 | 57.1 | |||||
8 | 63.2 | 100 | |||||
Manual | Number of cylinders | #number of cases | 13 | 6 | 7 | ||
#row % | 100 | 46.2 | 53.8 | ||||
4 | 61.5 | 16.7 | 100 | ||||
6 | 23.1 | 50 | |||||
8 | 15.4 | 33.3 |
mtcars %>%
tab_cells(dtfrm(mpg, disp, hp, wt, qsec)) %>%
tab_cols(total(label = "#Total| |"), am) %>%
tab_stat_fun_df(
function(x){
frm = reformulate(".", response = names(x)[1])
model = lm(frm, data = x)
dtfrm('Coef.' = coef(model),
confint(model)
)
}
) %>%
tab_pivot() %>%
htmlTable(caption = "Linear regression by groups.")
Linear regression by groups. | |||||||||||
#Total | Transmission | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Automatic | Manual | ||||||||||
Coef. | 2.5 % | 97.5 % | Coef. | 2.5 % | 97.5 % | Coef. | 2.5 % | 97.5 % | |||
(Intercept) | 27.3 | 9.6 | 45.1 | 21.8 | -1.9 | 45.5 | 13.3 | -21.9 | 48.4 | ||
`Displacement (cu.in.)` | 0 | 0 | 0 | 0 | 0 | 0 | 0 | -0.1 | 0.1 | ||
`Gross horsepower` | 0 | -0.1 | 0 | 0 | -0.1 | 0 | 0 | 0 | 0.1 | ||
`Weight (1000 lbs)` | -4.6 | -7.2 | -2 | -2.3 | -5 | 0.4 | -7.7 | -12.5 | -2.9 | ||
`1/4 mile time` | 0.5 | -0.4 | 1.5 | 0.4 | -0.7 | 1.6 | 1.6 | -0.2 | 3.4 |
Here we use truncated dataset with data from product test of two samples of chocolate sweets. 150 respondents tested two kinds of sweets (codenames: VSX123 and SDF546). Sample was divided into two groups (cells) of 75 respondents in each group. In cell 1 product VSX123 was presented first and then SDF546. In cell 2 sweets were presented in reversed order. Questions about respondent impressions about first product are in the block A (and about second tested product in the block B). At the end of the questionnaire there was a question about the preferences between sweets.
List of variables:
id
Respondent Idcell
First tested product (cell number)s2a
Agea1_1-a1_6
What did you like in these sweets? Multiple response. First tested producta22
Overall quality. First tested productb1_1-b1_6
What did you like in these sweets? Multiple response. Second tested productb22
Overall quality. Second tested productc1
Preferencesdata(product_test)
w = product_test # shorter name to save some keystrokes
# here we recode variables from first/second tested product to separate variables for each product according to their cells
# 'h' variables - VSX123 sample, 'p' variables - 'SDF456' sample
# also we recode preferences from first/second product to true names
# for first cell there are no changes, for second cell we should exchange 1 and 2.
w = w %>%
do_if(cell == 1, {
recode(a1_1 %to% a1_6, other ~ copy) %into% (h1_1 %to% h1_6)
recode(b1_1 %to% b1_6, other ~ copy) %into% (p1_1 %to% p1_6)
recode(a22, other ~ copy) %into% h22
recode(b22, other ~ copy) %into% p22
c1r = c1
}) %>%
do_if(cell == 2, {
recode(a1_1 %to% a1_6, other ~ copy) %into% (p1_1 %to% p1_6)
recode(b1_1 %to% b1_6, other ~ copy) %into% (h1_1 %to% h1_6)
recode(a22, other ~ copy) %into% p22
recode(b22, other ~ copy) %into% h22
recode(c1, 1 ~ 2, 2 ~ 1, other ~ copy) %into% c1r
}) %>%
compute({
# recode age by groups
age_cat = recode(s2a, lo %thru% 25 ~ 1, lo %thru% hi ~ 2)
# count number of likes
# codes 2 and 99 are ignored.
h_likes = count_row_if(1 | 3 %thru% 98, h1_1 %to% h1_6)
p_likes = count_row_if(1 | 3 %thru% 98, p1_1 %to% p1_6)
})
# here we prepare labels for future usage
codeframe_likes = num_lab("
1 Liked everything
2 Disliked everything
3 Chocolate
4 Appearance
5 Taste
6 Stuffing
7 Nuts
8 Consistency
98 Other
99 Hard to answer
")
overall_liking_scale = num_lab("
1 Extremely poor
2 Very poor
3 Quite poor
4 Neither good, nor poor
5 Quite good
6 Very good
7 Excellent
")
w = apply_labels(w,
c1r = "Preferences",
c1r = num_lab("
1 VSX123
2 SDF456
3 Hard to say
"),
age_cat = "Age",
age_cat = c("18 - 25" = 1, "26 - 35" = 2),
h1_1 = "Likes. VSX123",
p1_1 = "Likes. SDF456",
h1_1 = codeframe_likes,
p1_1 = codeframe_likes,
h_likes = "Number of likes. VSX123",
p_likes = "Number of likes. SDF456",
h22 = "Overall quality. VSX123",
p22 = "Overall quality. SDF456",
h22 = overall_liking_scale,
p22 = overall_liking_scale
)
cro(w$c1r) %>% htmlTable(caption = "Distribution of preferences." )
Distribution of preferences. | |
#Total | |
---|---|
Preferences | |
VSX123 | 94 |
SDF456 | 50 |
Hard to say | 6 |
#Total cases | 150 |
# 'na_if(c1r, 3)' remove 'hard to say' from vector
w %>% calculate(c1r %>% na_if(3) %>% table %>% chisq.test)
##
## Chi-squared test for given probabilities
##
## data: .
## X-squared = 13.444, df = 1, p-value = 0.0002457
Further we calculate answers distribution of survey questions.
w %>%
tab_cols(total(), age_cat) %>%
tab_cells(c1r) %>%
tab_stat_cpct() %>%
tab_pivot() %>%
htmlTable(caption = "Preferences")
Preferences | ||||
#Total | Age | |||
---|---|---|---|---|
18 - 25 | 26 - 35 | |||
Preferences | ||||
VSX123 | 62.7 | 65.7 | 60 | |
SDF456 | 33.3 | 31.4 | 35 | |
Hard to say | 4 | 2.9 | 5 | |
#Total cases | 150 | 70 | 80 |
w %>%
tab_cols(total(), age_cat, c1r) %>%
tab_cells(h22) %>%
tab_stat_mean(label = "<b><u>Mean</u></b>") %>%
tab_stat_cpct() %>%
tab_cells(p22) %>%
tab_stat_mean(label = "<b><u>Mean</u></b>") %>%
tab_stat_cpct() %>%
tab_pivot() %>%
htmlTable(caption = "Overall liking")
Overall liking | ||||||||
#Total | Age | Preferences | ||||||
---|---|---|---|---|---|---|---|---|
18 - 25 | 26 - 35 | VSX123 | SDF456 | Hard to say | ||||
Overall quality. VSX123 | ||||||||
Mean | 5.5 | 5.4 | 5.6 | 5.3 | 5.8 | 5.5 | ||
Extremely poor | ||||||||
Very poor | ||||||||
Quite poor | 2 | 2.9 | 1.2 | 3.2 | ||||
Neither good, nor poor | 10.7 | 11.4 | 10 | 14.9 | 2 | 16.7 | ||
Quite good | 39.3 | 45.7 | 33.8 | 40.4 | 38 | 33.3 | ||
Very good | 33.3 | 24.3 | 41.2 | 30.9 | 38 | 33.3 | ||
Excellent | 14.7 | 15.7 | 13.8 | 10.6 | 22 | 16.7 | ||
#Total cases | 150 | 70 | 80 | 94 | 50 | 6 | ||
Overall quality. SDF456 | ||||||||
Mean | 5.4 | 5.3 | 5.4 | 5.4 | 5.3 | 5.7 | ||
Extremely poor | ||||||||
Very poor | 0.7 | 1.2 | 1.1 | |||||
Quite poor | 2.7 | 4.3 | 1.2 | 2.1 | 4 | |||
Neither good, nor poor | 16.7 | 20 | 13.8 | 18.1 | 14 | 16.7 | ||
Quite good | 31.3 | 27.1 | 35 | 28.7 | 38 | 16.7 | ||
Very good | 35.3 | 35.7 | 35 | 35.1 | 34 | 50 | ||
Excellent | 13.3 | 12.9 | 13.8 | 14.9 | 10 | 16.7 | ||
#Total cases | 150 | 70 | 80 | 94 | 50 | 6 |
w %>%
tab_cols(total(), age_cat, c1r) %>%
tab_cells(h_likes) %>%
tab_stat_mean() %>%
tab_cells(mrset(h1_1 %to% h1_6)) %>%
tab_stat_cpct() %>%
tab_cells(p_likes) %>%
tab_stat_mean() %>%
tab_cells(mrset(p1_1 %to% p1_6)) %>%
tab_stat_cpct() %>%
tab_pivot() %>%
htmlTable(caption = "Likes")
Likes | ||||||||
#Total | Age | Preferences | ||||||
---|---|---|---|---|---|---|---|---|
18 - 25 | 26 - 35 | VSX123 | SDF456 | Hard to say | ||||
Number of likes. VSX123 | ||||||||
Mean | 2 | 2 | 2.1 | 1.9 | 2.2 | 2.3 | ||
Likes. VSX123 | ||||||||
Liked everything | ||||||||
Disliked everything | 3.3 | 1.4 | 5 | 4.3 | 2 | |||
Chocolate | 34 | 38.6 | 30 | 35.1 | 32 | 33.3 | ||
Appearance | 29.3 | 21.4 | 36.2 | 25.5 | 38 | 16.7 | ||
Taste | 32 | 38.6 | 26.2 | 23.4 | 48 | 33.3 | ||
Stuffing | 27.3 | 20 | 33.8 | 28.7 | 26 | 16.7 | ||
Nuts | 66.7 | 72.9 | 61.3 | 69.1 | 60 | 83.3 | ||
Consistency | 12 | 4.3 | 18.8 | 8.5 | 14 | 50 | ||
Other | ||||||||
Hard to answer | ||||||||
#Total cases | 150 | 70 | 80 | 94 | 50 | 6 | ||
Number of likes. SDF456 | ||||||||
Mean | 2 | 2 | 2.1 | 2 | 2 | 2 | ||
Likes. SDF456 | ||||||||
Liked everything | ||||||||
Disliked everything | 1.3 | 1.4 | 1.2 | 2.1 | ||||
Chocolate | 32 | 27.1 | 36.2 | 29.8 | 34 | 50 | ||
Appearance | 32 | 35.7 | 28.7 | 34 | 30 | 16.7 | ||
Taste | 39.3 | 42.9 | 36.2 | 36.2 | 44 | 50 | ||
Stuffing | 27.3 | 24.3 | 30 | 31.9 | 20 | 16.7 | ||
Nuts | 61.3 | 60 | 62.5 | 58.5 | 68 | 50 | ||
Consistency | 10 | 5.7 | 13.8 | 11.7 | 6 | 16.7 | ||
Other | 0.7 | 1.2 | 1.1 | |||||
Hard to answer | ||||||||
#Total cases | 150 | 70 | 80 | 94 | 50 | 6 |
w %>%
tab_cols(total(label = "#Total| |"), c1r) %>%
tab_cells(list(unvr(mrset(h1_1 %to% h1_6)))) %>%
tab_stat_cpct(label = var_lab(h1_1)) %>%
tab_cells(list(unvr(mrset(p1_1 %to% p1_6)))) %>%
tab_stat_cpct(label = var_lab(p1_1)) %>%
tab_pivot(stat_position = "inside_columns") %>%
htmlTable(caption = "Likes - side by side comparison")
Likes - side by side comparison | |||||||||||
#Total | Preferences | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
VSX123 | SDF456 | Hard to say | |||||||||
Likes. VSX123 | Likes. SDF456 | Likes. VSX123 | Likes. SDF456 | Likes. VSX123 | Likes. SDF456 | Likes. VSX123 | Likes. SDF456 | ||||
Liked everything | |||||||||||
Disliked everything | 3.3 | 1.3 | 4.3 | 2.1 | 2 | ||||||
Chocolate | 34 | 32 | 35.1 | 29.8 | 32 | 34 | 33.3 | 50 | |||
Appearance | 29.3 | 32 | 25.5 | 34 | 38 | 30 | 16.7 | 16.7 | |||
Taste | 32 | 39.3 | 23.4 | 36.2 | 48 | 44 | 33.3 | 50 | |||
Stuffing | 27.3 | 27.3 | 28.7 | 31.9 | 26 | 20 | 16.7 | 16.7 | |||
Nuts | 66.7 | 61.3 | 69.1 | 58.5 | 60 | 68 | 83.3 | 50 | |||
Consistency | 12 | 10 | 8.5 | 11.7 | 14 | 6 | 50 | 16.7 | |||
Other | 0.7 | 1.1 | |||||||||
Hard to answer | |||||||||||
#Total cases | 150 | 150 | 94 | 94 | 50 | 50 | 6 | 6 |
We can save labelled dataset as *.csv file with accompanying R code for labelling.
write_labelled_csv(w, file filename = "product_test.csv")
Or, we can save dataset as *.csv file with SPSS syntax to read data and apply labels.
write_labelled_spss(w, file filename = "product_test.csv")
Here we demonstrate limited labels support in base R - value labels automatically used as factors levels. So every function which converts labelled variable to factor will utilize labels. Note that variables labels is not supported in such conversions.
with(mtcars, table(am, vs)) %>% knitr::kable()
V-engine | Straight engine | |
---|---|---|
Automatic | 12 | 7 |
Manual | 6 | 7 |
boxplot(mpg ~ am, data = mtcars)
Excel toy table:
A | B | C | |
---|---|---|---|
1 | 2 | 15 | 50 |
2 | 1 | 70 | 80 |
3 | 3 | 30 | 40 |
4 | 2 | 30 | 40 |
Code for creating the same table in R:
w = read.csv(text = "
a,b,c
2,15,50
1,70,80
3,30,40
2,30,40"
)
w
is the name of our table.
Excel: IF(B1>60, 1, 0)
R: Here we create new column with name d
with results. ifelse
function is from base R not from ‘expss’ package but included here for completeness.
w$d = ifelse(w$b>60, 1, 0)
If we need to use multiple transformations it is often convenient to use compute
function. Inside compute
we can put arbitrary number of statements:
w = compute(w, {
d = ifelse(b>60, 1, 0)
e = 42
abc_sum = sum_row(a, b, c)
abc_mean = mean_row(a, b, c)
})
Count 1’s in entire dataset.
Excel: COUNTIF(A1:C4, 1)
R:
count_if(1, w)
or
calculate(w, count_if(1, a, b, c))
Count values greater than 1 in each row of dataset.
Excel: COUNTIF(A1:C1, ">1")
R:
w$d = count_row_if(gt(1), w)
or
w = compute(w, {
d = count_row_if(gt(1), a, b, c)
})
Count values less than or equal to 1 in column A of dataset.
Excel: COUNTIF(A1:A4, "<=1")
R:
count_col_if(le(1), w$a)
Table of criteria:
Excel | R |
---|---|
“<1” | lt(1) |
“<=1” | le(1) |
“<>1” | ne(1) |
“=1” | eq(1) |
“>=1” | ge(1) |
“>1” | gt(1) |
Sum all values in dataset.
Excel: SUM(A1:C4)
R:
sum(w, na.rm = TRUE)
Calculate average of each row of dataset.
Excel: AVERAGE(A1:C1)
R:
w$d = mean_row(w)
or
w = compute(w, {
d = mean_row(a, b, c)
})
Sum values of column A
of dataset.
Excel: SUM(A1:A4)
R:
sum_col(w$a)
Sum values greater than 40 in entire dataset.
Excel: SUMIF(A1:C4, ">40")
R:
sum_if(gt(40), w)
or
calculate(w, sum_if(gt(40), a, b, c))
Sum values less than 40 in each row of dataset.
Excel: SUMIF(A1:C1, "<40")
R:
w$d = sum_row_if(lt(40), w)
or
w = compute(w, {
d = sum_row_if(lt(40), a, b, c)
})
Calculate average of B
column with column A
values less than 3.
Excel: AVERAGEIF(A1:A4, "<3", B1:B4)
R:
mean_col_if(lt(3), w$a, data = w$b)
or, if we want calculate means for both b
and c
columns:
calculate(w, mean_col_if(lt(3), a, data = dtfrm(b, c)))
Our dictionary for lookup:
X | Y | |
---|---|---|
1 | 1 | apples |
2 | 2 | oranges |
3 | 3 | peaches |
Code for creating the same dictionary in R:
dict = read.csv(text = "
x,y
1,apples
2,oranges
3,peaches",
stringsAsFactors = FALSE
)
Excel: VLOOKUP(A1, $X$1:$Y$3, 2, FALSE)
R:
w$d = vlookup(w$a, dict, 2)
or, we can use column names:
w$d = vlookup(w$a, dict, "y")
SPSS:
COMPUTE d = 1.
R:
w$d = 1
or, in specific data.frame
w = compute(w, {
d = 1
})
There can be arbitrary number of statements inside compute
.
SPSS:
IF(a = 3) d = 2.
R:
Default dataset should be already predefined as in previous example.
w = compute(w, {
d = ifelse(a == 3, 2, NA)
})
or,
w = compute(w, {
d = ifs(a == 3 ~ 2)
})
SPSS:
DO IF (a>1).
COMPUTE d = 4.
END IF.
R:
w = do_if(w, a>1, {
d = 4
})
There can be arbitrary number of statements inside do_if
.
SPSS:
COUNT cnt = a1 TO a5 (LO THRU HI).
R:
cnt = count_row_if(lo %thru% hi, a1 %to% a5)
SPSS:
COUNT cnt = a1 TO a5 (SYSMIS).
R:
cnt = count_row_if(NA, a1 %to% a5)
SPSS:
COUNT cnt = a1 TO a5 (1 THRU 5).
R:
cnt = count_row_if(1 %thru% 5, a1 %to% a5)
SPSS:
COUNT cnt = a1 TO a5 (1 THRU HI).
R:
cnt = count_row_if(1 %thru% hi, a1 %to% a5)
or,
cnt = count_row_if(ge(1), a1 %to% a5)
SPSS:
COUNT cnt = a1 TO a5 (LO THRU 1).
R:
cnt = count_row_if(lo %thru% 1, a1 %to% a5)
or,
cnt = count_row_if (le(1), a1 %to% a5)
SPSS:
COUNT cnt = a1 TO a5 (1 THRU 5, 99).
R:
cnt = count_row_if(1 %thru% 5 | 99, a1 %to% a5)
SPSS:
COUNT cnt = a1 TO a5(1,2,3,4,5, SYSMIS).
R:
cnt = count_row_if(c(1:5, NA), a1 %to% a5)
count_row_if
can be used with default dataset inside the compute
.
SPSS:
RECODE V1 (0=1) (1=0) (2, 3=-1) (9=9) (ELSE=SYSMIS)
R:
recode(v1) = c(0 ~ 1, 1 ~ 0, 2:3 ~ -1, 9 ~ 9, other ~ NA)
SPSS:
RECODE QVAR(1 THRU 5=1)(6 THRU 10=2)(11 THRU HI=3)(ELSE=0).
R:
recode(qvar) = c(1 %thru% 5 ~ 1, 6 %thru% 10 ~ 2, 11 %thru% hi ~ 3, other ~ 0)
SPSS:
RECODE STRNGVAR ('A', 'B', 'C'='A')('D', 'E', 'F'='B')(ELSE=' ').
R:
recode(strngvar) = c(c('A', 'B', 'C') ~ 'A', c('D', 'E', 'F') ~ 'B', other ~ ' ')
SPSS:
RECODE AGE (MISSING=9) (18 THRU HI=1) (0 THRU 18=0) INTO VOTER.
R:
voter = recode(age, NA ~ 9, 18 %thru% hi ~ 1, 0 %thru% 18 ~ 0)
# or
recode(age, NA ~ 9, 18 %thru% hi ~ 1, 0 %thru% 18 ~ 0) %into% voter
recode
can be used inside the compute
.
SPSS:
VARIABLE LABELS a "Fruits"
b "Cost"
c "Price".
R:
w = apply_labels(w,
a = "Fruits",
b = "Cost",
c = "Price"
)
SPSS:
VALUE LABELS a
1 "apples"
2 "oranges"
3 "peaches".
R:
w = apply_labels(w,
a = num_lab("
1 apples
2 oranges
3 peaches
")
)
or,
val_lab(w$a) = num_lab("
1 apples
2 oranges
3 peaches
")
R:
fre(w$a) # Frequency of fruits
Fruits | Count | Valid percent | Percent | Responses, % | Cumulative responses, % |
---|---|---|---|---|---|
apples | 1 | 25 | 25 | 25 | 25 |
oranges | 2 | 50 | 50 | 50 | 75 |
peaches | 1 | 25 | 25 | 25 | 100 |
#Total | 4 | 100 | 100 | 100 | |
|
0 | 0 |
cro_cpct(w$b, w$a) # Column percent of cost by fruits
Fruits | |||
---|---|---|---|
apples | oranges | peaches | |
Cost | |||
15 | 50 | ||
30 | 50 | 100 | |
70 | 100 | ||
#Total cases | 1 | 2 | 1 |
cro_mean(dtfrm(w$b, w$c), w$a) # Mean cost and price by fruits
Fruits | |||
---|---|---|---|
apples | oranges | peaches | |
Cost | 70 | 22.5 | 30 |
Price | 80 | 45 | 40 |