Tables with labels in R

2017-04-10

Introduction

expss package provides tabulation functions with support of SPSS-style labels, multiple / nested banners, weights and multiple-response variables. Additionally it offers useful functions for data processing workflow in the social / marketing research surveys - popular data transformation functions from SPSS Statistics (RECODE, COUNT, COMPUTE, DO IF, etc.) and Excel (COUNTIF, VLOOKUP, etc.). Proper methods for labelled variables add value labels support to base R and other packages. Package aimed to help people to move data processing from ‘Excel’/‘SPSS’ to R. See examples below. You can get help about any function by typing ?function_name in the R console.

Installation

expss is on CRAN, so for installation you can print in the console install.packages("expss").

Cross-tablulation examples

We will use for demonstartion well-known mtcars dataset. Let’s start with adding labels to the dataset. Then we can continue with tables creation.

library(expss)
data(mtcars)
mtcars = apply_labels(mtcars,
                      mpg = "Miles/(US) gallon",
                      cyl = "Number of cylinders",
                      disp = "Displacement (cu.in.)",
                      hp = "Gross horsepower",
                      drat = "Rear axle ratio",
                      wt = "Weight (1000 lbs)",
                      qsec = "1/4 mile time",
                      vs = "Engine",
                      vs = c("V-engine" = 0,
                             "Straight engine" = 1),
                      am = "Transmission",
                      am = c("Automatic" = 0,
                             "Manual"=1),
                      gear = "Number of forward gears",
                      carb = "Number of carburetors"
)

For quick cross-tabulation there are fre and cro family of function. For simplicity we demonstrate here only cro_cpct which caluclates column percent. Documentation for other functions, such as cro_cases for counts, cro_rpct for row percent, cro_tpct for table percent and cro_fun for custom summary functions can be seen by typing ?cro and ?cro_fun in the console.

# 'cro' examples
# multiple banners
mtcars %>% 
    calculate(cro_cpct(cyl, list(total(), am, vs))) %>% 
    htmlTable(caption = "Table with multiple banners (column %).")
Table with multiple banners (column %).
 #Total    Transmission    Engine 
   Automatic   Manual     V-engine   Straight engine 
 Number of cylinders 
   4  34.4   15.8 61.5   5.6 71.4
   6  21.9   21.1 23.1   16.7 28.6
   8  43.8   63.2 15.4   77.8
   #Total cases  32   19 13   18 14
# nested banners          
mtcars %>% 
    calculate(cro_cpct(cyl, list(total(), am %nest% vs))) %>% 
    htmlTable(caption = "Table with nested banners (column %).")         

Table with nested banners (column %).
 #Total    Transmission 
   Automatic    Manual 
   Engine    Engine 
   V-engine   Straight engine     V-engine   Straight engine 
 Number of cylinders 
   4  34.4   42.9   16.7 100
   6  21.9   57.1   50
   8  43.8   100   33.3
   #Total cases  32   12 7   6 7
We have more sophisticated interface for table construction with magrittr piping. Documentation for this interface can be seen via ?tables.

mtcars %>% 
    tab_cells(mpg, disp, hp, wt, qsec) %>%
    tab_cols(total(), am) %>% 
    tab_stat_fun(Mean = w_mean, "Std. dev." = w_sd, "Valid N" = w_n) %>%
    tab_pivot() %>% 
    htmlTable(caption = "Table with summary statistics. Statistics labels in rows.")
Table with summary statistics. Statistics labels in rows.
 #Total    Transmission 
   Automatic   Manual 
 Miles/(US) gallon 
   Mean  20.1   17.1 24.4
   Std. dev.  6   3.8 6.2
   Valid N  32   19 13
 Displacement (cu.in.) 
   Mean  230.7   290.4 143.5
   Std. dev.  123.9   110.2 87.2
   Valid N  32   19 13
 Gross horsepower 
   Mean  146.7   160.3 126.8
   Std. dev.  68.6   53.9 84.1
   Valid N  32   19 13
 Weight (1000 lbs) 
   Mean  3.2   3.8 2.4
   Std. dev.  1   0.8 0.6
   Valid N  32   19 13
 1/4 mile time 
   Mean  17.8   18.2 17.4
   Std. dev.  1.8   1.8 1.8
   Valid N  32   19 13
mtcars %>% 
    tab_cells(mpg, disp, hp, wt, qsec) %>%
    tab_cols(total(label = "#Total| |"), am) %>% 
    tab_stat_fun(Mean = w_mean, "Std. dev." = w_sd, "Valid N" = w_n, method = list) %>%
    tab_pivot() %>% 
    htmlTable(caption = "Table with the same summary statistics. Statistics labels in columns.")
Table with the same summary statistics. Statistics labels in columns.
 #Total    Transmission 
      Automatic    Manual 
 Mean   Std. dev.   Valid N     Mean   Std. dev.   Valid N     Mean   Std. dev.   Valid N 
 Miles/(US) gallon  20.1 6 32   17.1 3.8 19   24.4 6.2 13
 Displacement (cu.in.)  230.7 123.9 32   290.4 110.2 19   143.5 87.2 13
 Gross horsepower  146.7 68.6 32   160.3 53.9 19   126.8 84.1 13
 Weight (1000 lbs)  3.2 1 32   3.8 0.8 19   2.4 0.6 13
 1/4 mile time  17.8 1.8 32   18.2 1.8 19   17.4 1.8 13
mtcars %>%
    tab_cols(total(), vs) %>%
    tab_cells(mpg) %>% 
    tab_stat_mean() %>% 
    tab_stat_valid_n() %>% 
    tab_cells(am) %>%
    tab_stat_cpct(total_row_position = "none", label = "col %") %>%
    tab_stat_rpct(total_row_position = "none", label = "row %") %>%
    tab_stat_tpct(total_row_position = "none", label = "table %") %>%
    tab_pivot(stat_position = "inside_rows") %>% 
    htmlTable(caption = "Different statistics for differen variables.")
Different statistics for differen variables.
   #Total    Engine 
     V-engine   Straight engine 
 Miles/(US) gallon 
   Mean    20.1   16.6 24.6
   Valid N    32   18 14
 Transmission 
   Automatic   col %    59.4   66.7 50
    row %    100   63.2 36.8
    table %    59.4   37.5 21.9
   Manual   col %    40.6   33.3 50
    row %    100   46.2 53.8
    table %    40.6   18.8 21.9
mtcars %>% 
    tab_cells(cyl) %>% 
    tab_cols(total(), vs) %>% 
    tab_rows(am) %>% 
    tab_stat_cpct(total_row_position = "above",
                  total_label = c("number of cases", "row %"),
                  total_statistic = c("u_cases", "u_rpct")) %>% 
    tab_pivot() %>% 
    htmlTable(caption = "Table with split by rows and with custom totals.")
Table with split by rows and with custom totals.
   #Total    Engine 
     V-engine   Straight engine 
 Transmission 
   Automatic   Number of cylinders   #number of cases    19   12 7
    #row %    100   63.2 36.8
    4    15.8   42.9
    6    21.1   57.1
    8    63.2   100
   Manual   Number of cylinders   #number of cases    13   6 7
    #row %    100   46.2 53.8
    4    61.5   16.7 100
    6    23.1   50
    8    15.4   33.3
mtcars %>% 
    tab_cells(dtfrm(mpg, disp, hp, wt, qsec)) %>% 
    tab_cols(total(label = "#Total| |"), am) %>% 
    tab_stat_fun_df(
        function(x){
            frm = reformulate(".", response = names(x)[1])
            model = lm(frm, data = x)
            dtfrm('Coef.' = coef(model), 
                  confint(model)
            )
        }    
    ) %>% 
    tab_pivot() %>% 
    htmlTable(caption = "Linear regression by groups.")
Linear regression by groups.
 #Total    Transmission 
      Automatic    Manual 
 Coef.   2.5 %   97.5 %     Coef.   2.5 %   97.5 %     Coef.   2.5 %   97.5 % 
 (Intercept)  27.3 9.6 45.1   21.8 -1.9 45.5   13.3 -21.9 48.4
 `Displacement (cu.in.)`  0 0 0   0 0 0   0 -0.1 0.1
 `Gross horsepower`  0 -0.1 0   0 -0.1 0   0 0 0.1
 `Weight (1000 lbs)`  -4.6 -7.2 -2   -2.3 -5 0.4   -7.7 -12.5 -2.9
 `1/4 mile time`  0.5 -0.4 1.5   0.4 -0.7 1.6   1.6 -0.2 3.4

Example of data processing with multiple-response variables

Here we use truncated dataset with data from product test of two samples of chocolate sweets. 150 respondents tested two kinds of sweets (codenames: VSX123 and SDF546). Sample was divided into two groups (cells) of 75 respondents in each group. In cell 1 product VSX123 was presented first and then SDF546. In cell 2 sweets were presented in reversed order. Questions about respondent impressions about first product are in the block A (and about second tested product in the block B). At the end of the questionnaire there was a question about the preferences between sweets.

List of variables:

data(product_test)

w = product_test # shorter name to save some keystrokes

# here we recode variables from first/second tested product to separate variables for each product according to their cells
# 'h' variables - VSX123 sample, 'p' variables - 'SDF456' sample
# also we recode preferences from first/second product to true names
# for first cell there are no changes, for second cell we should exchange 1 and 2.
w = w %>% 
    do_if(cell == 1, {
        recode(a1_1 %to% a1_6, other ~ copy) %into% (h1_1 %to% h1_6)
        recode(b1_1 %to% b1_6, other ~ copy) %into% (p1_1 %to% p1_6)
        recode(a22, other ~ copy) %into% h22
        recode(b22, other ~ copy) %into% p22
        c1r = c1
    }) %>% 
    do_if(cell == 2, {
        recode(a1_1 %to% a1_6, other ~ copy) %into% (p1_1 %to% p1_6)
        recode(b1_1 %to% b1_6, other ~ copy) %into% (h1_1 %to% h1_6)
        recode(a22, other ~ copy) %into% p22
        recode(b22, other ~ copy) %into% h22
        recode(c1, 1 ~ 2, 2 ~ 1, other ~ copy) %into% c1r
    }) %>% 
    compute({
        # recode age by groups
        age_cat = recode(s2a, lo %thru% 25 ~ 1, lo %thru% hi ~ 2)
        # count number of likes
        # codes 2 and 99 are ignored.
        h_likes = count_row_if(1 | 3 %thru% 98, h1_1 %to% h1_6) 
        p_likes = count_row_if(1 | 3 %thru% 98, p1_1 %to% p1_6) 
    })

# here we prepare labels for future usage
codeframe_likes = num_lab("
    1 Liked everything
    2 Disliked everything
    3 Chocolate
    4 Appearance
    5 Taste
    6 Stuffing
    7 Nuts
    8 Consistency
    98 Other
    99 Hard to answer
")

overall_liking_scale = num_lab("
    1 Extremely poor 
    2 Very poor
    3 Quite poor
    4 Neither good, nor poor
    5 Quite good
    6 Very good
    7 Excellent
")

w = apply_labels(w, 
    c1r = "Preferences",
    c1r = num_lab("
        1 VSX123 
        2 SDF456
        3 Hard to say
    "),
    
    age_cat = "Age",
    age_cat = c("18 - 25" = 1, "26 - 35" = 2),
    
    h1_1 = "Likes. VSX123",
    p1_1 = "Likes. SDF456",
    h1_1 = codeframe_likes,
    p1_1 = codeframe_likes,
    
    h_likes = "Number of likes. VSX123",
    p_likes = "Number of likes. SDF456",
    
    h22 = "Overall quality. VSX123",
    p22 = "Overall quality. SDF456",
    h22 = overall_liking_scale,
    p22 = overall_liking_scale
)


cro(w$c1r) %>% htmlTable(caption = "Distribution of preferences." )

Distribution of preferences.
 #Total 
 Preferences 
   VSX123  94
   SDF456  50
   Hard to say  6
   #Total cases  150
Are there any significant differences between preferences? Yes, difference is significant.

# 'na_if(c1r, 3)' remove 'hard to say' from vector 
w %>% calculate(c1r %>% na_if(3) %>% table %>% chisq.test) 
## 
##  Chi-squared test for given probabilities
## 
## data:  .
## X-squared = 13.444, df = 1, p-value = 0.0002457

Further we calculate answers distribution of survey questions.

w %>% 
    tab_cols(total(), age_cat) %>% 
    tab_cells(c1r) %>% 
    tab_stat_cpct() %>% 
    tab_pivot() %>% 
    htmlTable(caption = "Preferences")
Preferences
 #Total    Age 
   18 - 25   26 - 35 
 Preferences 
   VSX123  62.7   65.7 60
   SDF456  33.3   31.4 35
   Hard to say  4   2.9 5
   #Total cases  150   70 80
w %>% 
    tab_cols(total(), age_cat, c1r) %>% 
    tab_cells(h22) %>% 
    tab_stat_mean(label = "<b><u>Mean</u></b>") %>% 
    tab_stat_cpct() %>% 
    tab_cells(p22) %>% 
    tab_stat_mean(label = "<b><u>Mean</u></b>") %>% 
    tab_stat_cpct() %>% 
    tab_pivot() %>% 
    htmlTable(caption = "Overall liking")
Overall liking
 #Total    Age    Preferences 
   18 - 25   26 - 35     VSX123   SDF456   Hard to say 
 Overall quality. VSX123 
   Mean  5.5   5.4 5.6   5.3 5.8 5.5
   Extremely poor     
   Very poor     
   Quite poor  2   2.9 1.2   3.2
   Neither good, nor poor  10.7   11.4 10   14.9 2 16.7
   Quite good  39.3   45.7 33.8   40.4 38 33.3
   Very good  33.3   24.3 41.2   30.9 38 33.3
   Excellent  14.7   15.7 13.8   10.6 22 16.7
   #Total cases  150   70 80   94 50 6
 Overall quality. SDF456 
   Mean  5.4   5.3 5.4   5.4 5.3 5.7
   Extremely poor     
   Very poor  0.7   1.2   1.1
   Quite poor  2.7   4.3 1.2   2.1 4
   Neither good, nor poor  16.7   20 13.8   18.1 14 16.7
   Quite good  31.3   27.1 35   28.7 38 16.7
   Very good  35.3   35.7 35   35.1 34 50
   Excellent  13.3   12.9 13.8   14.9 10 16.7
   #Total cases  150   70 80   94 50 6
w %>% 
    tab_cols(total(), age_cat, c1r) %>% 
    tab_cells(h_likes) %>% 
    tab_stat_mean() %>% 
    tab_cells(mrset(h1_1 %to% h1_6)) %>% 
    tab_stat_cpct() %>% 
    tab_cells(p_likes) %>% 
    tab_stat_mean() %>% 
    tab_cells(mrset(p1_1 %to% p1_6)) %>% 
    tab_stat_cpct() %>% 
    tab_pivot() %>% 
    htmlTable(caption = "Likes") 
Likes
 #Total    Age    Preferences 
   18 - 25   26 - 35     VSX123   SDF456   Hard to say 
 Number of likes. VSX123 
   Mean  2   2 2.1   1.9 2.2 2.3
 Likes. VSX123 
   Liked everything     
   Disliked everything  3.3   1.4 5   4.3 2
   Chocolate  34   38.6 30   35.1 32 33.3
   Appearance  29.3   21.4 36.2   25.5 38 16.7
   Taste  32   38.6 26.2   23.4 48 33.3
   Stuffing  27.3   20 33.8   28.7 26 16.7
   Nuts  66.7   72.9 61.3   69.1 60 83.3
   Consistency  12   4.3 18.8   8.5 14 50
   Other     
   Hard to answer     
   #Total cases  150   70 80   94 50 6
 Number of likes. SDF456 
   Mean  2   2 2.1   2 2 2
 Likes. SDF456 
   Liked everything     
   Disliked everything  1.3   1.4 1.2   2.1
   Chocolate  32   27.1 36.2   29.8 34 50
   Appearance  32   35.7 28.7   34 30 16.7
   Taste  39.3   42.9 36.2   36.2 44 50
   Stuffing  27.3   24.3 30   31.9 20 16.7
   Nuts  61.3   60 62.5   58.5 68 50
   Consistency  10   5.7 13.8   11.7 6 16.7
   Other  0.7   1.2   1.1
   Hard to answer     
   #Total cases  150   70 80   94 50 6
w %>% 
    tab_cols(total(label = "#Total| |"), c1r) %>% 
    tab_cells(list(unvr(mrset(h1_1 %to% h1_6)))) %>% 
    tab_stat_cpct(label = var_lab(h1_1)) %>% 
    tab_cells(list(unvr(mrset(p1_1 %to% p1_6)))) %>% 
    tab_stat_cpct(label = var_lab(p1_1)) %>% 
    tab_pivot(stat_position = "inside_columns") %>% 
    htmlTable(caption = "Likes - side by side comparison")  
Likes - side by side comparison
 #Total    Preferences 
      VSX123    SDF456    Hard to say 
 Likes. VSX123   Likes. SDF456     Likes. VSX123   Likes. SDF456     Likes. VSX123   Likes. SDF456     Likes. VSX123   Likes. SDF456 
 Liked everything       
 Disliked everything  3.3 1.3   4.3 2.1   2  
 Chocolate  34 32   35.1 29.8   32 34   33.3 50
 Appearance  29.3 32   25.5 34   38 30   16.7 16.7
 Taste  32 39.3   23.4 36.2   48 44   33.3 50
 Stuffing  27.3 27.3   28.7 31.9   26 20   16.7 16.7
 Nuts  66.7 61.3   69.1 58.5   60 68   83.3 50
 Consistency  12 10   8.5 11.7   14 6   50 16.7
 Other  0.7   1.1    
 Hard to answer       
 #Total cases  150 150   94 94   50 50   6 6

We can save labelled dataset as *.csv file with accompanying R code for labelling.

write_labelled_csv(w, file  filename = "product_test.csv")

Or, we can save dataset as *.csv file with SPSS syntax to read data and apply labels.

write_labelled_spss(w, file  filename = "product_test.csv")

Labels support for base R

Here we demonstrate limited labels support in base R - value labels automatically used as factors levels. So every function which converts labelled variable to factor will utilize labels. Note that variables labels is not supported in such conversions.

with(mtcars, table(am, vs)) %>% knitr::kable()
V-engine Straight engine
Automatic 12 7
Manual 6 7
boxplot(mpg ~ am, data = mtcars)

Excel functions translation guide

Excel toy table:

A B C
1 2 15 50
2 1 70 80
3 3 30 40
4 2 30 40

Code for creating the same table in R:

w = read.csv(text = "
a,b,c
2,15,50
1,70,80
3,30,40
2,30,40"
)

w is the name of our table.

IF

Excel: IF(B1>60, 1, 0)

R: Here we create new column with name d with results. ifelse function is from base R not from ‘expss’ package but included here for completeness.

w$d = ifelse(w$b>60, 1, 0)

If we need to use multiple transformations it is often convenient to use compute function. Inside compute we can put arbitrary number of statements:

w = compute(w, {
    d = ifelse(b>60, 1, 0)
    e = 42
    abc_sum = sum_row(a, b, c)
    abc_mean = mean_row(a, b, c)
})
COUNTIF

Count 1’s in entire dataset.

Excel: COUNTIF(A1:C4, 1)

R:

count_if(1, w)

or

calculate(w, count_if(1, a, b, c))

Count values greater than 1 in each row of dataset.

Excel: COUNTIF(A1:C1, ">1")

R:

w$d = count_row_if(gt(1), w)  

or

w = compute(w, {
    d = count_row_if(gt(1), a, b, c) 
})

Count values less than or equal to 1 in column A of dataset.

Excel: COUNTIF(A1:A4, "<=1")

R:

count_col_if(le(1), w$a)

Table of criteria:

Excel R
“<1” lt(1)
“<=1” le(1)
“<>1” ne(1)
“=1” eq(1)
“>=1” ge(1)
“>1” gt(1)
SUM/AVERAGE

Sum all values in dataset.

Excel: SUM(A1:C4)

R:

sum(w, na.rm = TRUE)

Calculate average of each row of dataset.

Excel: AVERAGE(A1:C1)

R:

w$d = mean_row(w)  

or

w = compute(w, {
    d = mean_row(a, b, c) 
})

Sum values of column A of dataset.

Excel: SUM(A1:A4)

R:

sum_col(w$a)
SUMIF/AVERAGEIF

Sum values greater than 40 in entire dataset.

Excel: SUMIF(A1:C4, ">40")

R:

sum_if(gt(40), w)

or

calculate(w, sum_if(gt(40), a, b, c))

Sum values less than 40 in each row of dataset.

Excel: SUMIF(A1:C1, "<40")

R:

w$d = sum_row_if(lt(40), w)  

or

w = compute(w, {
    d = sum_row_if(lt(40), a, b, c) 
})

Calculate average of B column with column A values less than 3.

Excel: AVERAGEIF(A1:A4, "<3", B1:B4)

R:

mean_col_if(lt(3), w$a, data = w$b)

or, if we want calculate means for both b and c columns:

calculate(w, mean_col_if(lt(3), a, data = dtfrm(b, c)))
VLOOKUP

Our dictionary for lookup:

X Y
1 1 apples
2 2 oranges
3 3 peaches

Code for creating the same dictionary in R:

dict = read.csv(text = "
x,y
1,apples
2,oranges
3,peaches",
stringsAsFactors = FALSE
)

Excel: VLOOKUP(A1, $X$1:$Y$3, 2, FALSE)

R:

w$d = vlookup(w$a, dict, 2)

or, we can use column names:

w$d = vlookup(w$a, dict, "y")

SPSS functions translation guide

COMPUTE

SPSS:

COMPUTE d = 1.

R:

w$d = 1

or, in specific data.frame

w = compute(w, {
    d = 1
})

There can be arbitrary number of statements inside compute.

IF

SPSS:

IF(a = 3) d = 2.

R:

Default dataset should be already predefined as in previous example.

w = compute(w, {
    d = ifelse(a == 3, 2, NA)
})

or,

w = compute(w, {
    d = ifs(a == 3 ~ 2)
})
DO IF

SPSS:

DO IF (a>1).
    COMPUTE d = 4.
END IF.

R:

w = do_if(w, a>1, {
    d = 4
})

There can be arbitrary number of statements inside do_if.

COUNT

SPSS:

COUNT cnt = a1 TO a5 (LO THRU HI).

R:

cnt = count_row_if(lo %thru% hi, a1 %to% a5)

SPSS:

COUNT cnt = a1 TO a5 (SYSMIS).

R:

cnt = count_row_if(NA, a1 %to% a5)

SPSS:

COUNT cnt = a1 TO a5 (1 THRU 5).

R:

cnt = count_row_if(1 %thru% 5, a1 %to% a5)

SPSS:

COUNT cnt = a1 TO a5 (1 THRU HI).

R:

cnt = count_row_if(1 %thru% hi, a1 %to% a5)

or,

cnt = count_row_if(ge(1), a1 %to% a5)

SPSS:

COUNT cnt = a1 TO a5 (LO THRU 1).

R:

cnt = count_row_if(lo %thru% 1, a1 %to% a5)

or,

cnt = count_row_if (le(1), a1 %to% a5)

SPSS:

COUNT cnt = a1 TO a5 (1 THRU 5, 99).

R:

cnt = count_row_if(1 %thru% 5 | 99, a1 %to% a5)

SPSS:

COUNT cnt = a1 TO a5(1,2,3,4,5, SYSMIS).

R:

cnt = count_row_if(c(1:5, NA), a1 %to% a5)

count_row_if can be used with default dataset inside the compute.

RECODE

SPSS:

RECODE V1 (0=1) (1=0) (2, 3=-1) (9=9) (ELSE=SYSMIS)

R:

recode(v1) = c(0 ~ 1, 1 ~ 0, 2:3 ~ -1, 9 ~ 9, other ~ NA)

SPSS:

RECODE QVAR(1 THRU 5=1)(6 THRU 10=2)(11 THRU HI=3)(ELSE=0).

R:

recode(qvar) = c(1 %thru% 5 ~ 1, 6 %thru% 10 ~ 2, 11 %thru% hi ~ 3, other ~ 0)

SPSS:

RECODE STRNGVAR ('A', 'B', 'C'='A')('D', 'E', 'F'='B')(ELSE=' '). 

R:

recode(strngvar) = c(c('A', 'B', 'C') ~ 'A', c('D', 'E', 'F') ~ 'B', other ~ ' ')

SPSS:

RECODE AGE (MISSING=9) (18 THRU HI=1) (0 THRU 18=0) INTO VOTER. 

R:

voter = recode(age, NA ~ 9, 18 %thru% hi ~ 1, 0 %thru% 18 ~ 0)
# or
recode(age, NA ~ 9, 18 %thru% hi ~ 1, 0 %thru% 18 ~ 0) %into% voter

recode can be used inside the compute.

VARIABLE LABELS

SPSS:

VARIABLE LABELS a "Fruits"
                b "Cost"
                c "Price".

R:

w = apply_labels(w,
                 a = "Fruits",
                 b = "Cost",
                 c = "Price"
)
VALUE LABELS

SPSS:

VALUE LABELS a
    1 "apples"
    2 "oranges"
    3 "peaches". 

R:

w = apply_labels(w, 
                 a = num_lab("
                        1 apples
                        2 oranges
                        3 peaches 
                    ")
)

or,

val_lab(w$a) = num_lab("
    1 apples
    2 oranges
    3 peaches 
")
Tables

R:

fre(w$a) # Frequency of fruits
Fruits  Count   Valid percent   Percent   Responses, %   Cumulative responses, % 
 apples  1 25 25 25 25
 oranges  2 50 50 50 75
 peaches  1 25 25 25 100
 #Total  4 100 100 100
   0 0
cro_cpct(w$b, w$a) # Column percent of cost by fruits
 Fruits 
 apples   oranges   peaches 
 Cost 
   15  50
   30  50 100
   70  100
   #Total cases  1 2 1
cro_mean(dtfrm(w$b, w$c), w$a) # Mean cost and price by fruits
 Fruits 
 apples   oranges   peaches 
 Cost  70 22.5 30
 Price  80 45 40