Codebook example with manual labelling

Ruben Arslan

2019-01-08

In this vignette, you can see how to add metadata to a dataset when it isn’t already stored in its attributes. For this example, we’ll use the bfi and bfi.dictionary datasets from the psych package. We use functions from the labelled package to set the relevant attributes with convenience functions.

knit_by_pkgdown <- !is.null(knitr::opts_chunk$get("fig.retina"))
library(dplyr)
library(codebook)
library(labelled)
pander::panderOptions("table.split.table", Inf)
ggplot2::theme_set(ggplot2::theme_bw())

data("bfi", package = 'psych')
bfi <- bfi %>% tbl_df()
data("bfi.dictionary", package = 'psych')
bfi.dictionary <- tibble::rownames_to_column(bfi.dictionary, "variable") %>% 
  tbl_df()

Let’s start by getting an overview of our dataset

head(bfi, 20)
## # A tibble: 20 x 28
##       A1    A2    A3    A4    A5    C1    C2    C3    C4    C5    E1    E2
##    <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
##  1     2     4     3     4     4     2     3     3     4     4     3     3
##  2     2     4     5     2     5     5     4     4     3     4     1     1
##  3     5     4     5     4     4     4     5     4     2     5     2     4
##  4     4     4     6     5     5     4     4     3     5     5     5     3
##  5     2     3     3     4     5     4     4     5     3     2     2     2
##  6     6     6     5     6     5     6     6     6     1     3     2     1
##  7     2     5     5     3     5     5     4     4     2     3     4     3
##  8     4     3     1     5     1     3     2     4     2     4     3     6
##  9     4     3     6     3     3     6     6     3     4     5     5     3
## 10     2     5     6     6     5     6     5     6     2     1     2     2
## 11     4     4     5     6     5     4     3     5     3     2     1     3
## 12     2     5     5     5     5     5     4     5     4     5     3     3
## 13     5     5     5     6     4     5     4     3     2     2     3     3
## 14     5     5     5     6     6     4     4     4     2     1     2     2
## 15     4     5     2     2     1     5     5     5     2     2     3     4
## 16     4     3     6     6     3     5     5     5     3     5     1     1
## 17     4     6     6     2     5     4     4     4     4     4     1     2
## 18     5     5     5     4     5     5     5     5     4     3     2     2
## 19     4     4     5     4     3     5     4     5     4     6     1     2
## 20     4     4     6     5     5     1     1     1     5     6     1     1
## # … with 16 more variables: E3 <int>, E4 <int>, E5 <int>, N1 <int>,
## #   N2 <int>, N3 <int>, N4 <int>, N5 <int>, O1 <int>, O2 <int>, O3 <int>,
## #   O4 <int>, O5 <int>, gender <int>, education <int>, age <int>

and our data dictionary.

bfi.dictionary
## # A tibble: 28 x 8
##    variable ItemLabel Item         Giant3  Big6    Little12  Keying IPIP100
##    <chr>    <fct>     <fct>        <fct>   <fct>   <fct>      <int> <fct>  
##  1 A1       q_146     Am indiffer… Cohesi… Agreea… Compassi…     -1 B5:A   
##  2 A2       q_1162    Inquire abo… Cohesi… Agreea… Compassi…      1 B5:A   
##  3 A3       q_1206    Know how to… Cohesi… Agreea… Compassi…      1 B5:A   
##  4 A4       q_1364    Love childr… Cohesi… Agreea… Compassi…      1 B5:A   
##  5 A5       q_1419    Make people… Cohesi… Agreea… Compassi…      1 B5:A   
##  6 C1       q_124     Am exacting… Stabil… Consci… Orderlin…      1 B5:C   
##  7 C2       q_530     Continue un… Stabil… Consci… Orderlin…      1 B5:C   
##  8 C3       q_619     Do things a… Stabil… Consci… Orderlin…      1 B5:C   
##  9 C4       q_626     Do things i… Stabil… Consci… Industri…     -1 B5:C   
## 10 C5       q_1949    Waste my ti… Stabil… Consci… Industri…     -1 B5:C   
## # … with 18 more rows

How to add variable and value labels

Using the var_label function from the labelled package, we can easily assign a label to a variable (or a list of labels to a dataset).

# First, let's see what we know about these variables.
bfi <- bfi %>% # here we use the pipe (feeding the bfi argument into the pipe)
  mutate(education = as.double(education), # the labelled class is a bit picky and doesn't like integers
         gender = as.double(gender))

bfi.dictionary %>% tail(3)
## # A tibble: 3 x 8
##   variable  ItemLabel Item             Giant3 Big6  Little12 Keying IPIP100
##   <chr>     <fct>     <fct>            <fct>  <fct> <fct>     <int> <fct>  
## 1 gender    gender    males=1, female… <NA>   <NA>  <NA>         NA <NA>   
## 2 education education in HS, fin HS, … <NA>   <NA>  <NA>         NA <NA>   
## 3 age       age       age in years     <NA>   <NA>  <NA>         NA <NA>
var_label(bfi$gender) <- "Self-reported gender"
attributes(bfi$gender) # check what we're doing
## $label
## [1] "Self-reported gender"
var_label(bfi) <- list(age = "age in years", education = "Highest degree")

# or using dplyr syntax
bfi <- bfi %>% set_variable_labels(
  age = "age in years", 
  education = "Highest degree")

Now, we saw that the value labels were encoded in the variable label. This is not what we want. Instead, we assign value labels.

bfi <- bfi %>% 
  add_value_labels(
    gender = c("male" = 1, "female" = 2),
    education = c("in high school" = 1, "finished high school" = 2,
                  "some college" = 3, "college graduate" = 4, 
                  "graduate degree" = 5) # dont use abbreviations if you can avoid it
    )
attributes(bfi$gender) # check what we're doing
## $label
## [1] "Self-reported gender"
## 
## $labels
##   male female 
##      1      2 
## 
## $class
## [1] "haven_labelled"
# We could also assign the attributes manually, but then there's no error checking.
attributes(bfi$gender) <- list(
  label = "Self-reported gender", 
  labels = c(male = 1L, female = 2L), 
  class = "haven_labelled")

As we see, adding value labels turned the variable gender into a different type (from a simple integer to a labelled class).

This is all pretty tedious, and we have the data we need in a nice dictionary already. With a few easy steps, we can transform it.

First, we take only the personality items. We did the rest already.

dict <- bfi.dictionary %>% 
  filter(!variable %in% c("gender", "education", "age")) %>% # we did those already
  mutate(label = paste0(Big6, ": ", Item)) %>% # make sure we name the construct in the label
  select(variable, label, Keying)

Now, we turn this data.frame with variable and label columns into a named list and assign it as variable labels.

var_label(bfi) <- dict_to_list(dict)

Now, we want to assign value labels to all likert items. First, we need to define a named vector.

value_labels <- c("Very Inaccurate" = 1, 
                  "Moderately Inaccurate" = 2, 
                  "Slightly Inaccurate" = 3,
                  "Slightly Accurate" = 4,
                  "Moderately Accurate" = 5,
                  "Very Accurate" = 6)

We’re going to be using these labels many times, so let’s put the step of assigning them into a function.

add_likert_label <- function(x) {
  val_labels(x) <- value_labels
  x
}

Now, for all personality items (we get them from our data dictionary), we assign these value labels.

personality_items <- dict %>% pull(variable)
bfi <- bfi %>% 
  mutate_at(personality_items, 
                         add_likert_label)

However, some of our items are reverse-coded. This information is contained in the Keying variable in our data dictionary. We’ll use it to rename the variables to end with the letter R.

# reverse underlying values for the reverse-keyed items
reverse_coded_items <- dict %>% filter(Keying == -1) %>% pull(variable)

bfi <- bfi %>% 
  rename_at(reverse_coded_items, add_R)

Next, we can conveniently call the reverse_labelled_values function on all variables ending with a number and R.

head(bfi$A1R, 3)
## <Labelled integer>: Agreeableness: Am indifferent to the feelings of others.
## [1] 2 2 5
## 
## Labels:
##  value                 label
##      1       Very Inaccurate
##      2 Moderately Inaccurate
##      3   Slightly Inaccurate
##      4     Slightly Accurate
##      5   Moderately Accurate
##      6         Very Accurate
labelled::val_labels(bfi$A1R)
##       Very Inaccurate Moderately Inaccurate   Slightly Inaccurate 
##                     1                     2                     3 
##     Slightly Accurate   Moderately Accurate         Very Accurate 
##                     4                     5                     6
bfi <- bfi %>% 
  mutate_at(
    vars(matches("[0-9]_?R$")), # only for variables that end in 1R 2_R 3R etc
    reverse_labelled_values)
labelled::val_labels(bfi$A1R)
##       Very Inaccurate Moderately Inaccurate   Slightly Inaccurate 
##                     6                     5                     4 
##     Slightly Accurate   Moderately Accurate         Very Accurate 
##                     3                     2                     1
head(bfi$A1R, 3)
## <Labelled double>: Agreeableness: Am indifferent to the feelings of others.
## [1] 5 5 2
## 
## Labels:
##  value                 label
##      6       Very Inaccurate
##      5 Moderately Inaccurate
##      4   Slightly Inaccurate
##      3     Slightly Accurate
##      2   Moderately Accurate
##      1         Very Accurate

As you can see, the underlying numeric values have changed, but the labels are still the way the participant answered them.

Aggregating scales

Now, we can form scale aggregates. The codebook function aggregate_and_document_scale does this for us and automatically sets the correct attributes. For some select calls her,e we have to set ignore.case to FALSE, or it would match the age and education variables.

bfi$consc <- aggregate_and_document_scale(bfi %>% select(starts_with("C")))
bfi$extra <- aggregate_and_document_scale(bfi %>% select(starts_with("E", ignore.case = F)))
bfi$open <- aggregate_and_document_scale(bfi %>% select(starts_with("O")))
bfi$agree <- aggregate_and_document_scale(bfi %>% select(starts_with("A", ignore.case = F)))
bfi$neuro <- aggregate_and_document_scale(bfi %>% select(starts_with("N")))

Last, we can assign some metadata to the dataset itself. We might want to give it a meaningful name and description, for example.

metadata(bfi)$name <- "25 Personality items representing 5 factors"
metadata(bfi)$description <- "25 personality self report items taken from the International Personality Item Pool (ipip.ori.org) were included as part of the Synthetic Aperture Personality Assessment (SAPA) web based personality assessment project. The data from 2800 subjects are included here as a demonstration set for scale construction, factor analysis, and Item Response Theory analysis. Three additional demographic variables (sex, education, and age) are also included.

The first 25 items are organized by five putative factors: Agreeableness, Conscientiousness, Extraversion, Neuroticism, and Opennness. The item data were collected using a 6 point response scale: 1 Very Inaccurate 2 Moderately Inaccurate 3 Slightly Inaccurate 4 Slightly Accurate 5 Moderately Accurate 6 Very Accurate

To see an example of the data collection technique, visit https://SAPA-project.org or the International Cognitive Ability Resource at https://icar-project.com. The items given were sampled from the International Personality Item Pool of Lewis Goldberg using the sampling technique of SAPA. This is a sample data set taken from the much larger SAPA data bank."
metadata(bfi)$identifier <- "https://CRAN.R-project.org/package=psych"
metadata(bfi)$datePublished <- "2010-01-01"
metadata(bfi)$creator <- list(
      "@type" = "Person",
      givenName = "William", familyName = "Revelle",
      email = "revelle@northwestern.edu", 
      affiliation = list("@type" = "Organization",
        name = "Northwestern University"))
metadata(bfi)$citation <- "Revelle, W., Wilt, J., and Rosenthal, A. (2010) Individual Differences in Cognition: New Methods for examining the Personality-Cognition Link In Gruszka, A. and Matthews, G. and Szymura, B. (Eds.) Handbook of Individual Differences in Cognition: Attention, Memory and Executive Control, Springer."
metadata(bfi)$url <- "https://CRAN.R-project.org/package=psych"
metadata(bfi)$temporalCoverage <- "Spring 2010" 
metadata(bfi)$spatialCoverage <- "Online" 
# We don't want to look at the code in the codebook.
knitr::opts_chunk$set(warning = TRUE, message = TRUE, echo = FALSE)

Finally, we can generate our codebook.

Metadata

Description

Dataset name: 25 Personality items representing 5 factors

25 personality self report items taken from the International Personality Item Pool (ipip.ori.org) were included as part of the Synthetic Aperture Personality Assessment (SAPA) web based personality assessment project. The data from 2800 subjects are included here as a demonstration set for scale construction, factor analysis, and Item Response Theory analysis. Three additional demographic variables (sex, education, and age) are also included.

The first 25 items are organized by five putative factors: Agreeableness, Conscientiousness, Extraversion, Neuroticism, and Opennness. The item data were collected using a 6 point response scale: 1 Very Inaccurate 2 Moderately Inaccurate 3 Slightly Inaccurate 4 Slightly Accurate 5 Moderately Accurate 6 Very Accurate

To see an example of the data collection technique, visit https://SAPA-project.org or the International Cognitive Ability Resource at https://icar-project.com. The items given were sampled from the International Personality Item Pool of Lewis Goldberg using the sampling technique of SAPA. This is a sample data set taken from the much larger SAPA data bank.

  • Temporal Coverage: Spring 2010
  • Spatial Coverage: Online
  • Citation: Revelle, W., Wilt, J., and Rosenthal, A. (2010) Individual Differences in Cognition: New Methods for examining the Personality-Cognition Link In Gruszka, A. and Matthews, G. and Szymura, B. (Eds.) Handbook of Individual Differences in Cognition: Attention, Memory and Executive Control, Springer.
  • URL: https://CRAN.R-project.org/package=psych
  • Identifier: https://CRAN.R-project.org/package=psych
  • Date published: 2010-01-01

  • Creator:

    • @type: Person
    • givenName: William
    • familyName: Revelle
    • email:
    • affiliation:

      • @type: Organization
      • name: Northwestern University
  • keywords: A1R, A2, A3, A4, A5, C1, C2, C3, C4R, C5R, E1R, E2R, E3, E4, E5, N1R, N2R, N3R, N4R, N5R, O1, O2R, O3, O4, O5R, gender, education, age, consc, extra, open, agree and neuro

Variables

gender

Self-reported gender

Distribution

0 missing values.

Summary statistics

name label data_type value_labels missing complete n mean sd p0 p25 p50 p75 p100 hist
gender Self-reported gender numeric 1. male,
2. female
0 2800 2800 1.67 0.47 1 1 2 2 2 ▃▁▁▁▁▁▁▇

Value labels

  • male: 1
  • female: 2

education

Highest degree

Distribution

223 missing values.

Summary statistics

name label data_type value_labels missing complete n mean sd p0 p25 p50 p75 p100 hist
education Highest degree numeric 1. in high school,
2. finished high school,
3. some college,
4. college graduate,
5. graduate degree
223 2577 2800 3.19 1.11 1 3 3 4 5 ▂▂▁▇▁▂▁▃

Value labels

  • in high school: 1
  • finished high school: 2
  • some college: 3
  • college graduate: 4
  • graduate degree: 5

age

age in years

Distribution

0 missing values.

Summary statistics

name label data_type missing complete n mean sd p0 p25 p50 p75 p100 hist
age age in years integer 0 2800 2800 28.78 11.13 3 20 26 35 86 ▁▇▆▃▂▁▁▁

Scale: consc

Overview

Reliability: Cronbach’s α [95% CI] = 0.73 [0.71;0.74].

Missing: 93.

Reliability details

Reliability
95% Confidence Interval
lower estimate upper
0.7108 0.7267 0.7426
raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
0.7267 0.7301 0.6942 0.351 2.705 0.008117 4.266 0.9513 0.34
Reliability if an item is dropped:
  raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
C1 0.694 0.6964 0.6401 0.3645 2.294 0.009337 0.003733 0.3478
C2 0.6736 0.6749 0.6189 0.3416 2.076 0.009891 0.005605 0.3383
C3 0.6887 0.694 0.6443 0.3618 2.268 0.009564 0.007021 0.3597
C4R 0.6538 0.6629 0.6028 0.3296 1.967 0.01066 0.003672 0.3237
C5R 0.6897 0.6902 0.6283 0.3577 2.228 0.009562 0.001734 0.3476
Item statistics
  n raw.r std.r r.cor r.drop mean sd
C1 2779 0.6457 0.6702 0.5399 0.4502 4.502 1.241
C2 2776 0.6964 0.7097 0.6027 0.5046 4.37 1.318
C3 2780 0.6639 0.6748 0.5389 0.4642 4.304 1.289
C4R 2774 0.7365 0.7306 0.6413 0.5525 4.447 1.375
C5R 2784 0.7197 0.6819 0.566 0.4775 3.703 1.629
Non missing response frequency for each item
  1 2 3 4 5 6 miss
C1 0.02627 0.05793 0.09896 0.2357 0.3663 0.2148 0.0075
C2 0.03206 0.08501 0.1066 0.2316 0.3465 0.1981 0.008571
C3 0.03022 0.08921 0.1054 0.2665 0.3388 0.1698 0.007143
C4R 0.02271 0.08219 0.1615 0.1702 0.2862 0.2772 0.009286
C5R 0.1024 0.1674 0.2205 0.125 0.2037 0.181 0.005714

Summary statistics

name label data_type value_labels missing complete n mean sd p0 p25 p50 p75 p100 hist
C1 Conscientiousness: Am exacting in my work. integer 1. Very Inaccurate,
2. Moderately Inaccurate,
3. Slightly Inaccurate,
4. Slightly Accurate,
5. Moderately Accurate,
6. Very Accurate
21 2779 2800 4.5 1.24 1 4 5 5 6 ▁▁▁▂▅▁▇▅
C2 Conscientiousness: Continue until everything is perfect. integer 1. Very Inaccurate,
2. Moderately Inaccurate,
3. Slightly Inaccurate,
4. Slightly Accurate,
5. Moderately Accurate,
6. Very Accurate
24 2776 2800 4.37 1.32 1 4 5 5 6 ▁▂▁▂▆▁▇▅
C3 Conscientiousness: Do things according to a plan. integer 1. Very Inaccurate,
2. Moderately Inaccurate,
3. Slightly Inaccurate,
4. Slightly Accurate,
5. Moderately Accurate,
6. Very Accurate
20 2780 2800 4.3 1.29 1 4 5 5 6 ▁▂▁▂▆▁▇▅
C4R Conscientiousness: Do things in a half-way manner. numeric 6. Very Inaccurate,
5. Moderately Inaccurate,
4. Slightly Inaccurate,
3. Slightly Accurate,
2. Moderately Accurate,
1. Very Accurate
26 2774 2800 4.45 1.38 1 3 5 6 6 ▁▂▁▅▅▁▇▇
C5R Conscientiousness: Waste my time. numeric 6. Very Inaccurate,
5. Moderately Inaccurate,
4. Slightly Inaccurate,
3. Slightly Accurate,
2. Moderately Accurate,
1. Very Accurate
16 2784 2800 3.7 1.63 1 2 4 5 6 ▃▆▁▇▅▁▇▆

Scale: extra

Overview

Reliability: Cronbach’s α [95% CI] = 0.76 [0.75;0.78].

Missing: 87.

Reliability details

Reliability
95% Confidence Interval
lower estimate upper
0.748 0.7617 0.7755
raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
0.7617 0.7618 0.7266 0.3901 3.198 0.007027 4.145 1.061 0.3818
Reliability if an item is dropped:
  raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
E1R 0.7257 0.7254 0.6731 0.3978 2.642 0.00837 0.004394 0.3818
E2R 0.6902 0.6931 0.6342 0.3608 2.258 0.009509 0.002792 0.3546
E3 0.7279 0.7262 0.6737 0.3988 2.653 0.008241 0.007098 0.396
E4 0.7019 0.7032 0.6464 0.372 2.37 0.009073 0.003275 0.3767
E5 0.7436 0.7442 0.6914 0.4211 2.909 0.007824 0.004344 0.419
Item statistics
  n raw.r std.r r.cor r.drop mean sd
E1R 2777 0.7238 0.7027 0.5882 0.5163 4.026 1.632
E2R 2784 0.7797 0.7647 0.6936 0.6054 3.858 1.605
E3 2775 0.683 0.7011 0.5827 0.5046 4.001 1.353
E4 2791 0.7467 0.7459 0.6625 0.578 4.422 1.458
E5 2779 0.6432 0.6637 0.5229 0.4542 4.416 1.335
Non missing response frequency for each item
  1 2 3 4 5 6 miss
E1R 0.08678 0.1322 0.162 0.1455 0.2348 0.2387 0.008214
E2R 0.09124 0.1383 0.2152 0.1232 0.2407 0.1915 0.005714
E3 0.05369 0.1056 0.1485 0.2977 0.2677 0.1268 0.008929
E4 0.05016 0.09387 0.0971 0.1612 0.3375 0.2601 0.003214
E5 0.03418 0.07953 0.1036 0.2227 0.3383 0.2217 0.0075

Summary statistics

name label data_type value_labels missing complete n mean sd p0 p25 p50 p75 p100 hist
E1R Extraversion: Don’t talk a lot. numeric 6. Very Inaccurate,
5. Moderately Inaccurate,
4. Slightly Inaccurate,
3. Slightly Accurate,
2. Moderately Accurate,
1. Very Accurate
23 2777 2800 4.03 1.63 1 3 4 5 6 ▃▅▁▆▅▁▇▇
E2R Extraversion: Find it difficult to approach others. numeric 6. Very Inaccurate,
5. Moderately Inaccurate,
4. Slightly Inaccurate,
3. Slightly Accurate,
2. Moderately Accurate,
1. Very Accurate
16 2784 2800 3.86 1.61 1 3 4 5 6 ▃▅▁▇▅▁▇▆
E3 Extraversion: Know how to captivate people. integer 1. Very Inaccurate,
2. Moderately Inaccurate,
3. Slightly Inaccurate,
4. Slightly Accurate,
5. Moderately Accurate,
6. Very Accurate
25 2775 2800 4 1.35 1 3 4 5 6 ▂▃▁▃▇▁▇▃
E4 Extraversion: Make friends easily. integer 1. Very Inaccurate,
2. Moderately Inaccurate,
3. Slightly Inaccurate,
4. Slightly Accurate,
5. Moderately Accurate,
6. Very Accurate
9 2791 2800 4.42 1.46 1 4 5 6 6 ▁▂▁▂▃▁▇▆
E5 Extraversion: Take charge. integer 1. Very Inaccurate,
2. Moderately Inaccurate,
3. Slightly Inaccurate,
4. Slightly Accurate,
5. Moderately Accurate,
6. Very Accurate
21 2779 2800 4.42 1.33 1 4 5 5 6 ▁▂▁▂▅▁▇▅

Scale: open

Overview

Reliability: Cronbach’s α [95% CI] = 0.6 [0.58;0.62].

Missing: 74.

Reliability details

Reliability
95% Confidence Interval
lower estimate upper
0.5769 0.6002 0.6234
raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
0.6002 0.6073 0.5681 0.2362 1.546 0.01186 4.587 0.8084 0.2261
Reliability if an item is dropped:
  raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
O1 0.5316 0.5341 0.4762 0.2227 1.146 0.01428 0.009206 0.2278
O2R 0.5672 0.5701 0.5103 0.249 1.326 0.01334 0.007597 0.2164
O3 0.4974 0.5006 0.4418 0.2004 1.002 0.01527 0.007096 0.1961
O4 0.6115 0.6208 0.5603 0.2904 1.637 0.0119 0.00437 0.2854
O5R 0.5117 0.528 0.4738 0.2185 1.118 0.01504 0.01154 0.2039
Item statistics
  n raw.r std.r r.cor r.drop mean sd
O1 2778 0.6151 0.6496 0.5156 0.3907 4.816 1.13
O2R 2800 0.654 0.5991 0.4298 0.3321 4.287 1.565
O3 2772 0.6747 0.6926 0.5911 0.4505 4.438 1.221
O4 2786 0.4979 0.5193 0.2903 0.2179 4.892 1.221
O5R 2780 0.6704 0.6577 0.5237 0.4162 4.51 1.328
Non missing response frequency for each item
  1 2 3 4 5 6 miss
O1 0.007919 0.03708 0.07559 0.2181 0.333 0.3283 0.007857
O2R 0.06393 0.09857 0.1554 0.1386 0.2561 0.2875 0
O3 0.02742 0.05231 0.1053 0.2796 0.3402 0.1952 0.01
O4 0.01974 0.04487 0.05528 0.1726 0.3184 0.3891 0.005
O5R 0.02518 0.06871 0.1309 0.1892 0.3176 0.2683 0.007143

Summary statistics

name label data_type value_labels missing complete n mean sd p0 p25 p50 p75 p100 hist
O1 Openness: Am full of ideas. integer 1. Very Inaccurate,
2. Moderately Inaccurate,
3. Slightly Inaccurate,
4. Slightly Accurate,
5. Moderately Accurate,
6. Very Accurate
22 2778 2800 4.82 1.13 1 4 5 6 6 ▁▁▁▂▅▁▇▇
O2R Openness: Avoid difficult reading material. numeric 6. Very Inaccurate,
5. Moderately Inaccurate,
4. Slightly Inaccurate,
3. Slightly Accurate,
2. Moderately Accurate,
1. Very Accurate
0 2800 2800 4.29 1.57 1 3 5 6 6 ▂▃▁▅▃▁▇▇
O3 Openness: Carry the conversation to a higher level. integer 1. Very Inaccurate,
2. Moderately Inaccurate,
3. Slightly Inaccurate,
4. Slightly Accurate,
5. Moderately Accurate,
6. Very Accurate
28 2772 2800 4.44 1.22 1 4 5 5 6 ▁▁▁▂▆▁▇▅
O4 Openness: Spend time reflecting on things. integer 1. Very Inaccurate,
2. Moderately Inaccurate,
3. Slightly Inaccurate,
4. Slightly Accurate,
5. Moderately Accurate,
6. Very Accurate
14 2786 2800 4.89 1.22 1 4 5 6 6 ▁▁▁▁▃▁▆▇
O5R Openness: Will not probe deeply into a subject. numeric 6. Very Inaccurate,
5. Moderately Inaccurate,
4. Slightly Inaccurate,
3. Slightly Accurate,
2. Moderately Accurate,
1. Very Accurate
20 2780 2800 4.51 1.33 1 4 5 6 6 ▁▂▁▃▅▁▇▇

Scale: agree

Overview

Reliability: Cronbach’s α [95% CI] = 0.7 [0.69;0.72].

Missing: 91.

Reliability details

Reliability
95% Confidence Interval
lower estimate upper
0.6855 0.703 0.7206
raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
0.703 0.713 0.6828 0.332 2.485 0.008952 4.652 0.8984 0.3376
Reliability if an item is dropped:
  raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
A1R 0.7185 0.7255 0.673 0.3979 2.643 0.008725 0.00653 0.376
A2 0.6172 0.6256 0.5795 0.2946 1.671 0.0119 0.01695 0.2866
A3 0.6003 0.6129 0.5578 0.2836 1.584 0.01244 0.009431 0.3219
A4 0.6858 0.6935 0.6498 0.3613 2.263 0.009825 0.01586 0.3651
A5 0.643 0.6555 0.6051 0.3224 1.903 0.01115 0.0126 0.3376
Item statistics
  n raw.r std.r r.cor r.drop mean sd
A1R 2784 0.5807 0.5664 0.3764 0.3084 4.587 1.408
A2 2773 0.728 0.748 0.6665 0.5636 4.802 1.172
A3 2774 0.7603 0.7674 0.7092 0.587 4.604 1.302
A4 2781 0.6542 0.6307 0.4712 0.3944 4.7 1.48
A5 2784 0.6866 0.6992 0.5957 0.4886 4.56 1.259
Non missing response frequency for each item
  1 2 3 4 5 6 miss
A1R 0.02945 0.0801 0.121 0.1444 0.2938 0.3312 0.005714
A2 0.01695 0.04544 0.05445 0.1994 0.3689 0.3148 0.009643
A3 0.03244 0.062 0.07462 0.2033 0.3554 0.2722 0.009286
A4 0.04639 0.07731 0.06652 0.1622 0.2352 0.4124 0.006786
A5 0.02119 0.06681 0.09124 0.2216 0.3495 0.2496 0.005714

Summary statistics

name label data_type value_labels missing complete n mean sd p0 p25 p50 p75 p100 hist
A1R Agreeableness: Am indifferent to the feelings of others. numeric 6. Very Inaccurate,
5. Moderately Inaccurate,
4. Slightly Inaccurate,
3. Slightly Accurate,
2. Moderately Accurate,
1. Very Accurate
16 2784 2800 4.59 1.41 1 4 5 6 6 ▁▂▁▃▃▁▇▇
A2 Agreeableness: Inquire about others’ well-being. integer 1. Very Inaccurate,
2. Moderately Inaccurate,
3. Slightly Inaccurate,
4. Slightly Accurate,
5. Moderately Accurate,
6. Very Accurate
27 2773 2800 4.8 1.17 1 4 5 6 6 ▁▁▁▁▅▁▇▇
A3 Agreeableness: Know how to comfort others. integer 1. Very Inaccurate,
2. Moderately Inaccurate,
3. Slightly Inaccurate,
4. Slightly Accurate,
5. Moderately Accurate,
6. Very Accurate
26 2774 2800 4.6 1.3 1 4 5 6 6 ▁▂▁▂▅▁▇▆
A4 Agreeableness: Love children. integer 1. Very Inaccurate,
2. Moderately Inaccurate,
3. Slightly Inaccurate,
4. Slightly Accurate,
5. Moderately Accurate,
6. Very Accurate
19 2781 2800 4.7 1.48 1 4 5 6 6 ▁▂▁▁▃▁▅▇
A5 Agreeableness: Make people feel at ease. integer 1. Very Inaccurate,
2. Moderately Inaccurate,
3. Slightly Inaccurate,
4. Slightly Accurate,
5. Moderately Accurate,
6. Very Accurate
16 2784 2800 4.56 1.26 1 4 5 5 6 ▁▂▁▂▅▁▇▆

Scale: neuro

Overview

Reliability: Cronbach’s α [95% CI] = 0.81 [0.8;0.82].

Missing: 106.

Reliability details

Reliability
95% Confidence Interval
lower estimate upper
0.803 0.814 0.825
raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
0.814 0.8147 0.7991 0.4679 4.396 0.005607 3.838 1.196 0.4137
Reliability if an item is dropped:
  raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
N1R 0.7581 0.7583 0.711 0.4396 3.138 0.007474 0.006093 0.4132
N2R 0.7632 0.7634 0.7159 0.4465 3.226 0.007322 0.005421 0.4137
N3R 0.7553 0.7567 0.7312 0.4374 3.11 0.007663 0.01787 0.3946
N4R 0.7953 0.7969 0.7688 0.4952 3.924 0.006405 0.01817 0.489
N5R 0.8126 0.8128 0.787 0.5205 4.343 0.005854 0.01374 0.5344
Item statistics
  n raw.r std.r r.cor r.drop mean sd
N1R 2778 0.8 0.8025 0.7648 0.6672 4.071 1.571
N2R 2779 0.7873 0.7917 0.7496 0.6526 3.492 1.526
N3R 2789 0.8081 0.806 0.7425 0.6748 3.783 1.603
N4R 2764 0.7152 0.7145 0.5985 0.5428 3.814 1.57
N5R 2771 0.6806 0.6744 0.5318 0.4865 4.03 1.619
Non missing response frequency for each item
  1 2 3 4 5 6 miss
N1R 0.06983 0.1202 0.1854 0.1537 0.2354 0.2354 0.007857
N2R 0.104 0.1835 0.2551 0.1479 0.1925 0.1169 0.0075
N3R 0.09215 0.1574 0.2119 0.1309 0.2288 0.1789 0.003929
N4R 0.08973 0.1375 0.22 0.1451 0.237 0.1708 0.01286
N5R 0.08697 0.118 0.183 0.1379 0.2382 0.236 0.01036

Summary statistics

name label data_type value_labels missing complete n mean sd p0 p25 p50 p75 p100 hist
N1R Emotional Stability: Get angry easily. numeric 6. Very Inaccurate,
5. Moderately Inaccurate,
4. Slightly Inaccurate,
3. Slightly Accurate,
2. Moderately Accurate,
1. Very Accurate
22 2778 2800 4.07 1.57 1 3 4 5 6 ▂▅▁▆▅▁▇▇
N2R Emotional Stability: Get irritated easily. numeric 6. Very Inaccurate,
5. Moderately Inaccurate,
4. Slightly Inaccurate,
3. Slightly Accurate,
2. Moderately Accurate,
1. Very Accurate
21 2779 2800 3.49 1.53 1 2 3 5 6 ▃▆▁▇▅▁▆▃
N3R Emotional Stability: Have frequent mood swings. numeric 6. Very Inaccurate,
5. Moderately Inaccurate,
4. Slightly Inaccurate,
3. Slightly Accurate,
2. Moderately Accurate,
1. Very Accurate
11 2789 2800 3.78 1.6 1 3 4 5 6 ▃▆▁▇▅▁▇▆
N4R Emotional Stability: Often feel blue. numeric 6. Very Inaccurate,
5. Moderately Inaccurate,
4. Slightly Inaccurate,
3. Slightly Accurate,
2. Moderately Accurate,
1. Very Accurate
36 2764 2800 3.81 1.57 1 3 4 5 6 ▃▅▁▇▅▁▇▆
N5R Emotional Stability: Panic easily. numeric 6. Very Inaccurate,
5. Moderately Inaccurate,
4. Slightly Inaccurate,
3. Slightly Accurate,
2. Moderately Accurate,
1. Very Accurate
29 2771 2800 4.03 1.62 1 3 4 5 6 ▃▃▁▆▅▁▇▇

Missingness report

Among those who finished the survey. Only variables that have missing values are shown.

## Warning: Could not figure out who finished the surveys, because the
## variables expired and ended were missing.

Codebook table

name label data_type value_labels scale_item_names missing complete n mean sd p0 p25 p50 p75 p100 hist
A1R Agreeableness: Am indifferent to the feelings of others. numeric 6. Very Inaccurate,
5. Moderately Inaccurate,
4. Slightly Inaccurate,
3. Slightly Accurate,
2. Moderately Accurate,
1. Very Accurate
NA 16 2784 2800 4.59 1.41 1 4 5 6 6 ▁▂▁▃▃▁▇▇
A2 Agreeableness: Inquire about others’ well-being. integer 1. Very Inaccurate,
2. Moderately Inaccurate,
3. Slightly Inaccurate,
4. Slightly Accurate,
5. Moderately Accurate,
6. Very Accurate
NA 27 2773 2800 4.8 1.17 1 4 5 6 6 ▁▁▁▁▅▁▇▇
A3 Agreeableness: Know how to comfort others. integer 1. Very Inaccurate,
2. Moderately Inaccurate,
3. Slightly Inaccurate,
4. Slightly Accurate,
5. Moderately Accurate,
6. Very Accurate
NA 26 2774 2800 4.6 1.3 1 4 5 6 6 ▁▂▁▂▅▁▇▆
A4 Agreeableness: Love children. integer 1. Very Inaccurate,
2. Moderately Inaccurate,
3. Slightly Inaccurate,
4. Slightly Accurate,
5. Moderately Accurate,
6. Very Accurate
NA 19 2781 2800 4.7 1.48 1 4 5 6 6 ▁▂▁▁▃▁▅▇
A5 Agreeableness: Make people feel at ease. integer 1. Very Inaccurate,
2. Moderately Inaccurate,
3. Slightly Inaccurate,
4. Slightly Accurate,
5. Moderately Accurate,
6. Very Accurate
NA 16 2784 2800 4.56 1.26 1 4 5 5 6 ▁▂▁▂▅▁▇▆
C1 Conscientiousness: Am exacting in my work. integer 1. Very Inaccurate,
2. Moderately Inaccurate,
3. Slightly Inaccurate,
4. Slightly Accurate,
5. Moderately Accurate,
6. Very Accurate
NA 21 2779 2800 4.5 1.24 1 4 5 5 6 ▁▁▁▂▅▁▇▅
C2 Conscientiousness: Continue until everything is perfect. integer 1. Very Inaccurate,
2. Moderately Inaccurate,
3. Slightly Inaccurate,
4. Slightly Accurate,
5. Moderately Accurate,
6. Very Accurate
NA 24 2776 2800 4.37 1.32 1 4 5 5 6 ▁▂▁▂▆▁▇▅
C3 Conscientiousness: Do things according to a plan. integer 1. Very Inaccurate,
2. Moderately Inaccurate,
3. Slightly Inaccurate,
4. Slightly Accurate,
5. Moderately Accurate,
6. Very Accurate
NA 20 2780 2800 4.3 1.29 1 4 5 5 6 ▁▂▁▂▆▁▇▅
C4R Conscientiousness: Do things in a half-way manner. numeric 6. Very Inaccurate,
5. Moderately Inaccurate,
4. Slightly Inaccurate,
3. Slightly Accurate,
2. Moderately Accurate,
1. Very Accurate
NA 26 2774 2800 4.45 1.38 1 3 5 6 6 ▁▂▁▅▅▁▇▇
C5R Conscientiousness: Waste my time. numeric 6. Very Inaccurate,
5. Moderately Inaccurate,
4. Slightly Inaccurate,
3. Slightly Accurate,
2. Moderately Accurate,
1. Very Accurate
NA 16 2784 2800 3.7 1.63 1 2 4 5 6 ▃▆▁▇▅▁▇▆
E1R Extraversion: Don’t talk a lot. numeric 6. Very Inaccurate,
5. Moderately Inaccurate,
4. Slightly Inaccurate,
3. Slightly Accurate,
2. Moderately Accurate,
1. Very Accurate
NA 23 2777 2800 4.03 1.63 1 3 4 5 6 ▃▅▁▆▅▁▇▇
E2R Extraversion: Find it difficult to approach others. numeric 6. Very Inaccurate,
5. Moderately Inaccurate,
4. Slightly Inaccurate,
3. Slightly Accurate,
2. Moderately Accurate,
1. Very Accurate
NA 16 2784 2800 3.86 1.61 1 3 4 5 6 ▃▅▁▇▅▁▇▆
E3 Extraversion: Know how to captivate people. integer 1. Very Inaccurate,
2. Moderately Inaccurate,
3. Slightly Inaccurate,
4. Slightly Accurate,
5. Moderately Accurate,
6. Very Accurate
NA 25 2775 2800 4 1.35 1 3 4 5 6 ▂▃▁▃▇▁▇▃
E4 Extraversion: Make friends easily. integer 1. Very Inaccurate,
2. Moderately Inaccurate,
3. Slightly Inaccurate,
4. Slightly Accurate,
5. Moderately Accurate,
6. Very Accurate
NA 9 2791 2800 4.42 1.46 1 4 5 6 6 ▁▂▁▂▃▁▇▆
E5 Extraversion: Take charge. integer 1. Very Inaccurate,
2. Moderately Inaccurate,
3. Slightly Inaccurate,
4. Slightly Accurate,
5. Moderately Accurate,
6. Very Accurate
NA 21 2779 2800 4.42 1.33 1 4 5 5 6 ▁▂▁▂▅▁▇▅
N1R Emotional Stability: Get angry easily. numeric 6. Very Inaccurate,
5. Moderately Inaccurate,
4. Slightly Inaccurate,
3. Slightly Accurate,
2. Moderately Accurate,
1. Very Accurate
NA 22 2778 2800 4.07 1.57 1 3 4 5 6 ▂▅▁▆▅▁▇▇
N2R Emotional Stability: Get irritated easily. numeric 6. Very Inaccurate,
5. Moderately Inaccurate,
4. Slightly Inaccurate,
3. Slightly Accurate,
2. Moderately Accurate,
1. Very Accurate
NA 21 2779 2800 3.49 1.53 1 2 3 5 6 ▃▆▁▇▅▁▆▃
N3R Emotional Stability: Have frequent mood swings. numeric 6. Very Inaccurate,
5. Moderately Inaccurate,
4. Slightly Inaccurate,
3. Slightly Accurate,
2. Moderately Accurate,
1. Very Accurate
NA 11 2789 2800 3.78 1.6 1 3 4 5 6 ▃▆▁▇▅▁▇▆
N4R Emotional Stability: Often feel blue. numeric 6. Very Inaccurate,
5. Moderately Inaccurate,
4. Slightly Inaccurate,
3. Slightly Accurate,
2. Moderately Accurate,
1. Very Accurate
NA 36 2764 2800 3.81 1.57 1 3 4 5 6 ▃▅▁▇▅▁▇▆
N5R Emotional Stability: Panic easily. numeric 6. Very Inaccurate,
5. Moderately Inaccurate,
4. Slightly Inaccurate,
3. Slightly Accurate,
2. Moderately Accurate,
1. Very Accurate
NA 29 2771 2800 4.03 1.62 1 3 4 5 6 ▃▃▁▆▅▁▇▇
O1 Openness: Am full of ideas. integer 1. Very Inaccurate,
2. Moderately Inaccurate,
3. Slightly Inaccurate,
4. Slightly Accurate,
5. Moderately Accurate,
6. Very Accurate
NA 22 2778 2800 4.82 1.13 1 4 5 6 6 ▁▁▁▂▅▁▇▇
O2R Openness: Avoid difficult reading material. numeric 6. Very Inaccurate,
5. Moderately Inaccurate,
4. Slightly Inaccurate,
3. Slightly Accurate,
2. Moderately Accurate,
1. Very Accurate
NA 0 2800 2800 4.29 1.57 1 3 5 6 6 ▂▃▁▅▃▁▇▇
O3 Openness: Carry the conversation to a higher level. integer 1. Very Inaccurate,
2. Moderately Inaccurate,
3. Slightly Inaccurate,
4. Slightly Accurate,
5. Moderately Accurate,
6. Very Accurate
NA 28 2772 2800 4.44 1.22 1 4 5 5 6 ▁▁▁▂▆▁▇▅
O4 Openness: Spend time reflecting on things. integer 1. Very Inaccurate,
2. Moderately Inaccurate,
3. Slightly Inaccurate,
4. Slightly Accurate,
5. Moderately Accurate,
6. Very Accurate
NA 14 2786 2800 4.89 1.22 1 4 5 6 6 ▁▁▁▁▃▁▆▇
O5R Openness: Will not probe deeply into a subject. numeric 6. Very Inaccurate,
5. Moderately Inaccurate,
4. Slightly Inaccurate,
3. Slightly Accurate,
2. Moderately Accurate,
1. Very Accurate
NA 20 2780 2800 4.51 1.33 1 4 5 6 6 ▁▂▁▃▅▁▇▇
gender Self-reported gender numeric 1. male,
2. female
NA 0 2800 2800 1.67 0.47 1 1 2 2 2 ▃▁▁▁▁▁▁▇
education Highest degree numeric 1. in high school,
2. finished high school,
3. some college,
4. college graduate,
5. graduate degree
NA 223 2577 2800 3.19 1.11 1 3 3 4 5 ▂▂▁▇▁▂▁▃
age age in years integer NA NA 0 2800 2800 28.78 11.13 3 20 26 35 86 ▁▇▆▃▂▁▁▁
consc 5 C items aggregated by rowMeans numeric NA C1, C2, C3, C4R, C5R 93 2707 2800 4.26 0.95 1 3.6 4.4 5 6 ▁▁▂▅▇▇▇▅
extra 5 E items aggregated by rowMeans numeric NA E1R, E2R, E3, E4, E5 87 2713 2800 4.14 1.06 1 3.4 4.2 5 6 ▁▁▃▅▇▇▇▆
open 5 O items aggregated by rowMeans numeric NA O1, O2R, O3, O4, O5R 74 2726 2800 4.59 0.81 1.2 4 4.6 5.2 6 ▁▁▁▂▇▇▇▅
agree 5 A items aggregated by rowMeans numeric NA A1R, A2, A3, A4, A5 91 2709 2800 4.64 0.9 1 4.2 4.8 5.4 6 ▁▁▁▂▃▆▇▇
neuro 5 N items aggregated by rowMeans numeric NA N1R, N2R, N3R, N4R, N5R 106 2694 2800 3.84 1.19 1 3 4 4.8 6 ▂▃▆▇▇▇▇▅