The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
look_for()
It is a common need to easily get a description of all variables in a data frame.
When a data frame is converted into a tibble (e.g. with
dplyr::as_tibble()
), it as a nice printing showing the
first rows of the data frame as well as the type of column.
## # A tibble: 150 × 5
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## <dbl> <dbl> <dbl> <dbl> <fct>
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 7 4.6 3.4 1.4 0.3 setosa
## 8 5 3.4 1.5 0.2 setosa
## 9 4.4 2.9 1.4 0.2 setosa
## 10 4.9 3.1 1.5 0.1 setosa
## # ℹ 140 more rows
However, when you have too many variables, all of them cannot be printed and their are just listed.
## # A tibble: 2,000 × 17
## id_woman id_household weight interview_date date_of_birth age residency
## <dbl> <dbl> <dbl> <date> <date> <dbl> <dbl+lbl>
## 1 391 381 1.80 2012-05-05 1997-03-07 15 2 [rural]
## 2 1643 1515 1.80 2012-01-23 1982-01-06 30 2 [rural]
## 3 85 85 1.80 2012-01-21 1979-01-01 33 2 [rural]
## 4 881 844 1.80 2012-01-06 1968-03-29 43 2 [rural]
## 5 1981 1797 1.80 2012-05-11 1986-05-25 25 2 [rural]
## 6 1072 1015 0.998 2012-02-20 1993-07-03 18 2 [rural]
## 7 1978 1794 0.998 2012-02-23 1967-01-28 45 2 [rural]
## 8 1607 1486 0.998 2012-02-20 1989-01-21 23 2 [rural]
## 9 738 711 0.192 2012-03-09 1962-07-24 49 2 [rural]
## 10 1656 1525 0.192 2012-03-15 1980-12-25 31 2 [rural]
## # ℹ 1,990 more rows
## # ℹ 10 more variables: region <dbl+lbl>, instruction <dbl+lbl>,
## # employed <dbl+lbl>, matri <dbl+lbl>, religion <dbl+lbl>,
## # newspaper <dbl+lbl>, radio <dbl+lbl>, tv <dbl+lbl>,
## # ideal_nb_children <dbl+lbl>, test <dbl+lbl>
Note: in R console, value labels (if defined) are usually printed but they do not appear in a R markdown document like this vignette.
dplyr::glimpse()
The function dplyr::glimpse()
allows you to have a quick
look at all the variables in a data frame.
## Rows: 150
## Columns: 5
## $ Sepal.Length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4.…
## $ Sepal.Width <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, 3.…
## $ Petal.Length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5, 1.…
## $ Petal.Width <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0.2, 0.…
## $ Species <fct> setosa, setosa, setosa, setosa, setosa, setosa, setosa, s…
## Rows: 2,000
## Columns: 17
## $ id_woman <dbl> 391, 1643, 85, 881, 1981, 1072, 1978, 1607, 738, 165…
## $ id_household <dbl> 381, 1515, 85, 844, 1797, 1015, 1794, 1486, 711, 152…
## $ weight <dbl> 1.803150, 1.803150, 1.803150, 1.803150, 1.803150, 0.…
## $ interview_date <date> 2012-05-05, 2012-01-23, 2012-01-21, 2012-01-06, 201…
## $ date_of_birth <date> 1997-03-07, 1982-01-06, 1979-01-01, 1968-03-29, 198…
## $ age <dbl> 15, 30, 33, 43, 25, 18, 45, 23, 49, 31, 26, 45, 25, …
## $ residency <dbl+lbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, …
## $ region <dbl+lbl> 4, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, …
## $ instruction <dbl+lbl> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 2, 1, 0, …
## $ employed <dbl+lbl> 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ matri <dbl+lbl> 0, 2, 2, 2, 1, 0, 1, 1, 2, 5, 2, 3, 0, 2, 1, 2, …
## $ religion <dbl+lbl> 1, 3, 2, 3, 2, 2, 3, 1, 3, 3, 2, 3, 2, 2, 2, 2, …
## $ newspaper <dbl+lbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ radio <dbl+lbl> 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, …
## $ tv <dbl+lbl> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, …
## $ ideal_nb_children <dbl+lbl> 4, 4, 4, 4, 4, 5, 10, 5, 4, 5, 6, 10, …
## $ test <dbl+lbl> 0, 9, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, …
It will show you the first values of each variable as well as the type of each variable. However, some important informations are not displayed:
labelled::look_for()
look_for()
provided by the labelled
package
will print in the console a data dictionary of all variables, showing
variable labels when available, the type of variable and a list of
values corresponding to:
details = "full"
).## pos variable label col_type missing values
## 1 Sepal.Length — dbl 0
## 2 Sepal.Width — dbl 0
## 3 Petal.Length Length of petal dbl 0
## 4 Petal.Width Width of Petal dbl 0
## 5 Species — fct 0 setosa
## versicolor
## virginica
## pos variable label col_type missing values
## 1 id_woman Woman Id dbl 0
## 2 id_household Household Id dbl 0
## 3 weight Sample weight dbl 0
## 4 interview_date Interview date date 0
## 5 date_of_birth Date of birth date 0
## 6 age Age at last anniv~ dbl 0
## 7 residency Urban / rural res~ dbl+lbl 0 [1] urban
## [2] rural
## 8 region Region dbl+lbl 0 [1] North
## [2] East
## [3] South
## [4] West
## 9 instruction Level of instruct~ dbl+lbl 0 [0] none
## [1] primary
## [2] secondary
## [3] higher
## 10 employed Employed? dbl+lbl 7 [0] no
## [1] yes
## [9] missing
## 11 matri Matrimonial status dbl+lbl 0 [0] single
## [1] married
## [2] living togeth~
## [3] windowed
## [4] divorced
## [5] separated
## 12 religion Religion dbl+lbl 4 [1] Muslim
## [2] Christian
## [3] Protestant
## [4] no religion
## [5] other
## 13 newspaper Read newspaper? dbl+lbl 0 [0] no
## [1] yes
## 14 radio Listen to radio? dbl+lbl 0 [0] no
## [1] yes
## 15 tv Watch TV? dbl+lbl 0 [0] no
## [1] yes
## 16 ideal_nb_children Ideal number of c~ dbl+lbl 0 [96] don't know
## [99] missing
## 17 test Ever tested for H~ dbl+lbl 29 [0] no
## [1] yes
## [9] missing
Note that lookfor()
and
generate_dictionary()
are synonyms of
look_for()
and works exactly in the same way.
If there is not enough space to print full labels in the console,
they will be truncated (truncation is indicated by a
~
).
When a data frame has dozens or even hundreds of variables, it could become difficult to find a specific variable. In such case, you can provide an optional list of keywords, which can be simple character strings or regular expression, to search for specific variables.
## pos variable label col_type missing values
## 3 Petal.Length Length of petal dbl 0
## 4 Petal.Width Width of Petal dbl 0
## pos variable label col_type missing values
## 1 Sepal.Length — dbl 0
## 2 Sepal.Width — dbl 0
## 5 Species — fct 0 setosa
## versicolor
## virginica
## pos variable label col_type missing values
## 3 Petal.Length Length of petal dbl 0
## 4 Petal.Width Width of Petal dbl 0
## 5 Species — fct 0 setosa
## versicolor
## virginica
## pos variable label col_type missing values
## 5 Species — fct 0 setosa
## versicolor
## virginica
## pos variable label col_type missing values
## 3 Petal.Length Length of petal dbl 0
## 4 Petal.Width Width of Petal dbl 0
## 5 Species — fct 0 setosa
## versicolor
## virginica
## pos variable label col_type missing values
## 9 instruction Level of instruction dbl+lbl 0 [0] none
## [1] primary
## [2] secondary
## [3] higher
## 13 newspaper Read newspaper? dbl+lbl 0 [0] no
## [1] yes
By default, look_for()
will look through both variable
names and variables labels. Use labels = FALSE
to look only
through variable names.
## pos variable label col_type missing values
## 13 newspaper Read newspaper? dbl+lbl 0 [0] no
## [1] yes
## ! Nothing found. Sorry.
Similarly, the search is by default case insensitive. To make the
search case sensitive, use ignore.case = FALSE
.
## pos variable label col_type missing values
## 1 Sepal.Length — dbl 0
## 2 Sepal.Width — dbl 0
## ! Nothing found. Sorry.
If you just want to use the search feature of look_for()
without computing the details of each variable, simply indicate
details = "none"
or details = FALSE
.
## pos variable label
## 1 id_woman Woman Id
## 2 id_household Household Id
## 7 residency Urban / rural residency
## 16 ideal_nb_children Ideal number of children
If you want more details (but can be time consuming for big data
frames), indicate details = "full"
or
details = TRUE
.
## pos variable label col_type missing unique_values
## 1 id_woman Woman Id dbl 0 2000
## 2 id_household Household Id dbl 0 1814
## 3 weight Sample weight dbl 0 351
## 4 interview_date Interview date date 0 165
## 5 date_of_birth Date of birth date 0 1740
## 6 age Age at last anniv~ dbl 0 36
## 7 residency Urban / rural res~ dbl+lbl 0 2
##
## 8 region Region dbl+lbl 0 4
##
##
##
## 9 instruction Level of instruct~ dbl+lbl 0 4
##
##
##
## 10 employed Employed? dbl+lbl 7 3
##
##
## 11 matri Matrimonial status dbl+lbl 0 6
##
##
##
##
##
## 12 religion Religion dbl+lbl 4 6
##
##
##
##
## 13 newspaper Read newspaper? dbl+lbl 0 2
##
## 14 radio Listen to radio? dbl+lbl 0 2
##
## 15 tv Watch TV? dbl+lbl 0 2
##
## 16 ideal_nb_children Ideal number of c~ dbl+lbl 0 18
##
## 17 test Ever tested for H~ dbl+lbl 29 3
##
##
## values na_values na_range
## range: 1 - 2000
## range: 1 - 1814
## range: 0.044629 -~
## range: 2011-12-01~
## range: 1962-02-07~
## range: 14 - 49
## [1] urban
## [2] rural
## [1] North
## [2] East
## [3] South
## [4] West
## [0] none
## [1] primary
## [2] secondary
## [3] higher
## [0] no 9
## [1] yes
## [9] missing
## [0] single
## [1] married
## [2] living togeth~
## [3] windowed
## [4] divorced
## [5] separated
## [1] Muslim
## [2] Christian
## [3] Protestant
## [4] no religion
## [5] other
## [0] no
## [1] yes
## [0] no
## [1] yes
## [0] no
## [1] yes
## [96] don't know
## [99] missing
## [0] no 9
## [1] yes
## [9] missing
## Rows: 17
## Columns: 14
## $ pos <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17
## $ variable <chr> "id_woman", "id_household", "weight", "interview_date", …
## $ label <chr> "Woman Id", "Household Id", "Sample weight", "Interview …
## $ col_type <chr> "dbl", "dbl", "dbl", "date", "date", "dbl", "dbl+lbl", "…
## $ missing <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 0, 4, 0, 0, 0, 0, 29
## $ levels <named list> <NULL>, <NULL>, <NULL>, <NULL>, <NULL>, <NULL>, <NULL>, …
## $ value_labels <named list> <NULL>, <NULL>, <NULL>, <NULL>, <NULL>, <NULL>, <1, 2>, …
## $ class <named list> "numeric", "numeric", "numeric", "Date", "Date", …
## $ type <chr> "double", "double", "double", "double", "double",…
## $ na_values <named list> <NULL>, <NULL>, <NULL>, <NULL>, <NULL>, <NULL>, <…
## $ na_range <named list> <NULL>, <NULL>, <NULL>, <NULL>, <NULL>, <NULL>, <NULL>, …
## $ n_na <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 0, 4, 0, 0, 0, 0, 29
## $ unique_values <int> 2000, 1814, 351, 165, 1740, 36, 2, 4, 4, 3, 6, 6,…
## $ range <named list> <1, 2000>, <1, 1814>, <0.044629, 4.396831>, <2011…
look_for()
look_for()
returns a detailed tibble which is summarized
before printing. To deactivate default printing and see full results,
simply use dplyr::as_tibble()
,
dplyr::glimpse()
or even utils::View()
.
## # A tibble: 17 × 7
## pos variable label col_type missing levels value_labels
## <int> <chr> <chr> <chr> <int> <name> <named list>
## 1 1 id_woman Woman Id dbl 0 <NULL> <NULL>
## 2 2 id_household Household Id dbl 0 <NULL> <NULL>
## 3 3 weight Sample weight dbl 0 <NULL> <NULL>
## 4 4 interview_date Interview date date 0 <NULL> <NULL>
## 5 5 date_of_birth Date of birth date 0 <NULL> <NULL>
## 6 6 age Age at last ann… dbl 0 <NULL> <NULL>
## 7 7 residency Urban / rural r… dbl+lbl 0 <NULL> <dbl [2]>
## 8 8 region Region dbl+lbl 0 <NULL> <dbl [4]>
## 9 9 instruction Level of instru… dbl+lbl 0 <NULL> <dbl [4]>
## 10 10 employed Employed? dbl+lbl 7 <NULL> <dbl [3]>
## 11 11 matri Matrimonial sta… dbl+lbl 0 <NULL> <dbl [6]>
## 12 12 religion Religion dbl+lbl 4 <NULL> <dbl [5]>
## 13 13 newspaper Read newspaper? dbl+lbl 0 <NULL> <dbl [2]>
## 14 14 radio Listen to radio? dbl+lbl 0 <NULL> <dbl [2]>
## 15 15 tv Watch TV? dbl+lbl 0 <NULL> <dbl [2]>
## 16 16 ideal_nb_children Ideal number of… dbl+lbl 0 <NULL> <dbl [2]>
## 17 17 test Ever tested for… dbl+lbl 29 <NULL> <dbl [3]>
## Rows: 17
## Columns: 7
## $ pos <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17
## $ variable <chr> "id_woman", "id_household", "weight", "interview_date", "…
## $ label <chr> "Woman Id", "Household Id", "Sample weight", "Interview d…
## $ col_type <chr> "dbl", "dbl", "dbl", "date", "date", "dbl", "dbl+lbl", "d…
## $ missing <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 0, 4, 0, 0, 0, 0, 29
## $ levels <named list> <NULL>, <NULL>, <NULL>, <NULL>, <NULL>, <NULL>, <NULL>, <…
## $ value_labels <named list> <NULL>, <NULL>, <NULL>, <NULL>, <NULL>, <NULL>, <1, 2>, <…
The tibble returned by look_for()
could be easily
manipulated for advanced programming.
When a column has several values for one variable
(e.g. levels
or value_labels
), results as
stored with nested named list. You can convert named lists into simpler
character vectors, you can use
convert_list_columns_to_character()
.
## # A tibble: 17 × 7
## pos variable label col_type missing levels value_labels
## <int> <chr> <chr> <chr> <int> <chr> <chr>
## 1 1 id_woman Woman Id dbl 0 "" ""
## 2 2 id_household Household Id dbl 0 "" ""
## 3 3 weight Sample weight dbl 0 "" ""
## 4 4 interview_date Interview date date 0 "" ""
## 5 5 date_of_birth Date of birth date 0 "" ""
## 6 6 age Age at last ann… dbl 0 "" ""
## 7 7 residency Urban / rural r… dbl+lbl 0 "" "[1] urban;…
## 8 8 region Region dbl+lbl 0 "" "[1] North;…
## 9 9 instruction Level of instru… dbl+lbl 0 "" "[0] none; …
## 10 10 employed Employed? dbl+lbl 7 "" "[0] no; [1…
## 11 11 matri Matrimonial sta… dbl+lbl 0 "" "[0] single…
## 12 12 religion Religion dbl+lbl 4 "" "[1] Muslim…
## 13 13 newspaper Read newspaper? dbl+lbl 0 "" "[0] no; [1…
## 14 14 radio Listen to radio? dbl+lbl 0 "" "[0] no; [1…
## 15 15 tv Watch TV? dbl+lbl 0 "" "[0] no; [1…
## 16 16 ideal_nb_children Ideal number of… dbl+lbl 0 "" "[96] don't…
## 17 17 test Ever tested for… dbl+lbl 29 "" "[0] no; [1…
Alternatively, you can use lookfor_to_long_format()
to
transform results into a long format with one row per factor level and
per value label.
## # A tibble: 41 × 7
## pos variable label col_type missing levels value_labels
## <int> <chr> <chr> <chr> <int> <chr> <chr>
## 1 1 id_woman Woman Id dbl 0 <NA> <NA>
## 2 2 id_household Household Id dbl 0 <NA> <NA>
## 3 3 weight Sample weight dbl 0 <NA> <NA>
## 4 4 interview_date Interview date date 0 <NA> <NA>
## 5 5 date_of_birth Date of birth date 0 <NA> <NA>
## 6 6 age Age at last annive… dbl 0 <NA> <NA>
## 7 7 residency Urban / rural resi… dbl+lbl 0 <NA> [1] urban
## 8 7 residency Urban / rural resi… dbl+lbl 0 <NA> [2] rural
## 9 8 region Region dbl+lbl 0 <NA> [1] North
## 10 8 region Region dbl+lbl 0 <NA> [2] East
## # ℹ 31 more rows
Both can be combined:
## # A tibble: 41 × 7
## pos variable label col_type missing levels value_labels
## <int> <chr> <chr> <chr> <int> <chr> <chr>
## 1 1 id_woman Woman Id dbl 0 <NA> <NA>
## 2 2 id_household Household Id dbl 0 <NA> <NA>
## 3 3 weight Sample weight dbl 0 <NA> <NA>
## 4 4 interview_date Interview date date 0 <NA> <NA>
## 5 5 date_of_birth Date of birth date 0 <NA> <NA>
## 6 6 age Age at last annive… dbl 0 <NA> <NA>
## 7 7 residency Urban / rural resi… dbl+lbl 0 <NA> [1] urban
## 8 7 residency Urban / rural resi… dbl+lbl 0 <NA> [2] rural
## 9 8 region Region dbl+lbl 0 <NA> [1] North
## 10 8 region Region dbl+lbl 0 <NA> [2] East
## # ℹ 31 more rows
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.