The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

The wcde package allows for R users to easily download data from the Wittgenstein Centre for Demography and Human Capital Data Explorer as well as containing a number of helpful functions for working with education specific demographic data.

Installation

You can install the released version of wcde from CRAN with:

install.packages("wcde")

Install the developmental version with:

library(devtools)
install_github("guyabel/wcde", ref = "main")

Getting data into R

The get_wcde() function can be used to download data from the Wittgenstein Centre Human Capital Data Explorer. It requires three user inputs

  • indicator: a short code for the indicator of interest
  • scenario: a number referring to a SSP narrative, by default 2 is used (for SSP2)
  • country_code (or country_name): corresponding to the country of interest
library(wcde)
# download education specific tfr data
get_wcde(indicator = "etfr",
         country_name = c("Brazil", "Albania"))
#> # A tibble: 192 × 6
#>    scenario name    country_code education          period     etfr
#>       <dbl> <chr>          <dbl> <chr>              <chr>     <dbl>
#>  1        2 Brazil            76 No Education       2020-2025  2.16
#>  2        2 Albania            8 No Education       2020-2025  2.31
#>  3        2 Brazil            76 Incomplete Primary 2020-2025  2.16
#>  4        2 Albania            8 Incomplete Primary 2020-2025  2.51
#>  5        2 Brazil            76 Primary            2020-2025  2.16
#>  6        2 Albania            8 Primary            2020-2025  2.17
#>  7        2 Brazil            76 Lower Secondary    2020-2025  1.71
#>  8        2 Albania            8 Lower Secondary    2020-2025  1.88
#>  9        2 Brazil            76 Upper Secondary    2020-2025  1.30
#> 10        2 Albania            8 Upper Secondary    2020-2025  1.61
#> # … with 182 more rows

# download education specific survivorship rates
get_wcde(indicator = "eassr",
         country_name = c("Niger", "Korea"))
#> # A tibble: 6,912 × 8
#>    scenario name              country_code age    sex   education   period eassr
#>       <dbl> <chr>                    <dbl> <chr>  <chr> <chr>       <chr>  <dbl>
#>  1        2 Niger                      562 15--19 Male  No Educati… 2020-… 0.987
#>  2        2 Republic of Korea          410 15--19 Male  No Educati… 2020-… 0.999
#>  3        2 Niger                      562 15--19 Male  Incomplete… 2020-… 0.987
#>  4        2 Republic of Korea          410 15--19 Male  Incomplete… 2020-… 0.999
#>  5        2 Niger                      562 15--19 Male  Primary     2020-… 0.989
#>  6        2 Republic of Korea          410 15--19 Male  Primary     2020-… 0.999
#>  7        2 Niger                      562 15--19 Male  Lower Seco… 2020-… 0.990
#>  8        2 Republic of Korea          410 15--19 Male  Lower Seco… 2020-… 0.999
#>  9        2 Niger                      562 15--19 Male  Upper Seco… 2020-… 0.992
#> 10        2 Republic of Korea          410 15--19 Male  Upper Seco… 2020-… 0.999
#> # … with 6,902 more rows

Indicator codes

The indicator input must match the short code from the indicator table. The find_indicator() function can be used to look up short codes (given in the first column) from the wic_indicators data frame:

find_indicator(x = "tfr")
#> # A tibble: 2 × 6
#>   indicator description                       `wcde-v3`  wcde-…¹ wcde-…² defin…³
#>   <chr>     <chr>                             <chr>      <chr>   <chr>   <chr>  
#> 1 etfr      Total Fertility Rate by Education projectio… projec… projec… The av…
#> 2 tfr       Total Fertility Rate              projectio… past-a… past-a… The av…
#> # … with abbreviated variable names ¹​`wcde-v2`, ²​`wcde-v1`, ³​definition_latest

Temporal coverage

By default, get_wdce() returns data for all years or available periods or years. The filter() function in dplyr can be used to filter data for specific years or periods, for example:

library(tidyverse)
get_wcde(indicator = "e0",
         country_name = c("Japan", "Australia")) %>%
  filter(period == "2015-2020")
#> # A tibble: 0 × 6
#> # … with 6 variables: scenario <dbl>, name <chr>, country_code <dbl>,
#> #   sex <chr>, period <chr>, e0 <dbl>

get_wcde(indicator = "sexratio",
         country_name = c("China", "South Korea")) %>%
  filter(year == 2020)
#> # A tibble: 44 × 6
#>    scenario name              country_code age     year sexratio
#>       <dbl> <chr>                    <dbl> <chr>  <dbl>    <dbl>
#>  1        2 China                      156 All     2020    1.05 
#>  2        2 Republic of Korea          410 All     2020    0.999
#>  3        2 China                      156 0--4    2020    1.14 
#>  4        2 Republic of Korea          410 0--4    2020    1.05 
#>  5        2 China                      156 5--9    2020    1.16 
#>  6        2 Republic of Korea          410 5--9    2020    1.05 
#>  7        2 China                      156 10--14  2020    1.17 
#>  8        2 Republic of Korea          410 10--14  2020    1.07 
#>  9        2 China                      156 15--19  2020    1.17 
#> 10        2 Republic of Korea          410 15--19  2020    1.08 
#> # … with 34 more rows

Past data is only available for selected indicators. These can be viewed using the version column:

wic_indicators %>%
  filter(`wcde-v2` == "past-available") %>%
  select(1:2)
#> # A tibble: 28 × 2
#>    indicator description                                     
#>    <chr>     <chr>                                           
#>  1 asfr      Age-Specific Fertility Rate                     
#>  2 assr      Age-Specific Survival Ratio                     
#>  3 bmys      Mean Years of Schooling by Broad Age            
#>  4 bpop      Population Size by Broad Age (000's)            
#>  5 bprop     Educational Attainment Distribution by Broad Age
#>  6 cbr       Crude Birth Rate                                
#>  7 cdr       Crude Death Rate                                
#>  8 e0        Life Expectancy at Birth                        
#>  9 epop      Population Size by Education (000's)            
#> 10 ggapedu15 Gender Gap in Educational Attainment (15+)      
#> # … with 18 more rows

The filter() function can also be used to filter specific indicators to specific age, sex or education groups

get_wcde(indicator = "sexratio",
         country_name = c("China", "South Korea")) %>%
  filter(year == 2020,
         age == "All")
#> # A tibble: 2 × 6
#>   scenario name              country_code age    year sexratio
#>      <dbl> <chr>                    <dbl> <chr> <dbl>    <dbl>
#> 1        2 China                      156 All    2020    1.05 
#> 2        2 Republic of Korea          410 All    2020    0.999

Country names and codes

Country names are guessed using the countrycode package.

get_wcde(indicator = "tfr",
         country_name = c("U.A.E", "Espania", "Österreich"))
#> # A tibble: 48 × 5
#>    scenario name                 country_code period      tfr
#>       <dbl> <chr>                       <dbl> <chr>     <dbl>
#>  1        2 United Arab Emirates          784 2020-2025  1.35
#>  2        2 Spain                         724 2020-2025  1.19
#>  3        2 Austria                        40 2020-2025  1.45
#>  4        2 United Arab Emirates          784 2025-2030  1.39
#>  5        2 Spain                         724 2025-2030  1.25
#>  6        2 Austria                        40 2025-2030  1.48
#>  7        2 United Arab Emirates          784 2030-2035  1.41
#>  8        2 Spain                         724 2030-2035  1.32
#>  9        2 Austria                        40 2030-2035  1.51
#> 10        2 United Arab Emirates          784 2035-2040  1.44
#> # … with 38 more rows

The get_wcde() functions accepts ISO alpha numeric codes for countries via the country_code argument:

get_wcde(indicator = "etfr", country_code = c(44, 100))
#> # A tibble: 192 × 6
#>    scenario name     country_code education          period     etfr
#>       <dbl> <chr>           <dbl> <chr>              <chr>     <dbl>
#>  1        2 Bahamas            44 No Education       2020-2025  2.16
#>  2        2 Bulgaria          100 No Education       2020-2025  1.86
#>  3        2 Bahamas            44 Incomplete Primary 2020-2025  2.16
#>  4        2 Bulgaria          100 Incomplete Primary 2020-2025  1.86
#>  5        2 Bahamas            44 Primary            2020-2025  2.16
#>  6        2 Bulgaria          100 Primary            2020-2025  1.86
#>  7        2 Bahamas            44 Lower Secondary    2020-2025  1.71
#>  8        2 Bulgaria          100 Lower Secondary    2020-2025  1.86
#>  9        2 Bahamas            44 Upper Secondary    2020-2025  1.43
#> 10        2 Bulgaria          100 Upper Secondary    2020-2025  1.51
#> # … with 182 more rows

A full list of available countries and region aggregates, and their codes, can be found in the wic_locations data frame.

wic_locations
#> # A tibble: 232 × 8
#>    name                       isono conti…¹ region dim   wcde-…² wcde-…³ wcde-…⁴
#>    <chr>                      <dbl> <chr>   <chr>  <chr> <lgl>   <lgl>   <lgl>  
#>  1 World                        900 <NA>    <NA>   area  TRUE    TRUE    TRUE   
#>  2 Africa                       903 <NA>    <NA>   area  TRUE    TRUE    TRUE   
#>  3 Asia                         935 <NA>    <NA>   area  TRUE    TRUE    TRUE   
#>  4 Europe                       908 <NA>    <NA>   area  TRUE    TRUE    TRUE   
#>  5 Latin America and the Car…   904 <NA>    <NA>   area  TRUE    TRUE    TRUE   
#>  6 Northern America             905 <NA>    <NA>   area  TRUE    TRUE    TRUE   
#>  7 Oceania                      909 <NA>    <NA>   area  TRUE    TRUE    TRUE   
#>  8 Afghanistan                    4 Asia    South… coun… TRUE    TRUE    TRUE   
#>  9 Albania                        8 Europe  South… coun… TRUE    TRUE    TRUE   
#> 10 Algeria                       12 Africa  North… coun… TRUE    TRUE    TRUE   
#> # … with 222 more rows, and abbreviated variable names ¹​continent, ²​`wcde-v3`,
#> #   ³​`wcde-v2`, ⁴​`wcde-v1`

Scenarios

By default get_wcde() returns data for Medium (SSP2) scenario. Results for different SSP scenarios can be returned by passing a different (or multiple) scenario values to the scenario argument in get_data().

get_wcde(indicator = "growth",
         country_name = c("India", "China"),
         scenario = c(1:3, 22, 23)) %>%
  filter(period == "2095-2100")
#> # A tibble: 10 × 5
#>    scenario name  country_code period    growth
#>       <dbl> <chr>        <dbl> <chr>      <dbl>
#>  1        1 India          356 2095-2100 -1.05 
#>  2        1 China          156 2095-2100 -1.11 
#>  3        2 India          356 2095-2100 -0.545
#>  4        2 China          156 2095-2100 -1.03 
#>  5        3 India          356 2095-2100  0.170
#>  6        3 China          156 2095-2100 -0.428
#>  7       22 India          356 2095-2100 -0.545
#>  8       22 China          156 2095-2100 -1.03 
#>  9       23 India          356 2095-2100 -0.545
#> 10       23 China          156 2095-2100 -1.03

Set include_scenario_names = TRUE to include a columns with the full names of the scenarios

get_wcde(indicator = "tfr",
         country_name = c("Kenya", "Nigeria", "Algeria"),
         scenario = 1:3,
         include_scenario_names = TRUE) %>%
  filter(period == "2045-2050")
#> # A tibble: 9 × 7
#>   scenario scenario_name              scenario_abb name    countr…¹ period   tfr
#>      <dbl> <chr>                      <chr>        <chr>      <dbl> <chr>  <dbl>
#> 1        1 Rapid Development (SSP1)   SSP1         Kenya        404 2045-…  1.62
#> 2        1 Rapid Development (SSP1)   SSP1         Nigeria      566 2045-…  2.62
#> 3        1 Rapid Development (SSP1)   SSP1         Algeria       12 2045-…  1.52
#> 4        2 Medium (SSP2)              SSP2         Kenya        404 2045-…  2.32
#> 5        2 Medium (SSP2)              SSP2         Nigeria      566 2045-…  3.75
#> 6        2 Medium (SSP2)              SSP2         Algeria       12 2045-…  2.04
#> 7        3 Stalled Development (SSP3) SSP3         Kenya        404 2045-…  3.02
#> 8        3 Stalled Development (SSP3) SSP3         Nigeria      566 2045-…  4.83
#> 9        3 Stalled Development (SSP3) SSP3         Algeria       12 2045-…  2.66
#> # … with abbreviated variable name ¹​country_code

Additional details of the pathways for each scenario numeric code can be found in the wic_scenarios object. Further background and links to the corresponding literature are provided in the Data Explorer

wic_scenarios
#> # A tibble: 9 × 6
#>   scenario_name                          scena…¹ scena…² wcde-…³ wcde-…⁴ wcde-…⁵
#>   <chr>                                    <dbl> <chr>   <lgl>   <lgl>   <lgl>  
#> 1 Rapid Development (SSP1)                     1 SSP1    TRUE    TRUE    TRUE   
#> 2 Medium (SSP2)                                2 SSP2    TRUE    TRUE    TRUE   
#> 3 Stalled Development (SSP3)                   3 SSP3    TRUE    TRUE    TRUE   
#> 4 Inequality (SSP4)                            4 SSP4    TRUE    FALSE   TRUE   
#> 5 Conventional Development (SSP5)              5 SSP5    TRUE    FALSE   TRUE   
#> 6 Medium - Zero Migration (SSP2-ZM)           22 SSP2-ZM TRUE    TRUE    FALSE  
#> 7 Medium - Double Migration (SSP2-DM)         23 SSP2-DM TRUE    TRUE    FALSE  
#> 8 Medium - Constant Enrolment Rate (SSP…      20 SSP2-C… FALSE   FALSE   TRUE   
#> 9 Medium - Fast Track Education (SSP2-F…      21 SSP2-FT FALSE   FALSE   TRUE   
#> # … with abbreviated variable names ¹​scenario, ²​scenario_abb, ³​`wcde-v3`,
#> #   ⁴​`wcde-v2`, ⁵​`wcde-v1`

All countries data

Data for all countries can be obtained by not setting country_name or country_code

get_wcde(indicator = "mage")
#> # A tibble: 3,876 × 5
#>    scenario name                     country_code  year  mage
#>       <dbl> <chr>                           <dbl> <dbl> <dbl>
#>  1        2 Bulgaria                          100  2020  40.1
#>  2        2 Myanmar                           104  2020  24.6
#>  3        2 Burundi                           108  2020  11.5
#>  4        2 Belarus                           112  2020  35.9
#>  5        2 Cambodia                          116  2020  22.0
#>  6        2 Algeria                            12  2020  23.5
#>  7        2 Cameroon                          120  2020  13.5
#>  8        2 Canada                            124  2020  35.9
#>  9        2 Cape Verde                        132  2020  21.8
#> 10        2 Central African Republic          140  2020  10.7
#> # … with 3,866 more rows

Multiple indicators

The get_wdce() function needs to be called multiple times to download multiple indicators. This can be done using the map() function in purrr

mi <- tibble(ind = c("odr", "nirate", "ggapedu25")) %>%
  mutate(d = map(.x = ind, .f = ~get_wcde(indicator = .x)))
mi
#> # A tibble: 3 × 2
#>   ind       d                    
#>   <chr>     <list>               
#> 1 odr       <tibble [3,876 × 5]> 
#> 2 nirate    <tibble [3,648 × 5]> 
#> 3 ggapedu25 <tibble [23,256 × 6]>

mi %>%
  filter(ind == "odr") %>%
  select(-ind) %>%
  unnest(cols = d)
#> # A tibble: 3,876 × 5
#>    scenario name                     country_code  year    odr
#>       <dbl> <chr>                           <dbl> <dbl>  <dbl>
#>  1        2 Bulgaria                          100  2020 0.347 
#>  2        2 Myanmar                           104  2020 0.0930
#>  3        2 Burundi                           108  2020 0.0486
#>  4        2 Belarus                           112  2020 0.246 
#>  5        2 Cambodia                          116  2020 0.0790
#>  6        2 Algeria                            12  2020 0.0937
#>  7        2 Cameroon                          120  2020 0.0505
#>  8        2 Canada                            124  2020 0.268 
#>  9        2 Cape Verde                        132  2020 0.0792
#> 10        2 Central African Republic          140  2020 0.0501
#> # … with 3,866 more rows

mi %>%
  filter(ind == "nirate") %>%
  select(-ind) %>%
  unnest(cols = d)
#> # A tibble: 3,648 × 5
#>    scenario name                     country_code period    nirate
#>       <dbl> <chr>                           <dbl> <chr>      <dbl>
#>  1        2 Bulgaria                          100 2020-2025 -10.7 
#>  2        2 Myanmar                           104 2020-2025   7.46
#>  3        2 Burundi                           108 2020-2025  28.0 
#>  4        2 Belarus                           112 2020-2025  -5.95
#>  5        2 Cambodia                          116 2020-2025  12.8 
#>  6        2 Algeria                            12 2020-2025  17.3 
#>  7        2 Cameroon                          120 2020-2025  27.0 
#>  8        2 Canada                            124 2020-2025   1.58
#>  9        2 Cape Verde                        132 2020-2025  11.8 
#> 10        2 Central African Republic          140 2020-2025  33.4 
#> # … with 3,638 more rows

mi %>%
  filter(ind == "ggapedu25") %>%
  select(-ind) %>%
  unnest(cols = d)
#> # A tibble: 23,256 × 6
#>    scenario name                     country_code  year education    ggapedu25
#>       <dbl> <chr>                           <dbl> <dbl> <chr>            <dbl>
#>  1        2 Bulgaria                          100  2020 No Education -4.63e- 3
#>  2        2 Myanmar                           104  2020 No Education -4.30e- 2
#>  3        2 Burundi                           108  2020 No Education  1.47e- 1
#>  4        2 Belarus                           112  2020 No Education -5.76e- 4
#>  5        2 Cambodia                          116  2020 No Education -1.19e- 1
#>  6        2 Algeria                            12  2020 No Education -1.63e- 1
#>  7        2 Cameroon                          120  2020 No Education -1.02e- 1
#>  8        2 Canada                            124  2020 No Education  1.36e-20
#>  9        2 Cape Verde                        132  2020 No Education  2.61e- 2
#> 10        2 Central African Republic          140  2020 No Education -3.13e- 1
#> # … with 23,246 more rows

Previous versions

Previous versions of projections from the Wittgenstein Centre for Demography are available using the version argument in get_wdce() to "wcde-v1" or "wcde-v2", where "wcde-v3" is used as the default since 2024.

get_wcde(indicator = "etfr",
         country_name = c("Brazil", "Albania"),
         version = "wcde-v2")
#> # A tibble: 204 × 6
#>    scenario name    country_code education          period     etfr
#>       <dbl> <chr>          <dbl> <chr>              <chr>     <dbl>
#>  1        2 Brazil            76 No Education       2015-2020  2.47
#>  2        2 Albania            8 No Education       2015-2020  1.88
#>  3        2 Brazil            76 Incomplete Primary 2015-2020  2.47
#>  4        2 Albania            8 Incomplete Primary 2015-2020  1.88
#>  5        2 Brazil            76 Primary            2015-2020  2.47
#>  6        2 Albania            8 Primary            2015-2020  1.88
#>  7        2 Brazil            76 Lower Secondary    2015-2020  1.89
#>  8        2 Albania            8 Lower Secondary    2015-2020  1.9 
#>  9        2 Brazil            76 Upper Secondary    2015-2020  1.37
#> 10        2 Albania            8 Upper Secondary    2015-2020  1.57
#> # … with 194 more rows

Note, not all indicators and scenarios are available in all versions - see the the wic_indicators and wic_scenarios objects for further details (above).

Server

If you have trouble with connecting to the IIASA server you can try back versions using the server option in get_wcde(), which can be set to github, 1&1.

get_wcde(indicator = "etfr",
         country_name = c("Brazil", "Albania"), 
         version = "wcde-v2", server = "github")
#> # A tibble: 204 × 6
#>    scenario name    country_code education          period     etfr
#>       <dbl> <chr>          <dbl> <chr>              <chr>     <dbl>
#>  1        2 Brazil            76 No Education       2015-2020  2.47
#>  2        2 Albania            8 No Education       2015-2020  1.88
#>  3        2 Brazil            76 Incomplete Primary 2015-2020  2.47
#>  4        2 Albania            8 Incomplete Primary 2015-2020  1.88
#>  5        2 Brazil            76 Primary            2015-2020  2.47
#>  6        2 Albania            8 Primary            2015-2020  1.88
#>  7        2 Brazil            76 Lower Secondary    2015-2020  1.89
#>  8        2 Albania            8 Lower Secondary    2015-2020  1.9 
#>  9        2 Brazil            76 Upper Secondary    2015-2020  1.37
#> 10        2 Albania            8 Upper Secondary    2015-2020  1.57
#> # … with 194 more rows

You may also set server = search-available to search through the three possible data location to download the data whereever it is available.

Working with population data

Population data for a range of age-sex-educational attainment combinations can be obtained by setting indicator = "pop" in get_wcde() and specifying a pop_age, pop_sex and pop_edu arguments. By default each of the three population breakdown arguments are set to “total”

get_wcde(indicator = "pop", country_name = "India")
#> # A tibble: 17 × 5
#>    scenario name  country_code  year      pop
#>       <dbl> <chr>        <dbl> <dbl>    <dbl>
#>  1        2 India          356  2020 1389966.
#>  2        2 India          356  2025 1445480.
#>  3        2 India          356  2030 1501725.
#>  4        2 India          356  2035 1548067.
#>  5        2 India          356  2040 1583687.
#>  6        2 India          356  2045 1607695.
#>  7        2 India          356  2050 1620358.
#>  8        2 India          356  2055 1625062.
#>  9        2 India          356  2060 1622572.
#> 10        2 India          356  2065 1612143.
#> 11        2 India          356  2070 1594676.
#> 12        2 India          356  2075 1570024.
#> 13        2 India          356  2080 1539493.
#> 14        2 India          356  2085 1504981.
#> 15        2 India          356  2090 1468261.
#> 16        2 India          356  2095 1430167.
#> 17        2 India          356  2100 1391608.

The pop_age argument can be set to all to get population data broken down in five-year age groups. The pop_sex argument can be set to both to get population data broken down into female and male groups. The pop_edu argument can be set to four, six or eight to get population data broken down into education categorizations with different levels of detail.

get_wcde(indicator = "pop", country_code = 900, pop_edu = "four")
#> # A tibble: 85 × 6
#>    scenario name  country_code  year education           pop
#>       <dbl> <fct>        <dbl> <dbl> <fct>             <dbl>
#>  1        2 World          900  2020 Under 15       2012336.
#>  2        2 World          900  2020 No Education    756762.
#>  3        2 World          900  2020 Primary        1208824.
#>  4        2 World          900  2020 Secondary      2883491.
#>  5        2 World          900  2020 Post Secondary  943560.
#>  6        2 World          900  2025 Under 15       2002922.
#>  7        2 World          900  2025 No Education    724867.
#>  8        2 World          900  2025 Primary        1212577.
#>  9        2 World          900  2025 Secondary      3114657.
#> 10        2 World          900  2025 Post Secondary 1096623.
#> # … with 75 more rows

The population breakdown arguments can be used in combination to provide further breakdowns, for example sex and education specific population totals

get_wcde(indicator = "pop", country_code = 900, pop_edu = "six", pop_sex = "both")
#> # A tibble: 238 × 7
#>    scenario name  country_code  year sex    education               pop
#>       <dbl> <fct>        <dbl> <dbl> <fct>  <fct>                 <dbl>
#>  1        2 World          900  2020 Male   Under 15           1037900.
#>  2        2 World          900  2020 Male   No Education        308168.
#>  3        2 World          900  2020 Male   Incomplete Primary  197055.
#>  4        2 World          900  2020 Male   Primary             426676.
#>  5        2 World          900  2020 Male   Lower Secondary     623289.
#>  6        2 World          900  2020 Male   Upper Secondary     848609.
#>  7        2 World          900  2020 Male   Post Secondary      484476.
#>  8        2 World          900  2020 Female Under 15            974436.
#>  9        2 World          900  2020 Female No Education        448594.
#> 10        2 World          900  2020 Female Incomplete Primary  186376.
#> # … with 228 more rows

The full age-sex-education specific data can also be obtained by setting indicator = "epop" in get_wcde().

Population pyramids

Create population pyramids by setting male population values to negative equivalent to allow for divergent columns from the y axis.

w <- get_wcde(indicator = "pop", country_code = 900,
              pop_age = "all", pop_sex = "both", pop_edu = "four",
              version = "wcde-v2")
w
#> # A tibble: 6,510 × 8
#>    scenario name  country_code  year age   sex    education          pop
#>       <dbl> <fct>        <dbl> <int> <fct> <fct>  <fct>            <dbl>
#>  1        2 World          900  1950 0--4  Male   Under 15       172362.
#>  2        2 World          900  1950 0--4  Male   No Education        0 
#>  3        2 World          900  1950 0--4  Male   Primary             0 
#>  4        2 World          900  1950 0--4  Male   Secondary           0 
#>  5        2 World          900  1950 0--4  Male   Post Secondary      0 
#>  6        2 World          900  1950 0--4  Female Under 15       166026.
#>  7        2 World          900  1950 0--4  Female No Education        0 
#>  8        2 World          900  1950 0--4  Female Primary             0 
#>  9        2 World          900  1950 0--4  Female Secondary           0 
#> 10        2 World          900  1950 0--4  Female Post Secondary      0 
#> # … with 6,500 more rows

w <- w %>%
  mutate(pop_pm = ifelse(test = sex == "Male", yes = -pop, no = pop),
         pop_pm = pop_pm/1e3)
w
#> # A tibble: 6,510 × 9
#>    scenario name  country_code  year age   sex    education          pop pop_pm
#>       <dbl> <fct>        <dbl> <int> <fct> <fct>  <fct>            <dbl>  <dbl>
#>  1        2 World          900  1950 0--4  Male   Under 15       172362.  -172.
#>  2        2 World          900  1950 0--4  Male   No Education        0      0 
#>  3        2 World          900  1950 0--4  Male   Primary             0      0 
#>  4        2 World          900  1950 0--4  Male   Secondary           0      0 
#>  5        2 World          900  1950 0--4  Male   Post Secondary      0      0 
#>  6        2 World          900  1950 0--4  Female Under 15       166026.   166.
#>  7        2 World          900  1950 0--4  Female No Education        0      0 
#>  8        2 World          900  1950 0--4  Female Primary             0      0 
#>  9        2 World          900  1950 0--4  Female Secondary           0      0 
#> 10        2 World          900  1950 0--4  Female Post Secondary      0      0 
#> # … with 6,500 more rows

Standard plot

Use standard ggplot code to create population pyramid with

  • scale_x_symmetric() from the lemon package to allow for equal male and female x-axis
  • fill colours set to the wic_col4 object in the wcde package which contains the names of the colours used in the Wittgenstein Centre Human Capital Data Explorer Data Explorer.

Note wic_col6 and wic_col8 objects also exist for equivalent plots of population data objects with corresponding numbers of categories of education.

library(lemon)

w %>%
  filter(year == 2020) %>%
  ggplot(mapping = aes(x = pop_pm, y = age, fill = fct_rev(education))) +
  geom_col() +
  geom_vline(xintercept = 0, colour = "black") +
  scale_x_symmetric(labels = abs) +
  scale_fill_manual(values = wic_col4, name = "Education") +
  labs(x = "Population (millions)", y = "Age") +
  theme_bw()

Sex label position

Add male and female labels on the x-axis by

  • Creating a facet plot with the strips on the bottom with transparent backgrounds and no space between.
  • Set the x axis to have zero expansion beyond the values in the data allowing the two sides of the pyramids to meet.
  • Add a geom_blank() to allow for equal x-axis and additional space at the end of largest columns.
w <- w %>%
  mutate(pop_max = ifelse(sex == "Male", -max(pop/1e3), max(pop/1e3)))

w %>%
  filter(year == 2020) %>%
  ggplot(mapping = aes(x = pop_pm, y = age, fill = fct_rev(education))) +
  geom_col() +
  geom_vline(xintercept = 0, colour = "black") +
  scale_x_continuous(labels = abs, expand = c(0, 0)) +
  scale_fill_manual(values = wic_col4, name = "Education") +
  labs(x = "Population (millions)", y = "Age") +
  facet_wrap(facets = "sex", scales = "free_x", strip.position = "bottom") +
  geom_blank(mapping = aes(x = pop_max * 1.1)) +
  theme(panel.spacing.x = unit(0, "pt"),
        strip.placement = "outside",
        strip.background = element_rect(fill = "transparent"),
        strip.text.x = element_text(margin = margin( b = 0, t = 0)))

Animate

Animate the pyramid through the past data and projection periods using the transition_time() function in the gganimate package

library(gganimate)

ggplot(data = w,
       mapping = aes(x = pop_pm, y = age, fill = fct_rev(education))) +
  geom_col() +
  geom_vline(xintercept = 0, colour = "black") +
  scale_x_continuous(labels = abs, expand = c(0, 0)) +
  scale_fill_manual(values = wic_col4, name = "Education") +
  facet_wrap(facets = "sex", scales = "free_x", strip.position = "bottom") +
  geom_blank(mapping = aes(x = pop_max * 1.1)) +
  theme(panel.spacing.x = unit(0, "pt"),
        strip.placement = "outside",
        strip.background = element_rect(fill = "transparent"),
        strip.text.x = element_text(margin = margin(b = 0, t = 0))) +
  transition_time(time = year) +
  labs(x = "Population (millions)", y = "Age",
       title = 'SSP2 World Population {round(frame_time)}')

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.