The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
The wcde package allows for R users to easily download
data from the Wittgenstein Centre
for Demography and Human Capital Data Explorer as well as containing
a number of helpful functions for working with education specific
demographic data.
You can install the released version of wcde from CRAN with:
install.packages("wcde")
Install the developmental version with:
library(devtools)
install_github("guyabel/wcde", ref = "main")
The get_wcde() function can be used to download data
from the Wittgenstein Centre Human Capital Data Explorer. It requires
three user inputs
indicator: a short code for the indicator of
interestscenario: a number referring to a SSP narrative, by
default 2 is used (for SSP2)country_code (or country_name):
corresponding to the country of interestlibrary(wcde)
# download education specific tfr data
get_wcde(indicator = "etfr",
country_name = c("Brazil", "Albania"))
#> # A tibble: 192 × 6
#> scenario name country_code education period etfr
#> <dbl> <chr> <dbl> <chr> <chr> <dbl>
#> 1 2 Brazil 76 No Education 2020-2025 2.24
#> 2 2 Albania 8 No Education 2020-2025 2.31
#> 3 2 Brazil 76 Incomplete Primary 2020-2025 2.24
#> 4 2 Albania 8 Incomplete Primary 2020-2025 2.51
#> 5 2 Brazil 76 Primary 2020-2025 2.24
#> 6 2 Albania 8 Primary 2020-2025 2.17
#> 7 2 Brazil 76 Lower Secondary 2020-2025 1.78
#> 8 2 Albania 8 Lower Secondary 2020-2025 1.88
#> 9 2 Brazil 76 Upper Secondary 2020-2025 1.35
#> 10 2 Albania 8 Upper Secondary 2020-2025 1.61
#> # ℹ 182 more rows
# download education specific survivorship rates
get_wcde(indicator = "eassr",
country_name = c("Niger", "Korea"))
#> # A tibble: 8,448 × 8
#> scenario name country_code age sex education period eassr
#> <dbl> <chr> <dbl> <chr> <chr> <chr> <chr> <dbl>
#> 1 2 Niger 562 15--19 Male No Educati… 2020-… 0.986
#> 2 2 Republic of Korea 410 15--19 Male No Educati… 2020-… 0.999
#> 3 2 Niger 562 15--19 Male Incomplete… 2020-… 0.988
#> 4 2 Republic of Korea 410 15--19 Male Incomplete… 2020-… 0.999
#> 5 2 Niger 562 15--19 Male Primary 2020-… 0.989
#> 6 2 Republic of Korea 410 15--19 Male Primary 2020-… 0.999
#> 7 2 Niger 562 15--19 Male Lower Seco… 2020-… 0.992
#> 8 2 Republic of Korea 410 15--19 Male Lower Seco… 2020-… 0.999
#> 9 2 Niger 562 15--19 Male Upper Seco… 2020-… 0.992
#> 10 2 Republic of Korea 410 15--19 Male Upper Seco… 2020-… 0.999
#> # ℹ 8,438 more rows
The indicator input must match the short code from the indicator
table. The find_indicator() function can be used to look up
short codes (given in the first column) from the
wic_indicators data frame:
find_indicator(x = "tfr")
#> # A tibble: 2 × 6
#> indicator description `wcde-v3` `wcde-v2` `wcde-v1` definition_latest
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 etfr Total Fertility Rat… projecti… projecti… projecti… The average numb…
#> 2 tfr Total Fertility Rate past-ava… past-ava… past-ava… The average numb…
By default, get_wdce() returns data for all years or
available periods or years. The filter() function in dplyr can
be used to filter data for specific years or periods, for example:
library(tidyverse)
get_wcde(indicator = "e0",
country_name = c("Japan", "Australia")) %>%
filter(period == "2015-2020")
#> # A tibble: 4 × 6
#> scenario name country_code sex period e0
#> <dbl> <chr> <dbl> <chr> <chr> <dbl>
#> 1 2 Japan 392 Male 2015-2020 81.1
#> 2 2 Australia 36 Male 2015-2020 81.1
#> 3 2 Japan 392 Female 2015-2020 87.3
#> 4 2 Australia 36 Female 2015-2020 85.1
get_wcde(indicator = "sexratio",
country_name = c("China", "South Korea")) %>%
filter(year == 2020)
#> # A tibble: 44 × 6
#> scenario name country_code age year sexratio
#> <dbl> <chr> <dbl> <chr> <dbl> <dbl>
#> 1 2 China 156 All 2020 1.05
#> 2 2 Republic of Korea 410 All 2020 1
#> 3 2 China 156 0--4 2020 1.14
#> 4 2 Republic of Korea 410 0--4 2020 1.05
#> 5 2 China 156 5--9 2020 1.16
#> 6 2 Republic of Korea 410 5--9 2020 1.05
#> 7 2 China 156 10--14 2020 1.17
#> 8 2 Republic of Korea 410 10--14 2020 1.07
#> 9 2 China 156 15--19 2020 1.17
#> 10 2 Republic of Korea 410 15--19 2020 1.08
#> # ℹ 34 more rows
Past data is only available for selected indicators. These can be viewed using the version column:
wic_indicators %>%
filter(`wcde-v3` == "past-available") %>%
select(1:2)
#> # A tibble: 28 × 2
#> indicator description
#> <chr> <chr>
#> 1 asfr Age-Specific Fertility Rate
#> 2 assr Age-Specific Survival Ratio
#> 3 bmys Mean Years of Schooling by Broad Age
#> 4 bpop Population Size by Broad Age (000's)
#> 5 bprop Educational Attainment Distribution by Broad Age
#> 6 cbr Crude Birth Rate
#> 7 cdr Crude Death Rate
#> 8 e0 Life Expectancy at Birth
#> 9 epop Population Size by Education (000's)
#> 10 ggapedu15 Gender Gap in Educational Attainment (15+)
#> # ℹ 18 more rows
The filter() function can also be used to filter
specific indicators to specific age, sex or education groups
get_wcde(indicator = "sexratio",
country_name = c("China", "South Korea")) %>%
filter(year == 2020,
age == "All")
#> # A tibble: 2 × 6
#> scenario name country_code age year sexratio
#> <dbl> <chr> <dbl> <chr> <dbl> <dbl>
#> 1 2 China 156 All 2020 1.05
#> 2 2 Republic of Korea 410 All 2020 1
Country names are guessed using the countrycode package.
get_wcde(indicator = "tfr",
country_name = c("U.A.E", "Espania", "Österreich"))
#> # A tibble: 90 × 5
#> scenario name country_code period tfr
#> <dbl> <chr> <dbl> <chr> <dbl>
#> 1 2 United Arab Emirates 784 1950-1955 6.88
#> 2 2 Spain 724 1950-1955 2.49
#> 3 2 Austria 40 1950-1955 2.08
#> 4 2 United Arab Emirates 784 1955-1960 6.81
#> 5 2 Spain 724 1955-1960 2.65
#> 6 2 Austria 40 1955-1960 2.52
#> 7 2 United Arab Emirates 784 1960-1965 6.65
#> 8 2 Spain 724 1960-1965 2.82
#> 9 2 Austria 40 1960-1965 2.78
#> 10 2 United Arab Emirates 784 1965-1970 6.50
#> # ℹ 80 more rows
The get_wcde() functions accepts ISO alpha numeric codes
for countries via the country_code argument:
get_wcde(indicator = "etfr", country_code = c(44, 100))
#> # A tibble: 192 × 6
#> scenario name country_code education period etfr
#> <dbl> <chr> <dbl> <chr> <chr> <dbl>
#> 1 2 Bahamas 44 No Education 2020-2025 2.2
#> 2 2 Bulgaria 100 No Education 2020-2025 1.86
#> 3 2 Bahamas 44 Incomplete Primary 2020-2025 2.2
#> 4 2 Bulgaria 100 Incomplete Primary 2020-2025 1.86
#> 5 2 Bahamas 44 Primary 2020-2025 2.2
#> 6 2 Bulgaria 100 Primary 2020-2025 1.86
#> 7 2 Bahamas 44 Lower Secondary 2020-2025 1.74
#> 8 2 Bulgaria 100 Lower Secondary 2020-2025 1.86
#> 9 2 Bahamas 44 Upper Secondary 2020-2025 1.46
#> 10 2 Bulgaria 100 Upper Secondary 2020-2025 1.51
#> # ℹ 182 more rows
A full list of available countries and region aggregates, and their
codes, can be found in the wic_locations data frame.
wic_locations
#> # A tibble: 232 × 8
#> name isono continent region dim `wcde-v3` `wcde-v2` `wcde-v1`
#> <chr> <dbl> <chr> <chr> <chr> <lgl> <lgl> <lgl>
#> 1 World 900 <NA> <NA> area TRUE TRUE TRUE
#> 2 Africa 903 <NA> <NA> area TRUE TRUE TRUE
#> 3 Asia 935 <NA> <NA> area TRUE TRUE TRUE
#> 4 Europe 908 <NA> <NA> area TRUE TRUE TRUE
#> 5 Latin America and… 904 <NA> <NA> area TRUE TRUE TRUE
#> 6 Northern America 905 <NA> <NA> area TRUE TRUE TRUE
#> 7 Oceania 909 <NA> <NA> area TRUE TRUE TRUE
#> 8 Afghanistan 4 Asia South… coun… TRUE TRUE TRUE
#> 9 Albania 8 Europe South… coun… TRUE TRUE TRUE
#> 10 Algeria 12 Africa North… coun… TRUE TRUE TRUE
#> # ℹ 222 more rows
By default get_wcde() returns data for Medium (SSP2)
scenario. Results for different SSP scenarios can be returned by passing
a different (or multiple) scenario values to the scenario
argument in get_data().
get_wcde(indicator = "growth",
country_name = c("India", "China"),
scenario = c(1:3, 22, 23)) %>%
filter(period == "2095-2100")
#> # A tibble: 10 × 5
#> scenario name country_code period growth
#> <dbl> <chr> <dbl> <chr> <dbl>
#> 1 1 India 356 2095-2100 -1
#> 2 1 China 156 2095-2100 -1.1
#> 3 2 India 356 2095-2100 -0.5
#> 4 2 China 156 2095-2100 -1
#> 5 3 India 356 2095-2100 0.2
#> 6 3 China 156 2095-2100 -0.4
#> 7 22 India 356 2095-2100 -0.5
#> 8 22 China 156 2095-2100 -1
#> 9 23 India 356 2095-2100 -0.5
#> 10 23 China 156 2095-2100 -1
Set include_scenario_names = TRUE to include a columns
with the full names of the scenarios
get_wcde(indicator = "tfr",
country_name = c("Kenya", "Nigeria", "Algeria"),
scenario = 1:3,
include_scenario_names = TRUE) %>%
filter(period == "2045-2050")
#> # A tibble: 9 × 7
#> scenario scenario_name scenario_abb name country_code period tfr
#> <dbl> <chr> <chr> <chr> <dbl> <chr> <dbl>
#> 1 1 Rapid Development (SSP1) SSP1 Kenya 404 2045-… 1.64
#> 2 1 Rapid Development (SSP1) SSP1 Nige… 566 2045-… 2.62
#> 3 1 Rapid Development (SSP1) SSP1 Alge… 12 2045-… 1.53
#> 4 2 Medium (SSP2) SSP2 Kenya 404 2045-… 2.34
#> 5 2 Medium (SSP2) SSP2 Nige… 566 2045-… 3.76
#> 6 2 Medium (SSP2) SSP2 Alge… 12 2045-… 2.06
#> 7 3 Stalled Development (SS… SSP3 Kenya 404 2045-… 3.04
#> 8 3 Stalled Development (SS… SSP3 Nige… 566 2045-… 4.87
#> 9 3 Stalled Development (SS… SSP3 Alge… 12 2045-… 2.68
Additional details of the pathways for each scenario numeric code can
be found in the wic_scenarios object. Further background
and links to the corresponding literature are provided in the Data
Explorer
wic_scenarios
#> # A tibble: 9 × 6
#> scenario_name scenario scenario_abb `wcde-v3` `wcde-v2` `wcde-v1`
#> <chr> <dbl> <chr> <lgl> <lgl> <lgl>
#> 1 Rapid Development (SSP1) 1 SSP1 TRUE TRUE TRUE
#> 2 Medium (SSP2) 2 SSP2 TRUE TRUE TRUE
#> 3 Stalled Development (SSP3) 3 SSP3 TRUE TRUE TRUE
#> 4 Inequality (SSP4) 4 SSP4 TRUE FALSE TRUE
#> 5 Conventional Development … 5 SSP5 TRUE FALSE TRUE
#> 6 Medium - Zero Migration (… 22 SSP2-ZM TRUE TRUE FALSE
#> 7 Medium - Double Migration… 23 SSP2-DM TRUE TRUE FALSE
#> 8 Medium - Constant Enrolme… 20 SSP2-CER FALSE FALSE TRUE
#> 9 Medium - Fast Track Educa… 21 SSP2-FT FALSE FALSE TRUE
Data for all countries can be obtained by not setting
country_name or country_code
get_wcde(indicator = "mage")
#> # A tibble: 6,858 × 5
#> scenario name country_code year mage
#> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 2 Bulgaria 100 1950 26.9
#> 2 2 Myanmar 104 1950 21.8
#> 3 2 Burundi 108 1950 19.5
#> 4 2 Belarus 112 1950 26.6
#> 5 2 Cambodia 116 1950 18.6
#> 6 2 Algeria 12 1950 19.5
#> 7 2 Cameroon 120 1950 20.4
#> 8 2 Canada 124 1950 27.7
#> 9 2 Cape Verde 132 1950 17.9
#> 10 2 Central African Republic 140 1950 22.4
#> # ℹ 6,848 more rows
The get_wdce() function needs to be called multiple
times to download multiple indicators. This can be done using the
map() function in purrr
mi <- tibble(ind = c("odr", "nirate", "ggapedu25")) %>%
mutate(d = map(.x = ind, .f = ~get_wcde(indicator = .x)))
mi
#> # A tibble: 3 × 2
#> ind d
#> <chr> <list>
#> 1 odr <tibble [6,858 × 5]>
#> 2 nirate <tibble [6,630 × 5]>
#> 3 ggapedu25 <tibble [52,776 × 6]>
mi %>%
filter(ind == "odr") %>%
select(-ind) %>%
unnest(cols = d)
#> # A tibble: 6,858 × 5
#> scenario name country_code year odr
#> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 2 Bulgaria 100 1950 0.1
#> 2 2 Myanmar 104 1950 0.05
#> 3 2 Burundi 108 1950 0.06
#> 4 2 Belarus 112 1950 0.13
#> 5 2 Cambodia 116 1950 0.05
#> 6 2 Algeria 12 1950 0.06
#> 7 2 Cameroon 120 1950 0.06
#> 8 2 Canada 124 1950 0.12
#> 9 2 Cape Verde 132 1950 0.07
#> 10 2 Central African Republic 140 1950 0.09
#> # ℹ 6,848 more rows
mi %>%
filter(ind == "nirate") %>%
select(-ind) %>%
unnest(cols = d)
#> # A tibble: 6,630 × 5
#> scenario name country_code period nirate
#> <dbl> <chr> <dbl> <chr> <dbl>
#> 1 2 Bulgaria 100 1950-1955 11
#> 2 2 Myanmar 104 1950-1955 19.8
#> 3 2 Burundi 108 1950-1955 27.9
#> 4 2 Belarus 112 1950-1955 9.9
#> 5 2 Cambodia 116 1950-1955 25.2
#> 6 2 Algeria 12 1950-1955 29.1
#> 7 2 Cameroon 120 1950-1955 16.2
#> 8 2 Canada 124 1950-1955 20
#> 9 2 Cape Verde 132 1950-1955 27.4
#> 10 2 Central African Republic 140 1950-1955 15.1
#> # ℹ 6,620 more rows
mi %>%
filter(ind == "ggapedu25") %>%
select(-ind) %>%
unnest(cols = d)
#> # A tibble: 52,776 × 6
#> scenario name country_code year education ggapedu25
#> <dbl> <chr> <dbl> <dbl> <chr> <dbl>
#> 1 2 Bulgaria 100 1950 No Education -17.8
#> 2 2 Myanmar 104 1950 No Education -19.3
#> 3 2 Burundi 108 1950 No Education -22.9
#> 4 2 Belarus 112 1950 No Education -19.8
#> 5 2 Cambodia 116 1950 No Education -27.3
#> 6 2 Algeria 12 1950 No Education -1.2
#> 7 2 Cameroon 120 1950 No Education -19
#> 8 2 Canada 124 1950 No Education 0
#> 9 2 Cape Verde 132 1950 No Education -23.7
#> 10 2 Central African Republic 140 1950 No Education -20.2
#> # ℹ 52,766 more rows
Previous versions of projections from the Wittgenstein Centre for
Demography are available using the version argument in
get_wdce(). Set version to "wcde-v1"
or "wcde-v2"
or "wcde-v3"
(the default since 2024).
get_wcde(indicator = "etfr",
country_name = c("Brazil", "Albania"),
version = "wcde-v2")
#> # A tibble: 204 × 6
#> scenario name country_code education period etfr
#> <dbl> <chr> <dbl> <chr> <chr> <dbl>
#> 1 2 Brazil 76 No Education 2015-2020 2.47
#> 2 2 Albania 8 No Education 2015-2020 1.88
#> 3 2 Brazil 76 Incomplete Primary 2015-2020 2.47
#> 4 2 Albania 8 Incomplete Primary 2015-2020 1.88
#> 5 2 Brazil 76 Primary 2015-2020 2.47
#> 6 2 Albania 8 Primary 2015-2020 1.88
#> 7 2 Brazil 76 Lower Secondary 2015-2020 1.89
#> 8 2 Albania 8 Lower Secondary 2015-2020 1.9
#> 9 2 Brazil 76 Upper Secondary 2015-2020 1.37
#> 10 2 Albania 8 Upper Secondary 2015-2020 1.57
#> # ℹ 194 more rows
Note, not all indicators and scenarios are available in all versions
- see the the wic_indicators and wic_scenarios
objects for further details or see above.
If you have trouble with connecting to the IIASA server you can try
alternative hosts using the server option in
get_wcde(), which can be set to "iiasa"
(default) "github" and "1&1".
get_wcde(indicator = "etfr",
country_name = c("Brazil", "Albania"),
version = "wcde-v2", server = "github")
#> # A tibble: 204 × 6
#> scenario name country_code education period etfr
#> <dbl> <chr> <dbl> <chr> <chr> <dbl>
#> 1 2 Brazil 76 No Education 2015-2020 2.47
#> 2 2 Albania 8 No Education 2015-2020 1.88
#> 3 2 Brazil 76 Incomplete Primary 2015-2020 2.47
#> 4 2 Albania 8 Incomplete Primary 2015-2020 1.88
#> 5 2 Brazil 76 Primary 2015-2020 2.47
#> 6 2 Albania 8 Primary 2015-2020 1.88
#> 7 2 Brazil 76 Lower Secondary 2015-2020 1.89
#> 8 2 Albania 8 Lower Secondary 2015-2020 1.9
#> 9 2 Brazil 76 Upper Secondary 2015-2020 1.37
#> 10 2 Albania 8 Upper Secondary 2015-2020 1.57
#> # ℹ 194 more rows
You may also set server = "search-available" to search
through the three possible data location to download the data wherever
it is available.
Population data for a range of age-sex-educational attainment
combinations can be obtained by setting indicator = "pop"
in get_wcde() and specifying a pop_age,
pop_sex and pop_edu arguments. By default each
of the three population breakdown arguments are set to “total”
get_wcde(indicator = "pop", country_name = "India")
#> # A tibble: 31 × 5
#> scenario name country_code year pop
#> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 2 India 356 1950 353104.
#> 2 2 India 356 1955 394095
#> 3 2 India 356 1960 440828.
#> 4 2 India 356 1965 494677.
#> 5 2 India 356 1970 551306.
#> 6 2 India 356 1975 616603.
#> 7 2 India 356 1980 688875.
#> 8 2 India 356 1985 771520.
#> 9 2 India 356 1990 861205.
#> 10 2 India 356 1995 954786.
#> # ℹ 21 more rows
The pop_age argument can be set to all to
get population data broken down in five-year age groups. The
pop_sex argument can be set to both to get
population data broken down into female and male groups. The
pop_edu argument can be set to four,
six or eight to get population data broken
down into education categorizations with different levels of detail.
get_wcde(indicator = "pop", country_code = 900, pop_edu = "four")
#> # A tibble: 155 × 6
#> scenario name country_code year education pop
#> <dbl> <fct> <dbl> <dbl> <fct> <dbl>
#> 1 2 World 900 1950 Under 15 857662.
#> 2 2 World 900 1950 No Education 779147.
#> 3 2 World 900 1950 Primary 495037.
#> 4 2 World 900 1950 Secondary 320938.
#> 5 2 World 900 1950 Post Secondary 24890.
#> 6 2 World 900 1955 Under 15 971774.
#> 7 2 World 900 1955 No Education 779160.
#> 8 2 World 900 1955 Primary 548060
#> 9 2 World 900 1955 Secondary 384565.
#> 10 2 World 900 1955 Post Secondary 35093.
#> # ℹ 145 more rows
The population breakdown arguments can be used in combination to provide further breakdowns, for example sex and education specific population totals
get_wcde(indicator = "pop", country_code = 900, pop_edu = "six", pop_sex = "both")
#> # A tibble: 434 × 7
#> scenario name country_code year sex education pop
#> <dbl> <fct> <dbl> <dbl> <fct> <fct> <dbl>
#> 1 2 World 900 1950 Male Under 15 438532.
#> 2 2 World 900 1950 Male No Education 333362
#> 3 2 World 900 1950 Male Incomplete Primary 109944.
#> 4 2 World 900 1950 Male Primary 167428.
#> 5 2 World 900 1950 Male Lower Secondary 102241.
#> 6 2 World 900 1950 Male Upper Secondary 66705.
#> 7 2 World 900 1950 Male Post Secondary 16416.
#> 8 2 World 900 1950 Female Under 15 419131.
#> 9 2 World 900 1950 Female No Education 445785.
#> 10 2 World 900 1950 Female Incomplete Primary 72736.
#> # ℹ 424 more rows
The full age-sex-education specific data can also be obtained by
setting indicator = "epop" in get_wcde().
Create population pyramids by setting male population values to negative equivalent to allow for divergent columns from the y axis.
w <- get_wcde(indicator = "pop", country_code = 900,
pop_age = "all", pop_sex = "both", pop_edu = "four",
version = "wcde-v3")
w
#> # A tibble: 4,650 × 8
#> scenario name country_code year age sex education pop
#> <dbl> <fct> <dbl> <dbl> <fct> <fct> <fct> <dbl>
#> 1 2 World 900 1950 0--4 Male Under 15 170451.
#> 2 2 World 900 1950 0--4 Female Under 15 163206.
#> 3 2 World 900 1950 5--9 Male Under 15 136460.
#> 4 2 World 900 1950 5--9 Female Under 15 130304.
#> 5 2 World 900 1950 10--14 Male Under 15 131621.
#> 6 2 World 900 1950 10--14 Female Under 15 125620
#> 7 2 World 900 1950 15--19 Male No Education 33724.
#> 8 2 World 900 1950 15--19 Male Primary 49895
#> 9 2 World 900 1950 15--19 Male Secondary 35984.
#> 10 2 World 900 1950 15--19 Male Post Secondary 312.
#> # ℹ 4,640 more rows
w <- w %>%
mutate(pop_pm = ifelse(test = sex == "Male", yes = -pop, no = pop),
pop_pm = pop_pm/1e3)
w
#> # A tibble: 4,650 × 9
#> scenario name country_code year age sex education pop pop_pm
#> <dbl> <fct> <dbl> <dbl> <fct> <fct> <fct> <dbl> <dbl>
#> 1 2 World 900 1950 0--4 Male Under 15 1.70e5 -170.
#> 2 2 World 900 1950 0--4 Female Under 15 1.63e5 163.
#> 3 2 World 900 1950 5--9 Male Under 15 1.36e5 -136.
#> 4 2 World 900 1950 5--9 Female Under 15 1.30e5 130.
#> 5 2 World 900 1950 10--14 Male Under 15 1.32e5 -132.
#> 6 2 World 900 1950 10--14 Female Under 15 1.26e5 126.
#> 7 2 World 900 1950 15--19 Male No Education 3.37e4 -33.7
#> 8 2 World 900 1950 15--19 Male Primary 4.99e4 -49.9
#> 9 2 World 900 1950 15--19 Male Secondary 3.60e4 -36.0
#> 10 2 World 900 1950 15--19 Male Post Seconda… 3.12e2 -0.312
#> # ℹ 4,640 more rows
Use standard ggplot code to create population pyramid with
scale_x_symmetric() from the lemon
package to allow for equal male and female x-axiswic_col4 object in the wcde
package which contains the names of the colours used in the Wittgenstein
Centre Human Capital Data Explorer Data Explorer.Note wic_col6 and wic_col8 objects also
exist for equivalent plots of population data objects with corresponding
numbers of categories of education.
library(lemon)
w %>%
filter(year == 2020) %>%
ggplot(mapping = aes(x = pop_pm, y = age, fill = fct_rev(education))) +
geom_col() +
geom_vline(xintercept = 0, colour = "black") +
scale_x_symmetric(labels = abs) +
scale_fill_manual(values = wic_col4, name = "Education") +
labs(x = "Population (millions)", y = "Age") +
theme_bw()
Add male and female labels on the x-axis by
geom_blank() to allow for equal x-axis and
additional space at the end of largest columns.w <- w %>%
mutate(pop_max = ifelse(sex == "Male", -max(pop/1e3), max(pop/1e3)))
w %>%
filter(year == 2020) %>%
ggplot(mapping = aes(x = pop_pm, y = age, fill = fct_rev(education))) +
geom_col() +
geom_vline(xintercept = 0, colour = "black") +
scale_x_continuous(labels = abs, expand = c(0, 0)) +
scale_fill_manual(values = wic_col4, name = "Education") +
labs(x = "Population (millions)", y = "Age") +
facet_wrap(facets = "sex", scales = "free_x", strip.position = "bottom") +
geom_blank(mapping = aes(x = pop_max * 1.1)) +
theme(panel.spacing.x = unit(0, "pt"),
strip.placement = "outside",
strip.background = element_rect(fill = "transparent"),
strip.text.x = element_text(margin = margin( b = 0, t = 0)))
Animate the pyramid through the past data and projection periods
using the transition_time() function in the gganimate
package
library(gganimate)
ggplot(data = w,
mapping = aes(x = pop_pm, y = age, fill = fct_rev(education))) +
geom_col() +
geom_vline(xintercept = 0, colour = "black") +
scale_x_continuous(labels = abs, expand = c(0, 0)) +
scale_fill_manual(values = wic_col4, name = "Education") +
facet_wrap(facets = "sex", scales = "free_x", strip.position = "bottom") +
geom_blank(mapping = aes(x = pop_max * 1.1)) +
theme(panel.spacing.x = unit(0, "pt"),
strip.placement = "outside",
strip.background = element_rect(fill = "transparent"),
strip.text.x = element_text(margin = margin(b = 0, t = 0))) +
transition_time(time = year) +
labs(x = "Population (millions)", y = "Age",
title = 'SSP2 World Population {round(frame_time)}')
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.