European differences in educational attainment

Jonas Schöley

2018-05-23

Here I demonstrate how to use the tricolore library to color-code a choropleth map of Europe according to the regional distribution of educational attainment.

library(tricolore)
euro_education
#> # A tibble: 324 x 4
#>    id    ed0_2 ed3_4 ed5_8
#>    <chr> <dbl> <dbl> <dbl>
#>  1 AT11  0.165 0.557 0.279
#>  2 AT12  0.147 0.551 0.302
#>  3 AT13  0.169 0.432 0.399
#>  4 AT21  0.106 0.6   0.294
#>  5 AT22  0.14  0.586 0.274
#>  6 AT31  0.157 0.553 0.291
#>  7 AT32  0.138 0.547 0.315
#>  8 AT33  0.179 0.539 0.282
#>  9 AT34  0.196 0.527 0.277
#> 10 BE10  0.291 0.269 0.441
#> # ... with 314 more rows

The data set euro_education contains the relative share of population by educational attainment in the European regions 2016. The variable id gives the NUTS-2 geocodes for the European regions, the variables ed0_2, ed3_4, ed5_8 give the proportion of population by highest educational attainment classified via the ISCED system.

Take the first row of the data set as an example: in the Austrian region of “Burgenland” (id = AT11) 16.5% of the population aged 25–64 had attained an education of “Lower secondary or less” (ed0_2), 55.7% attained “upper secondary” education (ed3_4), and 27.9% attained “tertiary” education.

The education composition is ternary, i.e. made up from three elements, and therefore can be color-coded as the weighted mixture of three primary colors, each primary mapped to one of the three elements. Such a color scale is called a ternary balance scheme1. This is what tricolore does.

It takes three steps to transform the compositional data and the geodata of Europe into a ternary color-coded map:

1. Using the Tricolore() function, color-code each composition in the euro_education data set and add the resulting vector of hex-srgb colors as a new variable to the data frame. Store the color key seperately.

# color-code the data set and generate a color-key
tric <- Tricolore(euro_education, p1 = 'ed0_2', p2 = 'ed3_4', p3 = 'ed5_8',
                  breaks = 4)
#> Warning: Ignoring unknown aesthetics: z

tric contains both a vector of color-coded compositions (tric$hexsrgb) and the corresponding color key (tric$legend).

# add the vector of colors to the `euro_education` data
euro_education$rgb <- tric$hexsrgb
euro_education
#> # A tibble: 324 x 5
#>    id    ed0_2 ed3_4 ed5_8 rgb      
#>    <chr> <dbl> <dbl> <dbl> <chr>    
#>  1 AT11  0.165 0.557 0.279 #049465FF
#>  2 AT12  0.147 0.551 0.302 #049465FF
#>  3 AT13  0.169 0.432 0.399 #3F7F78FF
#>  4 AT21  0.106 0.6   0.294 #049465FF
#>  5 AT22  0.14  0.586 0.274 #049465FF
#>  6 AT31  0.157 0.553 0.291 #049465FF
#>  7 AT32  0.138 0.547 0.315 #049465FF
#>  8 AT33  0.179 0.539 0.282 #049465FF
#>  9 AT34  0.196 0.527 0.277 #049465FF
#> 10 BE10  0.291 0.269 0.441 #636363FF
#> # ... with 314 more rows

2. Join the color-coded euro_education data frame with a data frame holding the geodata of the European NUTS-2 regions.

tricolore comes with low resolution geodata of the European NUTS-2 regions (euro_geo_nuts2) which I’m gonna use for this map.

# merge the geodata with the color-coded compositional data
euro_educ_map <- dplyr::left_join(euro_education, euro_geo_nuts2, by = 'id')

3. Using ggplot2 and the joined color-coded education data and geodata, plot a ternary choropleth map of education attainment in the European regions. Add the color key to the map.

The secret ingredient is scale_fill_identity() to make sure that each region is colored according to the value in the rgb variable of euro_educ_map.

library(ggplot2)

plot_educ <-
  # using data `euro_educ_map`...
  ggplot(euro_educ_map) +
  # ...draw a polygon for each `group` along `long` and `lat`...
  geom_polygon(aes(x = long, y = lat, group = group, fill = rgb)) +
  # ...and color each region according to the color code in the variable `rgb`
  scale_fill_identity()

plot_educ 

Using annotation_custom() and ggplotGrob we can add the color key produced by Tricolore() to the map. Internally, the color key is produced with the ggtern package. In order for it to render correctly we need to load ggtern after loading ggplot2. Don’t worry, the ggplot2 functions still work.

library(ggtern)
#> --
#> Consider donating at: http://ggtern.com
#> Even small amounts (say $10-50) are very much appreciated!
#> Remember to cite, run citation(package = 'ggtern') for further info.
#> --
#> 
#> Attaching package: 'ggtern'
#> The following objects are masked from 'package:ggplot2':
#> 
#>     %+%, aes, annotate, calc_element, ggplot, ggplotGrob,
#>     ggplot_build, ggplot_gtable, ggsave, layer_data, theme,
#>     theme_bw, theme_classic, theme_dark, theme_gray, theme_light,
#>     theme_linedraw, theme_minimal, theme_void

plot_educ +
  annotation_custom(
    ggplotGrob(tric$legend),
    xmin = 55e5, xmax = Inf, ymin = 35e5, ymax = Inf
  )

Because the color key behaves just like a ggplot2 plot we can change it to our liking.

plot_educ <-
  plot_educ +
  annotation_custom(
    ggplotGrob(tric$legend +
                 theme(plot.background = element_rect(fill = NA, color = NA)) +
                 labs(L = '0-2', T = '3-4', R = '5-8')),
    xmin = 55e5, xmax = Inf, ymin = 35e5, ymax = Inf
  )
plot_educ

Some final touches…2

plot_educ +
  theme_void() +
  labs(title = 'European inequalities in educational attainment',
       subtitle = 'Regional distribution of ISCED education levels for people aged 25-64 in 2016.',
       caption = 'Data by eurostat (edat_lfse_04).')

Literature

Brewer, C. A. (1994). Color Use Guidelines for Mapping and Visualization. In A. M. MacEachren & D. R. F. Taylor (Eds.), Visualization in Modern Cartography (pp. 123–147). Oxford, UK: Pergamon.

Dorling, D. (2012). The Visualization of Spatial Social Structure. Chichester, UK: Wiley. Retrieved from https://sasi.group.shef.ac.uk/thesis/prints.html


  1. See for example Dorling (2012) and Brewer (1994).

  2. In these maps some cities are colored wrong. The problem and a solution are described here. I’ve omitted the fix here for reasons of brevity.