The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Youzhi Yu University of Chicago
Joining country-level data across independent sources is deceptively
hard: the same country is spelled "US",
"U.S.", "United States" and
"United States of America", and a naïve join treats them as
different entities. countryatlas resolves this friction
by adopting ISO 3166 codes as a universal join key and by stitching
together three otherwise disjoint resources — map geometry, World Bank
development indicators, and a comprehensive country-code crosswalk —
into a single, map-ready table. This vignette presents the package’s
design philosophy, its complete functional vocabulary, and worked
examples spanning data assembly, the join engine, diagnostics, reference
data, analysis helpers and a full grammar of honest cartographic
displays. All examples run offline against a bundled data snapshot.
The package rests on a single conviction: if a task does not make it easier to get country data onto a map — or to make that map honest — it does not belong here. Concretely, three packages are combined:
ggplot2::map_data("world") (or Natural
Earth via sf) supplies polygon geometry,
i.e. where countries are;WDI supplies World Bank indicators,
i.e. what is true about them;countrycode supplies the crosswalk of
ISO codes, continents and regions that makes a reliable join
possible.Three design commitments follow. First, the happy path is one
call: world_data(2020) returns a tibble ready to
map. Second, the ISO code is the spine: every function
speaks iso3c/iso2c internally and exposes it,
so anything the package produces joins to anything else — and to the
user’s own data. Third, no country is lost silently:
entities that map backends spell idiosyncratically are matched
through a curated override table rather than dropped, and unmatched
values are reported explicitly.
To keep every example reproducible without a network connection, this
vignette uses the bundled world_snapshot dataset, a curated
set of indicators for one recent year.
snapshot <- world_snapshot$countries
dplyr::glimpse(snapshot)
#> Rows: 215
#> Columns: 10
#> $ iso3c <chr> "AFG", "ALB", "DZA", "ASM", "AND", "AGO", "ATG", "ARG"…
#> $ iso2c <chr> "AF", "AL", "DZ", "AS", "AD", "AO", "AG", "AR", "AM", …
#> $ country <chr> "Afghanistan", "Albania", "Algeria", "American Samoa",…
#> $ continent <chr> "Asia", "Europe", "Africa", "Oceania", "Europe", "Afri…
#> $ region <chr> "South Asia", "Europe & Central Asia", "Middle East & …
#> $ income <fct> Low income, Upper middle income, Upper middle income, …
#> $ gdp_per_capita <dbl> 377.6656, 5867.6510, 4544.4669, 13709.0975, 39780.4153…
#> $ population <dbl> 40578842, 2451636, 45477389, 48342, 79705, 35635029, 9…
#> $ life_expectancy <dbl> 65.61700, 78.76900, 76.12900, 72.75200, 84.01600, 64.2…
#> $ co2_per_capita <dbl> 0.278420956, 1.845257616, 4.160058529, 0.002068595, NA…world_data()The headline function is generalised but backward-compatible. The classic call returns the polygon-backed, enriched tibble exactly as before:
# Live World Bank API call (not evaluated here to keep the vignette offline):
world_data(2020)
world_data(
2020,
indicator = c(life_exp = "SP.DYN.LE00.IN", co2 = "EN.GHG.CO2.PC.CE.AR5"),
geometry = "sf",
region = "Africa"
)The indicator argument accepts one or many WDI codes; a
named vector drives clean column names. A range of
years (2000:2020) yields a panel keyed on
iso3c and year. The geometry
argument switches between the classic "polygon" backend, a
modern "sf" backend with real projections, and
"none" for pure analysis.
world_map() encapsulates the plotting boilerplate and
offers several honest styles. A continuous fill on a skewed indicator
hides most of the variation, so binned and quantile styles are
first-class:
Totals (population, total emissions) are misrepresented by a choropleth because large values hide in small countries. A bubble map at country centroids is the right idiom:
Tiny states vanish on a geographic map. An equal-area tile grid gives every country the same visual weight:
Origin–destination data (trade, migration, flights) is drawn as great-circle arcs, with both endpoints resolved to centroids automatically:
flows <- data.frame(
from = c("China", "Germany", "Brazil", "India"),
to = c("United States", "France", "Japan", "United Kingdom"),
weight = c(500, 200, 150, 120)
)
flow_map(flows, from, to, weight)The package’s mission, exposed for the reader’s own data. Given a
frame keyed on messy names, standardize_country() attaches
ISO codes and classifications:
messy <- data.frame(
nation = c("U.S.", "S. Korea", "Czechia", "Kosovo", "Cote d'Ivoire"),
value = c(10, 8, 6, 4, 7)
)
standardize_country(messy, nation, warn = FALSE)
#> # A tibble: 5 × 6
#> nation value iso3c iso2c continent region
#> <chr> <dbl> <chr> <chr> <chr> <chr>
#> 1 U.S. 10 USA US Americas North America
#> 2 S. Korea 8 KOR KR Asia East Asia & Pacific
#> 3 Czechia 6 CZE CZ Europe Europe & Central Asia
#> 4 Kosovo 4 XKX XK Europe Europe & Central Asia
#> 5 Cote d'Ivoire 7 CIV CI Africa Sub-Saharan Africajoin_world() goes one step further — auto-detecting the
country column, standardising it and attaching geometry — while
country_join() reconciles two independent tables that each
key on country names:
left <- data.frame(country = c("Czechia", "South Korea"), gdp = c(1, 2))
right <- data.frame(nation = c("Czech Republic", "Korea, Rep."), pop = c(10, 51))
country_join(left, right, country, nation)
#> # A tibble: 2 × 5
#> country gdp iso3c nation pop
#> <chr> <dbl> <chr> <chr> <dbl>
#> 1 Czechia 1 CZE Czech Republic 10
#> 2 South Korea 2 KOR Korea, Rep. 51check_country_match() is a pre-flight report;
wdj_overrides() is the curated match table that replaces
the old drop-list; and audit_coverage() reports missingness
before a half-empty map is published.
check_country_match(c("USA", "Cote d'Ivoire", "Yugoslavia", "Wakanda"))
#> # A tibble: 4 × 4
#> input iso3c matched suggestion
#> <chr> <chr> <lgl> <chr>
#> 1 USA USA TRUE <NA>
#> 2 Cote d'Ivoire CIV TRUE <NA>
#> 3 Yugoslavia <NA> FALSE Yugoslavia
#> 4 Wakanda <NA> FALSE Canadaaudit_coverage(snapshot)$na_rates
#> # A tibble: 4 × 4
#> indicator n n_missing na_rate
#> <chr> <int> <int> <dbl>
#> 1 gdp_per_capita 215 9 0.0419
#> 2 population 215 0 0
#> 3 life_expectancy 215 0 0
#> 4 co2_per_capita 215 12 0.0558The entities the previous version dropped — Kosovo, Micronesia, the Virgin Islands and a dozen others — are now matched:
dropped <- c("Kosovo", "Micronesia", "Virgin Islands", "Canary Islands",
"Saint Martin")
standardize_country(data.frame(region = dropped), region, warn = FALSE)
#> # A tibble: 5 × 4
#> iso3c iso2c continent region
#> <chr> <chr> <chr> <chr>
#> 1 XKX XK Europe Europe & Central Asia
#> 2 FSM FM Oceania East Asia & Pacific
#> 3 VIR VI Americas Latin America & Caribbean
#> 4 ESP ES Europe Europe & Central Asia
#> 5 MAF MF Americas Latin America & Caribbeanconvert_country() exposes the full countrycode
vocabulary with first-class shortcuts for the high-value schemes:
convert_country(c("Japan", "Brazil", "Germany"), to = "flag")
#> [1] "🇯🇵" "🇧🇷" "🇩🇪"
convert_country(c("Japan", "Brazil", "Germany"), to = "currency")
#> [1] "JPY" "BRL" "EUR"Country-group membership is a curated, dated table:
country_groups("G7")
#> # A tibble: 7 × 3
#> group iso3c country
#> <chr> <chr> <chr>
#> 1 G7 CAN Canada
#> 2 G7 FRA France
#> 3 G7 DEU Germany
#> 4 G7 ITA Italy
#> 5 G7 JPN Japan
#> 6 G7 GBR United Kingdom
#> 7 G7 USA United States
in_group(c("France", "United States", "Japan", "Brazil"), "EU")
#> [1] TRUE FALSE FALSE FALSEThe package also bundles country_meta (static
per-country attributes), common_indicators (a friendly
indicator catalogue), country_groups_tbl and
world_tiles.
Small, in-spirit transforms that keep an analysis from leaving the package mid-pipeline:
snapshot |>
rank_countries(gdp_per_capita) |>
filter(rank <= 5) |>
select(country, gdp_per_capita, rank, percentile)
#> # A tibble: 5 × 4
#> country gdp_per_capita rank percentile
#> <chr> <dbl> <int> <dbl>
#> 1 Bermuda 109643. 2 0.995
#> 2 Ireland 97794. 4 0.985
#> 3 Luxembourg 107467. 3 0.990
#> 4 Monaco 214360. 1 1
#> 5 Switzerland 90605. 5 0.980snapshot |>
aggregate_regions(population, by = "region", fun = "sum")
#> # A tibble: 8 × 2
#> region population
#> <chr> <dbl>
#> 1 East Asia & Pacific 2356430840
#> 2 Europe & Central Asia 922299160
#> 3 Latin America & Caribbean 649887983
#> 4 Middle East & North Africa 498069857
#> 5 North America 373018004
#> 6 South Asia 1932289074
#> 7 Sub-Saharan Africa 1229208573
#> 8 <NA> 3220137World Bank fetches are memoised with an optional on-disk cache, and
multiple indicators are fetched in parallel where the platform supports
forking. The bundled world_snapshot makes every example
here run without the network. The cache can be cleared with
clear_wdi_cache().
countryatlas keeps its original soul — ISO codes as the
universal join key, one call to a map-ready table — and extends it into
a complete toolkit: any indicator and any year span, a modern
sf backend, an exposed join engine for the user’s own data,
honest diagnostics, curated reference data, analysis helpers, and a full
vocabulary of projected, area-honest maps.
sessionInfo()
#> R version 4.4.1 (2024-06-14)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Red Hat Enterprise Linux 8.4 (Ootpa)
#>
#> Matrix products: default
#> BLAS/LAPACK: /software/openblas-0.2.19-el8-x86_64/lib/libopenblas_haswellp-r0.2.19.so; LAPACK version 3.6.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/Chicago
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] dplyr_1.1.4 ggplot2_4.0.3 countryatlas_1.0.0
#>
#> loaded via a namespace (and not attached):
#> [1] gtable_0.3.6 jsonlite_2.0.0 compiler_4.4.1 maps_3.4.3
#> [5] tidyselect_1.2.1 parallel_4.4.1 countrycode_1.8.0 jquerylib_0.1.4
#> [9] scales_1.4.0 yaml_2.3.10 fastmap_1.2.0 R6_2.6.1
#> [13] labeling_0.4.3 generics_0.1.4 classInt_0.4-11 knitr_1.50
#> [17] tibble_3.3.0 bslib_0.9.0 pillar_1.11.0 RColorBrewer_1.1-3
#> [21] rlang_1.1.6 utf8_1.2.6 cachem_1.1.0 xfun_0.53
#> [25] sass_0.4.10 S7_0.2.2 viridisLite_0.4.2 memoise_2.0.1
#> [29] cli_3.6.5 withr_3.0.2 magrittr_2.0.3 class_7.3-23
#> [33] stringdist_0.9.17 digest_0.6.37 grid_4.4.1 lifecycle_1.0.4
#> [37] vctrs_0.6.5 KernSmooth_2.23-26 proxy_0.4-27 evaluate_1.0.5
#> [41] glue_1.8.0 farver_2.1.2 e1071_1.7-16 rmarkdown_2.29
#> [45] tools_4.4.1 pkgconfig_2.0.3 htmltools_0.5.8.1These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.