The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
You can install the development version from GitHub with:
::install_github("rOpenGov/regions") devtools
or the released version from CRAN:
install.packages("regions")
You can review the complete package documentation on regions.dataobservaotry.eu. If you find any problems with the code, please raise an issue on Github. Pull requests are welcome if you agree with the Contributor Code of Conduct
If you use regions
in your work, please cite the package.
In international comparison, using nationally aggregated indicators often have many disadvantages, which result from the very different levels of homogeneity, but also from the often very limited observation numbers in a cross-sectional analysis. When comparing European countries, a few missing cases can limit the cross-section of countries to around 20 cases which disallows the use of many analytical methods. Working with sub-national statistics has many advantages: the similarity of the aggregation level and high number of observations can allow more precise control of model parameters and errors, and the number of observations grows from 20 to 200-300.
Yet the change from national to sub-national level comes with a huge data processing price. While national boundaries are relatively stable, with only a handful of changes in each recent decade. The change of national boundaries requires a more-or-less global consensus. But states are free to change their internal administrative boundaries, and they do it with large frequency. This means that the names, identification codes and boundary definitions of sub-national regions change very frequently. Joining data from different sources and different years can be very difficult.
There are numerous advantages of switching from a national level of the analysis to a sub-national level comes with a huge price in data processing, validation and imputation. The regions package aims to help this process.
This package is an offspring of the eurostat package on rOpenGov. It started as a tool to validate and re-code regional Eurostat statistics, but it aims to be a general solution for all sub-national statistics. It will be developed parallel with other rOpenGov packages.
Frequent boundary changes: as opposed to national boundaries, the territorial units, typologies are often change, and this makes the validation and recoding of observation necessary across time. For example, in the European Union, sub-national typologies change about every three years and you have to make sure that you compare the right French region in time, or, if you can make the time-wise comparison at all.
library(regions)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
<- data.frame (
example_df geo = c("FR", "DEE32", "UKI3" ,
"HU12", "DED",
"FRK"),
values = runif(6, 0, 100 ),
stringsAsFactors = FALSE )
recode_nuts(dat = example_df,
nuts_year = 2013) %>%
select ( .data$geo, .data$values, .data$code_2013) %>%
::kable() knitr
geo | values | code_2013 |
---|---|---|
FR | 25.502412 | FR |
UKI3 | 2.070591 | UKI3 |
DED | 40.306681 | DED |
FRK | 42.378169 | FR7 |
HU12 | 48.944542 | NA |
DEE32 | 98.442229 | NA |
Hierarchical aggregation and special imputation: missingness is very frequent in sub-national statistics, because they are created with a serious time-lag compared to national ones, and because they are often not back-casted after boundary changes. You cannot use standard imputation algorithms because the observations are not similarly aggregated or averaged. Often, the information is seemingly missing, and it is present with an obsolete typology code. This is a basic example which shows you how to impute data from a larger territorial unit, such as a national statistic, to lower territorial units:
library(regions)
<- data.frame (
upstream country_code = rep("AU", 2),
year = c(2019:2020),
my_var = c(10,12)
)
<- australia_states
downstream
<- impute_down (
imputed upstream_data = upstream,
downstream_data = downstream,
country_var = "country_code",
regional_code = "geo_code",
values_var = "my_var",
time_var = "year" )
::kable(imputed) knitr
geo_code | year | geo_name | country_code | my_var | method |
---|---|---|---|---|---|
AU-NSW | 2019 | New South Wales state | AU | 10 | imputed from AU actual |
AU-QLD | 2019 | Queensland state | AU | 10 | imputed from AU actual |
AU-SA | 2019 | South Australia state | AU | 10 | imputed from AU actual |
AU-TAS | 2019 | Tasmania state | AU | 10 | imputed from AU actual |
AU-VIC | 2019 | Victoria state | AU | 10 | imputed from AU actual |
AU-WA | 2019 | Western Australia state | AU | 10 | imputed from AU actual |
AU-ACT | 2019 | Australian Capital Territory territory | AU | 10 | imputed from AU actual |
AU-NT | 2019 | Northern Territory territory | AU | 10 | imputed from AU actual |
AU-NSW | 2020 | New South Wales state | AU | 12 | imputed from AU actual |
AU-QLD | 2020 | Queensland state | AU | 12 | imputed from AU actual |
AU-SA | 2020 | South Australia state | AU | 12 | imputed from AU actual |
AU-TAS | 2020 | Tasmania state | AU | 12 | imputed from AU actual |
AU-VIC | 2020 | Victoria state | AU | 12 | imputed from AU actual |
AU-WA | 2020 | Western Australia state | AU | 12 | imputed from AU actual |
AU-ACT | 2020 | Australian Capital Territory territory | AU | 12 | imputed from AU actual |
AU-NT | 2020 | Northern Territory territory | AU | 12 | imputed from AU actual |
ISO-3166-2
based Google
data and the European UnionNUTS1
to NUTS2
or
from ISO-3166-1
to ISO-3166-2
(impute
down)NUTS1
or from ISO-3166-2
to ISO-3166-1
(aggregate; under development)NUTS1
to NUTS2
or from
ISO-3166-1
to ISO-3166-2
(disaggregate; under
development)We started building an experimental APIs data is running regions regularly and improving known statistical data sources. See: Digital Music Observatory, Green Deal Data Observatory, Economy Data Observatory.
Thanks for @KKulma for the improved continous integration on Github.
Please note that the regions project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.