The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
pixieweb makes it easy to download open statistical data from PX-Web APIs — the platform used by Statistics Sweden (SCB), Statistics Norway (SSB), Statistics Finland, and many others. This vignette walks you from zero to a tidy tibble in five steps.
px_api() accepts a short alias ("scb",
"ssb", "statfi") or a full URL. Use
px_api_catalogue() to list known instances.
PX-Web organises data into tables. Each table holds
a data cube with one or more dimensions (called
variables). Use get_tables() to
search:
The result is a tibble. You can narrow it further on the client side
with table_search(), and inspect tables with
table_describe():
table_describe() now shows the subject path, time period
range, and data source alongside the title — making it much easier to
pick the right table.
Once you have a table ID, inspect what variables (dimensions) it has:
Each variable has a set of available values (codes). Look at a specific variable’s values:
Now you know which variables the table has and what values are
available. Pass your selections to get_data():
"*" means “all measures in this
table”.Kon
gives totals for both sexes). Not all variables allow this; see
vignette("introduction-to-pixieweb") for mandatory vs
eliminable.pop <- get_data(scb, "TAB638",
Region = c("0180", "1480"),
ContentsCode = "*",
Tid = px_top(5)
)
popSelection helpers like px_top(), px_from(),
and px_range() let you select values without knowing exact
codes. Use them when you want “the latest N periods” or “everything from
2020 onward” rather than typing out specific year codes.
prepare_query()You can skip this section if you prefer the direct approach above.
prepare_query() inspects the table and fills in sensible
defaults — handy when you don’t want to specify every variable:
It prints a summary of what was chosen and why. When you’re happy,
pass the query to get_data():
Set maximize_selection = TRUE to automatically include
as many variables as the API’s cell limit allows:
The result is a standard tibble. Use your favourite tidyverse tools:
library(ggplot2)
pop |>
ggplot(aes(x = Tid, y = value, colour = Region_text)) +
# One line per region
geom_line(aes(group = Region_text)) +
# Separate panel for each measure (Population, Deaths, etc.)
facet_wrap(~ ContentsCode_text, scales = "free_y") +
# Rotate x-axis labels to avoid overlap
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1)) +
labs(
title = "Population over time",
caption = px_cite(pop) # Auto-generated data citation
)Notice the _text suffix: get_data() returns
both raw code columns (Region = "0180") and human-readable
label columns (Region_text = "Stockholm"). Use
_text columns for display and plotting; use the raw codes
for filtering and joining.
Other useful helpers:
data_minimize() — remove columns where all values are
identicaldata_legend() — generate a caption string from variable
metadatapx_cite() — create a citation for the downloaded
datavignette("introduction-to-pixieweb") covers the data model,
codelists, saved queries, and query composition.vignette("multi-api") shows how to compare data across
national statistics agencies.These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.