R package ropenaq

M. Salmon

2016-09-14

Introduction

This R package is aimed at accessing the openaq API. OpenAQ is a community of scientists, software developers, and lovers of open environmental data who are building an open, real-time database that provides programmatic and historical access to air quality data. See their website at https://openaq.org/ and see the API documentation at https://docs.openaq.org/. The package contains 5 functions that correspond to the 5 different types of query offered by the openaq API: cities, countries, latest, locations and measurements. The package uses the dplyr package: all output tables are data.frame (dplyr “tbl_df”) objects, that can be further processed and analysed.

Finding measurements availability

Three functions of the package allow to get lists of available information. Measurements are obtained from locations that are in cities that are in countries.

The aq_countries function

The aq_countries function allows to see for which countries information is available within the platform. It is the easiest function because it does not have any argument. The code for each country is its ISO 3166-1 alpha-2 code.

library("ropenaq")
countries_table <- aq_countries()
library("knitr")
kable(countries_table)
name code cities locations count
Australia AU 11 28 816513
Bosnia and Herzegovina BA 4 11 166827
Bangladesh BD 1 2 5861
Brazil BR 73 113 1258245
Canada CA 11 157 591249
Chile CL 96 104 1856820
China CN 5 6 50606
Colombia CO 1 1 4338
Ethiopia ET 1 1 635
United Kingdom GB 105 152 1654882
Indonesia ID 2 3 15435
Israel IL 1 1 1826
India IN 34 79 1261824
Mongolia MN 1 12 917464
Mexico MX 5 48 584598
Nigeria NG 1 1 2541
Netherlands NL 67 109 2170561
Peru PE 1 11 209592
Philippines PH 1 1 958
Poland PL 10 15 445616
Singapore SG 1 1 1275
Thailand TH 33 61 1111023
United States US 688 1756 8823333
Viet Nam VN 2 3 12155
Kosovo XK 1 1 3803
attr(countries_table, "meta")
#> # A tibble: 1 × 6
#>         name   license                  website  page limit found
#>       <fctr>    <fctr>                   <fctr> <int> <int> <int>
#> 1 openaq-api CC BY 4.0 https://docs.openaq.org/     1   100    25
attr(countries_table, "timestamp")
#> # A tibble: 1 × 2
#>             lastModif           queriedAt
#>                <dttm>              <dttm>
#> 1 2016-09-14 09:21:55 2016-09-14 09:25:44

The aq_cities function

Using the aq_cities functions one can get all cities for which information is available within the platform. For each city, one gets the number of locations and the count of measures for the city, the URL encoded string, and the country it is in.

cities_table <- aq_cities()
kable(head(cities_table))
city country locations count cityURL
76t TH 1 4 76t
ABBEVILLE US 1 2702 ABBEVILLE
Aberdeen GB 3 26702 Aberdeen
Aberdeen US 2 6525 Aberdeen
ADA US 1 8158 ADA
ADAIR US 1 11253 ADAIR

The optional country argument allows to do this for a given country instead of the whole world.

cities_tableIndia <- aq_cities(country="IN", limit = 10)
kable(cities_tableIndia)
city country locations count cityURL
Agra IN 1 20640 Agra
Ahmedabad IN 1 11525 Ahmedabad
Aurangabad IN 1 4780 Aurangabad
Barddhaman IN 1 2260 Barddhaman
Bengaluru IN 5 72068 Bengaluru
Chandrapur IN 2 31771 Chandrapur
Chennai IN 4 47736 Chennai
Chittoor IN 1 2013 Chittoor
Delhi IN 15 286831 Delhi
Faridabad IN 1 35013 Faridabad

If one inputs a country that is not in the platform (or misspells a code), then an error message is thrown.

#aq_cities(country="PANEM")

The aq_locations function

The aq_locations function has far more arguments than the first two functions. On can filter locations in a given country, city, location, for a given parameter (valid values are “pm25”, “pm10”, “so2”, “no2”, “o3”, “co” and “bc”), from a given date and/or up to a given date, for values between a minimum and a maximum, for a given circle outside a central point by the use of the latitude, longitude and radius arguments. In the output table one also gets URL encoded strings for the city and the location. Below are several examples.

Here we only look for locations with PM2.5 information in Chennai, India.

locations_chennai <- aq_locations(country = "IN", city = "Chennai", parameter = "pm25")
kable(locations_chennai)
location city country sourceName count lastUpdated firstUpdated latitude longitude pm25 pm10 no2 so2 o3 co bc cityURL locationURL
US Diplomatic Post: Chennai Chennai IN StateAir_Chennai 6483 2016-09-14 08:30:00 2015-12-11 21:30:00 13.05237 80.25193 TRUE FALSE FALSE FALSE FALSE FALSE FALSE Chennai US+Diplomatic+Post%3A+Chennai

Getting measurements

Two functions allow to get data: aq_measurement and aq_latest. In both of them the arguments city and location needs to be given as URL encoded strings.

The aq_measurements function

The aq_measurements function has many arguments for getting a query specific to, say, a given parameter in a given location or for a given circle outside a central point by the use of the latitude, longitude and radius arguments. Below we get the PM2.5 measures for Anand Vihar in Delhi in India.

results_table <- aq_measurements(country = "IN", city = "Delhi", location = "Anand+Vihar", parameter = "pm25")
kable(head(results_table))
location parameter value unit country city dateUTC dateLocal latitude longitude cityURL locationURL
Anand Vihar pm25 52 µg/m³ IN Delhi 2016-09-14 08:35:00 2016-09-14 14:05:00 28.6508 77.3152 Delhi Anand+Vihar
Anand Vihar pm25 67 µg/m³ IN Delhi 2016-09-14 08:05:00 2016-09-14 13:35:00 28.6508 77.3152 Delhi Anand+Vihar
Anand Vihar pm25 67 µg/m³ IN Delhi 2016-09-14 07:35:00 2016-09-14 13:05:00 28.6508 77.3152 Delhi Anand+Vihar
Anand Vihar pm25 61 µg/m³ IN Delhi 2016-09-14 07:05:00 2016-09-14 12:35:00 28.6508 77.3152 Delhi Anand+Vihar
Anand Vihar pm25 61 µg/m³ IN Delhi 2016-09-14 06:35:00 2016-09-14 12:05:00 28.6508 77.3152 Delhi Anand+Vihar
Anand Vihar pm25 57 µg/m³ IN Delhi 2016-09-14 06:05:00 2016-09-14 11:35:00 28.6508 77.3152 Delhi Anand+Vihar

One could also get all possible parameters in the same table.

The aq_latest function

This function gives a table with all newest measures for the locations that are chosen by the arguments. If all arguments are NULL, it gives all the newest measures for all locations.

tableLatest <- aq_latest()
kable(head(tableLatest))
location city country latitude longitude parameter value lastUpdated unit cityURL locationURL
100 ail Ulaanbaatar MN 47.93291 106.92138 co -94.000 2016-09-13 04:00:00 µg/m³ Ulaanbaatar 100+ail
100 ail Ulaanbaatar MN 47.93291 106.92138 no2 20.000 2016-09-14 09:15:00 µg/m³ Ulaanbaatar 100+ail
100 ail Ulaanbaatar MN 47.93291 106.92138 o3 35.000 2016-09-14 09:15:00 µg/m³ Ulaanbaatar 100+ail
100 ail Ulaanbaatar MN 47.93291 106.92138 pm10 198.000 2016-09-14 09:15:00 µg/m³ Ulaanbaatar 100+ail
100 ail Ulaanbaatar MN 47.93291 106.92138 so2 3.000 2016-09-14 09:15:00 µg/m³ Ulaanbaatar 100+ail
16th and Whitmore Omaha-Council Bluffs US 41.32247 -95.93799 o3 0.008 2016-09-14 07:00:00 ppm Omaha-Council+Bluffs 16th+and+Whitmore

Below are the latest values for Anand Vihar at the time this vignette was compiled (cache=FALSE).

tableLatest <- aq_latest(country="IN", city="Delhi", location="Anand+Vihar")
kable(head(tableLatest))
location city country latitude longitude parameter value lastUpdated unit cityURL locationURL
Anand Vihar Delhi IN 28.6508 77.3152 co 1300.0 2016-03-21 14:45:00 µg/m³ Delhi Anand+Vihar
Anand Vihar Delhi IN 28.6508 77.3152 no2 43.1 2016-09-14 08:35:00 µg/m³ Delhi Anand+Vihar
Anand Vihar Delhi IN 28.6508 77.3152 o3 30.9 2016-09-14 08:35:00 µg/m³ Delhi Anand+Vihar
Anand Vihar Delhi IN 28.6508 77.3152 pm10 232.0 2016-09-14 08:35:00 µg/m³ Delhi Anand+Vihar
Anand Vihar Delhi IN 28.6508 77.3152 pm25 52.0 2016-09-14 08:35:00 µg/m³ Delhi Anand+Vihar
Anand Vihar Delhi IN 28.6508 77.3152 so2 18.0 2016-03-21 14:45:00 µg/m³ Delhi Anand+Vihar

Paging and limit

For all endpoints/functions, there a a limit and a page arguments, which indicate, respectively, how many results per page should be shown and which page should be queried. Based on this, how to get all results corresponding to a query? First, look at the number of results, e.g.

how_many <- attr(aq_measurements(city = "Delhi",
                            parameter = "pm25"), "meta")
knitr::kable(how_many)
name license website page limit found
openaq-api CC BY 4.0 https://docs.openaq.org/ 1 100 52147
how_many$found
#> [1] 52147

Then one can write a loop over pages. Note that the maximal value of limit is 1000.

meas <- NULL
for (page in 1:(ceiling(how_many$found/1000))){
  meas <- dplyr::bind_rows(meas,
                aq_measurements(city = "Delhi",
                                parameter = "pm25",
                                page = page,
                                limit = 1000))
  }

If you really need a lot of data, maybe using the API and this package is not the best choice for you. You can look into downloading csv data from OpenAQ website, e.g. here or the daily csv output here. Or you might want to contact OpenAQ.

Other packages of interest for getting air quality data

ropensci_footer