rdwd: climate data from the German Weather Service

Berry Boessenkool, berry-b@gmx.de

2017-01-24

Vignette Rmd source code

Interactive map vignette

Intro

The R package rdwd, available at github.com/brry, contains code to select, download and read weather data from measuring stations across Germany. The German Weather Service (Deutscher Wetterdienst, DWD) provides over 25 thousand datasets with weather observations through the FTP server online at

ftp://ftp-cdc.dwd.de/pub/CDC/observations_germany/climate.

Package structure

To use those datasets, rdwd has been designed to mainly do 3 things:

selectDWD uses the result from indexDWD which recursively lists all the files on an FTP-server (using RCurl::getURL). As this is time consuming, the result is stored in the package dataset fileIndex. From this, metaIndex, geoIndex, mapDWD and metaInfo are derived.

TOC

Package installation

install.packages("rdwd")
# get the latest development version from github:
berryFunctions::instGit("brry/rdwd") 

# For full usage, as needed in indexDWD and metaDWD(..., current=TRUE):
install.packages("RCurl") # is only suggested, not mandatory dependency
library(rdwd)

If direct installation from CRAN doesn’t work, your R version might be too old. In that case, an update is really recommendable: r-project.org. If you can’t update R, try installing from source (github) via instGit as mentioned above. If that’s not possible either, here’s a manual workaround that might work for most functions: On the github package page, click on Clone or Download -> Download ZIP (topright, link), unzip the file to some place, then

setwd("that/path")
dd <- dir("rdwd-master/R", full=T)
dummy <- sapply(dd, source)

This creates all R functions as objects in your globalenv workspace (and overwrites existing objects of the same name!).

TOC

Basic usage

tdir <- tempdir()
link <- selectDWD("Potsdam", res="daily", var="kl", per="recent")
file <- dataDWD(link, read=FALSE, dir=tdir)
## rdwd::dirDWD: adding to directory 'C:/Users/berry/AppData/Local/Temp/RtmpEDyaiK'
## rdwd::dataDWD: 1 file already existing and not downloaded again:  'daily_kl_recent_tageswerte_KL_03987_akt.zip'
## Now downloading 0 files...
## rdwd::fileDWD: Creating 1 file: 'daily_kl_recent_tageswerte_KL_03987_akt.zip'
clim <- readDWD(file, dir=tdir)

str(clim)
## 'data.frame':    550 obs. of  18 variables:
##  $ STATIONS_ID             : int  3987 3987 3987 3987 3987 3987 3987 3987 3987 3987 ...
##  $ MESS_DATUM              : POSIXct, format: "2015-07-23" "2015-07-24" ...
##  $ QUALITAETS_NIVEAU       : int  3 3 3 3 3 3 3 3 3 3 ...
##  $ LUFTTEMPERATUR          : num  19.4 20.5 20.3 16.3 16.1 16.6 15.8 15.4 14.6 18.4 ...
##  $ DAMPFDRUCK              : num  15.9 15 15.8 11 14.5 14.2 11.1 11.3 10.5 10.5 ...
##  $ BEDECKUNGSGRAD          : num  5.6 4.6 5.9 5.8 7 6 4.4 5.3 4.2 4.3 ...
##  $ LUFTDRUCK_STATIONSHOEHE : num  1003 1002 991 999 991 ...
##  $ REL_FEUCHTE             : num  72.2 65 67.6 60.9 79.2 ...
##  $ WINDGESCHWINDIGKEIT     : num  3.4 3.3 6.9 6.3 4.3 5.9 5.8 7 4.3 3.3 ...
##  $ LUFTTEMPERATUR_MAXIMUM  : num  23.4 26.4 26.7 21.2 22.2 20 21.4 19.8 19.6 26.8 ...
##  $ LUFTTEMPERATUR_MINIMUM  : num  14.5 14.7 12.8 13.1 12.4 14 12.4 11.8 10.7 10 ...
##  $ LUFTTEMP_AM_ERDB_MINIMUM: num  12.9 12.7 12.7 12 12 12.9 10 9.3 7.7 7 ...
##  $ WINDSPITZE_MAXIMUM      : num  8.6 9.3 22.7 19.1 14.4 17.7 17.3 17 10.7 7.8 ...
##  $ NIEDERSCHLAGSHOEHE      : num  0 3.4 1.1 0.2 3.9 0 0.1 0.4 0 0 ...
##  $ NIEDERSCHLAGSHOEHE_IND  : int  0 6 6 6 6 6 6 6 0 0 ...
##  $ SONNENSCHEINDAUER       : num  6.82 9.87 6.18 7.97 1.87 ...
##  $ SCHNEEHOEHE             : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ eor                     : Factor w/ 1 level "eor": 1 1 1 1 1 1 1 1 1 1 ...

TOC

Plotting examples

Recent temperature time series:

par(mar=c(4,4,2,0.5), mgp=c(2.7, 0.8, 0), cex=0.8)
plot(clim[,c(2,4)], type="l", xaxt="n", las=1, main="Daily temp Potsdam")
berryFunctions::monthAxis(ym=TRUE)   ;   abline(h=0)
mtext("Source: Deutscher Wetterdienst", adj=-0.1, line=0.5, font=3)

Long term climate graph:

link <- selectDWD("Potsdam", res="monthly", var="kl", per="h")
clim <- dataDWD(link)
## rdwd::dirDWD: creating directory 'S:/Dropbox/Public/rdwd/vignettes/DWDdata'
## rdwd::fileDWD: Creating 1 file: 'monthly_kl_historical_monatswerte_03987_18930101_20151231_hist.zip'
clim$month <- substr(clim$MESS_DATUM_BEGINN,5,6)
temp <- tapply(clim$LUFTTEMPERATUR, clim$month, mean)
prec <- tapply(clim$NIEDERSCHLAGSHOEHE, clim$month, mean)
library(berryFunctions)
climateGraph(temp, prec, main="Potsdam 1893:2015")
mtext("Source: Deutscher Wetterdienst", adj=-0.05, line=2.8, font=3)

TOC

Station selection

Weather stations can be selected geographically with the interactive map.

The DWD station IDs can be obtained from station names with

findID("Potsdam")
## Potsdam 
##    3987
findID("Koeln", exactmatch=FALSE)
## Warning: in rdwd::findID: ID determined from name 'Koeln' has 4 elements
## (2665, 2666, 2667, 2968).
##               Koeln-Bonn Koeln-Botanischer Garten           Koeln-Porz-Eil 
##                     2667                     2665                     2666 
##          Koeln-Stammheim 
##                     2968

TOC

Available files

File selection by station name/id and folder happens with selectDWD. It needs an index of all the available files on the server. The package contains such an index (fileIndex) that is updated (at least) with each CRAN release of the package. The selectDWD function documentation contains an overview of the FTP folder structure.

head(rdwd:::fileIndex) # 28'798 rows in Jan 2017 (with some almost duplicate files)
##     res var        per    id    start      end
## 1 daily  kl historical                        
## 2 daily  kl historical                        
## 3 daily  kl historical                        
## 4 daily  kl historical 00001 19370101 19860630
## 5 daily  kl historical 00003 18910101 20110331
## 6 daily  kl historical 00044 19710301 20151231
##                                                                              path
## 1 /daily/kl/historical/BESCHREIBUNG_obsgermany_climate_daily_kl_historical_de.pdf
## 2  /daily/kl/historical/DESCRIPTION_obsgermany_climate_daily_kl_historical_en.pdf
## 3                   /daily/kl/historical/KL_Tageswerte_Beschreibung_Stationen.txt
## 4                /daily/kl/historical/tageswerte_00001_19370101_19860630_hist.zip
## 5                /daily/kl/historical/tageswerte_00003_18910101_20110331_hist.zip
## 6                /daily/kl/historical/tageswerte_00044_19710301_20151231_hist.zip

If you find this to be outdated (Error in download.file … : cannot open URL), please let me know and I will update it. Meanwhile, use current=TRUE in selectDWD:

# all files at a given path, with current file index (RCurl required):
links <- selectDWD(res="monthly", var="more_precip", per="hist", current=TRUE)

fileIndex is created with the function indexDWD at https://github.com/brry/rdwd/blob/master/R/meta.R#L185.

# recursively list files on the FTP-server:
files <- indexDWD("hourly/sun") # use dir="some_path" to save the output elsewhere
berryFunctions::headtail(files, 5, na=TRUE)

# with other FTP servers, this should also work...
funet <- indexDWD(base="ftp.funet.fi/pub/standards/RFC/ien", folder="")
p <- RCurl::getURL("ftp.funet.fi/pub/standards/RFC/ien/",
                       verbose=T, ftp.use.epsv=TRUE, dirlistonly=TRUE)

TOC

File selection

selectDWD is designed to be very flexible:

# inputs can be vectorized, and period can be abbreviated:
selectDWD(c("Potsdam","Wuerzburg"), res="hourly", var="sun", per="hist")
## [[1]]
## [1] "ftp://ftp-cdc.dwd.de/pub/CDC/observations_germany/climate/hourly/sun/historical/ stundenwerte_SD_03987_18930101_20151231_hist.zip"
## 
## [[2]]
## [1] "ftp://ftp-cdc.dwd.de/pub/CDC/observations_germany/climate/hourly/sun/historical/ stundenwerte_SD_05705_19510101_20151231_hist.zip"
# Time period can be doubled to get both filenames:
selectDWD("Potsdam", res="daily", var="kl", per="rh", outvec=TRUE)
## [1] "ftp://ftp-cdc.dwd.de/pub/CDC/observations_germany/climate/daily/kl/recent/ tageswerte_KL_03987_akt.zip"                    
## [2] "ftp://ftp-cdc.dwd.de/pub/CDC/observations_germany/climate/daily/kl/historical/ tageswerte_03987_18930101_20151231_hist.zip"

There may be a differing number of available files for several stations across al folders. That’s why the default outvec is FALSE.

lapply(selectDWD(id=c(3467,5116)), substr, 58, 1e4)
## Warning: in rdwd::selectDWD: in file index 'fileIndex', there are 4 files
## with ID 3467.
## Warning: in rdwd::selectDWD: in file index 'fileIndex', there are 2 files
## with ID 5116.
## [[1]]
## [1] "/daily/more_precip/historical/tageswerte_RR_03467_19930601_20151231_hist.zip"   
## [2] "/daily/more_precip/recent/tageswerte_RR_03467_akt.zip"                          
## [3] "/monthly/more_precip/historical/monatswerte_RR_03467_19930601_20151231_hist.zip"
## [4] "/monthly/more_precip/recent/monatswerte_RR_03467_akt.zip"                       
## 
## [[2]]
## [1] "/daily/more_precip/historical/tageswerte_RR_05116_19930101_20061231_hist.zip"   
## [2] "/monthly/more_precip/historical/monatswerte_RR_05116_19920701_20061231_hist.zip"

TOC

Metadata

selectDWD also uses a complete data.frame with meta information, metaIndex (derived from the “Beschreibung” files in fileIndex).

# All metadata at all folders:
data(metaIndex)
str(metaIndex, vec.len=2)
## 'data.frame':    38028 obs. of  12 variables:
##  $ Stations_id  : int  1 1 1 1 1 ...
##  $ von_datum    : int  18910101 19120101 19120101 19120101 19310101 ...
##  $ bis_datum    : int  19860630 19860630 19860630 19860630 19860630 ...
##  $ Stationshoehe: num  478 478 478 478 478 ...
##  $ geoBreite    : num  47.8 47.8 ...
##  $ geoLaenge    : num  8.85 8.85 ...
##  $ Stationsname : chr  "Aach" "Aach" ...
##  $ Bundesland   : chr  "Baden-Wuerttemberg" "Baden-Wuerttemberg" ...
##  $ res          : chr  "monthly" "daily" ...
##  $ var          : chr  "more_precip" "more_precip" ...
##  $ per          : chr  "recent" "historical" ...
##  $ hasfile      : logi  FALSE TRUE FALSE ...
View(data.frame(sort(unique(rdwd:::metaIndex$Stationsname)))) # 5831 entries

dataDWD can download (and readDWD can correctly read) such a data.frame from any folder on the FTP server:

# file with station metadata for a given path:
m_link <- selectDWD(res="monthly", var="more_precip", per="hist", meta=TRUE)
substr(m_link, 50, 1e4) # (Monatswerte = monthly values, Beschreibung = description)
## [1] "/climate/monthly/more_precip/historical/RR_Monatswerte_Beschreibung_Stationen.txt"
meta_monthly_rain <- dataDWD(m_link, dir=tdir) # not executed in vignette creation
str(meta_monthly_rain)

Meta files may list stations for which there are actually no files. For example: Tucheim (5116) is listed in the metadata at …/monthly/more_precip/recent/RR_Monatwerte_Beschreibung_Stationen.txt, but actually has no file in that folder (only in …/monthly/more_precip/historical).

TOC