Introduction for the UKgrid Dataset

Rami Krispin (@Rami_Krispin)

2018-09-10

The UK National Electricity Transmission System Dataset

Intro

The UKgrid is an R dataset package with the historical national demand of the electricity transmission system in the UK and other related variables. This dataset is a half-hourly time series data with observations since January 2011. This dataset was sourced from National Grid UK website.

Installation

Install the package from CRAN:

install.packages("UKgrid")

or install the development version from Github:

# install.packages("devtools")
devtools::install_github("RamiKrispin/UKgrid")

Usage

library(UKgrid)

data("UKgrid")

str(UKgrid)
#> 'data.frame':    132912 obs. of  21 variables:
#>  $ TIMESTAMP                : POSIXct, format: "2011-01-01 00:00:00" "2011-01-01 00:30:00" ...
#>  $ ND                       : int  34606 35092 34725 33649 32644 32092 31101 29874 28795 28097 ...
#>  $ I014_ND                  : int  34677 35142 34761 33698 32698 32112 31135 29919 28868 28137 ...
#>  $ TSD                      : int  35648 36089 36256 35628 34752 34134 33174 32471 31415 30696 ...
#>  $ I014_TSD                 : int  35685 36142 36234 35675 34805 34102 33301 32521 31490 30737 ...
#>  $ ENGLAND_WALES_DEMAND     : int  31058 31460 31109 30174 29253 28688 27714 26575 25623 25034 ...
#>  $ EMBEDDED_WIND_GENERATION : int  484 520 520 512 512 464 464 473 473 511 ...
#>  $ EMBEDDED_WIND_CAPACITY   : int  1730 1730 1730 1730 1730 1730 1730 1730 1730 1730 ...
#>  $ EMBEDDED_SOLAR_GENERATION: int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ EMBEDDED_SOLAR_CAPACITY  : int  79 79 79 79 79 79 79 79 79 79 ...
#>  $ NON_BM_STOR              : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ PUMP_STORAGE_PUMPING     : int  60 16 549 998 1126 1061 1092 1616 1639 1618 ...
#>  $ I014_PUMP_STORAGE_PUMPING: int  67 20 558 997 1127 1066 1104 1622 1642 1621 ...
#>  $ FRENCH_FLOW              : int  1939 1939 1989 1991 1992 1992 1992 1993 1992 1992 ...
#>  $ BRITNED_FLOW             : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ MOYLE_FLOW               : int  -382 -381 -382 -381 -382 -381 -381 -381 -381 -381 ...
#>  $ EAST_WEST_FLOW           : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ I014_FRENCH_FLOW         : int  1922 1922 1974 1975 1975 1975 1975 1975 1975 1975 ...
#>  $ I014_BRITNED_FLOW        : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ I014_MOYLE_FLOW          : int  -382 -381 -382 -381 -382 -381 -381 -381 -381 -381 ...
#>  $ I014_EAST_WEST_FLOW      : int  0 0 0 0 0 0 0 0 0 0 ...

A variable dictionary is available in the dataset documentation.

The extract_grid function

The extract_grid function provides the ability to extract the UKgrid series in a different format (xts, zoo, ts, data.frame, data.table and tbl), frequencies (half-hourly, hourly, daily, weekly, monthly and quarterly), and subset the series by time frame.

For example, you can select the national demand variable (ND), using xts format:


nd_half_hourly <- extract_grid(type = "xts", # default
                          columns = "ND", # default
                          aggregate = NULL # default
                          )



library(xts)

class(nd_half_hourly)
#> [1] "xts" "zoo"
periodicity(nd_half_hourly)
#> 30 minute periodicity from 2011-01-01 to 2018-07-31 23:30:00

library(TSstudio)
ts_plot(ts.obj = nd_half_hourly, 
        title = "UK National Demand - Half-Hourly")

Alternatively, you can aggregate the series to an hourly frequency with the aggregate argument:


nd_hourly <- extract_grid(type = "ts", 
                          columns = "ND", 
                          aggregate = "hourly" 
                          )

class(nd_hourly)
#> [1] "ts"
frequency(nd_hourly)
#> [1] 24

ts_plot(ts.obj = nd_hourly, 
        title = "UK National Demand - Hourly")

Selection of the UKgrid columns is done by the columns argument. The full list of columns is available on the dataset documentation (?UKgrid). For instance, let’s select the “ND” and “TSD” columns in a daily format:


df <- extract_grid(type = "xts", 
                          columns = c("ND","TSD"), 
                          aggregate = "daily" 
                          )

head(df)
#>                 ND     TSD
#> 2011-01-01 1671744 1736617
#> 2011-01-02 1760123 1827986
#> 2011-01-03 1878748 2000659
#> 2011-01-04 2076052 2179372
#> 2011-01-05 2103866 2185785
#> 2011-01-06 2135202 2200160

ts_plot(ts.obj = df, 
        title = "UK National and Transmission System Demand - Daily")

Note: by default, when any of the data frame family structure is used, the output will include the timestamp of the data (even if was not selected in the columns argument)

Last but not least, you can subset the series by time range with the start and end argument:


df1 <- extract_grid(type = "zoo", 
                          columns = "ND", 
                          aggregate = "daily", 
                          start = 2015,
                          end = 2017)

head(df1)
#> 2015-01-01 2015-01-02 2015-01-03 2015-01-04 2015-01-05 2015-01-06 
#>    1442371    1564118    1653984    1684045    1882487    1866439

ts_plot(ts.obj = df1, 
        title = "UK National and Transmission System Demand - Daily between 2015 and 2017")