The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Local Data Visualization

Data Visualization

The collected values from the vignette Local Data Collection may be nicely visualized, but require a bit of pre-processing and formatting.

Recall pinned dataset

After some time of background data collection, it is now time to recall the collected data and bring them to the light.

board <- board_local()
inverter_df <- board |>
  pin_read("inverter_data")
head(inverter_df)
#> # A tibble: 6 × 6
#>   date                device_id    inverter output_power today_energy lifetime_energy
#>   <dttm>              <chr>        <chr>             [W]       [kW/h]          [kW/h]
#> 1 2024-09-22 19:35:41 E07000011776 1                   0        0.646            252.
#> 2 2024-09-22 19:35:41 E07000011776 2                   0        0.654            266.
#> 3 2024-09-22 19:43:29 E07000011776 1                   0        0.647            252.
#> 4 2024-09-22 19:43:29 E07000011776 2                   0        0.654            266.
#> 5 2024-09-24 19:09:59 E07000011776 1                   6        1.66             255.
#> 6 2024-09-24 19:09:59 E07000011776 2                   6        1.64             269.

Convert into time-series

We now turn the tibble into a tsibble time-series, in order to be able to use time-series specific functions.

The time-series is chosen to be regular on the 30 minutes interval, and we temporarily remove the units that are not yet {tsibble} friendly. We use fill_gaps() to make an explicit index entry in the time series, whenever data exists.

inverter_ts <- inverter_df |> 
  mutate(time_index = date |> lubridate::floor_date(unit = "30 minutes"),
         device_inverter = stringr::str_c(device_id, "_", inverter)) |>
  summarize(.by = c(time_index, device_id, inverter, device_inverter),
            lifetime_energy = max(lifetime_energy), 
            today_energy = max(today_energy), 
            output_power = mean(output_power)) |> 
  units::drop_units() |> 
  # convert to time-series
  as_tsibble(key = c(device_id, inverter, device_inverter),
             index = time_index ) |> 
  # explicit gaps
  fill_gaps() 

head(inverter_ts)
#> # A tsibble: 6 x 7 [30m] <?>
#> # Key:       device_id, inverter, device_inverter [1]
#>   time_index          device_id    inverter device_inverter lifetime_energy today_energy output_power
#>   <dttm>              <chr>        <chr>    <chr>                     <dbl>        <dbl>        <dbl>
#> 1 2024-09-24 19:00:00 E07000011433 1        E07000011433_1             47.0         1.59            6
#> 2 2024-09-24 19:30:00 E07000011433 1        E07000011433_1             47.0         1.59            0
#> 3 2024-09-24 20:00:00 E07000011433 1        E07000011433_1             NA          NA              NA
#> 4 2024-09-24 20:30:00 E07000011433 1        E07000011433_1             NA          NA              NA
#> 5 2024-09-24 21:00:00 E07000011433 1        E07000011433_1             NA          NA              NA
#> 6 2024-09-24 21:30:00 E07000011433 1        E07000011433_1             NA          NA              NA

Oh, the size of the filled gap tsibble is large, and missing values comes very early in the data. Now we realize we have only around 8% of completeness in the data. This will require us a strong effort in imputation.

Naive imputation

lifetime_energy is a monotonic series, and thus is correctly imputed with a linear interpolation.
We start with a naive imputation of today_energy series being the delta between two steps of lifetime_energy.

Despite being globally correct, it is far from being representative of the daily seasonality of photo-voltaic energy production.

full_inverter <- inverter_ts |>
  # impute 'lifetime_energy' gaps with a linear interpolation
  mutate(
    lifetime_energy = na_interpolation(lifetime_energy, option = "linear"),
    today_energy = if_else(
      inverter == lag(inverter),
      coalesce(today_energy, lifetime_energy - lag(lifetime_energy)),
      today_energy
    )
  )

Gaps visualization

{imputeTS} provides a good toolset to visualize the imputed values in the time-series :

ggplot_na_imputations(x_with_na = inverter_ts |> filter(device_id == "E07000011433") |> pull(lifetime_energy),
                      x_with_imputations = full_inverter |> filter(device_id == "E07000011433") |> pull(lifetime_energy),
                      ylab = "Total energy [kWh]",
                      title = "Imputed Values, first device",
                      size_imputations = 0.2,
                      size_points = 0.6
                      )
ggplot_na_imputations(x_with_na = inverter_ts |> filter(device_id == "E07000011776") |> pull(lifetime_energy),
                      x_with_imputations = full_inverter |> filter(device_id == "E07000011776") |> pull(lifetime_energy),
                      ylab = "Total energy [kWh]",
                      title = "Imputed Values, second device",
                      size_imputations = 0.2,
                      size_points = 0.6
                      )

ggplot_na_imputations(x_with_na = inverter_ts |> filter(device_id == "E07000011433") |> pull(today_energy),
                      x_with_imputations = full_inverter |> filter(device_id == "E07000011433") |> pull(today_energy),
                      ylab = "Daily energy [kWh]",
                      title = "Imputed Values, first device",
                      size_imputations = 0.2,
                      size_points = 0.6)
ggplot_na_imputations(x_with_na = inverter_ts |> filter(device_id == "E07000011776") |> pull(today_energy),
                      x_with_imputations = full_inverter |> filter(device_id == "E07000011776") |> pull(today_energy),
                      ylab = "Daily energy [kWh]",
                      title = "Imputed Values, second device",
                      size_imputations = 0.2,
                      size_points = 0.6)

Conclusion

Much better imputation can be done of course form this very challenging dataset. As everybody knows, we have the intuition of a daily seasonality being modulated by the local solar radiation. Up to you now to dig into it !

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.