library(healthyR.ts)
In this vignette we will discuss how to use the tidy_fft
function, what it does, and what it produces.
The tidy_fft
function has only a few parameters, six to
be exact. There are some sensible defaults made. It is important that
when you use this function, that you supply it with a full time-series
data set, one that has no missing data in it as this will affect your
results.
The function and its full parameters are as follows:
tidy_fft(
.data,
.date_col,
.value_col,.frequency = 12L,
.harmonics = 1L,
.upsampling = 10L
)
The .data
argument is the actual formatted data that
will get passed to the function, the time series data. The
.date_col
argument is the column that holds the datetime of
interest. The .value
column is the column that holds the
value that is being analyzed by the function, this can be counts,
averages, any type of value that is in the time series. The
.frequency
argument details the cyclical nature of the
data, is it 12 for monthly, 7 for weekly, etc. The
.harmonics
argument will tell the function how many times
the fft
should be run internally and how many filters
should be made. Finally the .upsampling
argument will tell
the function how much the function should up sample the time
parameter.
Let us now work through a simple example.
Lets get started with some data.
suppressPackageStartupMessages(library(healthyR.data))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(timetk))
<- healthyR_data%>%
data_tbl filter(ip_op_flag == 'I') %>%
summarise_by_time(
.date_var = visit_end_date_time,
.by = "month",
value = n()
%>%
) filter_by_time(
.date_var = visit_end_date_time,
.start_date = "2015",
.end_date = "2019"
%>%
) rename(date_col = visit_end_date_time)
Now that we have our sample data, let’s check it out.
glimpse(data_tbl)
#> Rows: 60
#> Columns: 2
#> $ date_col <dttm> 2015-01-01, 2015-02-01, 2015-03-01, 2015-04-01, 2015-05-01, ~
#> $ value <int> 1172, 966, 961, 1006, 991, 1073, 1143, 1130, 1061, 1101, 981,~
Lets take a look at a time series plot of the data.
suppressPackageStartupMessages(library(timetk))
%>%
data_tbl plot_time_series(
.date_var = date_col,
.value = value
)
Now that we know what our data looks like, lets go ahead and run it
through the function and assign it to a variable called
output
<- tidy_fft(
output .data = data_tbl,
.date_col = date_col,
.value_col = value,
.harmonics = 8,
.frequency = 12,
.upsampling = 5
)
Now that we have run the function, let’s take a look at the output.
The function invisibly returns a list object, hence the need to assign it to a variable. There are a total of 4 different sections of data in the list that are returned. These are:
In this section we will go over all of the data components that are
returned. We can access all of the data in the usual format
output$data
, which in of itself will return another list of
objects, 7 to be specific. Lets go through them all.
The data element accessed by output$data$data
is the
original data with a few elements added to it. Let’s take a look:
$data$data %>%
outputglimpse()
#> Rows: 2,400
#> Columns: 6
#> $ harmonic <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,~
#> $ time <dbl> 1.0, 1.2, 1.4, 1.6, 1.8, 2.0, 2.2, 2.4, 2.6, 2.8, 3.0, 3.2,~
#> $ y_actual <int> 1172, NA, NA, NA, NA, 966, NA, NA, NA, NA, 961, NA, NA, NA,~
#> $ y_hat <dbl> 978.4624, 979.1071, 979.7605, 980.4221, 981.0918, 981.7692,~
#> $ x <dbl> 1, 0, 0, 0, 0, 2, 0, 0, 0, 0, 3, 0, 0, 0, 0, 4, 0, 0, 0, 0,~
#> $ error_term <dbl> 193.537557, NA, NA, NA, NA, -15.769221, NA, NA, NA, NA, -24~
The error_data element accessed by
output$data$error_data
is a tibble
that has
the original data, plus a few other elements and an error term that is
the actual value minus the harmonic output. This is done for each
harmonic level.
$data$error_data %>%
outputglimpse()
#> Rows: 480
#> Columns: 6
#> $ harmonic <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,~
#> $ time <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, ~
#> $ y_actual <int> 1172, 966, 961, 1006, 991, 1073, 1143, 1130, 1061, 1101, 98~
#> $ y_hat <dbl> 978.4624, 981.7692, 985.2620, 988.9026, 992.6511, 996.4664,~
#> $ x <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, ~
#> $ error_term <dbl> 193.5375572, -15.7692207, -24.2620436, 17.0973566, -1.65113~
The input_vector is just the value column that was passed to the function.
$data$input_vector
output#> [1] 1172 966 961 1006 991 1073 1143 1130 1061 1101 981 1069 1065 980 1115
#> [16] 997 1083 1032 962 993 921 911 928 1030 1072 938 1077 961 1041 1060
#> [31] 1018 988 1007 1009 979 1023 1145 985 1015 1016 1040 1117 1057 1040 829
#> [46] 1027 949 916 1009 918 961 908 961 904 913 862 849 913 860 887
The maximum_harmonic_tbl is a tibble
that has data
regarding the maximum harmonic entered into the function, this will be
the most flexible data returned.
$data$maximum_harmonic_tbl %>%
outputglimpse()
#> Rows: 300
#> Columns: 6
#> $ harmonic <fct> 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,~
#> $ time <dbl> 1.0, 1.2, 1.4, 1.6, 1.8, 2.0, 2.2, 2.4, 2.6, 2.8, 3.0, 3.2,~
#> $ y_actual <int> 1172, NA, NA, NA, NA, 966, NA, NA, NA, NA, 961, NA, NA, NA,~
#> $ y_hat <dbl> 987.7486, 990.7363, 993.1633, 995.0839, 996.5661, 997.6894,~
#> $ x <dbl> 1, 0, 0, 0, 0, 2, 0, 0, 0, 0, 3, 0, 0, 0, 0, 4, 0, 0, 0, 0,~
#> $ error_term <dbl> 184.251402, NA, NA, NA, NA, -31.689374, NA, NA, NA, NA, -40~
The differenced_value_tbl
is a tibble
that
has a lag 1 difference of the value column supplied.
$data$differenced_value_tbl %>%
outputglimpse()
#> Rows: 59
#> Columns: 1
#> $ value <int> -206, -5, 45, -15, 82, 70, -13, -69, 40, -120, 88, -4, -85, 135,~
The dff_tbl
is a tibble
that is returned
that has the fft values, the complex, real and imaginary parts.
$data$dff_tbl %>%
outputglimpse()
#> Rows: 60
#> Columns: 3
#> $ dff_trans <cpl> 59925.00000+0.00000i, -608.62672-917.15896i, -187.91767-1762~
#> $ real_part <dbl> 59925.000000, -608.626716, -187.917671, -267.179120, -94.543~
#> $ imag_part <dbl> 0.000000, -917.158962, -1762.682114, -519.897564, 66.663422,~
The last data piece of the data section is the ts_obj
.
This is a ts
version of the input_vector
$data$ts_obj
output#> Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
#> 2015 1172 966 961 1006 991 1073 1143 1130 1061 1101 981 1069
#> 2016 1065 980 1115 997 1083 1032 962 993 921 911 928 1030
#> 2017 1072 938 1077 961 1041 1060 1018 988 1007 1009 979 1023
#> 2018 1145 985 1015 1016 1040 1117 1057 1040 829 1027 949 916
#> 2019 1009 918 961 908 961 904 913 862 849 913 860 887
There are a total of five plots that are returned in the list. Three
of them are ggplot
plots and two of them are
plotly::ggplotly
plots.
The harmonic_plot
is a ggplot
plot that
shows all of the harmonic waves on the same graph if you set
.harmonics
greater than 1.
$plots$harmonic_plot output
The diff_plot
is a ggplot
plot of the lag 1
differenced_value_tbl
$plots$diff_plot output
The max_har_plot
is a ggplot
plot of the
maximum harmonic wave entered into .harmonics
$plots$max_har_plot output
The harmonic_plotly
is a plotly::ggplotly
plot of the harmonic_plot
$plots$harmonic_plotly output
The max_har_plotly
is a plotly::ggplotly
plot of the max_har_plot
$plots$max_har_plotly output
The parameters
element is a list of input parameters and
internal parameters.
$parameters
output#> $harmonics
#> [1] 5
#>
#> $upsampling
#> [1] 8
#>
#> $start_date
#> [1] "2015-01-01 UTC"
#>
#> $end_date
#> [1] "2019-12-01 UTC"
#>
#> $freq
#> [1] 12
The model
portion has four pieces to it which we will
look at below.
The parameter m
is an internal parameter that is equal
to .harmonics
/ 2. This is fed into
TSA::harmonic
along with the ts_obj
The parameter harmonic_obj
is the object returned from
TSA::harmonic
The parameter harmonic_model
is the harmonic model from
the TSA::harmonic
The parameter model_summary
is a summary of the harmonic
model.
$model$m
output#> [1] 6
$model$harmonic_obj %>% head()
output#> cos(2*pi*t) cos(4*pi*t) cos(6*pi*t) cos(8*pi*t) cos(10*pi*t)
#> [1,] 1.000000e+00 1.0 1.000000e+00 1.0 1.000000e+00
#> [2,] 8.660254e-01 0.5 3.419656e-13 -0.5 -8.660254e-01
#> [3,] 5.000000e-01 -0.5 -1.000000e+00 -0.5 5.000000e-01
#> [4,] 1.216555e-12 -1.0 -5.468654e-12 1.0 4.263785e-12
#> [5,] -5.000000e-01 -0.5 1.000000e+00 -0.5 -5.000000e-01
#> [6,] -8.660254e-01 0.5 3.319385e-12 -0.5 8.660254e-01
#> cos(12*pi*t) sin(2*pi*t) sin(4*pi*t) sin(6*pi*t) sin(8*pi*t)
#> [1,] 1 -4.722001e-13 -9.444002e-13 -5.054579e-12 -1.888800e-12
#> [2,] -1 5.000000e-01 8.660254e-01 1.000000e+00 8.660254e-01
#> [3,] 1 8.660254e-01 8.660254e-01 -4.370648e-12 -8.660254e-01
#> [4,] -1 1.000000e+00 2.433110e-12 -1.000000e+00 -4.866219e-12
#> [5,] 1 8.660254e-01 -8.660254e-01 -7.560404e-13 8.660254e-01
#> [6,] -1 5.000000e-01 -8.660254e-01 1.000000e+00 -8.660254e-01
#> sin(10*pi*t)
#> [1,] 1.276978e-12
#> [2,] 5.000000e-01
#> [3,] -8.660254e-01
#> [4,] 1.000000e+00
#> [5,] -8.660254e-01
#> [6,] 5.000000e-01
$model$harmonic_model
output#>
#> Call:
#> stats::lm(formula = ts_obj ~ har_)
#>
#> Coefficients:
#> (Intercept) har_cos(2*pi*t) har_cos(4*pi*t) har_cos(6*pi*t)
#> 998.750 -1.008 28.600 10.900
#> har_cos(8*pi*t) har_cos(10*pi*t) har_cos(12*pi*t) har_sin(2*pi*t)
#> 21.500 27.108 6.750 23.582
#> har_sin(4*pi*t) har_sin(6*pi*t) har_sin(8*pi*t) har_sin(10*pi*t)
#> -9.469 3.600 -8.487 -27.282
$model$model_summary
output#>
#> Call:
#> stats::lm(formula = ts_obj ~ har_)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -140.6 -58.0 9.1 38.7 127.6
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 998.750 9.358 106.732 <2e-16 ***
#> har_cos(2*pi*t) -1.008 13.234 -0.076 0.9396
#> har_cos(4*pi*t) 28.600 13.234 2.161 0.0357 *
#> har_cos(6*pi*t) 10.900 13.234 0.824 0.4142
#> har_cos(8*pi*t) 21.500 13.234 1.625 0.1108
#> har_cos(10*pi*t) 27.108 13.234 2.048 0.0460 *
#> har_cos(12*pi*t) 6.750 9.358 0.721 0.4742
#> har_sin(2*pi*t) 23.582 13.234 1.782 0.0811 .
#> har_sin(4*pi*t) -9.469 13.234 -0.715 0.4778
#> har_sin(6*pi*t) 3.600 13.234 0.272 0.7868
#> har_sin(8*pi*t) -8.487 13.234 -0.641 0.5244
#> har_sin(10*pi*t) -27.282 13.234 -2.062 0.0447 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 72.48 on 48 degrees of freedom
#> Multiple R-squared: 0.3057, Adjusted R-squared: 0.1466
#> F-statistic: 1.921 on 11 and 48 DF, p-value: 0.05984