The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
SNSeg supports change-points estimation for both univariate and multivariate time series (including high-dimensional time series with dimension greater than 10.) using Self-Normalization (SN) based framework. Please read Zhao, Jiang and Shao (2022) <doi.org/10.1111/rssb.12552> for details of the SN-based algorithms.
The package contain three functions for change-points estimation:
SNSeg_Uni
works for a univariate time
series.SNSeg_Multi
works for multivariate time
series with dimension up to ten.SNSeg_HD
works for high-dimensional time
series with dimension above ten.All functions contain two important input arguments:
grid_size_scale
and grid_size
.
grid_size_scale
: the trimming parameter \(\epsilon\), a parameter to control the
local window size grid_size
. The range of
grid_size_scale
should be between 0.05 and 0.5. Any input
grid_size_scale
smaller than 0.05 will be automatically
changed to 0.05, and any input grid_size_scale
greater than
0.5 will be automatically changed to 0.5.grid_size
: the local window size \(h\) for local scanning for each time point.
According to Zhao et al. (2022), grid_size
is a product of
grid_size_scale
and the length of time.Users can set their own grid_size
or leave it as
NULL
in the input arguments. If grid_size
is
NULL
, these functions will compute the SN test statistic
using the value of grid_size_scale
. If
grid_size
is set by users, the functions will first
calculate grid_size_scale
by diving the length of time, and
then compute the SN test statistic.
For the other input arguments: * ts
: Users should enter
a time series for this argument. For SNSeg_Uni
, the
dimension of ts
should be exactly one in most cases (or two
when paras_to_test = 'bivcor'
); for
SNSeg_Multi
, the dimension must be at least two but no more
than ten; for SNSeg_HD
, the dimension must be greater than
ten. * paras_to_test
: This argument in functions
SNSeg_Uni
and SNSeg_Multi
allows users to
enter the parameter(s) they would like to test the change in. *
confidence
: Users should choose a confidence level among
0.9, 0.95, 0.99, 0.995 and 0.995. A smaller confidence level is easier
to reject the null hypothesis and detect a change-point.
The package also offers an option to plot the time series (this only
works for the univariate time series cases!) Users need to set
plot_SN = TRUE
to visualize the time series, and
est_cp_loc = TRUE
to add the estimated change-point
locations in the plot.
max_SNsweep
To visualize the computed SN-based test statistic at each time point,
users can apply the function max_SNsweep
by plugging in the
output object from one of the functions SNSeg_Uni
,
SNSeg_Multi
and SNSeg_HD
. The options
est_cp_loc = TRUE
and critical_loc = TRUE
are
provided to draw the estimated change-point locations and the critical
value threshold inside the test statistic plot.
max_SNsweep
also returns the SN-based test statistic for
each time point. A large number of test statistics in the output can be
messy for users who only seek to generate a plot. To hide the test
statistics output, users can create an arbitrary variable name and set
it to the max_SNsweep
function, e.g.,
SN_stat <- max_SNsweep(...)
.
SNSeg_estimate
The function SNSeg_estimate
allows users to compute the
parameter estimates (e.g., mean, variance, acf, quantile, etc.) of each
of the segments separated by the estimated change-points. To use this
function, users should use the output of the functions
SNSeg_Uni
, SNSeg_Multi
and
SNSeg_HD
as the input of SNSeg_estimate
.
The typical S3 methods summary
, print
and
plot
are available to SNSeg_Uni
,
SNSeg_Multi
and SNSeg_HD
objects. The
summary
method displays the parameter to be tested, the
estimated change-point amount and locations, the grid_size
,
confidence level as well as the critical value of the SN-based test. The
print
method shows the change-point locations. The
plot
method plots the time series, and similar to the
argument plot_SN = TRUE
, the plot
method
allows users to generate time series segmentation plot(s) with the
estimated change-point locations. It also provides ts_index
option to allow users to plot any individual time series they want.
Users can apply their preferred color of the change-point(s) within the
plot(s) by setting cpts.col
to any color.
We then provide the examples of the functions SNSeg_Uni
,
SNSeg_Multi
and SNSeg_HD
.
SNSeg_Uni
:The function SNSeg_Uni
detect change-points for a
univariate time series based on the change in a single or
parameters.
paras_to_test
allows one or a combination of multiple
parameters from “mean”, “variance”, “acf” and a numeric quantile value
between 0 and 1. For instance, to test the change in autocorrelation and
80th quantile, users should set paras_to_test
to c(‘acf’,
0.8).paras_to_test
should be set to bivcor
.paras_to_test
using their own
parameters. In this case, the input of paras_to_test
needs
to be a function which returns a numeric value.We provide examples for different cases:
# Please run the following function before running examples:
<- function(u,p,trunc_r,gpd_scale,gpd_shape){
mix_GauGPD <- u<p
indicator <- rep(0, length(u))
rv >0] <- qtruncnorm(u[indicator>0]/p,a=-Inf,b=trunc_r)
rv[indicator<=0] <- qgpd((u[indicator<=0]-p)/(1-p), loc=trunc_r, scale=gpd_scale,shape=gpd_shape)
rv[indicatorreturn(rv)
}
set.seed(7)
<- 2000
n <- 2
reptime <- round(n*c(0,cumsum(c(0.5,0.25)),1))
cp_sets <- c(0.4,0,0.4)
mean_shift <- -0.7
rho <- MAR(n, reptime, rho)
ts <- length(cp_sets)-1
no_seg for(index in 1:no_seg){ # Mean shift
<- cp_sets[index]+1
tau1 <- cp_sets[index+1]
tau2 :tau2,] <- ts[tau1:tau2,] + mean_shift[index]
ts[tau1
}<- ts[,2]
ts # grid_size undefined
<- SNSeg_Uni(ts, paras_to_test = "mean", confidence = 0.9,
result grid_size_scale = 0.05, grid_size = NULL,
plot_SN = FALSE, est_cp_loc = FALSE)
# grid_size defined & generate time series segmentation plot
<- SNSeg_Uni(ts, paras_to_test = "mean", confidence = 0.9,
result grid_size_scale = 0.05, grid_size = 116,
plot_SN = TRUE, est_cp_loc = TRUE)
# Estimated change-point locations
$est_cp
result#> [1] 1018 1487
# Parameter estimates (mean) of each segment
SNSeg_estimate(result)
#> $mean
#> [1] 0.39444849 0.02107613 0.38362925
#>
#> attr(,"class")
#> [1] "SNSeg_estimate"
# plot the SN-based test statistic
<- max_SNsweep(result, plot_SN = TRUE, est_cp_loc = TRUE,
SN_stat critical_loc = TRUE)
We then show how to use the S3 methods summary
,
print
and plot
.
summary(result)
#> There are 2 change-point(s) detected at 90th confidence level based on the change in the single mean parameter.
#>
#> The critical value of SN-based test is 135.39283594
#>
#> The detected change-point location(s) are 1018,1487 with a grid_size of 116
print(result)
#> The detected change-point location(s) are 1018,1487
plot(result, cpts.col = 'red')
We can see both the plot_SN = TRUE
option and the
plot.SN
function generates the same plot. Users can select
any choice based on their preferences. The class of the argument
result
is SNSeg_Uni
.
set.seed(7)
<- MAR_Variance(2, "V1")
ts <- ts[,2]
ts # grid_size defined
<- SNSeg_Uni(ts, paras_to_test = "variance", confidence = 0.9,
result grid_size_scale = 0.05, grid_size = NULL,
plot_SN = FALSE, est_cp_loc = TRUE)
# Estimated change-point locations
$est_cp
result# Parameter estimates (variance) of each segment
SNSeg_estimate(result)
# plot the SN-based test statistic
<- max_SNsweep(result, plot_SN = TRUE, est_cp_loc = TRUE,
SN_stat critical_loc = TRUE)
# built-in functions
# summary statistic
summary(result)
# print
print(result)
# plot
plot(result, cpts.col = 'red')
set.seed(7)
<- MAR_Variance(2, "A3")
ts <- ts[,2]
ts # grid_size defined
<- SNSeg_Uni(ts, paras_to_test = "acf", confidence = 0.9,
result grid_size_scale = 0.05, grid_size = 92, plot_SN = FALSE,
est_cp_loc = TRUE)
# Estimated change-point locations
$est_cp
result# Parameter estimates (acf) of each segment
SNSeg_estimate(result)
# plot the SN-based test statistic
<- max_SNsweep(result, plot_SN = TRUE, est_cp_loc = TRUE,
SN_stat critical_loc = TRUE)
# built-in functions
# summary statistic
summary(result)
# print
print(result)
# plot
plot(result, cpts.col = 'red')
library(mvtnorm)
set.seed(7)
<- 1000
n <- list(4*matrix(c(1,0.8,0.8,1), nrow=2),
sigma_cross matrix(c(1,0.2,0.2,1), nrow=2),
matrix(c(1,0.8,0.8,1), nrow=2))
<- round(c(0,n/3,2*n/3,n))
cp_sets <- length(cp_sets)-2
noCP <- rep(0.5, noCP+1)
rho_sets <- MAR_MTS_Covariance(n, 2, rho_sets, cp_sets, sigma_cross)
ts <- ts[1][[1]]
ts # grid_size defined
<- SNSeg_Uni(ts, paras_to_test = "bivcor", confidence = 0.9,
result grid_size_scale = 0.05, grid_size = 77,
plot_SN = FALSE, est_cp_loc = TRUE)
# Estimated change-point locations
$est_cp
result# Parameter estimates (bivariate correlation) of each segment
SNSeg_estimate(result)
# plot the SN-based test statistic
<- max_SNsweep(result, plot_SN = TRUE, est_cp_loc = TRUE,
SN_stat critical_loc = TRUE)
# built-in functions
# summary statistic
summary(result)
# print
print(result)
# plot
plot(result, cpts.col = 'red')
Please download the packages truncnorm
and
evd
before running the following code.
library(truncnorm)
library(evd)
set.seed(7)
<- 1000
n <- c(0,n/2,n)
cp_sets <- length(cp_sets)-2
noCP <- 2
reptime <- 0.2
rho # AR time series with no change-point (mean, var)=(0,1)
<- MAR(n, reptime, rho)*sqrt(1-rho^2)
ts <- 0
trunc_r <- pnorm(trunc_r)
p <- 2
gpd_scale <- 0.125
gpd_shape for(ts_index in 1:reptime){
2]+1):n, ts_index] <- mix_GauGPD(pnorm(ts[(cp_sets[2]+1):n, ts_index]),
ts[(cp_sets[
p,trunc_r,gpd_scale,gpd_shape)
}<- ts[,2]
ts # grid_size undefined
# test in 90% quantile
<- SNSeg_Uni(ts, paras_to_test = 0.9, confidence = 0.9,
result grid_size_scale = 0.066, grid_size = NULL,
plot_SN = FALSE, est_cp_loc = TRUE)
# Estimated change-point locations
$est_cp
result# Parameter estimates (90th quantile) of each segment
SNSeg_estimate(result)
# plot the SN-based test statistic
<- max_SNsweep(result, plot_SN = TRUE, est_cp_loc = TRUE,
SN_stat critical_loc = TRUE)
# built-in functions
# summary statistic
summary(result)
# print
print(result)
# plot
plot(result, cpts.col = 'red')
To perform the SN-based test using any general functional, users
should define their own function as the input of
paras_to_test
. The input of this function should consist of
the followings:
ts
: A univariate time series provided by the user.The user-defined function should return a numeric value as the output.
set.seed(7)
<- 500
n <- 2
reptime <- round(n*c(0,cumsum(c(0.5,0.25)),1))
cp_sets <- c(0.4,0,0.4)
mean_shift <- -0.7
rho <- MAR(n, reptime, rho)
ts <- length(cp_sets)-1
no_seg for(index in 1:no_seg){ # Mean shift
<- cp_sets[index]+1
tau1 <- cp_sets[index+1]
tau2 :tau2,] <- ts[tau1:tau2,] + mean_shift[index]
ts[tau1
}<- ts[,2]
ts # Define a general functional for the input 'paras_to_test'
= function(ts){
paras_to_test mean(ts)
}<- SNSeg_Uni(ts, paras_to_test = paras_to_test,
result.SNCP.general confidence = 0.9, grid_size_scale = 0.05,
grid_size = NULL, plot_SN = FALSE,
est_cp_loc = TRUE)
# Estimated change-point locations
$est_cp
result.SNCP.general# Parameter estimates (general functional) of each segment
SNSeg_estimate(result.SNCP.general)
# plot the SN-based test statistic
<- max_SNsweep(result.SNCP.general, plot_SN = TRUE, est_cp_loc = TRUE,
SN_stat critical_loc = TRUE)
# built-in functions
# summary statistic
summary(result)
# print
print(result)
# plot
plot(result, cpts.col = 'red')
set.seed(7)
<- 1000
n <- c(0,333,667,1000)
cp_sets <- length(cp_sets)-1
no_seg <- 0
rho # AR time series with no change-point (mean, var)=(0,1)
<- MAR(n, 2, rho)*sqrt(1-rho^2)
ts <- length(cp_sets)-1
no_seg <- c(1,1.6,1)
sd_shift for(index in 1:no_seg){ # Mean shift
<- cp_sets[index]+1
tau1 <- cp_sets[index+1]
tau2 :tau2,] <- ts[tau1:tau2,]*sd_shift[index]
ts[tau1
}<- 2
d <- ts[,2]
ts
# Test in 90th and 95th quantile with grid_size undefined
<- SNSeg_Uni(ts, paras_to_test = c(0.9, 0.95), confidence = 0.9,
result grid_size_scale = 0.05, grid_size = NULL,
plot_SN = FALSE, est_cp_loc = FALSE)
# Test in 90th quantile and the variance with grid_size undefined
<- SNSeg_Uni(ts, paras_to_test = c(0.9, 'variance'),
result confidence = 0.95, grid_size_scale = 0.078,
grid_size = NULL, plot_SN = FALSE,
est_cp_loc = FALSE)
# Test in 90th quantile, variance and acf with grid_size undefined
<- SNSeg_Uni(ts, paras_to_test = c(0.9,'variance', "acf"),
result confidence = 0.9, grid_size_scale = 0.064,
grid_size = NULL, plot_SN = TRUE,
est_cp_loc = TRUE)
# Estimated change-point locations
$est_cp result
# Test in 60th quantile, mean, variance and acf with grid_size defined
<- SNSeg_Uni(ts, paras_to_test = c(0.6, 'mean', "variance",
result.last "acf"), confidence = 0.9, grid_size_scale = 0.05,
grid_size = 83, plot_SN = FALSE, est_cp_loc = TRUE)
# Estimated change-point locations
$est_cp
result.last# Parameter estimates (of the last example) of each segment
SNSeg_estimate(result.last)
# SN-based test statistic segmentation plot
<- max_SNsweep(result.last, plot_SN = TRUE, est_cp_loc = TRUE,
SN_stat critical_loc = TRUE)
# built-in functions
# summary statistic
summary(result)
# print
print(result)
# plot
plot(result, cpts.col = 'red')
SNSeg_Multi
The function SNSeg_Multi
detects change-points for
multivariate time series based on the change in multivariate means or
covariances. Users can set paras_to_test
to
mean
or covariance
for change-points
estimation.
mean
, the dimension is equivalent to the
number of time series. If the parameter is covariance
, the
dimension is equivalent to the number of unknown parameters within the
covariance matrix.For examples of multivariate means and covariances, please check the following commands:
# Please run this function before running examples:
<- function(d, rho){
exchange_cor_matrix <- matrix(rho, d, d)
tmp diag(tmp) <- 1
return(tmp)
}
library(mvtnorm)
set.seed(10)
<- 5
d <- 1000
n <- round(n*c(0,cumsum(c(0.075,0.3,0.05,0.1,0.05)),1))
cp_sets <- c(-3,0,3,0,-3,0)/sqrt(d)
mean_shift <- sign(mean_shift)*ceiling(abs(mean_shift)*10)/10
mean_shift <- 0.5
rho_sets <- list(exchange_cor_matrix(d,0))
sigma_cross <- MAR_MTS_Covariance(n, 2, rho_sets, cp_sets=c(0,n), sigma_cross)
ts <- length(cp_sets)-2
noCP <- length(cp_sets)-1
no_seg for(rep_index in 1:2){
for(index in 1:no_seg){ # Mean shift
<- cp_sets[index]+1
tau1 <- cp_sets[index+1]
tau2 :tau2] <- ts[[rep_index]][,tau1:tau2] + mean_shift[index]
ts[[rep_index]][,tau1
}
}<- ts[1][[1]]
ts
# grid_size undefined
<- SNSeg_Multi(ts, paras_to_test = "mean", confidence = 0.95,
result grid_size_scale = 0.079, grid_size = NULL,
plot_SN = FALSE, est_cp_loc = TRUE)
# grid_size defined
<- SNSeg_Multi(ts, paras_to_test = "mean", confidence = 0.99,
result grid_size_scale = 0.05, grid_size = 65,
plot_SN = FALSE, est_cp_loc = TRUE)
# Estimated change-point locations
$est_cp
result# Parameter estimates (multivariate mean) of each segment
SNSeg_estimate(result)
# SN-based test statistic segmentation plot
<- max_SNsweep(result, plot_SN = TRUE, est_cp_loc = TRUE,
SN_stat critical_loc = TRUE)
# built-in functions
# summary statistic
summary(result)
# print
print(result)
# plot
par(mfrow=c(2,3))
plot(result, cpts.col = 'red')
library(mvtnorm)
set.seed(10)
<- 2
reptime <- 4
d <- 1000
n <- list(exchange_cor_matrix(d,0.2),
sigma_cross 2*exchange_cor_matrix(d,0.5),
4*exchange_cor_matrix(d,0.5))
<- c(0.3,0.3,0.3)
rho_sets <- c(0,0,0) # with mean change
mean_shift <- round(c(0,n/3,2*n/3,n))
cp_sets <- MAR_MTS_Covariance(n, reptime, rho_sets, cp_sets, sigma_cross)
ts <- length(cp_sets)-2
noCP <- length(cp_sets)-1
no_seg for(rep_index in 1:reptime){
for(index in 1:no_seg){ # Mean shift
<- cp_sets[index]+1
tau1 <- cp_sets[index+1]
tau2 :tau2] <- ts[[rep_index]][,tau1:tau2] +
ts[[rep_index]][,tau1
mean_shift[index]
}
}<- ts[[1]]
ts
# grid_size undefined
<- SNSeg_Multi(ts, paras_to_test = "covariance",
result confidence = 0.9, grid_size_scale = 0.05,
grid_size = NULL, plot_SN = FALSE, est_cp_loc = FALSE)
# grid_size defined
<- SNSeg_Multi(ts, paras_to_test = "covariance",
result confidence = 0.9, grid_size_scale = 0.05,
grid_size = 81, plot_SN = FALSE,
est_cp_loc = TRUE)
# Estimated change-point locations
$est_cp
result# Parameter estimates (covariance estimate) of each segment
SNSeg_estimate(result)
# SN-based test statistic segmentation plot
<- max_SNsweep(result, plot_SN = TRUE, est_cp_loc = TRUE,
SN_stat critical_loc = TRUE)
# built-in functions
# summary statistic
summary(result)
# print
print(result)
# plot
par(mfrow=c(1,1))
plot(result, cpts.col = 'red')
SNSeg_HD
The function SNSeg_HD
performs change-points estimation
for high-dimensional time series (dimension greater than 10) using the
change in high-dimensional means. For usage examples of
SNSeg_HD
, we provide the followig code:
<- 600
n <- 100
p <- 5
nocp <- round(seq(0,nocp+1,1)/(nocp+1)*n)
cp_sets <- 5
num_entry <- sqrt(4/5) # Wang et al(2020)
kappa <- rep(c(0,kappa),100)[1:(length(cp_sets)-1)]
mean_shift set.seed(1)
<- matrix(rnorm(n*p,0,1),n,p)
ts <- length(cp_sets)-1
no_seg for(index in 1:no_seg){ # Mean shift
<- cp_sets[index]+1
tau1 <- cp_sets[index+1]
tau2 :tau2,1:num_entry] <- ts[tau1:tau2,1:num_entry] +
ts[tau1# sparse change
mean_shift[index]
}# SN segmentation plot
# grid_size undefined (plot the first 3 time series)
<- SNSeg_HD(ts, confidence = 0.9, grid_size_scale = 0.05,
result grid_size = NULL, plot_SN = FALSE, est_cp_loc = TRUE,
ts_index = c(1:3))
# grid_size defined (plot the 1st, 3rd and 5th time series)
<- SNSeg_HD(ts, confidence = 0.9, grid_size_scale = 0.05,
result grid_size = 52, plot_SN = FALSE, est_cp_loc = TRUE,
ts_index = c(1,3,5))
# Estimated change-point locations
$est_cp
result# Parameter estimates (high-dimensional means) of each segment
<- SNSeg_estimate(result)
summary.stat # SN-based test statistic segmentation plot
<- max_SNsweep(result, plot_SN = TRUE, est_cp_loc = TRUE,
SN_stat critical_loc = TRUE)
# built-in functions
# summary statistic
summary(result)
# print
print(result)
# plot
plot(result, cpts.col = 'red', ts_index = c(1:3))
We note that only the selected time series were plotted by setting
values for ts_index
.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.