The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Getting started with functional statistical testing

library(funStatTest)

The funStatTest package implements various statistics for two sample comparison testing regarding functional data.

Authorship and license

This package is developed by:

It implements statistics (and related experiments) introduced and used in [1].

Data simulation

Here are some functions used to simulate clustered trajectories of functional data based on the Karhunen-Loève decomposition.

The functional data simulation process is described in [1] (section 3.1).

Simulate a single trajectory

simu_vec <- simul_traj(100)
plot(simu_vec, xlab = "point", ylab = "value")

Simulate trajectories from two samples diverging by a delta

simu_data <- simul_data(
    n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 10, 
    delta_shape = "constant", distrib = "normal"
)
str(simu_data)
#> List of 5
#>  $ mat_sample1: num [1:100, 1:50] 10 10.1 10.2 10.3 10.4 ...
#>  $ mat_sample2: num [1:100, 1:75] 0 0.00241 0.00497 0.00775 0.01071 ...
#>  $ c_val      : num 10
#>  $ distrib    : chr "normal"
#>  $ delta_shape: chr "constant"

Graphical representation of simulated data

# constant delta
simu_data <- simul_data(
    n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 5, 
    delta_shape = "constant", distrib = "normal"
)
plot_simu(simu_data)

# linear delta
simu_data <- simul_data(
    n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 5, 
    delta_shape = "linear", distrib = "normal"
)
plot_simu(simu_data)

# quadratic delta
simu_data <- simul_data(
    n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 5, 
    delta_shape = "quadratic", distrib = "normal"
)
plot_simu(simu_data)

Statistics

MO median statistic

The \(MO\) median statistic [1] is implemented in the stat_mo() function.

simu_data <- simul_data(
    n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 10, 
    delta_shape = "constant", distrib = "normal"
)

MatX <- simu_data$mat_sample1
MatY <- simu_data$mat_sample2

stat_mo(MatX, MatY)
#> [1] 0.9988008

MED median statistic

The \(MED\) median statistic [1] is implemented in the stat_med() function.

simu_data <- simul_data(
    n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 10, 
    delta_shape = "constant", distrib = "normal"
)

MatX <- simu_data$mat_sample1
MatY <- simu_data$mat_sample2

stat_med(MatX, MatY)
#> [1] 0.9993601

WMW statistic

The Wilcoxon-Mann-Whitney statistic [2] (noted \(WMW\) in [1]) is implemented in the stat_wmw() function.

simu_data <- simul_data(
    n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 10, 
    delta_shape = "constant", distrib = "normal"
)

MatX <- simu_data$mat_sample1
MatY <- simu_data$mat_sample2

stat_wmw(MatX, MatY)
#> [1] 0.9984562

HKR statistics

The Horváth-Kokoszka-Reeder statistics [3] (noted \(HKR1\) and \(HKR2\) in [1]) are implemented in the stat_hkr() function.

simu_data <- simul_data(
    n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 10, 
    delta_shape = "constant", distrib = "normal"
)

MatX <- simu_data$mat_sample1
MatY <- simu_data$mat_sample2

stat_hkr(MatX, MatY)
#> $T1
#> [1] 56962058560
#> 
#> $T2
#> [1] 297023.2
#> 
#> $eigenval
#>  [1] 3.664005e+01 5.308985e+00 1.522118e+00 6.759658e-01 2.275435e-01
#>  [6] 1.705847e-01 9.892918e-02 6.026029e-02 3.071646e-02 1.788522e-02
#> [11] 1.102164e-02 9.424057e-03 6.172824e-03 5.108395e-03 2.334137e-03
#> [16] 1.450027e-03 3.118851e-04 6.975138e-05 1.716764e-09

CFF statistic

The Cuevas-Febrero-Fraiman statistic [4] (noted \(CFF\) in [1]) is implemented in the stat_cff() function.

simu_data <- simul_data(
    n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 10, 
    delta_shape = "constant", distrib = "normal"
)

MatX <- simu_data$mat_sample1
MatY <- simu_data$mat_sample2

stat_cff(MatX, MatY)
#> [1] 511045.7

Compute multiple statistics

The function comp_stat() allows to compute multiple statistics defined above in a single call on the same data.

simu_data <- simul_data(
    n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 10, 
    delta_shape = "constant", distrib = "normal"
)

MatX <- simu_data$mat_sample1
MatY <- simu_data$mat_sample2

res <- comp_stat(MatX, MatY, stat = c("mo", "med", "wmw", "hkr", "cff"))
res
#> $mo
#> [1] 0.9989056
#> 
#> $med
#> [1] 0.9992861
#> 
#> $wmw
#> [1] 0.9985099
#> 
#> $hkr
#>            [,1]
#> T1 5.132184e+10
#> T2 3.018523e+05
#> 
#> $cff
#> [1] 511580.1

Permutation-based computation of p-values

P-values associated to the different statistics defined above can be computed with the permutation-based method as follow:

# simulate small data for the example
simu_data <- simul_data(
    n_point = 20, n_obs1 = 4, n_obs2 = 5, c_val = 10, 
    delta_shape = "constant", distrib = "normal"
)

MatX <- simu_data$mat_sample1
MatY <- simu_data$mat_sample2
res <- permut_pval(
    MatX, MatY, n_perm = 100, stat = c("mo", "med", "wmw", "hkr", "cff"), 
    verbose = TRUE)
res
#> $mo
#> [1] 0.01980198
#> 
#> $med
#> [1] 0.01980198
#> 
#> $wmw
#> [1] 0.01980198
#> 
#> $hkr
#>         T1         T2 
#> 0.01980198 0.01980198 
#> 
#> $cff
#> [1] 0.01980198

:warning: computing p-values based on permutations may take some time (for large data or when using a large number of simulations. :warning:

Simulation-based power analysis

We use our simulation scheme with permutation-based p-values computation to run a power analysis to evaluate the different statistics.

# simulate a few small data for the example
res <- power_exp(
    n_simu = 20, alpha = 0.05, n_perm = 100, 
    stat = c("mo", "med", "wmw", "hkr", "cff"), 
    n_point = 25, n_obs1 = 4, n_obs2 = 5, c_val = 10, delta_shape = "constant", 
    distrib = "normal", max_iter = 10000, verbose = FALSE
)
res$power_res
#> $mo
#> [1] 1
#> 
#> $med
#> [1] 1
#> 
#> $wmw
#> [1] 1
#> 
#> $hkr
#> T1 T2 
#>  1  1 
#> 
#> $cff
#> [1] 1

References

1.
Smida, Z, Cucala, L, Gannoun, A, and Durif, G 2022 A median test for functional data. Journal of Nonparametric Statistics, 34(2): 520–553. DOI: https://doi.org/10.1080/10485252.2022.2064997
2.
Chakraborty, A and Chaudhuri, P 2015 A Wilcoxon–Mann–Whitney-type test for infinite-dimensional data. Biometrika, 102(1): 239–246. DOI: https://doi.org/10.1093/biomet/asu072
3.
Horváth, L, Kokoszka, P, and Reeder, R 2013 Estimation of the mean of functional time series and a two-sample problem. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 75(1): 103–122. DOI: https://doi.org/10.1111/j.1467-9868.2012.01032.x
4.
Cuevas, A, Febrero, M, and Fraiman, R 2004 An anova test for functional data. Computational Statistics & Data Analysis, 47(1): 111–122. DOI: https://doi.org/10.1016/j.csda.2003.10.021

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.