Getting Started with twinsvm

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

twinsvm fits twin support vector machines and provides a standard C-SVC SVM baseline for comparison. Binary fits use two-class factors: level 1 is class B, level 2 is class A. Multiclass fits use one-vs-one majority voting, with ties resolved by the first factor level.

Generate data and fit a twin SVM

library(twinsvm)

set.seed(1)
dat <- gen_moons(100, noise = 0.12)
fit <- tsvm(dat$x, dat$y, kernel = "rbf", gamma = 2, c1 = 0.1, c2 = 0.1)
head(predict(fit, dat$x))
#> [1] B B B B B B
#> Levels: B A
mean(predict(fit, dat$x) == dat$y)
#> [1] 1

Plot the boundary

plot(fit)

For a linear twin SVM, the two fitted planes are drawn as dashed lines.

linear_fit <- tsvm(dat$x, dat$y, kernel = "linear")
plot(linear_fit)

Cross-validation

cv <- cv_tsvm(
  dat$x,
  dat$y,
  c1_grid = c(0.1, 1),
  c2_grid = c(0.1, 1),
  gamma_grid = c(1, 2),
  kernel = "rbf",
  k = 3
)
cv$best_params
#> $c1
#> [1] 1
#> 
#> $c2
#> [1] 1
#> 
#> $gamma
#> [1] 1
plot(cv)

Multiclass

set.seed(4)
x3 <- rbind(
  matrix(rnorm(30, -2, 0.25), ncol = 2),
  cbind(rnorm(15, 2, 0.25), rnorm(15, -2, 0.25)),
  matrix(rnorm(30, 2, 0.25), ncol = 2)
)
y3 <- factor(rep(c("alpha", "beta", "gamma"), each = 15))

multi <- tsvm(x3, y3, kernel = "linear")
head(predict(multi, x3))
#> [1] alpha alpha alpha alpha alpha alpha
#> Levels: alpha beta gamma
head(predict(multi, x3, type = "votes"))
#>      alpha beta gamma
#> [1,]     2    1     0
#> [2,]     2    1     0
#> [3,]     2    1     0
#> [4,]     2    1     0
#> [5,]     2    1     0
#> [6,]     2    1     0
confusion(multi, x3, y3)
#> $table
#>        predicted
#> truth   alpha beta gamma
#>   alpha    15    0     0
#>   beta      0   15     0
#>   gamma     0    0    15
#> 
#> $accuracy
#> [1] 1

Compare with standard SVM

timing <- data.frame(
  n = c(40, 80, 120),
  tsvm_seconds = NA_real_,
  svms_seconds = NA_real_
)

for (i in seq_len(nrow(timing))) {
  set.seed(i)
  d <- gen_moons(timing$n[i], noise = 0.12)
  timing$tsvm_seconds[i] <- system.time(tsvm(d$x, d$y, kernel = "rbf", gamma = 2))[["elapsed"]]
  timing$svms_seconds[i] <- system.time(svms(d$x, d$y, kernel = "rbf", gamma = 2))[["elapsed"]]
}
timing
#>     n tsvm_seconds svms_seconds
#> 1  40            0            0
#> 2  80            0            0
#> 3 120            0            0

The timing table is generated on the machine running this vignette. Kernel twin-SVM forms invert an (n + 1) matrix, so they are meant for small to moderate data.

Visualization

circles <- gen_circles(100, noise = 0.04)
lift_plot(circles$x, circles$y, gamma = 1)

The same data can be shown through the three fitted classifiers in one row.

set.seed(2)
small <- gen_moons(60, noise = 0.1)
compare_methods(small$x, small$y, gamma = 1, c1 = 0.2, c2 = 0.2, cost = 1)

morph_boundary() returns a gganimate object. Rendering is left to the user so package examples stay fast.

anim <- morph_boundary(dat$x, dat$y, param = "gamma", range = c(0.5, 2), kernel = "rbf", n = 5)
class(anim)
#> [1] "gganim"          "ggplot2::ggplot" "ggplot"          "ggplot2::gg"    
#> [5] "S7_object"       "gg"

Validation

The standard SVM baseline is tested against e1071, which is backed by LIBSVM. There is no existing R twin-SVM package to match against, so twin-SVM tests validate plane-distance behavior, nonlinear kernel improvement, and agreement between the least-squares and original QP formulations. The algorithms follow Jayadeva, Khemchandani, and Chandra (2007) and Kumar and Gopal (2009).

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.