README

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

SVMODT: Support Vector Machine based Oblique Decision Trees

The svmodt package in R implements recursive oblique decision trees, leveraging linear Support Vector Machines (SVMs) to define oblique splits at each node. While traditional decision trees are valued for their interpretability due to axis-aligned splits, oblique decision trees introduce complexity by using linear combinations of features, making optimal split determination more challenging. SVMs, however, offer a principled approach to splitting by identifying hyperplanes that maximize the margin between classes.

Installation

# install.packages("devtools")
devtools::install_github("AneeshAgarwala/svmodt")

Key Features

Examples

library(svmodt)

# Load data
data(wdbc)  # The package is inclusive of this dataset
wdbc$diagnosis <- factor(wdbc$diagnosis)

# Split
set.seed(123)
train_idx <- sample(nrow(wdbc), 0.8 * nrow(wdbc))
train_data <- wdbc[train_idx, ]
test_data <- wdbc[-train_idx, ]

SVMODT Tree Workflow

# Train with class weights
tree <- svm_split(
  data = train_data,
  response = "diagnosis",
  max_depth = 4,
  max_features = 2,
  feature_method = "mutual",
  class_weights = "balanced",
  verbose = TRUE
)

# Predict
predictions <- predict(tree, test_data)

# Visualize Split Boundary at Individual Node(s)
viz <- plot(
  tree = tree,
  original_data = train_data,
  response_col = "diagnosis",
  plot.type = "boundary"
)

# Visualize Overall Surface Split(s) 
viz <- plot_surface(
  tree = tree,
  data = data, 
  response = "diagnois",
  plot.type = "surface")

Advanced Usage

Feature Selection with Penalties

# Penalize previously used features to promote diversity
tree <- svm_split(
  data = train_data,
  response = "diagnosis",
  max_depth = 4,
  max_features = 3,
  feature_method = "mutual",
  penalize_used_features = TRUE,
  feature_penalty_weight = 0.5
)

Dynamic Feature Selection

set.seed(123)
# Decrease number of features at deeper levels
tree <- svm_split(
  data = train_data,
  response = "diagnosis",
  max_depth = 5,
  max_features = 10,
  max_features_strategy = "decrease",
  max_features_decrease_rate = 0.8
)

# Random feature selection at each node
tree <- svm_split(
  data = train_data,
  response = "diagnosis",
  max_features_strategy = "random",
  max_features_random_range = c(0.3, 0.8)
)

Handle Imbalanced Data

# Balanced class weights
tree <- svm_split(
  data = train_data,
  response = "diagnosis",
  class_weights = "balanced"
)

set.seed(123)
# Custom class weights
custom_weights <- c("B" = 1, "M" = 3)
tree <- svm_split(
  data = train_data,
  response = "diagnosis",
  class_weights = "custom",
  custom_class_weights = custom_weights
)

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.