The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Title: Feature Selection Using Supervised Filter-Based Methods
Version: 0.1.0
Description: Tidy tools to apply filter-based supervised feature selection methods. These methods score and rank feature relevance using metrics such as p-values, correlation, and importance scores (Kuhn and Johnson (2019) <doi:10.1201/9781315108230>).
License: MIT + file LICENSE
URL: https://github.com/tidymodels/filtro
BugReports: https://github.com/tidymodels/filtro/issues
Depends: R (≥ 4.1)
Imports: purrr, rlang (≥ 1.1.0), stats, tibble
Suggests: aorsf, dplyr, FSelectorRcpp, modeldata, partykit, ranger, testthat (≥ 3.0.0), titanic
Config/Needs/website: tidyverse/tidytemplate
Config/testthat/edition: 3
Encoding: UTF-8
RoxygenNote: 7.3.2
NeedsCompilation: no
Packaged: 2025-07-15 16:22:14 UTC; franceslin
Author: Frances Lin [aut, cre], Max Kuhn [aut], Emil Hvitfeldt [aut], Posit Software, PBC ROR ID [cph, fnd]
Maintainer: Frances Lin <franceslinyc@gmail.com>
Repository: CRAN
Date/Publication: 2025-07-18 15:20:24 UTC

filtro: Feature Selection Using Supervised Filter-Based Methods

Description

Tidy tools to apply filter-based supervised feature selection methods. These methods score and rank feature relevance using metrics such as p-values, correlation, and importance scores (Kuhn and Johnson (2019) doi:10.1201/9781315108230).

Author(s)

Maintainer: Frances Lin franceslinyc@gmail.com

Authors:

Other contributors:

See Also

Useful links:


Compute F-statistic and p-value scores using ANOVA F-test

Description

Evaluate the relationship between a numeric outcome and a categorical predictor, or vice versa, by computing the ANOVA F-statistic or p-value. Output a tibble result with with one row per predictor, and four columns: name, score, predictor, and outcome.

Usage

get_scores_aov(score_obj, data, outcome)

Arguments

score_obj

A score object. See score_aov() for details.

data

A data frame or tibble containing the outcome and predictor variables.

outcome

A character string specifying the name of the outcome variable.

Details

The score_obj object may include the following components:

neg_log10

A logical value indicating whether to apply a negative log10 transformation to p-values (default is TRUE).

  • If TRUE, p-values are transformed as -log10(pval). In this case:

    • The default fallback_value is Inf

    • The default direction is "maximize"

  • If FALSE, raw p-values are used. In this case:

    • The fallback_value should be set to 0

    • The direction should be set to "minimize"

Value

A tibble of result with one row per predictor, and four columns:

Examples

data(ames, package = "modeldata")
data <- modeldata::ames |>
  dplyr::select(
    Sale_Price,
    MS_SubClass,
    MS_Zoning,
    Lot_Frontage,
    Lot_Area,
    Street
  )
# Define outcome
outcome <- "Sale_Price"
# Create a score object
score_obj <- score_aov()
score_res <- get_scores_aov(score_obj, data, outcome)
score_res
# Change score type
score_obj$score_type <- "pval"
score_res <- get_scores_aov(score_obj, data, outcome)
score_res
# Use raw p-values instead of -log10(p-values)
score_obj$score_type <- "pval"
score_obj$neg_log10 <- FALSE
score_obj$direction <- "minimize"
score_obj$fallback_value <- 0
score_res <- get_scores_aov(score_obj, data, outcome)
score_res

Construct a new score object

Description

Create a new score object that contains associated metadata, such as range, fallback_value, score_type, direction, and other relevant attributes.

Usage

new_score_obj(
  subclass = c("cat_num", "cat_cat", "num_num", "any"),
  outcome_type = c("numeric", "factor"),
  predictor_type = c("numeric", "factor"),
  case_weights = NULL,
  range = NULL,
  inclusive = NULL,
  fallback_value = NULL,
  score_type = NULL,
  trans = NULL,
  sorts = NULL,
  direction = NULL,
  deterministic = NULL,
  tuning = NULL,
  ties = NULL,
  calculating_fn = NULL,
  label = NULL,
  ...
)

Arguments

subclass

A character string indicating the type of predictor-outcome combination the scoring method supports. One of:

  • "cat_num"

  • "cat_cat"

  • "num_num"

  • "any"

outcome_type

A character string indicating the outcome type. One of:

  • "numeric"

  • "factor"

predictor_type

A character string indicating the predictor type. One of:

  • "numeric"

  • "factor"

case_weights

A logical value, indicating whether the model accepts case weights (TRUE) or not (FALSE).

range

A numeric vector of length two, specifying the minimum and maximum possible values, respectively.

inclusive

A logical vector of length two, indicating whether the lower and upper bounds of the range are inclusive (TRUE) or exclusive (FALSE).

fallback_value

A numeric scalar used as a fallback value. Typical values include:

  • 0

  • 1

  • Inf

score_type

A character string indicating the type of scoring metric to compute. Available options include:

  • ANOVA F-Test: "fstat", "pval"

  • Correlation: "pearson", "spearman"

  • Cross Tabulation: "pval_chisq", "pval_fisher"

  • Random Forest:"imp_rf", "imp_rf_conditional", "imp_rf_oblique"

  • Information Gain: "infogain", "gainratio", "symuncert"

  • ROC AUC: "roc_auc"

trans

Currently not used.

sorts

An optional function used to sort the scores. Common options include:

  • identity

  • abs

  • function(score) max(score, 1 - score)

direction

A character string indicating the optimization direction. One of:

  • "maximize"

  • "minimize"

  • "target"

deterministic

A logical value, indicating whether the score is deterministic (TRUE) or not (FALSE).

tuning

A logical value, indicating whether the model should be tuned (TRUE) or not (FALSE).

ties

An optional logical value indicating whether ties in score can occur (TRUE) or not (FALSE).

calculating_fn

An optional function used to compute the score. A default function is selected based on the score_type.

label

A named character string that can be used for printing and plotting.

...

Currently not used.

Value

A score object.

Examples

# Create a score object
new_score_obj()

Create a score object for ANOVA F-test F-statistics and p-values

Description

Construct a score object containing metadata for univariate feature scoring using the ANOVA F-test. Output a score object containing associated metadata such as range, fallback_value, score_type ("fstat" or "pval"), direction, and other relevant attributes.

Usage

score_aov(
  range = c(0, Inf),
  fallback_value = Inf,
  score_type = "fstat",
  direction = "maximize"
)

Arguments

range

A numeric vector of length two, specifying the minimum and maximum possible values, respectively.

fallback_value

A numeric scalar used as a fallback value. Typical values include:

  • 0

  • Inf (default)

For F-statistics, the fallback_value should be "Inf". For p-values, since the default applies a negative log10 transformation to p-values, the fallback_value should be "Inf".

score_type

A character string indicating the type of scoring metric to compute. Available options include:

  • "fstat"

  • "pval"

direction

A character string indicating the optimization direction. One of:

  • "maximize" (default)

  • "minimize"

  • "target"

For F-statistics, the direction should be "maximize". For p-values, since the default applies a negative log10 transformation to p-values, the direction should be "maximize".

Value

A score object containing associated metadata such as range, fallback_value, score_type ("fstat" or "pval"), direction, and other relevant attributes.

Examples

# Create a score object
score_aov()
# Change score type
score_obj <- score_aov()
score_obj$score_type <- "pval"

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.