The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Type: Package
Title: Goodness-of-Fit Tests using Kernelized Stein Discrepancy
Version: 1.0.1
Date: 2021-01-11
Description: An adaptation of Kernelized Stein Discrepancy, this package provides a goodness-of-fit test of whether a given i.i.d. sample is drawn from a given distribution. It works for any distribution once its score function (the derivative of log-density) can be provided. This method is based on "A Kernelized Stein Discrepancy for Goodness-of-fit Tests and Model Evaluation" by Liu, Lee, and Jordan, available at <doi:10.48550/arXiv.1602.03253>.
License: MIT + file LICENSE
LazyData: TRUE
RoxygenNote: 7.1.1
Imports: pryr, graphics, stats
Suggests: datasets, ggplot2, gridExtra, mclust, mvtnorm
NeedsCompilation: no
Packaged: 2021-01-11 07:33:36 UTC; danielkang
Author: Min Hyung Kang [aut, cre], Qiang Liu [aut]
Maintainer: Min Hyung Kang <Minhyung.Daniel.Kang@gmail.com>
Repository: CRAN
Date/Publication: 2021-01-11 08:50:16 UTC

Estimate Kernelized Stein Discrepancy (KSD)

Description

Estimate kernelized Stein discrepancy (KSD) using U-statistics, and use bootstrap to test H0: x_i is drawn from p(X) (via KSD=0).

Usage

KSD(x, score_function, kernel = "rbf", width = -1, nboot = 1000)

Arguments

x

Sample of size Num_Instance x Num_Dimension

score_function

(\nabla_x \log p(x)) Score funtion : takes x as input and output a column vector of size Num_Instance X Dimension. User may use pryr package to pass in a function that only takes in dataset as parameter, or user may also pass in computed score for a given dataset.

kernel

Type of kernel (default = 'rbf')

width

Bandwidth of the kernel (when width = -1 or 'median', set it to be the median distance between data points)

nboot

Bootstrap sample size

Value

A list which includes the following variables :

Examples

# Pass in a dataset generated by Gaussian distribution,
# use pryr package to pass in score function
model <- gmm()
X <- rgmm(model, n=100)
score_function = pryr::partial(scorefunctiongmm, model=model)
result <- KSD(X,score_function=score_function)

# Pass in a dataset generated by Gaussian distribution,
# pass in computed score rather than score function
model <- gmm()
X <- rgmm(model, n=100)
score_function = scorefunctiongmm(model=model, X=X)
result <- KSD(X,score_function=score_function)

# Pass in a dataset generated by Gaussian distribution,
# pass in computed score rather than score function
# Use median_heuristic by specifying width to be -2.0
model <- gmm()
X <- rgmm(model, n=100)
score_function = pryr::partial(scorefunctiongmm, model=model)
result <- KSD(X,score_function=score_function, 'rbf',-2.0)

# Pass in a dataset generated by specific Gaussian distribution,
# pass in computed score rather than score function
# Use median_heuristic by specifying width to be -2.0
model <- gmm()
X <- rgmm(model, n=100)
score_function = pryr::partial(scorefunctiongmm, model=model)
result <- KSD(X,score_function=score_function, 'rbf',-2.0)

Tests 1-dimensional Gaussian Mixture Models.

Description

Tests 1-dimensional Gaussian Mixture Models.

Usage

demo_gmm()

Tests multidimensional Gaussian Mixture Models.

Description

Tests multidimensional Gaussian Mixture Models.

Usage

demo_gmm_multi()

Fits Gaussian Mixture model and computes the KSD value for the model

Description

We fit a Gaussian Mixture Model for a given dataset (Fisher's Iris), and we compute the KSD P-value on the hold-out test dataset. User may tune the parameters and observe the change in results. Reports average of p-values obtained during each k-fold. It also plots the contour for each k-fold iteration if only 2 dimensions of data are used. If a vector is specified for nClust, the code tries each element as the number of clusters and reports the optimal parameter by choosing one with highest p-value.

Usage

demo_iris(cols = c(1, 2), nClust = 3, kfold = 5)

Arguments

cols

: Columns of iris data set to use. If 2 dimensions, plots the contour for each k-fold.

nClust

: Number of clusters want to estimate with If vector, use each element as number of clusters and reports the optimal number.

kfold

: Number of k to use for k-fold


Shows KSD p value change with respect variation in noise

Description

We generate a standard normal distribution, and add varying gaussian noise to this dataset and see the change in pvalues.

Usage

demo_normal_performance()

Tests 1-dimensional Gamma Distribution with customized parameters

Description

We generate a gamma distribution with given parameters, and add gaussian noise to this dataset. We then compute the score of each dataset for the original true distribution.

Usage

demo_simple_gamma(
  trueshape = 10,
  truescale = 3,
  noisemu = 5,
  noisesd = 2,
  n = 100
)

Arguments

trueshape

shape of true gamma distribution

truescale

scale of true gamma distribution

noisemu

mean of gaussian noise to add

noisesd

standard deviation of gaussian noise to add

n

number of samples to generate


Tests 1-dimensional Gaussian Distribution with customized parameters

Description

We generate a gaussian distribution with given parameters, and add noise to this dataset. We then compute the score of each dataset for the original true distribution.

Usage

demo_simple_gaussian(truemu = 5, truesd = 1, noisemu = 0, noisesd = 2, n = 100)

Arguments

truemu

mean of true distribution

truesd

standard deviation of true distribution

noisemu

mean of gaussian noise to add

noisesd

standard deviation of gaussian noise to add

n

number of samples to generate


Returns a Gaussian Mixture Model

Description

Returns a Gaussian Mixture Model

Usage

gmm(nComp = NULL, mu = NULL, sigma = NULL, weights = NULL, d = NULL)

Arguments

nComp

(scalar) : number of components

mu

(d by k): mean of each component

sigma

(d by d by k): covariance of each component

weights

(1 by k) : mixing weight of each proportion (optional)

d

: number of dimensions of vector (optional)

Value

model : A Gaussian Mixture Model generated from the given parameters

Examples

# Default 1-d gaussian mixture model
model <- gmm()

# 1-d Gaussian mixture model with 3 components
model <- gmm(nComp = 3)

# 3-d Gaussian mixture model with 3 components, with specified mu,sigma and weights
mu <- matrix(c(1,2,3,2,3,4,5,6,7),ncol=3)
sigma <- array(diag(3),c(3,3,3))
model <- gmm(nComp = 3, mu = mu, sigma=sigma, weights = c(0.2,0.4,0.4), d = 3)

Calculates the likelihood for a given dataset for a GMM

Description

Calculates the likelihood for a given dataset for a GMM

Usage

likelihoodgmm(model = NULL, X = NULL)

Arguments

model

: The Gaussian Mixture Model

X

(n by d): The dataset of interest, where n is the number of samples and d is the dimension

Value

P (n by k) : The likelihood of each dataset belonging to each of the k component

Examples

# compute likelihood for a default 1-d gaussian mixture model
# and dataset generated from it
model <- gmm()
X <- rgmm(model)
p <- likelihoodgmm(model=model, X=X)

Returns a perturbed model of given GMM

Description

Returns a perturbed model of given GMM

Usage

perturbgmm(model = NULL)

Arguments

model

: The base Gaussian Mixture Model

Value

perturbedModel : Perturbed model with added noise to the supplied GMM

Examples

#Add noise to default 1-d gaussian mixture model
model <- gmm()
noisymodel <- perturbgmm(model)

Plots histogram for 1-d GMM given the dataset

Description

Plots histogram for 1-d GMM given the dataset

Usage

plotgmm(data, mu = NULL)

Arguments

data

(n by 1): The dataset of interest, where n is the number of samples.

mu

: True mean of the GMM (optional)

Examples

# Plot pdf histogram for a given dataset
model <- gmm()
X <- rgmm(model)
plotgmm(data=X)

# Plot pdf histogram for a given dataset, with lines that indicate the mean
model <- gmm()
mu <- model$mu
X <- rgmm(model)
plotgmm(data=X, mu=mu)

Calculates the posterior probability for a given dataset for a GMM

Description

Calculates the posterior probability for a given dataset for a GMM

Usage

posteriorgmm(model = NULL, X = NULL)

Arguments

model

: The Gaussian Mixture Model

X

(n by d): The dataset of interest, where n is the number of samples and d is the dimension

Value

P (n by k) : The posterior probabilty of each dataset belonging to each of the k component

Examples

# compute posterior probability for a default 1-d gaussian mixture model
# and dataset generated from it
model <- gmm()
X <- rgmm(model)
p <- posteriorgmm(model=model, X=X)

Generates dataset from Gaussian Mixture Model

Description

Generates dataset from Gaussian Mixture Model

Usage

rgmm(model = NULL, n = 100)

Arguments

model

: Gaussian Mixture Model defined by gmm()

n

: number of samples desired

Value

data (n by d): Random dataset generated from given the Gaussian Mixture Model

Note

Requires library mvtnorm

Examples

#Generate 100 samples from default gaussian mixture model
model <- gmm()
X <- rgmm(model)

#Generate 300 samples from 3-d gaussian mixture model
model <- gmm(d=3)
X <- rgmm(model,n=300)

Score function for given GMM : calculates score function dlogp(x)/dx for a given Gaussian Mixture Model

Description

Score function for given GMM : calculates score function dlogp(x)/dx for a given Gaussian Mixture Model

Usage

scorefunctiongmm(model = NULL, X = NULL)

Arguments

model

: The Gaussian Mixture Model

X

(n by d): The dataset of interest, where n is the number of samples and d is the dimension

Value

y : The score computed by the given function

Examples

# Compute score for a given gaussianmixture model and dataset
model <- gmm()
X <- rgmm(model)
score <- scorefunctiongmm(model=model, X=X)

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.