The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Title: Method for Clustering Partially Observed Data
Version: 1.1
Description: Software for k-means clustering of partially observed data from Chi, Chi, and Baraniuk (2016) <doi:10.1080/00031305.2015.1086685>.
URL: http://jocelynchi.com/kpodclustr
Depends: R (≥ 3.1.0)
License: MIT + file LICENSE
LazyData: true
RoxygenNote: 7.1.0
Encoding: UTF-8
NeedsCompilation: no
Packaged: 2020-06-23 15:46:38 UTC; jtc
Author: Jocelyn T. Chi [aut, cre], Eric C. Chi [aut, ctb], Richard G. Baraniuk [aut]
Maintainer: Jocelyn T. Chi <jtchi@ncsu.edu>
Repository: CRAN
Date/Publication: 2020-06-24 09:10:06 UTC

Function for assigning clusters to rows in a matrix

Description

assign_clustpp Function for assigning clusters to rows in a matrix

Usage

assign_clustpp(X, init_centers, kmpp_flag = TRUE, max_iter = 20)

Arguments

X

Data matrix containing missing entries whose rows are observations and columns are features

init_centers

Centers for initializing k-means

kmpp_flag

(Optional) Indicator for whether or not to initialize with k-means++

max_iter

(Optional) Maximum number of iterations

Author(s)

Jocelyn T. Chi

Examples

p <- 2
n <- 100
k <- 3
sigma <- 0.25
missing <- 0.05
Data <- makeData(p,n,k,sigma,missing)
X <- Data$Missing
Orig <- Data$Orig

clusts <- assign_clustpp(Orig, k)


Function for finding indices of missing data in a matrix

Description

findMissing Function for finding indices of missing data in a matrix

Usage

findMissing(X)

Arguments

X

Data matrix containing missing entries whose rows are observations and columns are features

Value

A numeric vector containing indices of the missing entries in X

Author(s)

Jocelyn T. Chi

Examples

p <- 2
n <- 100
k <- 3
sigma <- 0.25
missing <- 0.05
Data <- makeData(p,n,k,sigma,missing)
X <- Data$Missing
missing <- findMissing(X)


Function for initial imputation for k-means

Description

initialImpute Initial imputation for k-means

Usage

initialImpute(X)

Arguments

X

Data matrix containing missing entries whose rows are observations and columns are features

Value

A data matrix containing no missing entries

Author(s)

Jocelyn T. Chi

Examples

p <- 2
n <- 100
k <- 3
sigma <- 0.25
missing <- 0.05
Data <- makeData(p,n,k,sigma,missing)
X <- Data$Missing
X_copy <- initialImpute(X)


k-means++

Description

kmpp Computes initial centroids via kmeans++

Usage

kmpp(X, k)

Arguments

X

Data matrix whose rows are observations and columns are features

k

Number of clusters.

Value

A data matrix whose rows contain initial centroids for the k clusters

Examples

n <- 10
p <- 2
X <- matrix(rnorm(n*p),n,p)
k <- 3
kmpp(X,k)


Function for performing k-POD

Description

kpod Function for performing k-POD, a method for k-means clustering on partially observed data

Usage

kpod(X, k, kmpp_flag = TRUE, maxiter = 100)

Arguments

X

Data matrix containing missing entries whose rows are observations and columns are features

k

Number of clusters

kmpp_flag

(Optional) Indicator for whether or not to initialize with k-means++

maxiter

(Optional) Maximum number of iterations

Value

cluster: Clustering assignment obtained with k-POD

cluster_list: List containing clustering assignments obtained in each iteration

obj_vals: List containing the k-means objective function in each iteration

fit: Fit of clustering assignment obtained with k-POD (calculated as 1-(total withinss/totss))

fit_list: List containing fit of clustering assignment obtained in each iteration

Author(s)

Jocelyn T. Chi

Examples

p <- 5
n <- 200
k <- 3
sigma <- 0.15
missing <- 0.20
Data <- makeData(p,n,k,sigma,missing)
X <- Data$Missing
Orig <- Data$Orig
truth <- Data$truth

kpod_result <- kpod(X,k)
kpodclusters <- kpod_result$cluster


Make test data

Description

makeData Function for making test data

Usage

makeData(p, n, k, sigma, missing, seed = 12345)

Arguments

p

Number of features (or variables)

n

Number of observations

k

Number of clusters

sigma

Variance

missing

Desired missingness percentage

seed

(Optional) Seed (default seed is 12345)

Author(s)

Jocelyn T. Chi

Examples

p <- 2
n <- 100
k <- 3
sigma <- 0.25
missing <- 0.05

X <- makeData(p,n,k,sigma,missing)$Orig

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.