The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Type: Package
Title: Encoders for Categorical Variables
Version: 0.1.1
Author: nl zhang
Maintainer: nl zhang <setseed2016@gmail.com>
Description: Contains some commonly used categorical variable encoders, such as 'LabelEncoder' and 'OneHotEncoder'. Inspired by the encoders implemented in Python 'sklearn.preprocessing' package (see http://scikit-learn.org/stable/modules/preprocessing.html).
License: GPL-2 | GPL-3
LazyData: TRUE
Imports: Matrix (≥ 1.2-6), data.table (≥ 1.9.6), methods
RoxygenNote: 5.0.1
NeedsCompilation: no
Packaged: 2017-03-08 03:14:25 UTC; nl
Repository: CRAN
Date/Publication: 2017-03-08 08:22:03

An S4 class to represent a LabelEncoder.

Description

An S4 class to represent a LabelEncoder.

Slots

type

A character to denote the input type, either character, factor or numeric

mapping

A data.frame to store the mapping table


An S4 class to represent a LabelEncoder with character input.

Description

An S4 class to represent a LabelEncoder with character input.

Slots

classes

A character vector to store the unique values of classes


An S4 class to represent a LabelEncoder with factor input.

Description

An S4 class to represent a LabelEncoder with factor input.

Slots

classes

A factor vector to store the unique values of classes


An S4 class to represent a LabelEncoder with numeric input.

Description

An S4 class to represent a LabelEncoder with numeric input.

Slots

classes

A numeric vector to store the unique values of classes


LabelEncoder.fit fits a LabelEncoder object

Description

LabelEncoder.fit fits a LabelEncoder object

Usage

LabelEncoder.fit(y)

Arguments

y

A vector of characters, factors, or numerics, which can include NA as well

Value

Returns an object of S4 class LabelEncoder.

Examples

# factor y
y <- factor(c('a','d','e',NA),exclude=NULL)
lenc <- LabelEncoder.fit(y)
# new values are transformed to NA
z <- transform(lenc,factor(c('d','d',NA,'f')))
print(z)

# character y
y <- c('a','d','e',NA)
lenc <- LabelEncoder.fit(y)
# new values are transformed to NA
z <- transform(lenc,c('d','d',NA,'f'))
print(z)

# numeric y
set.seed(123)
y <- sample(c(1:10,NA),5)
lenc <- LabelEncoder.fit(y)
# new values are transformed to NA
z <-transform(lenc,sample(c(1:10,NA),5))
print(z)

An S4 class to represent a OneHotEncoder

Description

An S4 class to represent a OneHotEncoder

Slots

n_columns

An integer value to store the number of columns of input data

n_values

A numeric vector to store the number of unique values in each column of input data

column_encoders

A list that stores the LabelEncoder for each column of input data


OneHotEncoder.fit fits an OneHotEncoder object

Description

OneHotEncoder.fit fits an OneHotEncoder object

Usage

OneHotEncoder.fit(X)

Arguments

X

A matrix or data.frame, which can include NA

Value

Returns an object of S4 class OneHotEncoder

Examples

# matrix input
X1 <- matrix(c(0, 1, 0, 1, 0, 1, 2, 0, 3, 0, 1, 2),c(4,3),byrow=FALSE)
oenc <- OneHotEncoder.fit(X1)
z <- transform(oenc,X1,sparse=TRUE)
# return a sparse matrix
print(z)

# data.frame
X2 <- cbind(data.frame(X1),X4=c('a','b','d',NA),X5=factor(c(1,2,3,1)))
oenc <- OneHotEncoder.fit(X2)
z <- transform(oenc,X2,sparse=FALSE)
# return a dense matrix
print(z)

inverse.transform transforms an integer vector back to the original vector

Description

inverse.transform transforms an integer vector back to the original vector

Usage

inverse.transform(enc, z)

## S4 method for signature 'LabelEncoder,numeric'
inverse.transform(enc, z)

Arguments

enc

A fitted LabelEncoder

z

A vector of integers

Value

A vector of characters, factors or numerics.

Examples

# character vector y
y <- c('a','d','e',NA)
lenc <- LabelEncoder.fit(y)
# new values are transformed to NA
z <- transform(lenc,c('d','d',NA,'f'))
print(z)
inverse.transform(lenc,z)

# factor vector y
y <- factor(c('a','d','e',NA),exclude=NULL)
lenc <- LabelEncoder.fit(y)
# new values are transformed to NA
z <- transform(lenc,factor(c('a','d',NA,'f')))
inverse.transform(lenc,z)

# numeric vector y
set.seed(123)
y <- c(1:10,NA)
lenc <- LabelEncoder.fit(y)
# new values are transformed to NA
newy <- sample(c(1:10,NA),5)
print(newy)
z <-transform(lenc,newy)
inverse.transform(lenc, z)

transform transforms a new data set using the fitted encoder

Description

transform transforms a new data set using the fitted encoder

Usage

transform(enc, ...)

## S4 method for signature 'LabelEncoder.Numeric'
transform(enc, y)

## S4 method for signature 'LabelEncoder.Character'
transform(enc, y)

## S4 method for signature 'LabelEncoder.Factor'
transform(enc, y)

## S4 method for signature 'OneHotEncoder'
transform(enc, X, sparse = TRUE,
  new.feature.error = TRUE)

Arguments

enc

A fitted encoder, i.e., LabelEncoder or OneHotEncoder

...

Additional argument list

y

A vector of character, factor or numeric values

X

A data.frame or matrix

sparse

If TRUE then return a sparse matrix, default = TRUE

new.feature.error

If TRUE then throw an error for new feature values; otherwise the new feature values are ignored, default = TRUE

Value

If enc is an OneHotEncoder, the returned value is a sparse or dense matrix. If enc is a LabelEncoder, the returned value is a vector.

Examples

# matrix X
X1 <- matrix(c(0, 1, 0, 1, 0, 1, 2, 0, 3, 0, 1, 2),c(4,3),byrow=FALSE)
oenc <- OneHotEncoder.fit(X1)
z <- transform(oenc,X1,sparse=TRUE)
# return a sparse matrix
print(z)

# data.frame X
X2 <- cbind(data.frame(X1),X4=c('a','b','d',NA),X5=factor(c(1,2,3,1)))
oenc <- OneHotEncoder.fit(X2)
z <- transform(oenc,X2,sparse=FALSE)
# return a dense matrix
print(z)

# factor vector y
y <- factor(c('a','d','e',NA),exclude=NULL)
lenc <- LabelEncoder.fit(y)
# new values are transformed to NA
z <- transform(lenc,factor(c('d','d',NA,'f')))
print(z)

# character vector y
y <- c('a','d','e',NA)
lenc <- LabelEncoder.fit(y)
# new values are transformed to NA
z <- transform(lenc,c('d','d',NA,'f'))
print(z)

# numeric vector y
set.seed(123)
y <- sample(c(1:10,NA),5)
lenc <- LabelEncoder.fit(y)
# new values are transformed to NA
z <-transform(lenc,sample(c(1:10,NA),5))
print(z)

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.