The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Title: Concise and Efficient Tools for Everyday Statistical Production
Version: 1.0
Description: A set of concise and efficient tools for statistical production. Can also be used for data management. In statistical production, you deal with complex data and need to control your process at each step of your work. Concise functions are very helpful, because you do not hesitate to use them. The following functions are included in the package. 'dup' checks duplicates. 'miss' checks missing values. 'tac' computes contingency table of all columns. 'toc' compares two tables, spotting significant deviations. 'chi2_find' compares columns within a data.frame, spotting related categories of (a more complex function).
Encoding: UTF-8
Imports: dplyr, rlang, tibble
RoxygenNote: 7.3.2
License: MIT + file LICENSE
NeedsCompilation: no
Packaged: 2025-10-28 12:50:38 UTC; vincent.reduron
Author: Vincent Reduron [cre, aut]
Maintainer: Vincent Reduron <vincent.reduron@laposte.net>
Repository: CRAN
Date/Publication: 2025-10-31 18:30:02 UTC

short for 'paste0()'

Description

short for 'paste0()'

Usage

a %+% b

Arguments

a

string

b

string

Value

string


Find modalities related to a criterion

Description

Find modalities related to a criterion

Usage

chi2_find(df, criterion)

Arguments

df

data.frame

criterion

character string: criterion that spots target rows

Value

data.frame


coltypes()

Description

Create vector of df's column types. Similar to colnames(), but with column types instead of names.

Usage

coltypes(df)

Arguments

df

data.frame

Value

vector


Analysis of the cardinality of a key/identifier in a table

Description

Creates multiple result tables. The term "n-plicate" is used to generalize the notion of duplicate: a n_plicate can be a duplicate, a triplicate, etc.

Usage

dup(tab, keyby, count_what = "rows", partition = NULL, view = TRUE)

Arguments

tab

Either an R dataframe or a reference to a remote table ("remote table")

keyby

(character vector) names of the column(s) considered as keys

count_what

(character vector) defines what to count by key (by *keyby*). 'rows' to count distinct rows, otherwise the name of the columns whose distinct values are to be counted

partition

(character vector) names of the columns by which to break down the analysis

view

automatic opening of generated tables

Value

A set of dataframes in the global environment. * nup_r_tab: table of n-plicate counts * nup_xpl_r: table of n-plicate examples * nup_exZ_r: table of examples of (n-plicates with value 0) * nup_r_tab_part: table of n-plicate counts broken down by the modalities of the 'partition' columns

Examples

# Check if "name" is a unique key of the starwars table (yes !)
dup(dplyr::starwars, keyby = "name", view = FALSE)

# Check if "key" is a unique key of the basic table (no !)
basic <- data.frame("key"   = c("a", "b", "c", "d", NA, "a", "e", "f"), 
                    "value" = c(112, 117, 317,  NA,  0,  17, 117, 112))
dup(basic, keyby = "key", view = FALSE)


get_recursion_depth

Description

get recursion depth of a list

Usage

get_recursion_depth(x, depth = 0)

Arguments

x

: input list

depth

: depth of x in another list (1 if x in a list. 2 if x is in a list of lists. Etc.)

Value

integer


Contingency table for column 'col_name' in data.frame 'df

Description

Contingency table for column 'col_name' in data.frame 'df

Usage

get_tac_column(df, col_name, values, strates)

Arguments

df

Input data.frame

col_name

string : name of column to which generate the contingency table

values

Vector of columns that serve as measures (amounts, counts, etc.)

strates

Vector of column names by which to stratify the contingency tables


is.Date

Description

Returns TRUE or FALSE depending on whether its argument is of Date type or not

Usage

is.Date(x)

Arguments

x

object

Value

TRUE/FALSE


is.POSIXct

Description

Returns TRUE or FALSE depending on whether its argument is of POSIXct type or not

Usage

is.POSIXct(x)

Arguments

x

object

Value

TRUE/FALSE


is.POSIXlt

Description

Returns TRUE or FALSE depending on whether its argument is of POSIXlt type or not

Usage

is.POSIXlt(x)

Arguments

x

object

Value

TRUE/FALSE


is.POSIXt

Description

Returns TRUE or FALSE depending on whether its argument is of POSIXxt type or not

Usage

is.POSIXt(x)

Arguments

x

object

Value

TRUE/FALSE


Missing: Generate a synthetic table of missing values for all columns of a data.frame

Description

Get a synthetic table of missing values for all columns of a data.frame

Usage

miss(df, values = NULL, view = FALSE)

Arguments

df

data.frame: Input data.frame

values

column: Variable (~weight) to measure the number of missing values (otherwise, count of rows)

view

boolean: Display a glimpse of cases with NA values

Value

data.frame

Examples

miss(mtcars)  # Checking NA values for all columns of mtcars (none)


Computes a contingency table (tac) of all columns in a dataframe for control purposes

Description

Contingency table (tac) of all columns in a dataframe for control purposes

Usage

tac(
  df,
  values = NULL,
  sample_rate = 0.01,
  num_but_discrete = "NULL",
  strates = NULL
)

Arguments

df

Input data.frame

values

Vector of columns that serve as measures (amounts, counts, etc.)

sample_rate

Sampling rate, if df is a remote table

num_but_discrete

Vector of names of numeric columns with discrete modalities (not continuous)

strates

Vector of column names by which to stratify the contingency tables

Value

data.frame

Examples

tab <- tac(iris) # calculate column frequencies


TAC-based Outlier Control (TOC)

Description

Generalized detection of outlier values in a database, based on contingency tables (tac)

Usage

toc(
  df1,
  df2,
  values = NULL,
  a = 10,
  r = 0.34,
  sample_rate = 0.01,
  num_but_discrete = "NULL"
)

Arguments

df1

Input data.frame (to compare with df2)

df2

Input data.frame (to compare with df1)

values

Vector of columns that serve as measures (amounts, counts, etc.)

a

Allowed absolute variation

r

Allowed relative variation

sample_rate

Sampling rate, if df is a remote table

num_but_discrete

Numeric variables to be treated as discrete modal variables. If 'all', all numeric variables are treated as discrete modal variables.

Details

Currently does not work with values parameter

Value

data.frame


Scoring significativity of difference between two values x and y

Description

Difference score between x and y (0 = no significant difference, >0 = presence of significant difference)

Usage

toc_score(x, y, a)

Arguments

x

(num) First value to compare

y

(num) Second value to compare

a

(num) Absolute difference threshold below which all differences are considered normal

Value

numeric

Examples

toc_score(15, 1500, a = 500) # 1.91: significant difference
toc_score(1432, 1501, a = 100) # 0: non-significant difference

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.