The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Given a dataset of careers and incomes, how large a difference of income between any pair of careers would be? Given a dataset of travel time records, how long do we need to spend more when choosing a public transportation mode A instead of B to travel? In this work, we developed a framework to solve these problems named “EDOIF”.
EDOIF is a nonparametric framework based on “Estimation Statistics” principle. Its main purpose is to infer orders of empirical distributions from different categories based on a probability of finding a value in one distribution that is greater than an expectation of another distribution. Given a set of ordered-pair of real-category values the framework is capable of
You can install our package from CRAN
install.packages("EDOIF")
For the newest version on github, please call the following command in R terminal.
::install_github("DarkEyes/EDOIF") remotes
This requires a user to install the “remotes” package before installing EDOIF.
library(EDOIF)
#== simulation: Generating distributuions of five categories:
# Category5 dominates Category4
# Category4 dominates Category3
# Category3 dominates Category2
# Category2 dominates Category1
=150 # number of samples per categories
nInv=10
initMean=20
stepMean=8
std
<-c()
simData1$Values<-rnorm(nInv,mean=initMean,sd=std)
simData1$Group<-rep(c("Category1"),times=nInv)
simData1$Values<-c(simData1$Values,rnorm(nInv,mean=initMean,sd=std) )
simData1$Group<-c(simData1$Group,rep(c("Category2"),times=nInv))
simData1$Values<-c(simData1$Values,rnorm(nInv,mean=initMean+2*stepMean,sd=std) )
simData1$Group<-c(simData1$Group,rep(c("Category3"),times=nInv) )
simData1$Values<-c(simData1$Values,rnorm(nInv,mean=initMean+3*stepMean,sd=std) )
simData1$Group<-c(simData1$Group, rep(c("Category4"),times=nInv) )
simData1$Values<-c(simData1$Values,rnorm(nInv,mean=initMean+4*stepMean,sd=std) )
simData1$Group<-c(simData1$Group, rep(c("Category5"),times=nInv) )
simData1
#== parameter setting
=1000 # number of times of sample with replacement in bootstrap function.
bootT=0.05 # Significance level
alpha
#== Calling the class constructor
<-EDOIF(simData1$Values,simData1$Group, bootT=bootT, alpha=alpha, methodType ="perc")
A1
#== Visualizing results
print(A1) # print the results in text mode
plot(A1, fontSize=10) # print the results in graphic mode
Graphic mode results 1. An alpha-confidence-interval of mean plot for five categories. The horizontal axis represents categories and the vertical axis represents values within distributions of categories.
2. A dominant-distribution network of five categories. A node represents categories and an edge represents a dominant-distribution relation between categories. If there is an edge from category A to B, then A dominates B. A larger node size implies a higher mean value of a category.
Text mode results
EDOIF (Empirical Distribution Ordering Inference Framework)
=======================================================
Alpha = 0.050000, Number of bootstrap resamples = 1000, CI type = perc
Using Mann-Whitney test to report whether A ≺ B
A dominant-distribution network density:0.900000
Distribution: Category1
Mean:10.840671 95CI:[ 9.706981,12.014179]
Distribution: Category2
Mean:11.044785 95CI:[ 9.806991,12.446037]
Distribution: Category3
Mean:50.462935 95CI:[ 49.208005,51.757706]
Distribution: Category4
Mean:70.299726 95CI:[ 69.103924,71.502505]
Distribution: Category5
Mean:91.190505 95CI:[ 89.895480,92.518455]
=======================================================
Mean difference of Category2 (n=150) minus Category1 (n=150): Category1 ⊀ Category2
:p-val 0.4463
Mean Diff:0.204114 95CI:[ -1.545130,1.930609]
Mean difference of Category3 (n=150) minus Category1 (n=150): Category1 ≺ Category3
:p-val 0.0000
Mean Diff:39.622264 95CI:[ 37.984831,41.378232]
Mean difference of Category4 (n=150) minus Category1 (n=150): Category1 ≺ Category4
:p-val 0.0000
Mean Diff:59.459055 95CI:[ 57.921328,61.127817]
Mean difference of Category5 (n=150) minus Category1 (n=150): Category1 ≺ Category5
:p-val 0.0000
Mean Diff:80.349835 95CI:[ 78.620391,82.133270]
Mean difference of Category3 (n=150) minus Category2 (n=150): Category2 ≺ Category3
:p-val 0.0000
Mean Diff:39.418150 95CI:[ 37.543210,41.241722]
Mean difference of Category4 (n=150) minus Category2 (n=150): Category2 ≺ Category4
:p-val 0.0000
Mean Diff:59.254941 95CI:[ 57.304359,61.098774]
Mean difference of Category5 (n=150) minus Category2 (n=150): Category2 ≺ Category5
:p-val 0.0000
Mean Diff:80.145720 95CI:[ 78.313321,82.040234]
Mean difference of Category4 (n=150) minus Category3 (n=150): Category3 ≺ Category4
:p-val 0.0000
Mean Diff:19.836791 95CI:[ 18.047421,21.762239]
Mean difference of Category5 (n=150) minus Category3 (n=150): Category3 ≺ Category5
:p-val 0.0000
Mean Diff:40.727570 95CI:[ 39.004372,42.627946]
Mean difference of Category5 (n=150) minus Category4 (n=150): Category4 ≺ Category5
:p-val 0.0000
Mean Diff:20.890780 95CI:[ 19.079287,22.625807]
For more examples, please see the vignettes in this link .
Amornbunchornvej, Chainarong, Navaporn Surasvadi, Anon Plangprasopchok, Suttipong Thajchayapong. “A nonparametric framework for inferring orders of categorical data from category-real pairs.” Heliyon 6, no. 11 (2020): e05435, ISSN 2405-8440, https://doi.org/10.1016/j.heliyon.2020.e05435. arXiv
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.