In the first step, we generate a simple dataset. where C1 and C2 are dominated by C3, C3 is dominated by C4, and is C4 dominated by C5. There is no dominant-distribution relation between C1 and C2.
# Simulation section
nInv<-100
initMean=10
stepMean=20
std=8
simData1<-c()
simData1$Values<-rnorm(nInv,mean=initMean,sd=std)
simData1$Group<-rep(c("C1"),times=nInv)
simData1$Values<-c(simData1$Values,rnorm(nInv,mean=initMean,sd=std) )
simData1$Group<-c(simData1$Group,rep(c("C2"),times=nInv))
simData1$Values<-c(simData1$Values,rnorm(nInv,mean=initMean+2*stepMean,sd=std) )
simData1$Group<-c(simData1$Group,rep(c("C3"),times=nInv) )
simData1$Values<-c(simData1$Values,rnorm(nInv,mean=initMean+3*stepMean,sd=std) )
simData1$Group<-c(simData1$Group, rep(c("C4"),times=nInv) )
simData1$Values<-c(simData1$Values,rnorm(nInv,mean=initMean+4*stepMean,sd=std) )
simData1$Group<-c(simData1$Group, rep(c("C5"),times=nInv) )
The framework is used to analyze the data below.
# Simple ordering inference section
library(EDOIF)
## Loading required package: boot
# parameter setting
bootT=1000 # Number of times of sampling with replacement
alpha=0.05 # significance significance level
#======= input
Values=simData1$Values
Group=simData1$Group
#=============
A1<-EDOIF(Values,Group,bootT = bootT, alpha=alpha )
We print the result of our framework below.
print(A1) # print results in text
## EDOIF (Empirical Distribution Ordering Inference Framework)
## =======================================================
## Alpha = 0.050000, Number of bootstrap resamples = 1000, CI type = perc
## Using Mann-Whitney test to report whether A <U+227A> B
## A dominant-distribution network density:0.900000
## Distribution: C1
## Mean:9.666647 95CI:[ 7.972723,11.398717]
## Distribution: C2
## Mean:10.200762 95CI:[ 8.734067,11.520001]
## Distribution: C3
## Mean:50.954129 95CI:[ 49.481904,52.453449]
## Distribution: C4
## Mean:70.434180 95CI:[ 68.957075,71.964985]
## Distribution: C5
## Mean:89.464439 95CI:[ 87.955008,91.040930]
## =======================================================
## Mean difference of C2 (n=100) minus C1 (n=100): C1 <U+2280> C2
## :p-val 0.3217
## Mean Diff:0.534114 95CI:[ -1.884361,2.659585]
##
## Mean difference of C3 (n=100) minus C1 (n=100): C1 <U+227A> C3
## :p-val 0.0000
## Mean Diff:41.287481 95CI:[ 38.907920,43.572537]
##
## Mean difference of C4 (n=100) minus C1 (n=100): C1 <U+227A> C4
## :p-val 0.0000
## Mean Diff:60.767532 95CI:[ 58.411505,63.011712]
##
## Mean difference of C5 (n=100) minus C1 (n=100): C1 <U+227A> C5
## :p-val 0.0000
## Mean Diff:79.797791 95CI:[ 77.431684,81.998652]
##
## Mean difference of C3 (n=100) minus C2 (n=100): C2 <U+227A> C3
## :p-val 0.0000
## Mean Diff:40.753367 95CI:[ 38.659321,42.829140]
##
## Mean difference of C4 (n=100) minus C2 (n=100): C2 <U+227A> C4
## :p-val 0.0000
## Mean Diff:60.233418 95CI:[ 58.125003,62.328341]
##
## Mean difference of C5 (n=100) minus C2 (n=100): C2 <U+227A> C5
## :p-val 0.0000
## Mean Diff:79.263677 95CI:[ 77.280514,81.364740]
##
## Mean difference of C4 (n=100) minus C3 (n=100): C3 <U+227A> C4
## :p-val 0.0000
## Mean Diff:19.480051 95CI:[ 17.386984,21.559089]
##
## Mean difference of C5 (n=100) minus C3 (n=100): C3 <U+227A> C5
## :p-val 0.0000
## Mean Diff:38.510310 95CI:[ 36.492999,40.755761]
##
## Mean difference of C5 (n=100) minus C4 (n=100): C4 <U+227A> C5
## :p-val 0.0000
## Mean Diff:19.030259 95CI:[ 16.761173,21.199440]
The first plot is the plot of mean-difference confidence intervals
plot(A1,options =1)
The second plot is the plot of mean confidence intervals
plot(A1,options =2)
The third plot is a dominant-distribution network.
out<-plot(A1,options =3)
We generate more complicated dataset of mixture distributions. C1, C2, C3, and C4 are dominated by C5. There is no dominant-distribution relation among C1, C2, C3, and C4.
library(EDOIF)
# parameter setting
bootT=1000
alpha=0.05
nInv<-1200
start_time <- Sys.time()
#======= input
simData3<-SimNonNormalDist(nInv=nInv,noisePer=0.01)
Values=simData3$Values
Group=simData3$Group
#=============
A3<-EDOIF(Values,Group, bootT=bootT, alpha=alpha, methodType ="perc")
A3
## EDOIF (Empirical Distribution Ordering Inference Framework)
## =======================================================
## Alpha = 0.050000, Number of bootstrap resamples = 1000, CI type = perc
## Using Mann-Whitney test to report whether A <U+227A> B
## A dominant-distribution network density:0.400000
## Distribution: C3
## Mean:33.159351 95CI:[ -71.119126,86.997846]
## Distribution: C4
## Mean:67.847456 95CI:[ 35.314927,85.271463]
## Distribution: C1
## Mean:81.074608 95CI:[ 78.975417,82.791467]
## Distribution: C2
## Mean:82.081975 95CI:[ 80.462070,83.606068]
## Distribution: C5
## Mean:151.686441 95CI:[ 139.940396,172.427549]
## =======================================================
## Mean difference of C4 (n=1200) minus C3 (n=1200): C3 <U+2280> C4
## :p-val 0.8641
## Mean Diff:34.688105 95CI:[ -44.006314,151.947976]
##
## Mean difference of C1 (n=1200) minus C3 (n=1200): C3 <U+2280> C1
## :p-val 0.7300
## Mean Diff:47.915257 95CI:[ -6.457703,197.811170]
##
## Mean difference of C2 (n=1200) minus C3 (n=1200): C3 <U+2280> C2
## :p-val 0.6598
## Mean Diff:48.922624 95CI:[ -5.131462,153.797076]
##
## Mean difference of C5 (n=1200) minus C3 (n=1200): C3 <U+227A> C5
## :p-val 0.0000
## Mean Diff:118.527090 95CI:[ 56.693962,258.004712]
##
## Mean difference of C1 (n=1200) minus C4 (n=1200): C4 <U+2280> C1
## :p-val 0.2998
## Mean Diff:13.227152 95CI:[ -4.423225,54.590803]
##
## Mean difference of C2 (n=1200) minus C4 (n=1200): C4 <U+2280> C2
## :p-val 0.2463
## Mean Diff:14.234519 95CI:[ -3.152270,45.619266]
##
## Mean difference of C5 (n=1200) minus C4 (n=1200): C4 <U+227A> C5
## :p-val 0.0000
## Mean Diff:83.838985 95CI:[ 57.831809,121.986657]
##
## Mean difference of C2 (n=1200) minus C1 (n=1200): C1 <U+2280> C2
## :p-val 0.4188
## Mean Diff:1.007367 95CI:[ -1.464983,3.534059]
##
## Mean difference of C5 (n=1200) minus C1 (n=1200): C1 <U+227A> C5
## :p-val 0.0000
## Mean Diff:70.611832 95CI:[ 58.398214,90.900656]
##
## Mean difference of C5 (n=1200) minus C2 (n=1200): C2 <U+227A> C5
## :p-val 0.0000
## Mean Diff:69.604466 95CI:[ 57.677873,89.601949]
plot(A3)
end_time <- Sys.time()
end_time - start_time
## Time difference of 15.72098 secs
Generating \(A\) dominates \(B\) with different degrees of uniform noise
library(ggplot2)
nInv<-1000
simData3<-SimNonNormalDist(nInv=nInv,noisePer=0.01)
#plot(density(simData3$V3))
dat <- data.frame(dens = c(simData3$V3, simData3$V5)
, lines = rep(c("B", "A"), each = nInv))
#Plot.
p1<-ggplot(dat, aes(x = dens, fill = lines)) + geom_density(alpha = 0.5) +xlim(-400, 400)+ ylim(0, 0.07) + ylab("Density [0,1]") +xlab("Values") + theme( axis.text.x = element_text(face="bold",
size=12) )
theme_update(text = element_text(face="bold", size=12) )
p1$labels$fill<-"Categories"
plot(p1)
## Warning: Removed 4 rows containing non-finite values (stat_density).