The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

R package rEMM - Extensible Markov Model for Modelling Temporal Relationships Between Clusters

Implements TRACDS (Temporal Relationships between Clusters for Data Streams), a generalization of Extensible Markov Model (EMM), to model transition probabilities in sequence data. TRACDS adds a temporal or order model to data stream clustering by superimposing a dynamically adapting Markov Chain. Also provides an implementation of EMM (TRACDS on top of tNN data stream clustering).

Interface classes DSC_tNN and DSC_EMM for the stream package are provided.

To cite package ‘rEMM’ in publications use:

Hahsler M, Dunham M (2010). “rEMM: Extensible Markov Model for Data Stream Clustering in R.” Journal of Statistical Software, 35(5), 1-31. ISSN 1548-7660, doi:10.18637/jss.v035.i05 https://doi.org/10.18637/jss.v035.i05.

@Article{,
  title = {{rEMM}: Extensible Markov Model for Data Stream Clustering in {R}},
  author = {Michael Hahsler and Margaret H. Dunham},
  journal = {Journal of Statistical Software},
  year = {2010},
  volume = {35},
  number = {5},
  pages = {1--31},
  doi = {10.18637/jss.v035.i05},
  issn = {1548-7660},
}

Installation

Stable CRAN version: Install from within R with

install.packages("rEMM")

Current development version: Install from r-universe.

install.packages("rEMM",
    repos = c("https://mhahsler.r-universe.dev". "https://cloud.r-project.org/"))

Usage

We use a artificial dataset with a mixture of four clusters components. Points are generated using a fixed sequence <1,2,1,3,4> through the four clusters. The lines below indicate the sequence.

library(rEMM)

data("EMMsim")

plot(EMMsim_train, pch = NA)
lines(EMMsim_train, col = "gray")
points(EMMsim_train, pch = EMMsim_sequence_train)

EMM recovers the components and the sequence information. We use EMM and then recluster the found structure assuming that we know that there are 4 components. The graph below represents a Markov model of the found sequence.

emm <- EMM(threshold = 0.1, measure = "euclidean")
build(emm, EMMsim_train)
emmc <- recluster_hclust(emm, k = 4, method = "average")
plot(emmc)

We can now score new sequences (we use a test sequence created in the same way as the training data) by calculating the product the transition probabilities in the model. The high score indicates this.

score(emmc, EMMsim_test)

## [1] 0.71

References

Michael Hahsler and Margaret H. Dunham. rEMM: Extensible Markov model for data stream clustering in R. Journal of Statistical Software, 35(5):1-31, 2010.
Michael Hahsler and Margaret H. Dunham. Temporal structure learning for clustering massive data streams in real-time. In SIAM Conference on Data Mining (SDM11), pages 664–675. SIAM, April 2011.

Acknowledgements

Development of this package was supported in part by NSF IIS-0948893 and R21HG005912 from the National Human Genome Research Institute.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.