The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
This vignette explains how proxyC compute the similarity and distance measures.
\[ \vec{x} = [x_i, x_{i + 1}, \dots, x_n] \\ \vec{y} = [y_i, y_{i + 1}, \dots, y_n] \] The length of the vector \(n = ||\vec{x}||\), while \(|\vec{x}|\) is the absolute values of the elements.
Operations on vectors are element-wise:
\[ \vec{z} = \vec{x}\vec{y} \\ n = ||\vec{x}|| = ||\vec{y}|| =||\vec{z}|| \]
Summation of the elements of vectors is written using sigma without specifying the range:
\[ \sum{\vec{x}} = \sum_{i=1}^{n}{x_i} \]
When the elements of the vector is compared with a value in a pair of square brackets, the summation is counting the number of elements that equal (or unequal) to the value:
\[ \sum{[\vec{x} = 1]} = \sum_{i=1}^{n}{[x_i = 1]} \]
Similarity measures are available in
proxyC::simil()
.
\[ simil = \frac{\sum{\vec{x}\vec{y}}}{\sqrt{\sum{\vec{x} ^ 2}} \sqrt{\sum{\vec{y} ^ 2}}} \]
\[ simil = \frac{Cov(\vec{x},\vec{y})}{Var(\vec{x}) Var(\vec{y})} \]
The values of \(x\) and \(y\) are Boolean for “jaccard”.
\[ e = \sum{\vec{x} \vec{y}} \\ w = \text{user-provided weight} \\ simil = \frac{e}{\sum{\vec{x} ^ w} + \sum{\vec{y} ^ w} - e} \]
The values must be \(0 \le x \le 1.0\) and \(0 \le y \le 1.0\).
\[ simil = \frac{\sum{min(\vec{x}, \vec{y})}}{\sum{max(\vec{x}, \vec{y})}} \]
The values of \(x\) and \(y\) are Boolean for “dice”.
\[ e = \sum{\vec{x} \vec{y}} \\ w = \text{user-provided weight} \\ simil = \frac{2 e}{\sum{\vec{x} ^ w} + \sum{\vec{y} ^ w}} \]
\[ e = \sum{\vec{x} \vec{y}} \\ n = ||\vec{x}|| = ||\vec{y}|| \\ u = n - e \\ simil = \frac{e - u}{e + u} \]
\[ t = \sum{[\vec{x} = 1][\vec{y} = 1]} \\ f = \sum{[\vec{x} = 0][\vec{y} = 0]} \\ n = ||\vec{x}|| = ||\vec{y}|| \\ simil = \frac{t + 0.5 f}{n} \]
\[ simil = \sum{[\vec{x} = \vec{y}]} \]
Similarity measures are available in proxyC::dist()
.
Smoothing of the vectors can be performed when method
is
“chisquared”, “kullback”, “jefferys” or “jensen”: the value of
smooth
will be added to each element of \(\vec{x}\) and \(\vec{y}\).
\[ dist = \sum{|\vec{x} - \vec{y}|} \]
\[ dist = \frac{|\vec{x} - \vec{y}|}{|\vec{x}| + |\vec{y}|} \]
\[ dist = \sum{\sqrt{\vec{x}^2 + \vec{y}^2}} \]
\[ p = \text{user-provided parameter} \\ dist = \Bigl( \sum{|\vec{x} - \vec{y}| ^ p} \Bigr) ^ \frac{1}{p} \]
\[ dist = \sum{[\vec{x} \ne \vec{y}]} \]
\[ dist = \max{\vec{x} - \vec{y}} \]
\[ O_{ij} = \text{augmented matrix from } \vec{x} \text{ and } \vec{y} \\ E_{ij} = \text{matrix of expected count for } O_{ij} \\ dist = \sum{\frac{(O_{ij} - E_{ij}) ^ 2}{ E_{ij}}} \\ \]
\[ \vec{p} = \frac{\vec{x}}{\sum{\vec{x}}} \\ \vec{q} = \frac{\vec{y}}{\sum{\vec{y}}} \\ dist = \sum{\vec{q} \log_2{\frac{\vec{q}}{\vec{p}}}} \]
\[ \vec{p} = \frac{\vec{x}}{\sum{\vec{x}}} \\ \vec{q} = \frac{\vec{y}}{\sum{\vec{y}}} \\ dist = \sum{\vec{q} \log_2{\frac{\vec{q}}{\vec{p}}}} + \sum{\vec{p} \log_2{\frac{\vec{p}}{\vec{q}}}} \]
\[ \vec{p} = \frac{\vec{x}}{\sum{\vec{x}}} \\ \vec{q} = \frac{\vec{y}}{\sum{\vec{y}}} \\ \vec{m} = \frac{1}{2} (\vec{p} + \vec{q}) \\ dist = \frac{1}{2} \sum{\vec{q} \log_2{\frac{\vec{q}}{\vec{m}}}} + \frac{1}{2} \sum{\vec{p} \log_2{\frac{\vec{p}}{\vec{m}}}} \]
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.