README

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Version: 0.1.1

TL;DR

What	`kibior` is a R package dedicated to ease the pain of data handling in science, and more notably with biological data.
Where	`kibior` is using `Elasticsearch` as database and search engine.
Who	`kibior` is built for data science and data manipulation, so when any data-related action or need is involved, notably `sharing data`. It mainly targets bioinformaticians, and more broadly, data scientists.
When	Available now from this repository, or CRAN repository.
Public instances	Use the `$get_kibio_instance()` method to connect to `Kibio` and access known datasets. See `Kibio datasets` at the end of this document for a complete list.
Cite this package	In R session, run `citation("kibior")`
Publication	`coming soon`.

Main features

This package allows:

Pushing, pulling, joining, sharing and searching tabular data between an R session and one or multiple Elasticsearch instances/clusters.
Massive data query and filter with Elasticsearch engine.
Multiple living Elasticsearch connections to different addresses.
Method autocompletion in proper environments (e.g. R cli, RStudio).
Import and export datasets from an to files.
Server-side execution for most of operations (i.e. on Elasticsearch instances/clusters).

How

Install

# Get from CRAN
install.packages("kibior")

# or get the latest from Github
devtools::install_github("regisoc/kibior")

Run

# load
library(kibior)

# Get a specific instance
kc <- Kibior$new("server_or_address", port)

# Or try something bigger...
kibio <- Kibior$get_kibio_instance()
kibio$list()

Examples

Here is an extract of some of the features proposed by KibioR. See Introduction vignette for more advanced usage.

Example: `push` datasets

# Push data (R memory -> Elasticsearch)
dplyr::starwars %>% kc$push("sw")
dplyr::storms %>% kc$push("st")

Example: `pull` datasets

# Pull data with columns selection (Elasticsearch -> R memory)
kc$pull("sw", query = "homeworld:(naboo || tatooine)", 
              columns = c("name", "homeworld", "height", "mass", "species"))
# see vignette for query syntax

Example: `copy` datasets

# Copy dataset (Elasticsearch internal operation)
kc$copy("sw", "sw_copy")

Example: `delete` datasets


# Delete datasets
kc$delete("sw_copy")

Example: `list`, `match` dataset names

# List available datasets
kc$list()

# Search for index names starting with "s"
kc$match("s*")

Example: get `columns` names and list unique `keys` in values

# Get columns of all datasets starting with "s"
kc$columns("s*")

# Get unique values of a column
kc$keys("sw", "homeworld")

Example: some Elasticsearch basic statistical methods

# Count number of lines in dataset
kc$count("st")

# Count number of lines with query (name of the storm is Anita)
kc$count("st", query = "name:anita")

# Generic stats on two columns
kc$stats("sw", c("height", "mass"))

# Specific descriptive stats with query
kc$avg("sw", c("height", "mass"), query = "homeworld:naboo")

Example: `join`

# Inner join between:
#   1/ a Elasticsearch-based dataset with query ("sw"), 
#   2/ and a in-memory R dataset (dplyr::starwars) 
kc$inner_join("sw", dplyr::starwars, 
              left_query = "hair_color:black",
              left_columns = c("name", "mass", "height"),
              by = "name")

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.

kibior: easy scientific data handling, searching and sharing with Elasticsearch

TL;DR

Main features

How

Install

Run

Examples

Example: push datasets

Example: pull datasets

Example: copy datasets

Example: delete datasets

Example: list, match dataset names

Example: get columns names and list unique keys in values