The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
sbo
provides utilities for building and evaluating text
predictors based on Stupid Back-off
N-gram models in R. It includes functions such as:
kgram_freqs()
: Extract (k)-gram frequency tables from a
text corpussbo_predictor()
: Train a next-word predictor via Stupid
Back-off.eval_sbo_predictor()
: Test text predictions against an
independent corpus.You can install the latest release of sbo
from CRAN:
install.packages("sbo")
You can install the development version of sbo
from
GitHub:
# install.packages("devtools")
::install_github("vgherard/sbo") devtools
This example shows how to build a text predictor with
sbo
:
library(sbo)
<- sbo_predictor(sbo::twitter_train, # 50k tweets, example dataset
p N = 3, # Train a 3-gram model
dict = sbo::twitter_dict, # Top 1k words appearing in corpus
.preprocess = sbo::preprocess, # Preprocessing transformation
EOS = ".?!:;" # End-Of-Sentence characters
)
The object p
can now be used to generate predictive text
as follows:
predict(p, "i love") # a character vector
#> [1] "you" "it" "my"
predict(p, "you love") # another character vector
#> [1] "<EOS>" "me" "the"
predict(p,
c("i love", "you love", "she loves", "we love", "you love", "they love")
# a character matrix
) #> [,1] [,2] [,3]
#> [1,] "you" "it" "my"
#> [2,] "<EOS>" "me" "the"
#> [3,] "you" "my" "me"
#> [4,] "you" "our" "it"
#> [5,] "<EOS>" "me" "the"
#> [6,] "to" "you" "and"
For help, see the sbo
website.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.