The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
tknz_sent()
and preprocess()
now have a
different implementation on Windows and UNIX OSs, respectively (since
the previous C++ implementation has impredictable behaviour on Windows,
see #30). This fix also included minor changes in the
tknz_sent()
output, in some corner cases
(e.g. tknz_sent("")
now returns character(0)
,
wheareas it used to return ""
).perplexity()
gets a new argument exp
that
allows to return the cross-entropy per word, rather than perplexity (its
exponential).perplexity.character()
gets a new argument
detailed
that allows to return, alongside with the total
perplexity of the input document, also the cross-entropies and word
lengths of individual sentences. Closes #28.?kgram_freqs
.R
requirements 3.5 -> 4.0
.SystemRequirements: C++11
(see this
tidyverse blog post)verbose
arguments now default to
FALSE
.probability()
, perplexity()
and
sample_sentences()
are restricted to accept only
language_model
class objects as their model
argument.as_dictionary(NULL)
now returns an empty
dictionary
..preprocess
and
.tknz_sent
arguments to be ignored in
process_sentences()
.max_lines
and
batch_size
arguments in
kgram_freqs.connection()
.dictionary
.dictionary()
with
batch processing and non-trivial size constraints on vocabulary
size.These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.