The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
global_idf3
.bind_tf_idf2
.
norm=TRUE
. Cosine nomalization is now performed on tf_idf
values as in the RMeCab package.tf="itf"
and idf="df"
options.pack
now preserves doc_id
type when it’s factor.MECABRC
environment variable or ~/.mecabrc
to set up dictionaries.tokenize
now skips resetting the output encodings to UTF-8.split
is FALSE
.grain_size
argument to tokenize
.bind_lr
function.RcppParallel::parallelFor
instead of tbb::parallel_for
. There are no user’s visible changes.tokenize
can now accept a character vector in addition to a data.frame like object.gbs_tokenize
is now deprecated. Please use the tokenize
function instead.is_blank
.partial
argument to gbs_tokenize
and tokenize
. This argument controls the partial parsing mode, which forces to extract given chunks of sentences when activated.posDebugRcpp
function.bind_tf_idf2
can calculate and bind the term frequency, inverse document frequency, and tf-idf of the tidy text dataset.collapse_tokens
, mute_tokens
, and lexical_density
can be used for handling a tidy text dataset of tokens.tokenize
, it still requires MeCab and its dictionaries installed and available).tokenize
now preserves the original order of docid_field
.bind_tf_idf2
function and is_blank
function.prettify
now can extract columns only specified by col_select
.NEWS.md
file to track changes to the package.tokenize
now takes a data.frame as its first argument, returns a data.frame only. The former function that gets character vector and returns a data.frame or named list was renamed as gbs_tokenize
.These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.