The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
tokenize_tweets() function, which is no
longer supported.tokenize_ptb() function for Penn Treebank
tokenizations (@jrnold) (#12).chunk_text() to split long documents
into pieces (#30).tokenize_tweets() preserves usernames,
hashtags, and URLS (@kbenoit) (#44).stopwords() function has been removed in favor of
using the stopwords package (#46).tif package. (#49)tokenize_skip_ngrams has been improved to generate
unigrams and bigrams, according to the skip definition (#24).tokenizers supports (@ironholds) (#26).tokenize_skip_ngrams now supports stopwords (#31).NA consistently (#33).tokenize_words() gains arguments to preserve or strip
punctuation and numbers (#48).tokenize_skip_ngrams() and
tokenize_ngrams() to return properly marked UTF8 strings on
Windows (@patperry)
(#58).tokenize_tweets() now removes stopwords prior to
stripping punctuation, making its behavior more consistent with
tokenize_words() (#76).tokenize_character_shingles() tokenizer.tokenize_words() and
tokenize_word_stems().These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.