The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
tidytext 0.4.3
- Updated package anchors in roxygen comments
tidytext 0.4.2
tidytext 0.4.1
- Fixed bug for FREX stm tidier (#228)
tidytext 0.4.0
- hunspell is now a suggested dependency, thanks to @MichaelChirico
(#221)
- Added stm()tidiers for high FREX and lift words
(#223)
- Removed tweet-specific tokenizers because of changes in upstream
dependencies (#227)
tidytext 0.3.4
- Updated the tidy method for a quanteda dfmbecause of
the upcoming release of Matrix (#218)
tidytext 0.3.3
- scale_x/y_reordered()now uses a function- labelsas its main input (#200)
- Fixed how to_loweris passed to underlying tokenization
function for character shingles (#208)
- Added support for tidying STM models that use content,
thanks to @jonathanvoelkle (#209)
tidytext 0.3.2
- Update testing for rlang change + testthat 3e
tidytext 0.3.1
- Check for installation of stopwords more gracefully
- Update tidiers and casters for new version of quanteda
tidytext 0.3.0
- Use vdiffr conditionally
- Bug fix/breaking change for collapseargument tounnest_functions(). This argument now takes eitherNULL(do not collapse text across rows for tokenizing) or a
character vector of variables (use said variables to collapse text
across rows for tokenizing). This fixes a long-standing bug and provides
more consistent behavior, but does change results for many situations
(such as n-gram tokenization).
tidytext 0.2.6
- Move one vignette to pkgdown site, because of dependency
removal
- Move all CI from Travis to GH actions
tidytext 0.2.5
- reorder_within()now handles multiple variables, thanks
to @tmastny
(#170)
- Move stopwords to Suggests so tidytext can be installed on older
versions of R
- Pass to_lowerargument to other tokenizing functions,
for more consistent behavior (#175)
- Add glance()method for stm’s estimated regressions,
thanks to @vincentarelbundock (#176)
tidytext 0.2.4
- Update tidying test for new tibble release (inner names for
columns)
- Deprecate SE versions of main functions (have long been replaced by
tidy eval semantics)
- Improve error handling throughout
tidytext 0.2.3
- Wrapper tokenization functions for n-grams, characters, sentences,
tweets, and more, thanks to @ColinFay (#137).
- Simplify get_sentiments() thanks to @jennybc (#151).
- Fix flaky tests for corpus tidiers.
tidytext 0.2.2
- Access NRC lexicon via textdata package
tidytext 0.2.1
- Fix bug in augment()function for stm topic model.
- Warn when tf-idf is negative, thanks to @EmilHvitfeldt (#112).
- Switch from importing broom to importing generics, for lighter
dependencies (#133).
- Add functions for reordering factors (such as for ggplot2 bar plots)
thanks to @tmastny
(#110).
- Update to tibble()where appropriate, thanks to @luisdza (#136).
- Clarify documentation about impact of lowercase conversion on URLs
(#139).
- Change how sentiment lexicons are accessed from package (remove NRC
lexicon entirely, access AFINN and Loughran lexicons via textdata
package so they are no longer included in this package).
tidytext 0.2.0
- Improvements to documentation (#117)
- Fix for NSE thanks to @lepennec (#122).
- Tidier for estimated regressions from stm package
thanks to @jefferickson (#115).
- Tidier for correlated topic model from topicmodels
package (#123).
tidytext 0.1.9
- Updates to documentation (#109) thanks to Emil Hvitfeldt.
- Add new tokenizers for tweets, Penn Treebank to
unnest_tokens().
- Better error message (#111) and code styling.
- Declare dependency for tests.
tidytext 0.1.8
- Updates to documentation (#102), README, and vignettes.
- Add tokenizing by character shingles thanks to Kanishka Misra
(#105).
- Fix tests for skip grams thanks to Lincoln Mullen (#106).
tidytext 0.1.7
- Updated more docs/tests so package can build on R-oldrel. (Still
trying!)
tidytext 0.1.6
- unnest_tokenscan now unnest a data frame with a list
column (which formerly threw the error- unnest_tokens expects all columns of input to be atomic vectors (not lists)).
The unnested result repeats the objects within each list. (It’s still
not possible when- collapse = TRUE, in which tokens can span
multiple lines).
- Add get_tidy_stopwords()to obtain stopword lexicons in
multiple languages in a tidy format.
- Add a dataset nma_wordsof negators, modals, and
adverbs that affect sentiment analysis (#55).
- Updated various vignettes/docs/tests so package can build on
R-oldrel.
tidytext 0.1.5
- Change how NAvalues are handled inunnest_tokensso they no longer cause other columns to
becomeNA(#82).
- Update tidiers and casters to align with quanteda v1.0 (#87).
- Handle input/output object classes (such as data.table)
consistently (#88).
tidytext 0.1.4
- Fix tidier for quanteda dictionary for correct class (#71).
- Add a pkgdown
site.
- Convert NSE from underscored function to tidyeval
(unnest_tokens,bind_tf_idf, all sparse
casters) (#67, #74).
- Added tidiers for topic models from the stmpackage
(#51).
tidytext 0.1.3
- get_sentimentsnow works regardless of whether- tidytexthas been loaded or not (#50).
- unnest_tokensnow supports data.table objects
(#37).
- Fixed to_lowerparameter inunnest_tokensto work properly for all tokenizing options.
- Updated tidy.corpus,glance.corpus, tests,
and vignette for changes to quanteda API
- Removed the deprecated pair_countfunction, which is
now in the in-development widyr package
- Added tidiers for LDA models from the malletpackage
- Added the Loughran and McDonald dictionary of sentiment words
specific to financial reports
- unnest_tokenspreserves custom attributes of data
frames and data.tables
tidytext 0.1.2
- Updated DESCRIPTION to require purrr >= 0.1.1.
- Fixed cast_sparse,cast_dtm, and other
sparse casters to ignore groups in the input (#19)
- Changed unnest_tokensso that it no longer uses tidyr’s
unnest, but rather a custom version that removes some overhead. In some
experiments, this sped up unnest_tokens on large inputs by about 40%.
This also moves tidyr from Imports to Suggests for now.
- unnest_tokensnow checks that there are no list columns
in the input, and raises an error if present (since those cannot be
unnested).
- Added a formatargument to unnest_tokens so that it can
process html, xml, latex or man pages using the hunspell package, though
only whentoken = "words".
- Added a get_sentimentsfunction that takes the name of
a lexicon (“nrc”, “bing”, or “sentiment”) and returns just that
sentiment data frame (#25)
tidytext 0.1.1
- Added documentation for n-grams, skip n-grams, and regex
- Added codecov and appveyor
- Added tidiers for LDA objects from topicmodels and a vignette on
topic modeling
- Added function to calculate tf-idf of a tidy text dataset and a
tf-idf vignette
- Fixed a bug when tidying by line/sentence/paragraph/regex and there
are multiple non-text columns
- Fixed a bug when unnesting using n-grams and skip n-grams (entire
text was not being collapsed)
- Added ability to pass a (custom tokenizing) function to token. Also
added a collapse argument that makes the choice whether to combine lines
before tokenizing explicit.
- Changed tidy.dictionary to return a tbl_df rather than a
data.frame
- Updated cast_sparseto work with dplyr 0.5.0
- Deprecated the pair_countfunction, which has been
moved topairwise_countin the widyr package. This will
be removed entirely in a future version.
tidytext 0.1.0
- Initial release for text mining using tidy tools
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.