The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
This package provides infrastructure to make text datasets available within R, even when they are too large to store within an R package or are licensed in such a way that prevents them from being included in OSS-licensed packages.
Do you want to add a new dataset to the textdata package?
prefix_*.R
in the R/
folder, where *
is the name of the dataset. Supported
prefixes include
dataset_
lexicon_
download_*()
,
process_*()
and dataset_*()
.
download_*()
function should take 1 argument named
folder_path
. It has 2 tasks, first it should check if the
file is already downloaded. If it is already downloaded it should return
invisible()
. If the file isn’t at the path it should
download the file to said path.process_*()
function should take 2 arguments,
folder_path
and name_path
.
folder_path
denotes the the path to the file returned by
download_*
and name_path
is the path to where
the polished data should live. Main point of process_*()
is
to turn the downloaded file into a .rds file containing a tidy
tibble.dataset_*()
function should wrap the
load_dataset()
.process_*()
function to the named list
process_functions
in the file process_functions.R.download_*()
function to the named list
download_functions
in the file download_functions.R.print_info
list in the info.R file.dataset_*.R
to the @include tags in
download_functions.R
.README.Rmd
._pkgdown.yml
.NEWS.md file
.What are the guidelines for adding datasets?
word
instead of words
for column
names.For datasets that comes with a testing and training dataset. Let the
user pick which one to retrieve with a split
argument
similar to how dataset_ag_news()
is doing.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.