The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Installation, Initialization, and Data Cleaning

Prerequisites

leadeR relies on spaCy, a Python NLP library, via the spacyr R package. You will need:

Install spaCy and the English model from a terminal:

pip install spacy
python -m spacy download en_core_web_sm

Installing leadeR

Install leadeR from GitHub:

# install.packages("remotes")
remotes::install_github("mmukaigawara/leadeR")

Initialization

Before using any leadeR function, initialize spaCy and (optionally) set a seed for reproducibility of bootstrap results.

library(leadeR)
library(data.table)

spacyr::spacy_initialize()

set.seed(02138)

Sample data

The package ships with three speeches by John F. Kennedy:

Dataset Date Occasion
jfk19610120 January 20, 1961 Inaugural Address
jfk19610925 September 25, 1961 Address Before the UN General Assembly
jfk19630610 June 10, 1963 Commencement Address at American University
head(jfk19571101)

Text cleaning

Speech transcripts often contain editorial annotations in brackets, parentheses, or curly braces. The clean_text() function removes these and normalizes whitespace.

jfk1 <- clean_text(jfk19610120)
jfk2 <- clean_text(jfk19610925)
jfk3 <- clean_text(jfk19630610)

Users may need additional cleaning steps depending on the source of their text data (e.g., removing headers, footers, or speaker labels).

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.