The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Installation, Initialization, and Data Cleaning

Prerequisites

leadeR relies on spaCy, a Python NLP library, via the spacyr R package. You will need:

Python (3.8 or later)
spaCy with an English language model

Install spaCy and the English model from a terminal:

pip install spacy
python -m spacy download en_core_web_sm

Installing leadeR

Install leadeR from GitHub:

# install.packages("remotes")
remotes::install_github("mmukaigawara/leadeR")

Initialization

Before using any leadeR function, initialize spaCy and (optionally) set a seed for reproducibility of bootstrap results.

library(leadeR)
library(data.table)

spacyr::spacy_initialize()

set.seed(02138)

Sample data

The package ships with three speeches by John F. Kennedy:

Dataset	Date	Occasion
`jfk19610120`	January 20, 1961	Inaugural Address
`jfk19610925`	September 25, 1961	Address Before the UN General Assembly
`jfk19630610`	June 10, 1963	Commencement Address at American University

head(jfk19571101)

Text cleaning

Speech transcripts often contain editorial annotations in brackets, parentheses, or curly braces. The clean_text() function removes these and normalizes whitespace.

jfk1 <- clean_text(jfk19610120)
jfk2 <- clean_text(jfk19610925)
jfk3 <- clean_text(jfk19630610)

Users may need additional cleaning steps depending on the source of their text data (e.g., removing headers, footers, or speaker labels).

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.