The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
lexicon is a collection of lexical hash tables, dictionaries, and word lists. The data prefixes help to categorize the data types:
Prefix | Meaning |
---|---|
key_
|
A data.frame with a lookup and return value
|
hash_
|
A keyed data.table hash table
|
freq_
|
A data.table of terms with frequencies
|
profanity_
|
A profane words vector
|
pos_
|
A part of speech vector
|
pos_df_
|
A part of speech data.frame
|
sw_
|
A stopword vector
|
Data | Description |
---|---|
cliches | Common Cliches |
common_names | First Names (U.S.) |
constraining_loughran_mcdonald | Loughran-McDonald Constraining Words |
emojis_sentiment | Emoji Sentiment Data |
freq_first_names | Frequent U.S. First Names |
freq_last_names | Frequent U.S. Last Names |
function_words | Function Words |
grady_augmented | Augmented List of Grady Ward’s English Words and Mark Kantrowitz’s Names List |
hash_emojis | Emoji Description Lookup Table |
hash_emojis_identifier | Emoji Identifier Lookup Table |
hash_emoticons | Emoticons |
hash_grady_pos | Grady Ward’s Moby Parts of Speech |
hash_internet_slang | List of Internet Slang and Corresponding Meanings |
hash_lemmas | Lemmatization List |
hash_nrc_emotions | NRC Emotion Table |
hash_sentiment_emojis | Emoji Sentiment Polarity Lookup Table |
hash_sentiment_huliu | Hu Liu Polarity Lookup Table |
hash_sentiment_jockers | Jockers Sentiment Polarity Table |
hash_sentiment_jockers_rinker | Combined Jockers & Rinker Polarity Lookup Table |
hash_sentiment_loughran_mcdonald | Loughran-McDonald Polarity Table |
hash_sentiment_nrc | NRC Sentiment Polarity Table |
hash_sentiment_senticnet | Augmented SenticNet Polarity Table |
hash_sentiment_sentiword | Augmented Sentiword Polarity Table |
hash_sentiment_slangsd | SlangSD Sentiment Polarity Table |
hash_sentiment_socal_google | SO-CAL Google Polarity Table |
hash_valence_shifters | Valence Shifters |
key_contractions | Contraction Conversions |
key_corporate_social_responsibility | Nadra Pencle and Irina Malaescu’s Corporate Social Responsibility Dictionary |
key_grade | Grades Data Set |
key_rating | Ratings Data Set |
key_regressive_imagery | Colin Martindale’s English Regressive Imagery Dictionary |
key_sentiment_jockers | Jockers Sentiment Data Set |
modal_loughran_mcdonald | Loughran-McDonald Modal List |
nrc_emotions | NRC Emotions |
pos_action_verb | Action Word List |
pos_df_irregular_nouns | Irregular Nouns Word Dataframe |
pos_df_pronouns | Pronouns |
pos_interjections | Interjections |
pos_preposition | Preposition Words |
profanity_alvarez | Alejandro U. Alvarez’s List of Profane Words |
profanity_arr_bad | Stackoverflow user2592414’s List of Profane Words |
profanity_banned | bannedwordlist.com’s List of Profane Words |
profanity_racist | Titus Wormer’s List of Racist Words |
profanity_zac_anger | Zac Anger’s List of Profane Words |
sw_dolch | Leveled Dolch List of 220 Common Words |
sw_fry_100 | Fry’s 100 Most Commonly Used English Words |
sw_fry_1000 | Fry’s 1000 Most Commonly Used English Words |
sw_fry_200 | Fry’s 200 Most Commonly Used English Words |
sw_fry_25 | Fry’s 25 Most Commonly Used English Words |
sw_jockers | Matthew Jocker’s Expanded Topic Modeling Stopword List |
sw_loughran_mcdonald_long | Loughran-McDonald Long Stopword List |
sw_loughran_mcdonald_short | Loughran-McDonald Short Stopword List |
sw_lucene | Lucene Stopword List |
sw_mallet | MALLET Stopword List |
sw_python | Python Stopword List |
To download the development version of lexicon:
Download the zip ball or
tar
ball, decompress and run R CMD INSTALL
on it, or use
the pacman package to install the development
version:
if (!require("pacman")) install.packages("pacman")
pacman::p_load_gh("trinker/lexicon")
You are welcome to:
- submit suggestions and bug-reports at: https://github.com/trinker/lexicon/issues
- send a pull request on: https://github.com/trinker/lexicon/
- compose a friendly e-mail to:
tyler.rinker@gmail.com
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.