The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

tidyEmoji

R-CMD-check CRAN status Lifecycle: stable

tidyEmoji helps you discover, count, categorise and sentiment-score the emoji in any text column β€” social-media posts, product reviews, chat logs, survey responses, support tickets β€” and hands the result back as tidy data frames that drop straight into a tidyverse workflow.

Unicode is awkward to work with and not every code point is an emoji, which makes emoji statistics fiddly. tidyEmoji takes care of that, including grapheme-aware detection so skin-tone modifiers (πŸ‘πŸ½) and multi-person sequences (πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦) are treated as a single emoji rather than being split apart.

Installation

install.packages("tidyEmoji")

# development version
# install.packages("devtools")
devtools::install_github("PursuitOfDataScience/tidyEmoji")

Usage

library(tidyEmoji)
library(dplyr)

reviews <- data.frame(text = c("Best purchase ever \U0001f600\U0001f60d",
                               "It broke after a day \U0001f621",
                               "Does the job.",
                               "Wearing my mask \U0001f637\U0001f637",
                               "Shipped fast \U0001f3c1\U0001f600"))

How much emoji is in the data?

reviews %>% emoji_summary(text)        # entries with emoji vs. total
#> # A tibble: 1 Γ— 2
#>   emoji_tweets total_tweets
#>          <int>        <int>
#> 1            4            5
reviews %>% emoji_filter(text)         # keep only the rows that have emoji
#> # A tibble: 4 Γ— 1
#>   text                   
#>   <chr>                  
#> 1 Best purchase ever πŸ˜€πŸ˜
#> 2 It broke after a day 😑
#> 3 Wearing my mask 😷😷   
#> 4 Shipped fast πŸπŸ˜€

Which emoji are most common?

reviews %>% emoji_frequency(text)      # every emoji, with name + category
#> # A tibble: 5 Γ— 5
#>   emoji name                         shortcode      group                 n
#>   <chr> <chr>                        <chr>          <chr>             <int>
#> 1 πŸ˜€    grinning face                grinning       Smileys & Emotion     2
#> 2 😷    face with medical mask       mask           Smileys & Emotion     2
#> 3 🏁    chequered flag               checkered_flag Flags                 1
#> 4 😍    smiling face with heart-eyes heart_eyes     Smileys & Emotion     1
#> 5 😑    enraged face                 rage           Smileys & Emotion     1
reviews %>% top_n_emojis(text, n = 3)  # just the most frequent
#> # A tibble: 3 Γ— 4
#>   emoji_name     unicode emoji_category        n
#>   <chr>          <chr>   <chr>             <int>
#> 1 grinning       πŸ˜€      Smileys & Emotion     2
#> 2 mask           😷      Smileys & Emotion     2
#> 3 checkered_flag 🏁      Flags                 1

Pull the emoji out

emoji_tokens() gives one tidy row per emoji occurrence, with its name, category and sentiment β€” ready to count, join or plot.

reviews %>% emoji_tokens(text)
#> # A tibble: 7 Γ— 5
#>   text                    .emoji .emoji_name    .emoji_category .emoji_sentiment
#>   <chr>                   <chr>  <chr>          <chr>                      <dbl>
#> 1 Best purchase ever πŸ˜€πŸ˜ πŸ˜€     grinning face  Smileys & Emot…            0.572
#> 2 Best purchase ever πŸ˜€πŸ˜ 😍     smiling face … Smileys & Emot…            0.678
#> 3 It broke after a day 😑 😑     enraged face   Smileys & Emot…           -0.173
#> 4 Wearing my mask 😷😷    😷     face with med… Smileys & Emot…           -0.171
#> 5 Wearing my mask 😷😷    😷     face with med… Smileys & Emot…           -0.171
#> 6 Shipped fast πŸπŸ˜€       🏁     chequered flag Flags                      0.571
#> 7 Shipped fast πŸπŸ˜€       πŸ˜€     grinning face  Smileys & Emot…            0.572

Categorise and score sentiment

reviews %>% emoji_categorize(text)     # which Unicode categories each row spans
#> # A tibble: 4 Γ— 2
#>   text                    .emoji_category        
#>   <chr>                   <chr>                  
#> 1 Best purchase ever πŸ˜€πŸ˜ Smileys & Emotion      
#> 2 It broke after a day 😑 Smileys & Emotion      
#> 3 Wearing my mask 😷😷    Smileys & Emotion      
#> 4 Shipped fast πŸπŸ˜€       Flags|Smileys & Emotion
reviews %>% emoji_sentiment(text)      # mean emoji sentiment per row (-1 to +1)
#> # A tibble: 5 Γ— 3
#>   text                    .emoji_n .emoji_sentiment
#>   <chr>                      <int>            <dbl>
#> 1 Best purchase ever πŸ˜€πŸ˜        2            0.625
#> 2 It broke after a day 😑        1           -0.173
#> 3 Does the job.                  0           NA    
#> 4 Wearing my mask 😷😷           2           -0.171
#> 5 Shipped fast πŸπŸ˜€              2            0.572

emoji_sentiment() uses the bundled Emoji Sentiment Ranking lexicon (Kralj Novak et al., 2015). See the package vignette for a fuller tour.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.