The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
The hackeRnews package is an R wrapper for the Hacker News API. Project for Advanced R classes at the Warsaw University of Technology.
The hackeRnews
package is available on CRAN and can be
installed with:
install.packages("hackeRnews")
You can install the development version from GitHub with:
# install.packages("devtools")
::install_github("szymanskir/hackeRnews") devtools
The Hacker News API is constructed in such a way that a single item
is retrieved with a single request. This means that the retrieval of 200
items requires 200 separate API calls. Processing this amount of
requests sequentially takes a significant amount of time. In order to
solve this issue the hackeRnews
package makes use of the
built-in support for parallel requests in httr2
(httr2::req_perform_parallel
).
library(hackeRnews)
library(dplyr)
library(ggplot2)
library(ggwordcloud)
library(stringr)
library(tidytext)
<- get_latest_job_stories()
job_stories
# get titles, normalize used words, remove non alphabet characters
<- unlist(
title_words lapply(job_stories, function(job_story) job_story$title) %>%
str_replace_all('[^A-Z|a-z]', ' ') %>%
str_replace_all('\\s\\s*', ' ') %>%
str_to_upper() %>%
str_split(' ')
)
# remove stop words
data('stop_words')
<- data.frame(word = title_words, stringsAsFactors = FALSE) %>%
df filter(str_length(word) > 0 & !str_to_lower(word) %in% stop_words$word) %>%
count(word)
# add colors to beautify visualization
<- df %>%
df mutate(color=factor(sample(10, nrow(df), replace=TRUE)))
<- ggplot(df, aes(label = word, size = n, color = color)) +
word_cloud geom_text_wordcloud() +
scale_size_area(max_size = 15)
word_cloud
library(stringr)
library(ggplot2)
<- get_best_stories(max_items=10)
best_stories
<- data.frame(
df title = sapply(best_stories, function(best_story) str_wrap(best_story$title, 42)),
score = sapply(best_stories, function(best_story) best_story$score),
stringsAsFactors = FALSE
)
$title <- factor(df$title, levels=df$title[order(df$score)])
df
<- ggplot(df, aes(x = title, y = score, label=score)) +
best_stories_plot geom_col() +
geom_label() +
coord_flip() +
ggtitle('Best stories') +
xlab('Story title') +
ylab('Score')
best_stories_plot
library(dplyr)
library(ggplot2)
library(stringr)
library(textdata)
library(tidytext)
data('stop_words')
<- get_best_stories(max_items = 2)
best_stories
<- lapply(best_stories, function(story) {
words_by_story <- get_comments(story) %>%
words pull(text) %>%
str_replace_all('[^A-Z|a-z]', ' ') %>%
str_to_lower() %>%
str_replace_all('\\s\\s*', ' ') %>%
str_split(' ', simplify = TRUE)
<- words[words != ""] %>%
filtered_words setdiff(stop_words$word)
data.frame(
story_title = rep(story$title, length(filtered_words)),
word = filtered_words,
stringsAsFactors = FALSE
)%>% bind_rows()
})
<- get_sentiments("afinn")
sentiment
<- words_by_story %>%
sentiment_plot inner_join(sentiment, by = "word") %>%
ggplot(aes(x = value, fill = story_title)) +
geom_density(alpha = 0.5) +
scale_x_continuous(breaks=c(-5, 0, 5),
labels=c("Negative", "Neutral", "Positive"),
limits=c(-6, 6)) +
theme_minimal() +
theme(axis.title.x=element_blank(),
axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank(),
plot.title=element_text(hjust=0.5),
legend.position = 'top') +
labs(fill='Story') +
ggtitle('Sentiment for 2 chosen stories')
sentiment_plot
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.