The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
meanr is an R package performing sentiment analysis. Its main method, score()
, computes sentiment as a simple sum of the counts of positive (+1) and negative (-1) sentiment words in a piece of text. More sophisticated techniques are available to R, for example in the qdap package’s polarity()
function. This package uses the Hu and Liu sentiment dictionary, same as everybody else.
meanr is significantly faster than everything else I tried (which was actually the motivation for its creation), but I don’t claim to have tried everything. I believe the package is quite fast. However, the method is merely a dictionary lookup, so it ignores word context like in more sophisticated methods. On the other hand, the more sophisticated tools are very slow. If you have a large volume of text, I believe there is value in getting a “first glance” at the data, and meanr allows you to do this very quickly.
The stable version is available on CRAN:
The development version is maintained on GitHub:
I have a dataset that, for legal reasons, I can not describe, much less provide. You can think of it like a collection of tweets (they are not tweets). But take my word for it that it’s real, English language text. The data is in the form of a vector of strings, which we’ll call x
.
x = readRDS("x.rds")
length(x)
## [1] 655760
sum(nchar(x))
## [1] 162663972
library(meanr)
system.time(s <- score(x))
## user system elapsed
## 1.072 0.000 0.285
head(s)
## positive negative score wc
## 1 2 0 2 32
## 2 5 0 5 29
## 3 4 2 2 67
## 4 12 3 9 203
## 5 8 2 6 101
## 6 4 3 1 99
The score()
function receives a vector of strings, and operates on each one as follows:
This is all done in four passes of each string; each pass corresponds to each of the enumerated items above. The hash tables uses perfect hash functions generated by gperf.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.