The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
library(malaytextr)
There is a data frame of Malay root words that can be used as a dictionary:
head(malayrootwords)
#> Col Word Root Word
#> 1 pengabadian abadi
#> 2 pengabdian abdi
#> 3 pengacaraan acara
#> 4 pengadangan adang
#> 5 pengadilan adil
#> 6 pengairan air
stem_malay()
will find the root words in a dictionary,
in which the malayrootwords
data frame can be used, then it
will remove “extra suffix”“,”prefix” and lastly “suffix”
To stem word “banyaknya”. It will return a data frame with the word “banyaknya” and the stemmed word “banyak”:
stem_malay(word = "banyaknya", dictionary = malayrootwords)
#> 'Root Word' is now returned instead of 'root_word'
#> Col Word Root Word
#> 1 banyaknya banyak
To stem words in a data frame:
<- data.frame(text = c("banyaknya","sangat","terkedu", "pengetahuan"))
x
stem_malay(word = x,
dictionary = malayrootwords,
col_feature1 = "text")
#> 'Root Word' is now returned instead of 'root_word'
#> Col Word Root Word
#> 1 banyaknya banyak
#> 2 sangat sangat
#> 3 terkedu kedu
#> 4 pengetahuan tahu
remove_url will remove all urls found in a string
<- c("test https://t.co/fkQC2dXwnc", "another one https://www.google.com/ to try")
x
remove_url(x)
#> [1] "test " "another one to try"
There is a data frame of Malay stop words:
head(malaystopwords)
#> # A tibble: 6 × 1
#> stopwords
#> <chr>
#> 1 ada
#> 2 sampai
#> 3 sana
#> 4 itu
#> 5 sangat
#> 6 saya
This lexicon includes words that have been labelled as positive or negative. This is useful for tasks like sentiment analysis, which involves determining the overall sentiment expressed in a piece of text. To use the lexicon, process the text and check each word against the lexicon to determine its sentiment. To note, this sentiment lexicon was created based on a general corpus, sourced from news articles
head(sentiment_general)
#> # A tibble: 6 × 2
#> Word Sentiment
#> <chr> <chr>
#> 1 aduan Negative
#> 2 agresif Negative
#> 3 amaran Negative
#> 4 anarki Negative
#> 5 ancaman Negative
#> 6 aneh Negative
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.