library(malaytextr)
There is a data frame of Malay root words that can be used as a dictionary:
head(malayrootwords)
#> Col Word Root Word
#> 1 ad ada
#> 2 ak aku
#> 3 akn akan
#> 4 ank anak
#> 5 ap apa
#> 6 awl awal
stem_malay() will find the root words in a dictionary, in which the malayrootwords data frame can be used, then it will remove “extra suffix”“,”prefix" and lastly “suffix”
To stem word “banyaknya”. It will return a data frame with the word “banyaknya” and the stemmed word “banyak”:
stem_malay(Word = "banyaknya", dictionary = malayrootwords)
#> Col Word root_word
#> 1 banyaknya banyak
To stem words in a data frame:
<- data.frame(text = c("banyaknya","sangat","terkedu", "pengetahuan"))
x
stem_malay(Word = x,
dictionary = malayrootwords,
col_feature1 = "text")
#> Col Word root_word
#> 1 banyaknya banyak
#> 2 sangat sangat
#> 3 terkedu kedu
#> 4 pengetahuan tahu
remove_url will remove all urls found in a string
<- c("test https://t.co/fkQC2dXwnc", "another one https://www.google.com/ to try")
x
remove_url(x)
#> [1] "test " "another one to try"