Trie Usage

Casimir Saternos

2017-01-05

Other Trie Packages

The rtrie package provides a few simple functions to support tries representing lists of words. Other R packages on CRAN provide trie implementations with different characteristics including:

Also see various packages related to trees in general.

Trie Creation

A list of words can be created programatically or read from a data file or other source.

  words<-read.delim(
    file=system.file("extdata", "dictionary.txt", package = "rtrie"), 
    header=F, sep = " ",
    stringsAsFactors = FALSE)
  class(words)
## [1] "data.frame"

In this case a data frame was created from a file. Creating a trie involves a simple function call to the char_tree function. A trie is created from a vector passed as an argument (in this case column V1 from the data frame).

trie <- char_tree(words$V1, 'X')

To get an idea of the performance, the file in question contained 79339 words. The microbenchmark library can be used for timing.

  library(rtrie)
  library(microbenchmark)
  timings <- microbenchmark(
    trie <- char_tree(words$V1, 'X'), 
    times=1
  )
  
  print(paste(timings$time / 1000000000, 'seconds'))
## [1] "17.299074397 seconds"

Match the Beginning of Words

What are words that start with “stu”.

cat(matching_words('stu',trie))
## stub stubbed stubbier stubbily stubbing stubble stubbled stubbles stubbly stubborn stubby stubs stucco stuccoed stuccoer stuccoes stuccos stuck stud studbook studded studdie studdies studding student students studfish studied studier studiers studies studio studios studious studs studwork study studying stuff stuffed stuffer stuffers stuffier stuffily stuffing stuffs stuffy stuiver stuivers stull stulls stultify stum stumble stumbled stumbler stumbles stummed stumming stump stumpage stumped stumper stumpers stumpier stumping stumps stumpy stums stun stung stunk stunned stunner stunners stunning stuns stunsail stunt stunted stunting stuntman stuntmen stunts stupa stupas stupe stupefy stupes stupid stupider stupidly stupids stupor stupors sturdied sturdier sturdies sturdily sturdy sturgeon sturt sturts stutter stutters

Test if a Character String is a Word

Is “stunt” a word?

cat(is_word('stunt',trie))
## TRUE

How about “stu”?

cat(is_word('stu',trie))
## FALSE