Introduction

datastructures implements various data structures that are frequently used in computer science. Among these are for example Fibonacci heaps, stacks, queues or hashmaps. Advanced data structures are essential in many computer science and statistics problems, for example graph algorithms or string analysis. The package uses Boost and STL data types and extends these to R with Rcpp modules.

Fibonacci and binomial heaps

Fibonacci and binomial heaps are priority queue data structures using the minimum heap property. They can be represented using collections of trees or linked lists consisting of nodes of elements. Every node is a pair of keys and values, where the key decides the priority of the node in the heap. Fibonacci heaps have various applications, one of the most famous being in efficiently finding shortest paths using Dijkstra’s algorithm. Fibonacci heaps can add values in amortized \(\mathcal{O}(1)\) time and remove the minimum value in \(\mathcal{O}(log \, n)\) time making them a good choice in many real world scenarios. Binomial heaps have in general slightly worse asymptotic bounds. However, removing the minimum element (pop) has an asymptotically tight bound of \(\Theta(log \, n)\). The two classes have the exact same methods, so every method explained here is available for the other class, too.

You can create a heap like this:

  fheap <- fibonacci_heap("numeric", "character")
  bheap <- binomial_heap("numeric", "character")

This gives us heaps with numeric keys and character values. You can pick from either integer, numeric or character keys and values. So if we insert several key-value pairs, the pair with the minimum key would have the highest priority. Let’s insert some values and have a look:

  keys   <- sample(seq(0, 1, by=.2))
  values <- paste0("V", keys)
  fheap  <- insert(fheap, keys, values)
  
  size(fheap)
## [1] 6

There are also some other ways to insert into a heap. If you want to insert a vectorial value, do this:

  fheap  <- insert(fheap, -1, letters)
  peek(fheap)
## $`-1`
##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
## [18] "r" "s" "t" "u" "v" "w" "x" "y" "z"

If you want to add multiple values, you need to use a vectorial key and a matrix as value:

  fheap  <- insert(fheap, c(-2, -3), matrix(letters, 2))
  peek(fheap)
## $`-3`
##  [1] "b" "d" "f" "h" "j" "l" "n" "p" "r" "t" "v" "x" "z"

Furthermore, you can use lists as values to insert into a heap:

  fheap  <- insert(fheap, c(-4, -5), list("a", letters[1:4]))
  peek(fheap)
## $`-5`
## [1] "a" "b" "c" "d"

Since Fibonacci and binomial heaps use the minimum heap property the first element in the heap should be 0 -> V0 even though we inserted the pairs in a random fashion.

  peek(fheap)
## $`-5`
## [1] "a" "b" "c" "d"
  size(fheap)
## [1] 11

That worked nicely. peek gives us the first element from the heap without removing it. If we want to have it removed we call the pop function. Of course this changes the priority of the heap:

  pop(fheap)
## $`-5`
## [1] "a" "b" "c" "d"
  size(fheap)
## [1] 10

You can alternatively insert values like this:

  fheap[-10] <- "V-1"
  peek(fheap)
## $`-10`
## [1] "V-1"

This works for all the described insert methods above.

Hashmaps and bimaps

Hashmaps and bimaps use hash functions to store key-value pairs. Given a key, the hash function computes a bucket where the value is stored, making lookup, insertion and deletion on average possible in \(\mathcal{O}(1)\) time. In the worst case these runtimes decrease to \(\mathcal{O}(n)\) time, where \(n\) is the number of entries in a bucket. The difference between the two classes is that hashmaps compute a mapping from keys to values (the hash function) \[ \, f: \mathcal{K} \rightarrow \mathcal{V} \, \] in ‘one direction’, while bimaps also compute a mapping \[ \, f: \mathcal{V} \rightarrow \mathcal{K} \, \] in the ‘other direction’. So a bimap is essentially a combination of two hashmaps.

Hashmaps and bimaps can get created like this, this time we use <character, integer> pairs:

  hash  <- hashmap("character", "integer")
  bimap <- bimap("character", "integer")

Insertion to maps works just like before, by calling the insert function:

  keys   <- paste0("V", 1:5)
  values <- 1:5
  hash   <- insert(hash, keys[1:4], values[1:4])
  bimap  <- insert(bimap, keys[1:4], values[1:4])
  hash[keys[5]]  <- values[5]
  bimap[keys[5]] <- values[5]

Values (and keys in the case of bimaps) can then be accessed like this:

  get(hash, keys[1])
## [[1]]
## [1] 1
  hash[keys[1]]
## [[1]]
## [1] 1
  get(bimap, keys[1], "values")
## [1] 1
  get(bimap, values[2], "keys")
## [1] "V2"
  get(bimap, keys[1])
## [1] 1

For hashmaps the subset-operator [ is also supported for accessing elements. This does not make so much sense for bimaps, because here we have to choose whether we want to access keys or values. In the example on the top we showed how both is possible by either providing keys or values as third argument to the get function. If you leave it out, then the third argument defaults to values.

If you want to directly access keys or values as vectors or have a look at a few random elements, you would call:

  keys(bimap)
## [1] "V1" "V2" "V3" "V4" "V5"
  values(hash)
## [[1]]
## [1] 5
## 
## [[2]]
## [1] 4
## 
## [[3]]
## [1] 2
## 
## [[4]]
## [1] 3
## 
## [[5]]
## [1] 1
  head(bimap)
## $V1
## [1] 1
## 
## $V2
## [1] 2
## 
## $V3
## [1] 3
## 
## $V4
## [1] 4
## 
## $V5
## [1] 5

Queues and stacks

Queues and stackes are two list datastructures using the STL’s std::deque in the backend, i.e. insertion at the end and getting the first element can be done in constant time \(\mathcal{O}(1)\) . Queues and stacks are for example used in depths-first-search and breadth-first-search making them versatile datastructures. While queues use the first-in-first-out principle, stacks use the last-in-first-out principle.

The two datatypes use the exact same methods. You can instantiate a stack or a queue using:

  qu <- queue("numeric")
  st <- stack("numeric")

As before, you can pick either numeric, integer or character keys. Now, let’s again insert some data:

  keys <- seq(0, 1, by=.2)
  print(keys)
## [1] 0.0 0.2 0.4 0.6 0.8 1.0
  qu <- insert(qu, keys)
  st <- insert(st, keys)

BEWARE: If you add one vector to queues and stacks it will add one element:

  peek(qu)
## [1] 0.0 0.2 0.4 0.6 0.8 1.0
  peek(st)
## [1] 0.0 0.2 0.4 0.6 0.8 1.0

If you want to add multiple elements, you need to add a list:

  st <- insert(st, list(1, rnorm(5)))
  pop(st)
## [1] -1.7577717  0.8551206 -2.9142047  2.7786205 -0.3233633
  pop(st)
## [1] 1

The pop operation removed the head element from the data structure.

For a queue pop and peek return the first element that got inserted, while a stack returns the last element that was inserted.

Author