In this post, we will be implementing the Sorted Set data structure in Go. Why? well because while working on my git project and following the book, the author used the ruby’s sorted set in the code. So that’s why I was also looking for the sorted set, and thought why not build it myself, from scratch (ain’t I building git for my toy project so why not also this ;) )
What is Sorted Set
I don’t know … Well sorted set is set data-structure, that is ordered (usually in ascending) and enforces that all the elements that are present in the set are unique.
Let’s first decide on what data-structure our sorted set will be build upon? Redis has implemented its sorted set, using two data-structures.
- Hashmap (for O(1) lookup time)
- Skip List (for O(log n) INSERT/REMOVE time)
In go we have hashmap(luckily…), as for skip list we will have to implement it.
Implementing Skip List
So after knowing that first I will have to implement the skip list, I embarced on the journey of finding the tutorials, videos understanding the skip list data-structure first. For understanding the skiplist I watched the MIT Lecture on skip list.
Skip list is an easier alternative to the tree-type data-structures, that gives us the probabilitic(not deterministic,
meaning for most of the time) O(log n)
time complexity for FINDING/INSERTING/REMOVING an element while maintaining
the desired order.
Skip list is built upon linked list, which just holds two value (i) the value itself, and (ii) the pointer to the next
node in the structure.
The difference comes in skip list is that it also have the levels added in this.
What skip list does, is it adds more linked list and adds them in level order, the lowest level has all the nodes, the level above it will probabilistically have the half as much node and so on. So that when we start the top level, search our element in that level, if we don’t find the element in that level or found the greater value, we go a level down till we find the value, if we do not found it, then it is not present in the skip list.
Let’s first get the stucture in place.
skiplist.go
package main
const (
MaxHeight = 16
)
type node struct {
key []byte
value []byte
tower [MaxHeight]*node
}
type SkipList struct {
head *node
height int
}
The MaxHeight const tells the max height the skip list will have, this was used in the original paper written by William Pugh, and it can store 65,536 nodes.
So our node has key, value pair and a tower helps in simulating the linked list stacking without actually stacking the linked list, by using the pointer to the next node in that level.
Let’s take an example
Level 3: head ▶ A ────────────────▶ nil
Level 2: head ▶ A ────────▶ D ────▶ nil
Level 1: head ────────▶ C ────▶ E ▶ nil
Level 0: head ▶ A ▶ B ▶ C ▶ D ▶ E ▶ nil
If we will see the tower array of the head node, in the tower[0], that is level 0 of skip list, will have the pointer to the node A, then the tower[1], that is level 1 of skip list, will have the pointer to the node C, and so on. So without having to make multiple linked list data, we will have a single linked list.
Our SkipList has two fields, head
that is a sentinal node, that just acts as dummy node, helping
us in getting rid of many edge cases, and another field height
tells the current height or levels of the skiplist.
Add a function that will return the pointer to the skiplist with initialized value.
func NewSkipList() *SkipList {
return &SkipList{
head: &node{}, height: 1,
}
}
Find
Now we want to implement the Find
feature of the skip list, that will take the key and searches
the whole skiplist and if the key is found, returns its value otherwise based on your preference, return either a bool, error or anything else.
func (s *SkipList) Find(key []byte) ([]byte, error) {
found, _ := s.search(key)
if found == nil {
return nil, errors.New("key not found")
}
return found.value, nil
}
The Find
API will use another unexported function that will do the actual searching in the skiplist.
The search
function will return two things, firstly the node pointer if we found it in the skiplist,
secondly the journey array that tells what nodes was visited on what level, that will help in INSERT/REMOVE
node in skiplist. The journey array will help in splicing, i.e. connecting and disconneting the node connections
import (
"bytes"
)
// ...
func (s *SkipList) search(key []byte) (*node, [MaxHeight]*node) {
var next *node
var journey [MaxHeight]*node
prev := s.head
for lvl := s.height - 1; lvl >= 0; lvl-- {
for next = prev.tower[lvl]; next != nil; next = prev.tower[lvl] {
if bytes.Compare(key, next.key) <= 0 {
break
}
prev = next
}
journey[lvl] = prev
}
if next != nil && bytes.Equal(key, next.key) {
return next, journey
}
return nil, journey
}
Now let’s create the ADD functionality, with the search
in place, it won’t we much difficult.
But wait before that, we need to get what height of what level the node we are inserting should be added to,
meaning should it only be added the level 0 or to more upper level?
To do so, we have two ways, firstly we can have a function coinToss
that will simulate the tossing of coin.
If we get HEAD we will bump the node a level, we will do so until we get TAIL. Suppose we added a node,
it will first be added to the level 0 (all nodes are required to be added in level 0), then we will
toss the coin, if we get the TAIL on the first toss, we will stop there, so the inserted node will only remain in
the level 0.
Secondly, we can use a function that will generate a random number and then based on that number,
we will decide what top most level it will added. This is what we are going to use. It will tell us
the topmost level and we add the node pointer from the topmost level we got to the lowest level.
import (
"bytes"
"math"
_ "unsafe"
)
const (
MaxHeight = 16
PValue = 0.5
)
var probabilities [MaxHeight]uint32
func init() {
probability := 1.0
for lvl := 0; lvl < MaxHeight; lvl++ {
probabilities[lvl] = uint32(probability * float64(math.MaxUint32))
probability *= PValue
}
}
//go:linkname Uint32 runtime.fastrand
func Uint32() uint32
func randomHeight() int {
seed := Uint32
height := 1
for height < MaxHeight && seed <=probabilities[height] {
height++
}
return height
}
In this we are initializing the probabilites
array (notice; array not slice), after that
we are populating it with the uint32
values. The logic behind is that if in case we
uint32
whose max value is 10
(for examples sake) we can have 2
probability of
0-5
will occupy the level 1 and 5-10
will occupy the level 2, we can further divide
the 5-10
, so that 5-7
occupy the level 2 and 7-10
occupy the level 3, and so on.
The init
function is called automatically by the Go while compiling the package.
And the last function just creates the random number, this method is preferred over the math.rand because this is more efficient here.
Insert
Now let’s add the Insert
function that will insert the node at the desired sorted position.
func (s *SkipList) Insert(key, val []byte) *node {
found, journey := s.search(key)
if found != nil {
found.value = val
return found
}
height := randomHeight()
n := &node{key: key, value: val}
for lvl := 0; lvl < height; lvl++ {
prev = journey[lvl]
if prev == nil {
// the nil prev means the there was not this height
// value in the skip list so this in the new height
// so now the prev will be the head
prev = s.head
}
n.tower[lvl] = prev.tower[lvl]
prev.tower[lvl] = n
}
if height > s.height {
s.height = height
}
return n
}
We are using the search
to find the node, if present then overwrite the value with
the given value, here we can also return error
depending on your use case. If node is not present, then
we will have the journey array that will tell us the appropriate neighbours where to splice the
node between.
Remove
Now let’s remove the node, it will be same as to what we are doing.
func (s *SkipList) Remove(key []byte) (bool, error) {
found, journey := s.search(key)
if found == nil {
return false, fmt.Errorf("key not found")
}
for lvl := 0; lvl < s.heigth; lvl++ {
if journey[lvl].tower[lvl] != found {
break
}
journey[lvl].tower[lvl] = found.tower[lvl]
found.tower[lvl] = nil
}
found = nil
s.shrink()
return true, nil
}
We are just splicing the nodes and connecting the correct node. As for the function shrink
, it reduces the height of the
skip list if neccessary.
func (s *SkipList) shrink() {
for lvl := s.height - 1; lvl >= 0; lvl-- {
if s.head.tower[lvl] == nil {
s.height--
}
}
}
Guy’s we are done with our implementation of skip list. But I would like to add one more thing, Iterator.
Iterator
I want to create the iterator on the skip list, because in my gitgo
project I want to
be able iterate on this underlying skip list, because the tower
field in our node
struct
is not exported(i.e. it starts with the lower-case letter) which good.
type Iterator interface {
Next() bool
Key() string
}
type skipIterator struct {
curr *node
}
func (it *skipIterator) Next() bool {
next := it.curr.tower[0]
if next == nil {
return false
}
it.curr = next
return true
}
func (it *skipIterator) Get() string {
return string(it.curr.key)
}
Implementing Sorted Set
Now let’s create the focus of this blog, Sorted Set. We have already done the hard part, now its easy. Let’s get the structure in place.
type SortedSet struct {
hashmap map[string]*node
list *Skiplist
}
This is the design choice by Redis, it uses hashmap for O(1)
lookup time, and O(log n)
for FIND/INSERT/REMOVE
operations.
This is the whole code for the sorted set implementation.
func NewSortedSet() *SortedSet {
return &SortedSet{
hashmap: make(map[string]*node),
list: NewSkipList(),
}
}
func (s *Sorted) Add(key string) {
n := s.list.Insert([]byte(key), nil)
s.hashmap[key] = n
}
func (s *SoretdSet) Remove(key string) bool {
_, err := s.list.Delete([]byte(key))
if err != nil {
return false
}
delete(s.hashmap, key)
return true
}
func (s *SortedSet) Contains(key string) (bool, *node) {
if node, found := s.hashmap[key]; found {
return true, node
}
return false, nil
}
func (s *SortedSet) Len() int { return len(s.hashmap) }
func (s *SortedSet) Iterator() Iterator {
return &skipIterator{curr: s.list.head}
}
And voilaaa, we are done with our Sorted Set implementation. In this we just abstracted the FIND/INSERT/DELETE and made the necessary API exported, for users to use.
You can use the iterator like this:
it := sortedset.Iterator()
for it.Next() {
val := it.Get()
}