Reservoir Sampling

Reservoir Sampling

·

3 min read

Reservoir Sampling

Normally, its very simple to choose an element randomly from an array. Just generate a random index from [0, array_length-1]:

const arr = ["a", "b", "c", "d", "e"];
const idx = ~~(Math.random() * arr.length);

console.log(arr[idx])

But this only works for arrays with known size. What if we want to select an element from an array which we don't know the size beforehand? Like a streaming data, as the elements come one by one, we want to select an element randomly with equal probability, how can we do it?

We can use the reservoir sampling algorithm to do it. The idea is, as the elements come, we keep track of the size of the elements we already seen, and generate an equal probability to decide if the current element should be choosen.

For example,

1. element a come

count is 1, generate p(probability) from [0, 1), if p < 1, then choose current element
this way, the probability of choosing element a is 100%, so we choose a

2. element b come

count is 2, generate p from [0, 2), if p < 1, then choose current element
this way, we have 1/2 probability to choose b, and 1/2 probability to keep the last choosen element a

3. element c come

count is 3, generate p from [0, 3). if p < 1, then choose current element
this way, we have 1/3 probability to choose c, 2/3 probability to keep last choosen element
and infer from last step, now we have 2/3 * 1/2 = 1/3 probability to choose b, and 2/3 * 1/2 = 1/3 probability to choose a

...

As you can see, as the streaming data comes, we always have equal probability to choose the elements we already seen. This is reservoir sampling, pretty simple.

Linked List Random Node

Now Let't use this algorithm to solve the leetcode Linked List Random Node problem. The code is strictly in line with the above process, no new things to know.

class ListNode {
    val: number;
    next: ListNode | null;
    constructor(val?: number, next?: ListNode | null) {
        this.val = (val === undefined ? 0 : val);
        this.next = (next === undefined ? null : next);
    }
}

class Solution {

    private head: ListNode | null;

    constructor(head: ListNode | null) {
        this.head = head;
    }

    getRandom(): number {
        let node = this.head as ListNode;
        let res = 0;
        let count = 1;

        while (node != null) {
            // if count == 1, then 100% use node
            // if count == 2, then 50% use node
            // if count == 3, then 33% use node
            if (Math.random() * count < 1) {
                res = node.val;
            }
            count += 1;
            node = node.next;
        }

        return res;
    }
}

/**
 * Your Solution object will be instantiated and called as such:
 * var obj = new Solution(head)
 * var param_1 = obj.getRandom()
 */

K sampling

What if we want to select k elements with equal probability, not just one? Well, actually, the reservoir sampling algorithm is just designed for this k elements sampling problem. The above one element sampling problem is just the case when the value of k is 1.

const inputArray = ["a", "b", "c", "d", "e"];
const outputArray = [];
const k = 2;

// if we only have k elements, then choose them all (100% probability)
for (let i = 0; i < k; i++) {
    outputArray.push(inputArray[i]);
}

// as more elements come,
// generate an equal probability to decide if new element should be added into output array
for (let i = k; i < inputArray.length; i++) {
    const idx = ~~(Math.random() * (i+1)) // [0, i+1), i+1 is count
    if (idx < k) {
        outputArray[idx] = inputArray[i];
    }
}

console.log(outputArray);