Heap (data structure)

Heap (data structure)

·

9 min read

store a binary tree in an array

First, we can store a binary tree in an array using level order traversal.

Say we have a tree below.

            a
        b       c
      d   e   f   g
     h i j k l m n o

This tree can be store as

[a, b, c, d, e, f, g, h, i, j, k, l, n, n, o]

Next, we need to know some special features for this structure:

  • for any node, its left child is at the index of nodeIndex*2+1
  • for any node, its right child is at the index of nodeIndex*2+2
  • for any node, its parent is at the index of (nodeIndex-1)/2

For example:

  • node a's left child's index is 0*2+1 = 1
  • node a's right child's index is 0*2+2 = 2
  • node g's left child's index is 6*2+1 = 13
  • node g's right child's index is 6*2+2 = 14
  • node g's parent's index is (6-1)/2 = 2

Heap

A heap must be a complete binary tree. If the tree is not complete like below:

   a
  b c
    d e

Then we will get holes in the stored array:

[a, b, c, -, -, d, e]

max heap / min heap

There are two types of heap: max heap and min heap.

In min heap, for any given node, the value of the node is less than its children. The root of the tree is the samllest value in the tree, so it is called min heap.

In max heap, for any given node, the value of the node is bigger than its children. The root of the tree is the biggest value in the tree, so it is call max heap.

insert

If we already have to heap, how to insert a value in it?

Say we have a max heap as below:

    50
  30  20
15 10 8 16

The stored array as: [50, 30, 20, 15, 10, 8, 16]

If we need to insert value 60, then we can follow the next steps:

  • push 60 into the end of the array

So array becomes [50, 30, 20, 15, 10, 8, 16, 60], and tree becomes:

      50
    30  20
  15 10 8 16
60
  • compare this new node with its parent node

If new node's value is bigger than its parent, then swap these two nodes, If not, then stop.

      50
    30  20
  60 10 8 16
15
  • repeat step 2
      50
    60  20
  30 10 8 16
15
  • repeat step 2
      60
    50  20
  30 10 8 16
15
  • Now here is no parent node, stop

delete

You can only delete the root node of the heap. The question is, after root node is deleted, how to make sure the tree can still be a valid heap.

Say we have a max heap as below:

    50
  30  20
15 10 8 16

Then follow the next steps to see the delete process.

  • delete root node

  30  20
15 10 8 16
  • move the last node to the top
    16
  30  20
15 10 8
  • compare current node 16 with its children

If current node's value is less than its children, then swap these two nodes, if not, stop.

    30 
  16  20
15 10 8
  • repeat step 3
    30 
  16  20
15 10 8

heapify

If the input is an array, how to constuct a valid heap from it?

We can initialize an empty heap, and insert array element one by one. But in this way, the time complexity is O(nlog(n)). We can use the following heapify process to make it faster.

Say we have a heap as below.

     10
  20    15
12 40 25 18
  • Find last node 18, no children, pass
  • Find next last node 25, no children, pass
  • Find next last node 40, no children, pass
  • Find next last node 12, no children, pass
  • Find next last node 15

Compare node 15 with its children, find max node 25, swap.

     10
  20    25
12 40 15 18

After node 15 was moved down, check its children again, no children, pass.

  • Find next last node 20

Compare node 20 with its children, find max node 40, swap

     10
  40    25
12 20 15 18

After node 20 was moved down, check its children again, no children, pass.

  • Find next last node 10

Compare node 10 with its children, find max node 40, swap

     40
  10    25
12 20 15 18

After node 10 was moved down, check its children, find max value 20, swap

     40
  20    25
12 10 15 18

After node 10 was moved down, check it children, no children, pass

heap sort

From the above process, we can conclude that if we delete element from heap one by one, then we can get a sorted array. This is heap sort.

heap

As we know all the key steps of implementing a heap, let's write it in code.

class heapq {
    public static heapify(nums: number[]) {
        for (let i = nums.length - 1; i >= 0; i--) {
            heapq._heapify(nums, i);
        }
    }

    public static heappush(nums: number[], val: number) {
        nums.push(val);
        heapq._swapWithParent(nums, nums.length - 1);
    }

    public static heappop(nums: number[]): number {
        const res = nums[0];
        nums[0] = nums[nums.length - 1];
        nums.splice(nums.length - 1, 1);
        heapq._heapify(nums, 0);
        return res;
    }

    private static _swapWithParent(nums: number[], index: number) {
        if (index === 0) return;
        const parentIndex = Math.trunc((index - 1) / 2);
        if (nums[index] < nums[parentIndex]) {
            [nums[index], nums[parentIndex]] = [nums[parentIndex], nums[index]];
            heapq._swapWithParent(nums, parentIndex);
        }
    }

    private static _heapify(nums: number[], index: number) {
        const leftChildIndex = index * 2 + 1;
        const rightChildIndex = index * 2 + 2;

        let minValueIndex = index;
        if (leftChildIndex < nums.length && nums[leftChildIndex] < nums[minValueIndex]) {
            minValueIndex = leftChildIndex;
        }
        if (rightChildIndex < nums.length && nums[rightChildIndex] < nums[minValueIndex]) {
            minValueIndex = rightChildIndex;
        }

        if (minValueIndex !== index) {
            [nums[index], nums[minValueIndex]] = [nums[minValueIndex], nums[index]];
            heapq._heapify(nums, minValueIndex);
        }
    }
}

Kth Largest Element in a Stream

The problem is to find the kth larest element in a stream. We can use a min heap. As the streaming data comes constantly, we add it into min heap and delete nodes to keep the min heap size as k. We know the root node of min heap is the minimum value of the heap, so all other node's values are bigger then the root node value. Because the heap size is k, so the root node value is the kth largest value.

class KthLargest {
    private k: number;
    public nums: number[];

    constructor(k: number, nums: number[]) {
        this.k = k;
        this.nums = nums;
        heapq.heapify(this.nums);
        while (this.nums.length > this.k) {
            heapq.heappop(this.nums);
        }
    }

    add(val: number): number {
        heapq.heappush(this.nums, val);
        if (this.nums.length > this.k) {
            heapq.heappop(this.nums);
        }
        return this.nums[0];
    }
}

Find Median from Data Stream

The problem is to find the median value from streaming data. We can use a min heap and a max heap to solve this problem.

Say, at some point, we get streaming data as [1,2,3,4,5,6,7]. We know the median value is 4. We can use a max heap to store all values below and equals to 4 [1,2,3,4], 4 is the root node. Then we use a min heap to save all values bigger the 4 [5,6,7]. Now the median value is the root element of the max heap.

What if the streaming data length is even? We can still use these two heaps. If we find two heap size is equal, then we know the length is even, we can compute the average of root node of max heap and root node of min heap.

Let see how to do it in code.

// Make the heap element type generic
// so we use use it to both min heap and max heap
type HeapqComparator<T> = (t1: T, t2: T) => boolean;

class heapq {

    public static heapify<T>(elements: T[], comparator: HeapqComparator<T>) {
        for (let i = elements.length - 1; i >= 0; i--) {
            heapq._heapify(elements, i, comparator);
        }
    }

    private static _heapify<T>(elements: T[], index: number, comparator: HeapqComparator<T>) {
        const lci = index * 2 + 1;
        const rci = index * 2 + 2;

        let topi = index;

        // note the arguments order the function comparator
        // if its in heap, then we need elements[topi] - elements[lci] > 0
        if (lci < elements.length && comparator(elements[topi], elements[lci])) {
            topi = lci;
        }
        if (rci < elements.length && comparator(elements[topi], elements[rci])) {
            topi = rci;
        }

        if (topi !== index) {
            [elements[topi], elements[index]] = [elements[index], elements[topi]];
            heapq._heapify(elements, topi, comparator);
        }
    }

    public static heappush<T>(elements: T[], val: T, comparator: HeapqComparator<T>) {
        elements.push(val);
        heapq._swapWithParent(elements, elements.length - 1, comparator);

    }

    private static _swapWithParent<T>(elements: T[], index: number, comparator: HeapqComparator<T>) {
        if (index === 0) return;
        const pi = Math.trunc((index - 1) / 2);

        // note the arguments order the function comparator
        // if its min heap, then we need elements[pi] - elements[index] > 0
        if (comparator(elements[pi], elements[index])) {
            [elements[index], elements[pi]] = [elements[pi], elements[index]];
            heapq._swapWithParent(elements, pi, comparator);
        }
    }

    public static heappop<T>(elements: T[], comparator: HeapqComparator<T>): T {
        const res = elements[0];
        elements[0] = elements[elements.length - 1];
        elements.splice(elements.length - 1, 1);
        heapq._heapify(elements, 0, comparator);
        return res;
    }
}

// type HeapElement = number;
// const nums: HeapElement[] = [4, 3, 6, 2, 1];
// const comparator = (a: HeapElement, b: HeapElement) => (b - a) > 0;

// heapq.heapify(nums, comparator);
// while (nums.length > 0) {
//     const v = heapq.heappop(nums, comparator);
//     console.log(v);
// }

type HeapElement = number;

class MedianFinder {
    private minHeap: HeapElement[];
    private maxHeap: HeapElement[];

    public static minHeapComparator: HeapqComparator<HeapElement> = (a: HeapElement, b: HeapElement) => (a - b) > 0;
    public static maxHeapComparator: HeapqComparator<HeapElement> = (a: HeapElement, b: HeapElement) => (b - a) > 0;

    constructor() {
        this.minHeap = [];
        this.maxHeap = [];
    }

    addNum(num: number): void {
        if (this.maxHeap.length === 0 || num <= this.maxHeap[0]) {
            heapq.heappush(this.maxHeap, num, MedianFinder.maxHeapComparator);

            // if max heap's length minus min heap's length bigger than 1
            // adjust two heap
            if (this.maxHeap.length - this.minHeap.length > 1) {
                const v = heapq.heappop(this.maxHeap, MedianFinder.maxHeapComparator);
                heapq.heappush(this.minHeap, v, MedianFinder.minHeapComparator);
            }
        } else {
            heapq.heappush(this.minHeap, num, MedianFinder.minHeapComparator);

            // if max heap's length lower then min heap's length
            // adjust two heap
            if (this.minHeap.length > this.maxHeap.length) {
                const v = heapq.heappop(this.minHeap, MedianFinder.minHeapComparator);
                heapq.heappush(this.maxHeap, v, MedianFinder.maxHeapComparator);
            }
        }
    }

    findMedian(): number {
        if (this.maxHeap.length === this.minHeap.length) {
            return (this.maxHeap[0] + this.minHeap[0]) / 2;
        } else {
            return this.maxHeap[0];
        }
    }
}

/**
 * Your MedianFinder object will be instantiated and called as such:
 */


// var obj = new MedianFinder();
// obj.addNum(-1);
// console.log(obj.findMedian());
// obj.addNum(-2);
// console.log(obj.findMedian());
// obj.addNum(-3);
// console.log(obj.findMedian());
// obj.addNum(-4);
// console.log(obj.findMedian());