Disjoint-set Data Structure

Disjoint-set Data Structure

·

4 min read

Disjoint-set data structure

According to wiki:

In computer science, a disjoint-set data structure, also called a union–find data structure or merge–find set, is a data structure that stores a collection of disjoint (non-overlapping) sets.

The key is every subsets use a special data structure, to make it easy to do two operations:

  • merge two subsets: Union
  • find which subsets the element belongs to: Find

For example, we have an array [s1, s2, s3, s4, s5]. First, we make every element as a unique subset. Every subsets have a concept called subset leader. For initialization, we make every element as the leader itself. We can use a map to express this data structure, with map key as the element, map value as subset leader.

{
    s1: s1,
    s2: s2,
    s3: s3,
    s4: s4,
    s5: s5,
}

So now if we call function Find for element s2, we should its leader s2 too.

Now let call function Union for subset s1 and s2 to merge the two subsets. So how to express this merged state? We can make s1 as s2's leader. So later, we can find s1 and s2 have the same leader so we know they are in the same subset. So now the data structure becomes:

{
    s1: s1,
    s2: s1,  // here, s2's leader is s1
    s3: s3,
    s4: s4,
    s5: s5,
}

So now if we want to find the leader of s2, we will find s1 first, then call Find recursively. When we find the element's leader is itself, then we find the top leader.

So this is the basic process of disjoint set. Let's implement it in code.

class DisjointSet {
    private m = new Map<number, number>();

    constructor(universe: number[]) {
        universe.forEach(e => {
            this.m.set(e, e);
        });
    }

    public find(k: number): number {
        const v = this.m.get(k) as number;
        if (v === k) {
            return v;
        } else {
            // call find recursively to get the top leader
            return this.find(v);
        }
    }

    public union(a: number, b: number): boolean {
        const x = this.find(a);
        const y = this.find(b);
        if (x === y) return false;

        this.m.set(x, y);
        // can also set(y, x);
    }
}

Unbalanced Problem

From the above process, we can see that, as we do many merge operations, this map tree may become very deep. We may need to call Find recursively many times to find the top leader.

For example, the structure may becomes:

{
    s1: s1,
    s2: s1,
    s3: s2,
    s4: s3,
    s5: s4,
}

In this case, we need to call Find many times to find s5's top leader.

There are two ways to solve this problem.

The first solution is called union by rank. The idea is, for each subset, we store a value called rank, which stands for the tree's deep degree. Then when we call Union, we make the leader with lower rank points to the leader with higher rank. This way, the tree should become more balanced.

The second solution is called path compression. The idea is, when we call Find to find element s5's leader, say we find the result is s1. Before returns the result, we can set s5's leader as s1 directly. So next time when we want to find s5's leader, we should get the result instantly.

Now let see how to code these two solutions.

class DisjointSet {
    private m = new Map<number, number>();

    // every leader have its rank
    private rank = new Map<number, number>();

    constructor(universe: number[]) {
        universe.forEach(e => {
            this.m.set(e, e);
            this.rank.set(e, 0); // rank init with 0
        });
    }

    public find(k: number): number {
        // path compression
        const v = this.m.get(k) as number;
        if (v !== k) {
            this.m.set(k, this.find(v));
        }
        return this.m.get(k) as number;
    }

    public union(a: number, b: number): boolean {
        const x = this.find(a);
        const y = this.find(b);
        if (x === y) return false;

        const rx = this.rank.get(x) as number;
        const ry = this.rank.get(y) as number;

        // higher rank value as leader
        if (rx > ry) {
            this.m.set(y, x);
        } else if (rx < ry) {
            this.m.set(x, y);
        } else {
            // same rank, either can be leader
            this.m.set(x, y);

            // now tree becomes deeper
            this.rank.set(y, ry + 1);
        }
        return true;
    }
}

redundant-connection

Now let's see how to use this data structure to solve problems.

The problem is to find the reduncdant connection. We could make every node as a subset of the disjoint set. Then we can iterate all the edges. For every iteration, we check if the two nodes are in the same subsets. If they are, then this edge is redundant. If they are not, then we call union to make them as one subsets.

function findRedundantConnection(edges: number[][]): number[] {
    const universe = Array.from({ length: edges.length }, (_, i) => i + 1);
    const ds = new DisjointSet(universe);

    for (let i = 0; i < edges.length; i++) {
        const [u, v] = edges[i];
        // union returns false means they were already connected
        if (!ds.union(u, v)) {
            return edges[i];
        }
    }

    return [];
};

// console.log(findRedundantConnection([[1, 2], [1, 3], [2, 3]]));