Disjoint-set data structure
According to wiki:
In computer science, a disjoint-set data structure, also called a union–find data structure or merge–find set, is a data structure that stores a collection of disjoint (non-overlapping) sets.
The key is every subsets use a special data structure, to make it easy to do two operations:
- merge two subsets:
Union
- find which subsets the element belongs to:
Find
For example, we have an array [s1, s2, s3, s4, s5]
. First, we make every element as a unique subset. Every subsets have a concept called subset leader. For initialization, we make every element as the leader itself. We can use a map to express this data structure, with map key as the element, map value as subset leader.
{
s1: s1,
s2: s2,
s3: s3,
s4: s4,
s5: s5,
}
So now if we call function Find
for element s2
, we should its leader s2
too.
Now let call function Union
for subset s1
and s2
to merge the two subsets. So how to express this merged state? We can make s1
as s2
's leader. So later, we can find s1
and s2
have the same leader so we know they are in the same subset. So now the data structure becomes:
{
s1: s1,
s2: s1, // here, s2's leader is s1
s3: s3,
s4: s4,
s5: s5,
}
So now if we want to find the leader of s2
, we will find s1
first, then call Find
recursively. When we find the element's leader is itself, then we find the top leader.
So this is the basic process of disjoint set. Let's implement it in code.
class DisjointSet {
private m = new Map<number, number>();
constructor(universe: number[]) {
universe.forEach(e => {
this.m.set(e, e);
});
}
public find(k: number): number {
const v = this.m.get(k) as number;
if (v === k) {
return v;
} else {
// call find recursively to get the top leader
return this.find(v);
}
}
public union(a: number, b: number): boolean {
const x = this.find(a);
const y = this.find(b);
if (x === y) return false;
this.m.set(x, y);
// can also set(y, x);
}
}
Unbalanced Problem
From the above process, we can see that, as we do many merge operations, this map tree may become very deep. We may need to call Find
recursively many times to find the top leader.
For example, the structure may becomes:
{
s1: s1,
s2: s1,
s3: s2,
s4: s3,
s5: s4,
}
In this case, we need to call Find
many times to find s5
's top leader.
There are two ways to solve this problem.
The first solution is called union by rank. The idea is, for each subset, we store a value called rank
, which stands for the tree's deep degree. Then when we call Union
, we make the leader with lower rank points to the leader with higher rank. This way, the tree should become more balanced.
The second solution is called path compression. The idea is, when we call Find
to find element s5
's leader, say we find the result is s1
. Before returns the result, we can set s5
's leader as s1
directly. So next time when we want to find s5
's leader, we should get the result instantly.
Now let see how to code these two solutions.
class DisjointSet {
private m = new Map<number, number>();
// every leader have its rank
private rank = new Map<number, number>();
constructor(universe: number[]) {
universe.forEach(e => {
this.m.set(e, e);
this.rank.set(e, 0); // rank init with 0
});
}
public find(k: number): number {
// path compression
const v = this.m.get(k) as number;
if (v !== k) {
this.m.set(k, this.find(v));
}
return this.m.get(k) as number;
}
public union(a: number, b: number): boolean {
const x = this.find(a);
const y = this.find(b);
if (x === y) return false;
const rx = this.rank.get(x) as number;
const ry = this.rank.get(y) as number;
// higher rank value as leader
if (rx > ry) {
this.m.set(y, x);
} else if (rx < ry) {
this.m.set(x, y);
} else {
// same rank, either can be leader
this.m.set(x, y);
// now tree becomes deeper
this.rank.set(y, ry + 1);
}
return true;
}
}
redundant-connection
Now let's see how to use this data structure to solve problems.
The problem is to find the reduncdant connection. We could make every node as a subset of the disjoint set. Then we can iterate all the edges. For every iteration, we check if the two nodes are in the same subsets. If they are, then this edge is redundant. If they are not, then we call union to make them as one subsets.
function findRedundantConnection(edges: number[][]): number[] {
const universe = Array.from({ length: edges.length }, (_, i) => i + 1);
const ds = new DisjointSet(universe);
for (let i = 0; i < edges.length; i++) {
const [u, v] = edges[i];
// union returns false means they were already connected
if (!ds.union(u, v)) {
return edges[i];
}
}
return [];
};
// console.log(findRedundantConnection([[1, 2], [1, 3], [2, 3]]));