Z algorithm is a string matching algorithm, which has time complexity of O(n)
.
The key idea of Z algorithm is to build an Z array, which stores the matching length starts from each character. For example, if we have a string ababc
, then we initialize an array with the same length [0,0,0,0,0]
. The first element means, how long could a string start from the first character to match the original string. So just string matching itself. So first element is the length of the string. [5,0,0,0,0]
. Then comes second element. Same logic, how long could babc
matching the original string? Of course, its 0, so [5,0,0,0,0]
. Same logic for other characters.
ababc
=> 50200
Before we dig into how to get this array, let consider how this array be helpful for string matching first. The idea is simple. Say if we want to find needle
in a haystack
. We can concat them and put a special character in between like needle$haystack
. Then we calculate Z array for this string. Then we can iterate Z array, if there is a value equals to the length of the needle
, then it means there is a match.
OK, now let's see how to calculate the Z array.
We could use the brute force way to calculate it. But that's not efficient at all, time complexity becomes O(n^2)
.
How to speed up this process? See an example below.
i
a b a b c d a b a b e f
0 2 0 0 0 4 0 2 0 0 0
- - - - - - - -
See the position of i
, it has a value 4, which means a substring start from this character with window 4 equals first 4 characters of the string. Since the string is the same, then its value in Z array should be equal too. So z[i+1] = z[1]
, z[i+2] = z[2]
, and z[i+3] = z[3]
. Wow, this really could speed up things.
But really? Take a look at another example.
i
a b a b a b c d a b a b e f
_ 0 4 0 2 0 0 0 4 0 2 0 0 0
- - - - - - - -
* *
In this example, at the end of the first window, there are 2 more characters matching, makes them not equal.
Let's see one more example.
i
a b a b c d a b a b a b
_ 0 2 0 0 0 4 0 4 0 2 0
- - - - - - - -
* *
This time, at the end of the second window, there are 2 more characters matching. also makes them not equal.
So, as you can see now, we can use this feature, but with limits. We need to check if current matching part could exceed the window. If it could, then we should start from the window, and match every characters manually again.
Now, let's see the code.
function strStr(haystack: string, needle: string): number {
const s = needle + "$" + haystack;
const z = [s.length];
let i = 1;
let window = 0;
while(i < s.length) {
// calculate the window manually
while(i + window < s.length && s[window] === s[i+window]) window++;
z[i] = window;
// if window is 0, then its no use, move to next
if (window === 0) {
i++;
continue;
}
// we have a window, then calculate values inside the window
let j = 1;
// j + z[j] means, if matching part could exceed the window
while(j < window && j + z[j] < window) {
z[i+j] = z[j];
j++;
}
// if could exceed, move to that position, start match manully again
i += j;
window -= j;
}
for (let i = 0; i < s.length; i++) {
if (z[i] === needle.length) {
return i - needle.length - 1;
}
}
return -1;
};