I got a problem from codility site and was asked to solve it. Here's a
link on stackoverflow. In two words: assume we have a string S and we need to choose such a prefix P of S, that product of its length and occurrence in S will be maximum.
For example, S = "abababa" has the following prefixes:
"a", whose product equals 1 * 4 = 4,
"ab", whose product equals 2 * 3 = 6,
"aba", whose product equals 3 * 3 = 9,
"abab", whose product equals 4 * 2 = 8,
"ababa", whose product equals 5 * 2 = 10,
"ababab", whose product equals 6 * 1 = 6,
"abababa", whose product equals 7 * 1 = 7.
I like problems like that. It has obvious solution in O(N ** 3) time, elegant and easy to guess if you know z-algorithm O(N ** 2) solution and tricky O(N) solution.
Ok, first solution is naive brute-force. You choose length from 1 to N, get a prefix of its length and count occurences by brute-force searching. So, choosing length takes O(N) time and brute force takes O(N ** 2) time, totally O(N ** 3).
Ok, it doesn't really what we want to achieve. Let's think how we can reduce complexity. Anyone who ever learned strings knows that string occurence can be searched in O(N) time, so with a little code we can achieve O(N ** 2) time compexity solution. I used Z-algorithm. Here's the code:
vector<int> z_function(string &S); //z-function, z[0] = S.length()
int calculate(string &S) {
vector<int> z = z_function(S);
int ans = 0;
for (int i = 1; i <= S.length(); ++i) { //iterate on length
int k = 0; //counter of occurrences
for (int j = 0; j < S.length(); ++j)
if (z[j] >= i) ++k;
//update answer
ans = max(ans, i * k);
}
return ans;
}
But it is still not enough though. How we can get rid of nested loop? We can precalc prefix occurences with Z-algo and finding them in constant time. Initially Z-algo returns an array, where z[i] is the length of longest substring starting from S[i], which is also prefix of S. For "abacaba" it will return {7, 0, 1, 0, 3, 0, 1}. And we know that if prefix of S with length i has N occurrences, then all smaller prefixes of S will have at least N occurrences. So, we create an array of occurrences, iterate over z-array, updating new created array and finally iterate over it backwards, updating as explained above.
vector<int> z_function(string &S); //z-function, z[0] = S.length()
int calculate(string &S) {
vector<int> z = z_function(S);
int n = S.length();
vector<int> cnt(n + 1);
//cnt[i] - count of i-length prefix occurrences of S
for (int i = 0; i < n; ++i)
++cnt[z[i]];
//if cnt[i] is in S, cnt[i - 1] will be in S
int previous = 0;
for (int i = n; i > 0; --i) {
cnt[i] += previous;
previous = cnt[i];
}
int ans = 0;
for (int i = 1; i <= S.length(); ++i) { //iterate on length
int test = cnt[i] * i;
//update answer
ans = max(ans, test);
}
return ans;
}
We separately count occurrences in O(N) time and find the answer in O(N) time as wanted.
For me it is the most easiest way to solve this problem. If anyone knows better solution, please kindly share.