Official

G - Compress Strings Editorial by en_translator


First, if there exist \(i\) and \(j\) such that \(S_i\) is a substring of \(S_j\), then the condition that “\(S_i\) is contained as a substring” is completely implied by the condition that “\(S_j\) is contained as a substring,” so one can ignore \(S_i\). From now on, we assume there are no such \(i\) and \(j\).

For two strings \(X\) and \(Y\), we define \(f(X, Y)\) as the maximum \(k\) such that the last \(k\) characters of \(X\) coincide with the first \(k\) characters of \(Y\).

Consider a string with the minimal length that contains all of \(S_1,S_2,\dots,S_N\) as substrings, and take a pair \((l_i,r_i)\) so that its \(l_i\)-th through \((r_i-1)\)-th characters match \(S_i\), for each \(i\).

Without loss of generality, we assume \(l_1 \leq l_2\leq \dots \leq l_N\). Then, for all \(i=1,2,\dots,N-1\) it holds that:

  • \(l_i < l_{i+1} \leq r_i\),
  • \(r_i-l_{i+1}=f(S_i,S_{i+1})\).

Otherwise, it contradicts the assumption in the first paragraph or the minimality assumed in the second paragraph.

Therefore, when finding the minimum length of a string that contains all of \(S_1,S_2,\dots,S_N\) as substrings, we only have to consider the following type of strings:

  • for a permutation \((p_1,p_2,\dots,p_N)\) of \((1,2,\dots,N)\), the string is a concatenation of \(S_{p_1}\), the last \(|S_{p_2}|-f(S_{p_1},S_{p_2})\) characters of \(S_{p_2}\), the last \(|S_{p_3}|-f(S_{p_2},S_{p_3})\) characters of \(S_{p_3}\), \(\dots\), and the last \(|S_{p_N}|-f(S_{p_{N-1}},S_{p_N})\) characters of \(S_{p_N}\).

Thus, the original problem is reduced to minimizing \(\displaystyle\sum_{i=1}^{N-1} f(S_{p_i},S_{p_{i+1}})\) for a permutation \((p_1,p_2,\dots,p_N)\) of \((1,2,\dots,N)\).

By precalculating \(f_(S_i,S_j)\) for all pairs \(i,j\) using string algorithms such as Z algorithm, KMP (Knuth–Morris–Pratt) algorithm, or rolling hash, this problem can be solved with bit DP (Dynamic Programming) just as the traveling salesman problem.

Hence, the problem has been solved in a total of about \(O(N\sum|S_i|+2^NN^2)\).

Sample code (C++):

#include <bits/stdc++.h>
#include <atcoder/string>

using namespace std;
using namespace atcoder;

// Does s contain t as a substring?
bool contains(const string &s, const string &t) {
    int n = s.size(), m = t.size();
    vector<int> za = z_algorithm(t + s);
    for (int i = m; i <= n; i++) {
        if (za[i] >= m) return true;
    }
    return false;
}

int main() {
    int n;
    cin >> n;
    vector<string> s(n);
    for (int i = 0; i < n; i++) cin >> s[i];
    for (auto it = s.begin(); it < s.end();) {
        bool flag = false;
        for (auto jt = s.begin(); jt < s.end(); jt++) {
            if (jt != it) flag |= contains(*jt, *it);
        }
        if (flag) it = s.erase(it);
        else ++it;
    }
    n = s.size();
    vector d(n, vector<int>(n));
    for (int i = 0; i < n; i++) {
        string now = s[i];
        for (int j = 0; j < n; j++) {
            if (j != i) now += s[j];
        }
        vector<int> za = z_algorithm(now);
        int cur = s[i].size();
        for (int j = 0; j < n; j++) {
            if (j == i) continue;
            d[j][i] = s[i].size();
            for (int k = 0; k < s[j].size(); k++) {
                if (za[cur + k] >= s[j].size() - k) {
                    d[j][i] = k + s[i].size() - s[j].size();
                    break;
                }
            }
            cur += s[j].size();
        }
    }
    vector dp(1 << n, vector<int>(n, 1 << 30));
    for (int i = 0; i < n; i++) dp[1 << i][i] = s[i].size();
    for (int bit = 1; bit < (1 << n) - 1; bit++) {
        for (int i = 0; i < n; i++) {
            if (~bit >> i & 1) continue;
            for (int j = 0; j < n; j++) {
                if (bit >> j & 1) continue;
                dp[bit | 1 << j][j] = min(dp[bit | 1 << j][j], dp[bit][i] + d[i][j]);
            }
        }
    }
    cout << *min_element(dp.back().begin(), dp.back().end()) << endl;
}

posted:
last update: