公式

Ex - XOR Sum of Arrays 解説 by en_translator


This problem can be solved with an advanced algorithm on sequence called rolling hash. First, let’s take a look at rolling hash.

What is a rolling hash?

A hash is a value \(\text{hash}(d)\) obtained from a data \(d\) and a hash function \(\text{hash}\) (where \(\text{hash}(d)\) has high entropy than \(d\)). Hash is typically used to determine if a pair of huge data has the same contents.

For example, when we have sequences \(A\) and \(B\) of length \(N\) each, consider how we can determine if these coincide. This costs \(\mathrm{O}(N)\) if you compare naively, but if you precalculate \(\text{hash}(A)\) and \(\text{hash}(B)\) (which we assume fit in a \(64\)-bit integer) beforehand, then we can determine it in an \(\mathrm{O}(1)\) time.

A rolling hash is an algorithm to compute the hash of a sequence fast.
Given a sequence \(A = (A_1, A_2, \dots, A_N)\), let us define the hash function \(R\) as follows (where \(p\) is a sufficiently large prime and \(x\) is an integer chosen from \([0, p)\) uniformly at random at runtime). Such a function is called a rolling hash.

\[R(A) = \left(\sum_{i=1}^N A_i x^{N-i} \right) \bmod{p}\]

Rolling hash has some good properties.
One is that, with an \(\mathrm{O}(N)\), we can find the hash of an arbitrary subarray \(A(a, b)\) of \(A\) in an \(\mathrm{O}(1)\) time (where operations \(\text{mod }p\) can be performed in an \(\mathrm{O}(1)\) time). First, enumerate the hashes of \(A(1, 1), A(1, 2),\dots, A(1, N)\) by

\[R(A(1, 1)) = A_1, R(A(1, n)) = R(A(1, n-1)) \times x + A_n (n \gt 1)\]

in an \(\mathrm{O}(N)\) time. Also, precalculate \(x^1, x^2,\dots,x^N\) in an \(\mathrm{O}(N)\) time too.

Then the hash of \(A(a, b)\) can be calculated by

\[R(A(a, b)) = R(A(1, b)) - R(A(1,a-1)) \times x^{b-(a-1)},\]

which can be obtained from the precomputed values.

Another is the collision resistance of the hash function. For a hash function \(H\) and sequences \(A\) and \(B\), the hashes of \(A\) and \(B\) are said to be colliding if \(A \neq B\) and \(H(A) = H(B)\).
A hash function requires an important property that “we cannot easily find \(A\) and \(B\) such that \(H(A) \neq H(B)\).” This is because if such \(A\) and \(B\) can be found with ease, a malicious person (in programming contests, the problemsetter) can let you compare such \(A\) and \(B\) so that you wrongly determine that “we have \(H(A) = H(B)\), so it must be \(A = B\)!” Such a property is called the collision resistance.

For example, consider a hash function \(H(A) = A_1 + A_2 + \dots + A_N\), which is bad. This hash works properly if \(A\) is random, but problemsetters may predict that some of you may use such \(H\) and add a test case that let you compare \(A = (2, 7, 4), B = (1, 4, 8)\); they always collide and lead to a wrong answer, and that’s why it is a bad hash.

A hash function that you use in a programming contest should basically satisfy the following property:

  • no matter how malicious test case the problemsetters prepare, the hash collision does not happen with a very high probability.

It is mathematically proven that the rolling hash satisfies such a condition.

Suppose that the hashes of two distinct sequences \(A\) and \(B\) of length \(N\) each collide. Then the following equation holds:

\[H(A) = H(B) \iff \sum_{i=1}^N x^{N-i} (A_i - B_i) \equiv 0 \pmod{p}.\]

Since the left hand side is a polynomial modulo \(p\) of \(x\) with degree \(N-1\), there are at most \((p-1)\) such \(x\)s. Moreover, \(x\) is chosen uniformly at random at runtime, so the problemsetters are unable to guess \(x\) that program chooses to make the hashes collide. Therefore, if \(N \ll p\), the rolling hash is a hash function with a very low probability of collision.

Solution when \(\text{XOR}\) is replaced by \(+\)

First, let us solve the case where the \(\text{XOR}\) in \(S(B,C)\) is replaced by \(+\), that is, where

\[S(B, C) = (B_1+C_1, B_2+C_2, ..., B_{|B|} + C_{|B|}).\]

Let us define the hash function \(H\) by the rolling hash. Then, for sequences \(B\) and \(C\) with equal length,

\[H(S(B,C)) = H(B) + H(C),\]

so we can find \(H(S(B,C))\) in the same complexity as \(H(B)\) and \(H(C)\).

Based on this fact, we can easily solve this problem. First, perform an appropriate \(\mathrm{O}(N)\) so that we can find the hash of \(A(a, b)\) for any \(a\leq b\) in an \(\mathrm{O}(1)\) time. Then, given each query \((a,b,c,d,e,f)\), we can find the LCP (Longest Common Prefix) of \(S(H(a,b), H(c,d))\) and \(S(e,f)\) with binary search and hash in an \(\mathrm{O}(\log N)\) time, which we can use to find the answer in an \(\mathrm{O}(\log N)\) time.

Therefore, the problem can be solved in a total of \(\mathrm{O}(N + Q \log N)\) time. The probability that the hash function collides can be bounded by \(N Q \log N / p\) because it compares sequences of length \(N\) \(\mathrm{O}(Q \log N)\) times in total. If we let \(p \sim 10^{18}\), we have \(N Q \log N / p \sim 10^{-6}\), which is small enough. (If we evaluate strictly, it may be bounded by \(NQ / p\), but we omit the details here.)

Applying for the original problem

Let’s get back to the original problem. In order to apply the solution above when

\[S(B, C) = (B_1 \oplus C_1, B_2 \oplus C_2, ..., B_{|B|} \oplus C_{|B|}),\]

we need to construct a hash such that:

  • \(H(S(B,C)) = H(B) \oplus H(C)\);
  • \(H(A(a,b))\) can be found fast enough with a proper precalculation; and
  • it has a collision resistance.

There are several ways to construct a hash function; this time, we explain a solution using Nimber.

In Rolling Hash, we used an operation \(\text{mod }p\) to find the hash value. In fact, the properties of rolling hash that “we can find the hash of any subarray fast” and “it has a collision resistance” can be shown by the fact that the operations on \(\text{mod }p\) has an algebraic structure called a field. (We omit the details.)

  • See also: [Commutable field (Wikipedia)]

So let’s try finding a field such that \(\oplus\) is the addition and use it for the rolling hash. If such a field exists, then we can apply the field for the rolling hash and use the solution above to solve this problem.
Nimber is a field whose addition is xor and multiplication is the Nim product, which is a great fit for this problem. (By the way, the Nim product is often used in competitive programming.)

Nim product and Nimber

The Nim product \(\otimes\) is an operation defined as follows:

\[a \otimes b := \text{mex} \lbrace (a' \otimes b) \oplus (a \otimes b') \oplus (a' \otimes b') \vert a' \lt a, b' \lt b \rbrace.\]

Let \(A\) be sets consisting of integers between \(0\) (inclusive) and \(2^{2^n}\) (exclusive) (where \(n\) is an integer), and \(\oplus\) and \(\otimes\) be operations defined on \(A\). Then it is known that the algebraic structure whose addition is \(\oplus\) and multiplication is \(\otimes\) forms a field. Such a field is called Nimber.

Therefore, one can use Nimber for the rolling hash to solve this problem in about \(\mathrm{O}(M (N + Q \log N))\) time, where the \(M\) is the complexity of Nim product, with a very low hash collision probability of \(NQ \log N / 2^{64} \sim 10^{-7}\).

  • This problem has \(86\) cases, so this solution will result in WA (Wrong Answer) with the probability of \(1 - (1-10^{-7})^{86} \sim 10^{-5}\); considering that a contestant will make about \(10^4\)-\(10^5\) submissions, we may consider that is a sufficiently low once-in-a-lifetime probability.

The Nim product can be computed by about \(64\) reference to the precalculation table in a recursive way, which is fast enough. (For more details, see also Sample Code and Article by kyopro_friends (Japanese).)

  • Sample code (C++)
#include <cassert>
#include <cstdio>
#include <ctime>
#include <random>
using namespace std;

using u64 = unsigned long long;
constexpr int N_MAX = 1e6 + 10;
int N, Q;
u64 A[N_MAX], pw[N_MAX], hs[N_MAX], basis, small[256][256];

template <bool is_pre = false>
u64 nim_product(u64 a, u64 b, int p = 64) {
  if (min(a, b) <= 1) return a * b;
  if (!is_pre and p <= 8) return small[a][b];
  p >>= 1;
  u64 a1 = a >> p, a2 = a & ((1ull << p) - 1);
  u64 b1 = b >> p, b2 = b & ((1ull << p) - 1);
  u64 c = nim_product<is_pre>(a1, b1, p);
  u64 d = nim_product<is_pre>(a2, b2, p);
  u64 e = nim_product<is_pre>(a1 ^ a2, b1 ^ b2, p);
  return nim_product<is_pre>(c, 1uLL << (p - 1), p) ^ d ^ ((d ^ e) << p);
}

void init() {
  for (int i = 0; i < 256; i++)
    for (int j = 0; j < 256; j++) small[i][j] = nim_product<true>(i, j, 8);
  pw[0] = 1, hs[0] = basis = 0;
  mt19937_64 rng(time(NULL));
  basis = rng();
  for (int i = 1; i <= N; i++) {
    pw[i] = nim_product(pw[i - 1], basis);
    hs[i] = nim_product(hs[i - 1], basis) ^ A[i - 1];
  }
}
u64 get(int l, int r) { return nim_product(hs[l], pw[r - l]) ^ hs[r]; }
void send(int flag) { printf(flag ? "Yes\n" : "No\n"); }

int main() {
  scanf("%d %d", &N, &Q);
  for (int i = 0; i < N; i++) scanf("%llu", &(A[i]));
  init();
  while (Q--) {
    int a, b, c, d, e, f;
    scanf("%d %d %d %d %d %d", &a, &b, &c, &d, &e, &f);
    --a, --c, --e;
    int l = 0, h = min(f - e, b - a) + 1;
    while (l + 1 < h) {
      int m = (l + h) / 2;
      ((get(a, a + m) ^ get(c, c + m) ^ get(e, e + m)) ? h : l) = m;
    }
    send(e + l != f and (a + l == b or (A[a + l] ^ A[c + l]) < A[e + l]));
  }
}

Other problems we can solve with Nimber

投稿日時:
最終更新: