Travelling salesman problem

Let's solve the leetcode problem Find the Shortest Superstring.

This is actually the same as the classic Travelling salesman problem. So let's see the what is travelling salesman problem first, and then go back to this leetcode problem later.

Travelling salesman problem

The travelling salesman problem (TSP) asks the following question: "Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city exactly once and returns to the origin city?"

A standard way to solve these problems is to try all possible orders of visiting the n points, which results in a time complexity of O(n!). With the method of dynamic programming, we can has time complexity as O(n^2 * 2^n). Now let's see how the dynamic programming approach works.

Say we have cities as {A, B, C, D, E}. The key is to find the relationship between below f({A, B, C, D}) and f({A, B, C}). If we already know the answer for cities {A, B, C}, how we calculate answer for another new city D? The formula is as below.

f({A, B, C, D}) = min(
  f({A, B, C}, A) + cost(A, D),
  f({A, B, C}, B) + cost(A, D),
  f({A, B, C}, C) + cost(A, D),
)

This formula means that the answer depends on the last element of f({A, B, C}). The last element may be any element in the set {A, B, C}, so we need to check all of them and select the minimum one.

Now we have the formula, next we need to know how to express them in code. The function f has 2 parameters, so we can express them as a 2 dimensional array. But how to express the set structure? Well, we can make use of the trick that expressing set as bits.

Specifically, if we have set {A, B, C, D, E}, then we can express them with index [0, 1, 2, 3, 4]. Then each index correspoding to one bit of the int, from least significant bit to most significant bit. For example, if we want to express set {A, B, D}, then we can have int 0b01011.

With these information, we can move to the leetcode problem.

Find the Shortest Superstring

The goal is to merge strings together to achieve maxmium overlaps. So it is the same as the travelling salesman problem.

First, we need to calculate the overlaps between words, coresponding to the distance between cities.

int calculateOverlap(string& a, string& b) {
  auto maxOverlapSize = min(a.size(), b.size());
  for (auto i = maxOverlapSize; i > 0; i--) {
    if (a.substr(a.size() - i) == b.substr(0, i)) return i;
  }
  return 0;
}

auto wordSize = words.size();
vector<vector<int>> overlaps(wordSize, vector<int>(wordSize, 0));
for (size_t i = 0; i < wordSize; i++) {
  for (size_t j = 0; j < wordSize; j++) {
    if (i == j) continue;
    overlaps[i][j] = calculateOverlap(words[i], words[j]);
  }
}

Next, we go through the dp process.

auto setSize = 1 << wordSize;
vector<vector<int>> dp(setSize, vector<int>(wordSize, 0));
vector<vector<int>> parent(setSize, vector<int>(wordSize, -1));

// for each set
for (auto setKey = 0; setKey < setSize; setKey++) {

  // for each last element in the set
  for (auto i = 0; i < wordSize; i++) {

    // word i is not in setKey
    if (((setKey >> i) & 1) == 0) continue;

    auto prevSetKey = setKey ^ (1 << i);

    // for each new element
    for (auto j = 0; j < wordSize; j++) {
      // word j is not in prevSetKey
      if (((prevSetKey >> j) & 1) == 0) continue;

      auto maxmiumOverlap = dp[prevSetKey][j] + overlaps[j][i];
      if (maxmiumOverlap >= dp[setKey][i]) {
        dp[setKey][i] = maxmiumOverlap;
        parent[setKey][i] = j;
      }
    }
  }
}

Note that we iterate all the words one by one, so some of them may not in the set, so we need to rule out the cases with if (((setKey >> i) & 1) == 0) continue; and if (((prevSetKey >> j) & 1) == 0) continue;

Another thing needed to be noted is that we have another 2 dimensional array here named as parent. This is used to store the best option for specific set. Becuase the dp array can only tell us the maximum overlap value, with this additional array, we can get the best options for each set.

// find last best string
int last = 0;
int setKey = setSize - 1;
for (auto j = 1; j < wordSize; j++) {
  if (dp[setKey][last] < dp[setKey][j]) {
    last = j;
  }
}

// go back
string res = words[last];
while (true) {
  auto prev = parent[setKey][last];
  if (prev == -1) break;

  res = words[prev].substr(0, words[prev].size() - overlaps[prev][last]) +
        res;
  setKey ^= 1 << last;
  last = prev;
}

That's all. Below is the complete code.


class Solution {
 public:
  int calculateOverlap(string& a, string& b) {
    auto maxOverlapSize = min(a.size(), b.size());
    for (auto i = maxOverlapSize; i > 0; i--) {
      if (a.substr(a.size() - i) == b.substr(0, i)) return i;
    }
    return 0;
  }

  string shortestSuperstring(vector<string>& words) {
    auto wordSize = words.size();
    vector<vector<int>> overlaps(wordSize, vector<int>(wordSize, 0));
    for (size_t i = 0; i < wordSize; i++) {
      for (size_t j = 0; j < wordSize; j++) {
        if (i == j) continue;
        overlaps[i][j] = calculateOverlap(words[i], words[j]);
      }
    }

    auto setSize = 1 << wordSize;
    vector<vector<int>> dp(setSize, vector<int>(wordSize, 0));
    vector<vector<int>> parent(setSize, vector<int>(wordSize, -1));

    for (auto setKey = 0; setKey < setSize; setKey++) {
      for (auto i = 0; i < wordSize; i++) {
        // word i is not in setKey
        if (((setKey >> i) & 1) == 0) continue;

        auto prevSetKey = setKey ^ (1 << i);

        for (auto j = 0; j < wordSize; j++) {
          // word j is not in prevSetKey
          if (((prevSetKey >> j) & 1) == 0) continue;

          auto maxmiumOverlap = dp[prevSetKey][j] + overlaps[j][i];
          if (maxmiumOverlap >= dp[setKey][i]) {
            dp[setKey][i] = maxmiumOverlap;
            parent[setKey][i] = j;
          }
        }
      }
    }

    // find last best string
    int last = 0;
    int setKey = setSize - 1;
    for (auto j = 1; j < wordSize; j++) {
      if (dp[setKey][last] < dp[setKey][j]) {
        last = j;
      }
    }

    // go back
    string res = words[last];
    while (true) {
      auto prev = parent[setKey][last];
      if (prev == -1) break;

      res = words[prev].substr(0, words[prev].size() - overlaps[prev][last]) +
            res;
      setKey ^= 1 << last;
      last = prev;
    }

    return res;
  }
};

yaox023's blog

yaox023's blog

Travelling salesman problem

Travelling salesman problem

Find the Shortest Superstring