Single Round Match 750 Editorials

By in Community Stories February 8, 2019

LexOrder

Let’s start by finding out when can the answer be “IMPOSSIBLE”. Clearly, this happens if and only if the string B is immediately after the string A in the lexicographic order. So, this is our question: given a string A, when does the next string exist and how do we find it?

Strings that are bigger than a given string A fall into two categories: strings that start with A, and strings that don’t. For example, if A = “cat”, a string from the first category is “cathode” and a string from the second category is “ceiling”.

It should be obvious that any string from the first category is smaller than any string from the second category. This is because already within the first length(A) characters there is a difference, and that difference is in favor of the string that still matches the original string A. In our example, “cathode” < “ceiling” because on index 1 we have ‘a’ < ‘e’.

Thus, when looking for the next string after A, we have to look among strings that start with A. Each of these strings has the form A + (some suffix). And thanks to how string comparison works, the smaller the suffix, the smaller the whole string will be. Thus, among all strings that start with A the smallest string is A itself, and the second smallest string is obviously A+’a’.

This gives us a full solution that is really easy to implement: Set C = A + ‘a’. If C = B, the answer is “IMPOSSIBLE”. And if it isn’t, you just found a string C such that A < C < B, and you can return it.

def between(A,B):
    C = A + ‘a’
    if C == B: return “IMPOSSIBLE”
    else: return C

Addendum: You may have wondered about the way-too-general phrasing I used in the first paragraph. What’s up with “when does the next string exist”? Isn’t it obvious that in an ordering there is always a next string? Well, it isn’t. And to see why, just look at the same question in the opposite direction. Given that B = “cat”, what is the immediately previous string in lexicographic order?

Another interesting question to ponder: how far is it from “cat” to “dog”? That is, if you start with “cat” and always go to the next string in lexicographic order, after how many steps will you reach “dog”?

VisitN

There were many different ways to solve this task. The limit of 2000 moves is quite generous and allows for many strategies.

For example, one possibility was to simply run a depth-first search (DFS) from the cell where you started. DFS will eventually visit the whole board. During DFS, you will discover each of the 900 cells exactly once, and you will leave it as finished exactly once. Thus, the total number of steps will be below 2000. To get a full solution, you just need to track the number of cells visited so far, and stop as soon as you have discovered n of them.

In our reference solution we opted for a solution with a much simpler implementation. In order to traverse the whole board we first go to the top left corner (by taking r steps north and c steps west) and then we just use a fixed pattern to visit everything (e.g., 29 steps east, step south, 29 steps west, step south, repeat). As in the above solution, this would solve the task for n = 900. If we are given a smaller value of n, all we need to do is to count the number of visited cells and stop as soon as the count reaches n.

public String visit(int n, int r, int c) {
    String visitEverything = "";
    for (int i=0; i<r; ++i) visitEverything += 'N';
    for (int i=0; i<c; ++i) visitEverything += 'W';
    for (int i=0; i<15; ++i) {
        visitEverything += "EEEEEEEEEEEEEEEEEEEEEEEEEEEEES";
        visitEverything += "WWWWWWWWWWWWWWWWWWWWWWWWWWWWWS";
    }

    String answer = "";
    
    boolean[][] visited = new boolean[30][30];
    for (int a=0; a<30; ++a) for (int b=0; b<30; ++b) visited[a][b] = false;

    int visitedCount = 1;
    visited[r] = true;

    for (char step : visitEverything.toCharArray()) {
        if (visitedCount == n) break;
        answer += step;
        if (step == 'N') --r;
        if (step == 'S') ++r;
        if (step == 'W') --c;
        if (step == 'E') ++c;
        if (!visited[r]) { ++visitedCount; visited[r] = true; }
    }
    return answer;
}

Challenge: How would you solve the task with the strictest possible limit of only 899 moves?

IdenticalBags

The key property to observe in this problem is that the answer is monotonous. If we can make K identical bags, we can also make K-1 bags (by filling them in exactly the same way and having more candy left over). Thus, if the answer is X, it forms a threshold: If the number of bags you want to form is less than or equal to X, it can be done, otherwise it cannot.

This suggests that a good way to look for the right X would be binary search. And indeed, the question “can we make K identical bags?” turns out to be much easier than the question “what is the maximum number of identical bags we can make?”

The main thing we gained by rephrasing the question is that once we fix K, we can fill the bags using a very simple greedy algorithm. How many candies of type i can I place into each bag? Obviously, at most (candy[i] div K). Sum this up over all i, and you’ll get one of two outcomes. If each of your bags contains fewer than bagSize candies, the value K is obviously too large, otherwise it’s obviously small enough. And that’s all we needed.

When doing the implementation, we need two more technical details. The first detail is the upper bound for the binary search. Clearly, if you set K > max(candy), you cannot put any candy into your bags. Hence, a good upper bound was max(candy)+1. Alternately, you could hard-wire the value 10^18 + 1, which is guaranteed to be too large.

The second detail was to avoid accidental overflows. This is something that should not happen to a properly implemented binary search, as the smallest value of K you will test will be more than half of the correct X, and thus the number of candies in the bag will be at most twice the required number. Still, some loose implementations may get bitten by something like trying K = 1 for the maximum input and getting a bag with 10^20 candies. An alternative for such solutions is to stop counting as soon as your current bag size exceeds the desired amount. This is also shown (although not necessary) in the implementation given below.

public long makeBags(long[] candy, long bagSize) {
    long lo = 0, hi = 1_000_000_000_000_000_001L;
    while (hi - lo > 1) {
        long med = (hi + lo) / 2;
        long filled = 0;
        for (long c : candy) {
            filled += c / med;
            if (filled >= bagSize) break;
        }
        if (filled >= bagSize) lo = med; else hi = med;
    }
    return lo;
}

SimulateBST

Everyone who has learned how BSTs work should know what their main issue is: if you don’t do any balancing, the operations on them can be slow because some branches of the tree can be very deep, and the time complexity of all operations is linear in the depth of the tree. Thus, the task name is deceiving you a little bit. If you do a straightforward simulation, it will time out.

I was delighted when I came up with the idea of this problem and realized that there are very simple ways to simulate BST insertion efficiently. Of course, we will be “cheating”: we’ll use more powerful data structures to do so. Below, I describe the approach that has the simplest implementation among the ones I know.

The key observation is that whenever we have some BST, we can look at the null pointers in the tree – i.e., the places where a node doesn’t have a child. Each of these is a possible place for the next insertion. And, more importantly, each of these corresponds to a range of values. A sample tree is shown in the figure below, with the places for future addition labeled A through E.

    40
   /  \
 20    60
/  \  /  \
A  B  C   70
         /  \
         D  E

Here, any value in the range (-inf,20) will be inserted at A, anything in the range (20,40) at B, (40,60) goes to C, (60,70) to D, and (70,+inf) to E.

Hence, our very simple way to simulate the BST insertions efficiently will use two data structures:
A map “depths” where the keys are numbers already in the BST, and the value associated with the key is the depth of the node with that number. (We need this to discard duplicates and properly report their depth.)
An ordered map “ranges” where we store the current ranges, and the depth of the root node of each of those. In the figure above, the depth of range A is 2.

Whenever we get a new insertion, we proceed as follows:
Look for it in “depths” to see whether it’s a duplicate. If it is, report its depth and we are done.
In logarithmic time, find the range in “ranges” that contains the new value. Set the depth of the new value to the depth of that range, and split the range into two smaller ones, one deeper than the previous one.

In the implementation I’m using one additional trick: As the beginning of the next range is the same as the end of the previous one, we can store the end of each range implicitly. Thus, the key into “ranges” will be simply the left endpoint of each range. This is how simply the entire insertion is then implemented:

    if (depths.containsKey( S[n] )) {
        D[n] = depths.get( S[n] );
    } else {
        int left = ranges.floorKey( S[n] );
        D[n] = ranges.get( left );
        ranges.put( left, D[n]+1 );
        ranges.put( S[n], D[n]+1 );
        depths.put( S[n], D[n] );
    }

This solution simulates each insertion in O(log n) time, hence its total time complexity is O(n log n).

PurpleSubsequences

Algorithmically, the most interesting part of this problem is probably the one where we need to make sure we only count each subsequence once, even if it has multiple occurrences. How do we do that? A general technique for problems like this one is to pick one “canonical” occurrence of each sequence and only count that one. In this particular problem, the occurrence we’ll pick is the one that would be picked greedily – i.e., the one with the lexicographically smallest sequence of indices.

In fact, it will be quite helpful to imagine the greedy algorithm that checks whether some sequence S is a subsequence of the given sequence A. What does this algorithm do? It looks for the first occurrence of S[0] in A, then for the first occurrence of S[1] that comes later, then for the first occurrence of S[2] that comes after that one, and so on.

We can now mimic this behavior and write a brute-force solution that will generate each distinct subsequence of A exactly once. Call generate(-1, empty sequence) to start it.

generate(last, prefix):
    for i = last+1 to length(A)-1:
        if this is the first occurrence of A[i] we have seen in this cycle:
            generate(i, prefix+[ A[i] ] )

Another formulation of essentially the same algorithm:

generate(last, prefix):
    for each possible value v:
        find the smallest i > last such that A[i] = v
        if that i exists:
            generate(i, prefix+[v] )

The second component of a full solution is a way to tell whether or not our sequence contains an increasing subsequence of length up to 6. The way we’ll use in this solution is based on the online O(n log n) algorithm to find the longest increasing subsequence (LIS). This algorithm processes the elements of the sequence one at a time, and it maintains a state: a vector of values. For each i > 0, the value we store is the smallest possible x[i] such that the sequence we processed so far has an increasing subsequence of length i that ends with the value x[i].

Clearly, the length of the longest increasing subsequence is the largest i for which x[i] is defined. Additionally, the key observation that makes the algorithm efficient is that the values x[i] themselves are always strictly increasing, and whenever we process a new element of the input sequence, at most one of the x[i] changes. Thus, the algorithm can use binary search to find and update the correct x[i] efficiently. (In our solution of this problem, this part will not be necessary.)

Let’s focus on the claim that the values x[i] in any state of the algorithm form an increasing sequence. Why is this the case? The answer is simple: if you can form an increasing subsequence of length 3 that ends in x[3], then by removing its last element you get an increasing subsequence of length 2 that ends in something smaller than x[2].

Exercise: Armed with the observation we just made, prove the claim that whenever you append a new value to the sequence, at most one of the values x[i] changes. Hint: if the new value is v, think about what happens to the first x[i] that is greater than v.

In our problem, the longest increasing subsequence we care about has length L. Thus, as soon as x[L] is defined, we do not care about the exact values x[i], we are already in the state “the current sequence is purple”. And once it is purple, it will remain purple regardless of how many additional elements we append to it. Thus, at any moment during its execution, the LIS algorithm is either in the special purple state, or its state is an increasing sequence of fewer than L integers.

For L=6 and maximum value 20, we get that the total number of states is (1 purple state) + choose(20,1) + choose(20,2) + … + choose(20,5) = 21,700. That is a nice small number.

And now we can bring those two parts together to get a full solution. We will run the recursive function that generates each subsequence of A exactly once, and at any moment we will ask the question: given the prefix we already chose, how many ways to produce a purple subsequence are there? And here we bring in the second part: we do not care about the exact prefix we chose. All we care about is the index of the last value we picked (this determines where we can and cannot go next), and the state of the LIS algorithm after it processed the values we already picked (because when we know the state, we can continue the computation when we pick the next element).

In the above solution, there are at most n * 21,700 distinct calls of the recursive function, and thus we can get it to run within the time limit by applying memoization.

In the reference solution, instead of going for the recursive approach I used a forward-looking version of dynamic programming. In this solution, for each pair (n, LIS state) I count the number of ways in which it can be reached by the recursive generator. As an initialization, for each value v I look at its first index in A and I increment dp[that index][ the sequence [v] ] by 1. Then, I process all pairs ordered by n, and from each pair I find all possibilities for the next step of the recursive generator, and I add the number of ways to get to my current state to the number of ways to get to the new state.

Here is the main body of this program:

for (int n=0; n<N; ++n) {
    for (int s=0; s<S; ++s) {
        // current state: the last number we picked is at index n,
        // and LIS algorithm is in state number s

        // simple optimization: if we couldn't get here, do nothing
        if (dp[n][s] == 0) continue;

        // a helper array to remember which values we've seen
        boolean[] seen = new boolean[MAXVAL];
        for (int i=0; i<seen.length; ++i) seen[i] = false;

        // iterate over all possibilities for the next element
        for (int n2=n+1; n2<N; ++n2) {
            // check whether this is a new value
            if (seen[ A[n2] ]) continue;
            seen[ A[n2] ] = true;

            // retrieve the current state of the LIS algorithm
            // and process the value A[n2] to get the new state
            int[] curr = decode_state(s,L);
            int[] next = LIS_step( curr, A[n2] );
            int s2 = encode_state(next,L);
       	 
            // we just discovered some new ways to reach the state (n2,s2):
            dp[n2][s2] += dp[n][s];
        }
    }
}

The final answer is then simply the sum of dp[n][purple state] over all n.