Match Editorial

SRM 426
Saturday, November 22, 2008

Match summary

SRM 426 provided an unusual problem set that shook up the usual pecking order. In division I, a relatively straightforward easy problem lead to a medium that required careful implementation and a hard that looked innocuous, but actually provided a real test of competitors' skills in rigorous analysis.

The lack of action during the challenge phase belied a set of submissions that was rich in challenge-fodder and the system tests were brutal, massacring a good proportion of the medium and hard submissions. After the dust settled on the battlefield (and after a short delay due to some technical problems), three hard submissions remained, giving their authors' the top three spots in the division. Congratulations go to yuhch123, pashka and andrewzta who came first, second and third. dskloet and monsoon rounded out the top five.

In division II, the hard problem unfortunately proved too difficult to get a passing solution, but the attempts at na�ve brute force solutions provided some interest during the challenge phase. Fast solutions on the first two problems with a challenge were enough for thinfaifai and siara to take the top two spots, with ludvik, lucasufrn and aydarov coming in positions three to five.

The Problems

KnockoutTourney

Used as: Division Two - Level One:

Value 250

Submission Rate 737 / 910 (80.99%)

Success Rate 539 / 737 (73.13%)

High Score atemplar for 249.99 points (0 mins 8 secs)

Average Score 181.71 (for 539 correct submissions)

As with most easy division II problems, the solution here is simply a case of simulating the process. In each round, every competitor in an odd numbered position of the list is paired against the competitor 1 position higher, so if at any point you and your oppenent are in such a pair of positions, then you will meet in the current round. Otherwise, you need to calculate your position in the list for the next round. This is simply a case of noticing that if your position is p (1 based indexing), then exactly p/2 competitors (rounded down) before you in the list will be eliminated each round. Your position for the next round is therefore p - (p/2). The situation is exactly the same for your rival. The problem lends itself nicely to recursive solution. Note that the total number of competitors N is actually irrelevant.

ShufflingMachine

Used as: Division Two - Level Two:

Value 500

Submission Rate 148 / 910 (16.26%)

Success Rate 123 / 148 (83.11%)

High Score thinfaifai for 416.46 points (13 mins 17 secs)

Average Score 258.33 (for 123 correct submissions)

Used as: Division One - Level One:

Value 250

Submission Rate 542 / 653 (83.00%)

Success Rate 516 / 542 (95.20%)

High Score Petr for 243.20 points (4 mins 46 secs)

Average Score 166.50 (for 516 correct submissions)

First of all, lets call all positions listed in cardsReceived good, and all positions not in cardsReceived will be bad. Lets start from the easiest possible case - K = 1, when the player is interested in only one card. We need to determine which position we should place that card before shuffling. The most natural brute-force way to do that is to check all possible positions, do maxShuffle shuffles for each position. The quality of position X will be equal to the number of times that the card from position X will appear at good positions after one of those maxShuffle shuffles. Obviously, if we are interested in only one card, we place it at the position with the highest quality. The probability of getting this card will be equal to Q / maxShuffle, where Q is the respective quality.

The situation is very similar when we have two, three, or more cards - since the quality of each position is independent from qualities of other positions, we just pick two (three, or more) positions with highest qualities and place the cards we are interested in on those positions. The expected number of cards we get will be equal to the sum of all qualities divided by maxShuffle:


    public double stackDeck(int[] shuffle, int maxShuffles, int[] cardsReceived, int K) {

        int N = shuffle.length;

        int[] total = new int[N];

        int[] order = new int[N];

        for (int i = 0; i < N; i++)

            order[i] = i;

        for (int i = 0; i < maxShuffles; i++) {

            int[] cpy = order.clone();

            for (int j = 0; j < N; j++)

                order[shuffle[j]] = cpy[j];

            for (int j = 0; j < cardsReceived.length; j++)

                total[order[cardsReceived[j]]]++;

        }

        Arrays.sort(total);

        int sum = 0;

        for (int i = N - K; i < N; i++)

            sum += total[i];

        return sum / (double)maxShuffles;

    }

DistinctDigits

Used as: Division Two - Level Three:

Value 1000

Submission Rate 34 / 910 (3.74%)

Success Rate 0 / 34 (0.00%)

High Score null for null points (NONE)

Average Score No correct submissions

This problem looks quite scary at the first glance. Fortunately, one may notice the answer for example 5 (1000, 10000000) is only 19 thousands, so the maximum total number of distinct numbers shouldn't be much bigger. This means we can easily check all possible non-increasing sequences and check whether each of them can represent a number between low and high. Note: Unlike in many similar problems, the answer in this one is not equal to the total number of distinct numbers between 1 and high minus the total number of distinct numbers between 1 and low.

To go through all possible sequences we can use a DFS on a graph, where each vertex is a pair (sequence S, digit D). You start from a pair (empty string, digit 0), and on each step we have 3 options: stop and check the current sequence S, append the digit to the sequence and continue to state (S + D, D), or move to the next digit and continue with state (S, D + 1). Now we need to solve the harder part of the problem - to check whether a sequence S can be reordered to represent a number X between low and high.

First we take care of the most obvious cases - when the length of sequence is not equal to lengths of low and high. Having done that, the problem is down to one of the following three subproblems:

Subproblem A1. Given a sequence of digits S and a number X determine whether it is possible to reorder the sequence such that the result will be not less than X. This problem can be solved by greedy approach - build a number T by picking the biggest number in S as the first digit of T, the second-biggest number as the second digit, and so on. Since T is the biggest number we can build from S, then the subproblem A is solvable if and only if T >= X.

Subproblem A2. Given a sequence of digits S and a number X determine whether it is possible to reorder the sequence such that the result will be not greater than X. Can be solved similarly to A1, but we build the smallest possible number T and make sure to avoid leading zeroes in T.

Subproblem B. Given a sequence of digits S, numbers low and high of the length equal to S, determine whether it is possible to reorder the S to receive a number T, such that low <= T <= high. First we need to make sure that the first digits of low and high are distinct. If both low and high start from a digit d, then we discard d from S, low, and high, and continue solving the problem with the smaller params (for example, if low = 12345, high = 16177 and S = "22341" then we discard '1' and continue with low = 2345, high = 6177 and S = "2234"). Of course, if S does not contain d, then we already know that the answer to the subproblem is negative. So, when the first letters of low and high are distinct, we may meet one of three following possibilities (let x and y be the first digits of (low and high, respectively):

S contains a digit d, x < d < y. Then we put d as the first character of a number T, appending it with all other chars of S in any order. Obviosly low < T < high, so the subproblem is solved, and the answer is positive.
All chars of S are either less than x or greater than y. Obviously, the subproblem is solved as well, but the answer is now negative.
S contains a digit d1 = x or / and digit d2 = y, with all other digits being either less than x or greater than y. In this case, we have two choices: use digit d1, discard it from S and from low, and continue with subproblem A1, or use d2 and continue with subproblem A2. Fortunately, we already can solve the subproblems A1 and A2 in a linear time, and the answer to this subproblem will be positive if answer to one of two cases is positive.

CatchTheMice

Used as: Division One - Level Two:

Value 500

Submission Rate 278 / 653 (42.57%)

Success Rate 123 / 278 (44.24%)

High Score monsoon for 453.85 points (9 mins 15 secs)

Average Score 295.14 (for 123 correct submissions)

This problem admitted a very wide number of different solutions, but all of them were a little tricky in implementation. While few successful challenges were registered on this problem, there were a large number of challengeable solutions taken down by the system tester, with most failing due to either time-out or precision issues.

Before delving into the solution methods, there are a few general points to make. We can define a function f(t), which is the largest cage that cannot capture all the mice at time t. It's quite easy to see that this is the larger of the largest difference in x-coordinate and the largest difference in y-coordinate between any pair of mice. We can therefore write:

f(t) = max_i,j( max(x_j(t)-x_i(t), y_j(t)-y_i(t)) )

Now, since all the coordinates are simple linear functions of time, any difference between coordinates is also a linear function of time, so we can write f(t) as:

f(t) = max_k( A_k * t + B_k )

where A_k and B_k are given by the pairwise differences in the given values in velocity and position for each mouse.

There are a few things to notice about this function. Firstly, it must be piecewise linear, such that any minimum can only happen at an articulation point of the function. Secondly, it is the maximum across a set of convex functions (linear functions are non-strictly convex), and therefore must be convex itself. Since we are looking for the largest cage that cannot capture the mice at any point in time, we are looking for the minimum of f(t) in time.

Converging Solutions

Ternary Search

The second point above immediately leads us to possibly the most obvious solution method. We are looking for the minimum of a function which is known to be convex, so ternary search in time applies. f(t) is very easy to calculate at a point in time, so the code ends up being fairly compact (see pashka's code for an example). However, there is a pitfall with ternary search: precision. This actually stems from the method chosen by most competitors to calculate the value of f(t), which was to calculate min_i(coordinate) and max_i(coordinate) and take the difference between the two. Why is this problematic? Because the absolute values of the coordinates can be relatively large and when you take the difference between two large numbers, the absolute error propagates not the relative error. This means that a small relative error in time, and therefore a small error in the calculated position can become a much larger relative error in the calculated answer. This meant that many competitors placed convergence criteria on time, leading to the solution failing due to the larger error in the answer. Possibly the unluckiest failure was that of neal_wu, whose solution performed a single iteration too few to converge enough for the system tests. Note that if the value of f(t) is instead calculated using the formula above, the delta is taken when all the values are integer and can therefore be represented exactly. This precision issue therefore disappears and a relative error in time will be of the same order as the relative error in the answer.

Binary Search

A different approach ignores f(t) entirely and asks, "given a cage size, is it possible to capture all the mice at any point in time?" This function is clearly monotonic, since any cage smaller than the desired answer will never be able to capture all the mice and any cage larger than the answer will definitely be able to. Binary search therefore applies, we just need to work out some way of answering the question.

The trick here is to notice that for some given cage size L, at the exact point in time when it transitions from being possible to being impossible (or vice-versa) to capture all the mice, two mice must be exactly a distance of L in either x- or y-coordinate (if such a time exists). This is quite easy to see, since beforehand, the maximum delta must be less than L and afterwards it must be greater than L, so at that exact moment, it must be equal to L. We can therefore check all pairs of mice, look at the point in time when they are exactly L apart and check whether all the mice fit in a cage of size L at that time. For an example of this approach, see yuhch123's code. Notice that this is slightly different in that he calculates the range of time when each pair of mice is a distance of exactly L and checks that the intersection of all these ranges is not empty. This is somewhat quicker in that the complexity of each binary search iteration is O(N²), rather than O(N³).

Non-converging Solutions

Na�ve O(N⁵)

The most na�ve exact solution would be to check all possible articulation points of f(t) and take the minimum over all of them. Since there are O(N²) values of A_k and B_k, and the articulation points can ony happen at an intersection of two lines, we can try all O(N⁴) intersections and if we work out the value of f(t) in O(N), we get an overall complexity of O(N⁵). This will cleary require some optimization to run in time, but is apprently possible. See ainu7's code for an example.

Better O(N³) solution

A better solution instead separates f(t) into g(t) = max_i,j( x_j(t)-x_i(t) ) and h(t) = max_i,j( y_j(t)-y_i(t) ) and calculates each one separately. There can only be O(N²) articulation points in each of h(t) and g(t), so we can examine them all in O(N³). The overall minimum of f(t) must then happen at either some articulation point of h(t) or g(t), or some point at which h(t) = g(t). These points where the functions are equal can be calculated by combining all the times of all the articulation points together, sorting them and looking for the functions intersecting in each range between any two adjacent times. Since the functions are both linear within these ranges, this is just the intersection of two straight lines. See Gassa's solution for an example.

O(N² * log (N))

I introduce this since it is the way the reference solution works and is a tidy way of explicitly formulating f(t). It is not the most efficient way, but it could also lead to an O(N * log (N)) solution, which I think would be hard to beat asymptotically with an exact solution. If you can reduce it to O(N), then present your solution on the message boards!

Since f(t) is convex, the piecewise linear segments must appear in increasing order of gradient. If we calculate all the values of A_k and B_k and then sort them by A_k, we therefore know the order in which they will appear. We can therefore build up f(t) one segment at a time, maintaining the current solution on a stack. For each line, we compare look at the last line segment added to the stack. If this segment lies completely under the next line we are considering, then we pop it from the stack and discard it and repeat with the new segment at the top of the stack. We then add the next line with the new segment starting at the point at which it intersects the last one on the stack. Since each line only gets added to or removed from the stack once, this process is linear in the number of lines. In c++, this process looks like:


int N = xp.size();

vector <pair <long long,long long> > lines;



for (int i=0;i<N;i++)

  for (int j=0;j<N;j++)

    if (i != j){

      lines.push_back(make_pair(xv[i]-xv[j],xp[i]-xp[j]));

      lines.push_back(make_pair(yv[i]-yv[j],yp[i]-yp[j]));

    }

  

sort(lines.begin(),lines.end());



// line contains the current f(t) 

stack <pair <pair <long long,long long> ,int> > line;



for (int i=0;i<lines.size();i++){



  if (i+1 < lines.size() && lines[i].first == lines[i+1].first) continue;



// Pop from the stack until the next line intersects the last segment

  while (!line.empty()){

    int j = line.top().second;



    if (line.top().first.first * (lines[i].first-lines[j].first) 

      < line.top().first.second * (lines[j].second-lines[i].second))

        break;

    line.pop();

  }



  if (line.empty())

    line.push(make_pair(make_pair(0,1),i));

  else {

    int j = line.top().second;

    line.push(make_pair(make_pair(

             lines[j].second-lines[i].second,lines[i].first-lines[j].first),i));

  }

}



// Work back through the stack, checking articulation points to find the answer

double ret = 1000000000.0;

while (!line.empty()){

  int j = line.top().second;

  ret = min(ret,lines[j].second 

         + 1.0 * lines[j].first*line.top().first.first/line.top().first.second);

  line.pop();

}



return ret;

O(N * log (N)) and beyond

Of course, the above trick could also be used to calculate min_i(coordinate) and max_i(coordinate) in O(N * log(N)). For x and y, these piecewise linear functions could then be combined in O(N * log(N)) or O(N) to calculate f(t) and g(t) using a similar method as that described in the O(N³) solution. f(t) and g(t) can then again be combined in almost exactly the same way to get the final answer in O(N * log(N)) overall.

If anyone is feeling masochistic, then try implementing this solution in the practice room (this is likely to be quite ugly). Alternatively, if you can think of an asymptotically faster solution then let us know on the mesage boards! And no, you can claim O(N) with a constant number of ternary search iterations, since this is not really constant, but has to be proportional to log(input dimension) to give the constant precision required in the answer.

LongStraightRoad

Used as: Division One - Level Three:

Value 1000

Submission Rate 41 / 653 (6.28%)

Success Rate 3 / 41 (7.32%)

High Score yuhch123 for 507.57 points (36 mins 32 secs)

Average Score 424.49 (for 3 correct submissions)

These were many conditions that competitors had to check in the solution of this problem, but only one provided any real difficulty: ensuring that the given ordering of the signs is feasible. It's difficult to know for sure, but I would guess that many competitors implemented a na�ve search, only to find it fails that annoying last example case, onto which is attached the innocent-looking statement, "There is no way the signs could be in this order." Determining if the given ordering of signs was possible required some tricky detective work.

So what is the na�ve solution? Simply generate a graph with a node for each sign and each place, with edges between any sign and any place that it references, giving the difference in position between the two. Then check whether all paths between any two nodes are the same length. This will show whether the sign and place positions can be consistently represented. This check can be carried out any number of ways: DFS, BFS, Floyd-Warshall, etc. This will also tells us whether we have enough information to know how far to the final destination: this is equivalent to checking that the destination and the last sign in the list are in the same component.

The problem here is that while we can then check that the ordering within each component is valid, we can't tell whether we can fit the components together in a way consistent with the ordering. The trick here is to rewite all the constraints that we have as inequalities. At the start, it helps the analysis to relax the constraints on the ordering to make it a non-strict ordering (i.e., 2 signs are allowed to be colocated). We will reintroduce the strict ordering constraint at the end. We can then write all the constraints in a single set of inequalities:

P[i][j] <= x[j] - x[i] <= Q[i][j]

If i is a sign and j is a place, then P[i][j] = Q[i][j] and we end up with an equality constraint as required. If i and j are both signs, with i < j, then P[i][j] = 0 (remembering that we relaxed the strict inequality for the moment) and Q[i][j] is infinite (just set it to some very large value). Notice that these inequalities then represent all the constraints we have put on the positions of the signs, so if we can find a solution that satisfies all of these (and the strict inequalities on top), then we have an answer: the signs are consistent. The next trick is then to examine the inequalities in P and Q separately. We want to find some set of differences that satisfies all the inequalities simultaneously. Noticing that:

P[i][k] <= x[k] - x[i]

and

P[k][j] <= x[j] - x[k]

together imply that

P[i][k] + P[k][j] <= x[j] - x[i]

it is clear that the overall value of x[j] - x[i] has a lower bound that is given by the length of any path between i and j. This lower bound must therefore be at least as long as the longest path between the vertices. Notice that there can be no cycles in our graph with non-zero length (or it would be inconsistent anyway), so this longest path is well-defined. Now what happens when we try to set the distances to be the longest path between any pair of vertices? We actually get a solution that meets all the constraints in P. More importantly, we get a solution where each distance is as small as possible. We can therefore see if this solution satisfies the constraints given in Q and since all the inequalities in Q go the other direction, if one of them is violated by our P solution, then we find that we would need to have a solution with a smaller distance between two vertices. However, we have just shown that the solution we have contains the minimal distances: there is no valid solution with any smaller distances, so no solution can exist that can satisfy all the constraints simultaneously.

Our proof of existence of a solution is therefore constructive: we have a solution, which we can show that if it is invalid, no valid solution can exist. Similarly, we could have done eactly the same thing with Q using the shortest path between any two vertices. In fact, the check of validity comes down to checking whether the final matrices of shortest and longest distances describe valid ranges for each pair of vertices.

But what of the strict inequality constraint? We have to ensure that we can construct a solution in which x[j] - x[i] > 0 for every sign where j > i. This is actually trivial now that we have constructed the solution that gives the maximum values for x[j] - x[i] which satisfies all the other constraints. If this solution gives the maximum value as greater than 0, then we also have a solution that satisfies the strict inequality. Otherwise we would again have to violate our maximum value constraint to get a solution.

Since the shortest and longest distances can be calculated in 4 or 5 lines using Floyd-Warshall, the final solution can be very short and simple for such a tricky problem analytically. See pashka's solution for a very tidy implementation, although notice that he uses doubles with a small tolerance to get around the strict inequality issue.

By Olexiy
TopCoder Member

By StevieT
TopCoder Member