JOIN
 Match Editorial
2007 TopCoder Open
Round 4

Saturday, April 28, 2007

## Match summary

One hundred and forty-eight of the top algorithm coders gathered in a rumble for one of 48 spots in the 2007 TCO finals. As expected, the competition was vicious, denying three of the top 10 rated coders a place in the finals.

Over recent years, it has often been the case that a single problem was enough to advance through the later online rounds of a TopCoder tournament. This was not the case this time around -- 16 contestants solved all three problems and plenty of coders found themselves below the cutoff line with two problems solved.

liympanda recovered from his recent rating decline in the best possible way, winning the match with the fastest hard despite resubmitting the easy. The main subject of post-match chat, however, was underdog nemesis_nitt, who now boasts a new yellow color thanks to his performance in this tournament, including his second-place finish in this match. Close behind them were usual suspects SnapDragon and tomek.

# The Problems

PolishNotation
Used as: Division One - Level One:
 Value 250 Submission Rate 144 / 148 (97.30%) Success Rate 132 / 144 (91.67%) High Score JongMan for 242.29 points (5 mins 5 secs) Average Score 202.08 (for 132 correct submissions)
It didn't take long for most coders to warm up their dynamic programming circuits, as evidenced by the high success rate on this problem.

The rules in the problem statement describing a valid expression in prefix notation form the backbone of the DP solution:
• If a string is all digits (without a leading unary minus), then there is only one way of interpreting that string in prefix notation, as the integer contained in the string.
• If a string starts with an operator, then it must be composed of two valid subexpressions in prefix notation, as described by the second rule. Moreover, these are independent so we can separately calculate the number of ways of interpreting each of them (this is the dynamic programming part, splitting a problem into smaller subproblems of the same form), and multiply them together. We do this for each possible split point and sum the products.
• Additionally, if the string starts with a minus and all remaining characters are digits, then there is an additional way to interpret it, as a negative integer.
For slick implementations of the above idea, take a peek at JongMan's highest-scoring solution or kalinov's solution.

SortingInIterations
Used as: Division One - Level Two:
 Value 550 Submission Rate 96 / 148 (64.86%) Success Rate 60 / 96 (62.50%) High Score tomekkulczynski for 462.41 points (12 mins 52 secs) Average Score 320.29 (for 60 correct submissions)
With constraints as high as 400000 on all parameters, quadratic solutions were bound to time out, for example on the decreasing-subsequence case, which can easily be generated with a0=399999, X=1, Y=399999, M=400000 and n=400000.

Suppose the distinct numbers appearing in the sequence are sorted and labeled x1, x2, x3 etc. so that x1 < x2 < x3 ... and so on. Here is another way to rephrase the process in the problem:
1. In iteration 1, remove all occurrences of x1 from the blackboard.
2. If x2 appears after the last occurrence of the element removed (initially x1), all such occurrences of x2 can be removed in the same iteration. If all occurrences of x2 have been removed, repeat this step for x3 etc.
3. If not all occurrences of x2 (or one of the later elements, if we got further in step 2) were removed, then we have to start a new iteration to remove them – back to step 1.
Even though a single iteration of step 2 can take long (when it handles multiple elements), if we can implement each operation in it in sublinear time, then the overall running time of the solution will be subquadratic.

This is because each distinct value is considered at most twice – once in step 2 and possibly again in step 1 (if not all occurrences were removed in step 2 and a new iteration was started). If we handle multiple elements in step 2, we will go back to step 1 fewer times.

The final trick is the If x2 appears after condition in step 2. To evaluate the condition, we need to keep track of where numbers are on the blackboard and be able to update this information efficiently. Various structures were used by contestants, for example mapping numbers to lists containing indices where they appear (and keeping these sorted), or having a big sorted list of (number, position) pairs. All of these suffice as long as they guarantee that the amortized operation complexity doesn't become linear on degenerate cases. As described by Ying in this forum post, this operation can even be done in amortized constant time, for a total time running time of O(n+M).

FourSubstrings
Used as: Division One - Level Three:
 Value 900 Submission Rate 62 / 148 (41.89%) Success Rate 39 / 62 (62.90%) High Score liympanda for 775.28 points (11 mins 47 secs) Average Score 549.16 (for 39 correct submissions)
The problem begged for a dynamic programming solution, but it wasn't necessarily obvious what the solution was.

Suppose we scan the text from left to right and, at each character, we contemplate placing substrings there (if any of them match the text). In order to be able to calculate how many of the newly covered letters have already been covered by words placed recently (and shouldn't be counted again), we also need to keep track of how many letters starting from the current character have already been covered. Additionally, we need to know which of the four substrings we've already placed – a 4-bit bitmask will do.

The state space in our DP solution will thus consist of (index, covered, mask) triplets, indicating that we're currently at character index, that the next covered letters have already been covered by previously placed substrings and should not be counted towards the score again, and that strings still to be found in the text are represented by mask. A sample state using the example in the problem statement is (1, 3, 0001), meaning that we're at character 1, have just placed the substring "our" (last bit in bitmask is set), and since we only just did it, the next 3 letters are already covered:
Now, when calculating some function f(index, covered, mask), one option is not to place a substring at the current position and move on to the next character. This leads us to state (index+1, max(0, covered-1), mask). If we place substring i, we go to state (index, max(covered, length of substring i), mask with set bit i). When placing a substring, we don't move to the next character because the problem statement allows us multiple substrings to begin at a single location.

Since we have to calculate two different values (the least and most covered characters), we calculate two DP functions in parallel (say, mini and maxi). Sample code for the calculation function (without the base cases):
```   mini[index][covered][mask] = mini[index+1][max(0, covered-1)][mask];

for ( int i=0; i<4; ++i ) {
if ( (mask&(1<<i)) != 0 || !starts[index][i] ) continue;

int nextc = max( covered, len[i] );
int nextm = mask | (1<<i);
calc( index, nextc, nextm );