JOIN
Get Time
statistics_w  Match Editorial
SRM 301
Tuesday, May 9, 2006

Match summary

This was the first competition after the TopCoder Open finals. Some of the TCO finalists tried for a rematch, while the coders who hadn't made it to Vegas finally got the chance to compete after almost two weeks.

In division 1, the problem set seemed a little strange one: it began with a 250 that was not that easy, followed by a simple straightforward 450 in the medium spot, and then a really difficult and tricky 1,000 pointer. The final positions were not clear until the end of the system tests, with more than 25 submissions of the hard problem failing at that point. The challenge phase also proved crucial, with lots of points gained that substantially affected final scores.

misof won the competition with good performances in medium and hard problems, a not-so-bad one in the easy one, and the help of two successful challenges. A little more than 50 points behind him, the TCO champion Petr showed that he always has the resources for a comeback. After resubmitting the first problem and finishing the coding phase out of the top 10 — something he is not used to — Peter got 5 successful challenges, and just one unsuccessful one, to secure second place. Finishing a little more than 200 points behind them was a pack of 5 other experienced red coders. AdrianKuegel, fuwenjie, tmyklebu, andrewzta and John Dethridge completed the exclusive top 7 that correctly solved the hard problem.

In division 2 things were similar, with only 2 coders correctly completing the hard problem to get the first two spots, without getting the medium, and the rest trying to correctly get the tricky medium after quickly solving a pretty simple easy one.
it4.kp finished with the victory thanks to his fast solution of the easy one and the highest-scoring surviving hard solution. The rookie cae, with the other correct submission on the hard problem, easily took second place over another first-timer, DeatH FaNG, who managed to get the bronze with the fastest submission on the medium and a pretty quick easy one.

The Problems

InsertionSortCount rate it discuss it
Used as: Division Two - Level One:
Value 250
Submission Rate 313 / 348 (89.94%)
Success Rate 256 / 313 (81.79%)
High Score Patriot for 249.23 points (1 mins 35 secs)
Average Score 203.39 (for 256 correct submissions)

The simplest to think approach to solve this problem was to simulate the algorithm (translating the pseudo-code given in the problem statement to your favorite language), simply adding a counter for keeping the number of moves updated and return it at the end.

But, as in most cases, there is an easier, less error-prone and shorter and therefore faster to code implementation. If you can come up with it fast enough, the difference in coding will even be worth the time lost when thinking it.

The first thing we have to ask ourselves when looking at this problem is: When is a number going to be moved? If we solve that, the implementation will be straightforward. There are two ways to look at it that leads to the same final code:

1. When inserting ith number, which numbers will move?
2. When is jth number going to be moved?

The answer to the first question is exactly the number of elements greater than the ith element that appear before it, because they are going to be in R by the time the ith element is added to it

The answer to the second question is once for each smaller number that is inserted after it. The elements that come before can't move it, because it is not even in R yet, and the greater elements that come after in the input are not going to move it, because they are inserted in R after it.

Both answers give us the final idea: The number of moves is the number of pairs (i,j) i<j such as A[i]>A[j]. Implementation for this is really simple:

int r=0;
for(int j=0;j<n;j++)for(int i=0;i<j;i++) if (A[i]>A[j]) r++;
return r;

For more about Insertion Sort and other sorting algorithms you can read the sorting tutorial.

IndicatorMotionReverse rate it discuss it
Used as: Division Two - Level Two:
Value 500
Submission Rate 152 / 348 (43.68%)
Success Rate 25 / 152 (16.45%)
High Score rajkon for 298.58 points (27 mins 37 secs)
Average Score 221.86 (for 25 correct submissions)
Used as: Division One - Level One:
Value 250
Submission Rate 311 / 347 (89.63%)
Success Rate 192 / 311 (61.74%)
High Score Kawigi for 222.82 points (10 mins 10 secs)
Average Score 151.63 (for 192 correct submissions)

For participants who took part on SRM 298 or tried the division 2 easy problem for that challenge in the practice room, this one may have sound familiar. Actually, what we have here is the calculation of the inverse function of the one asked in that problem. But this case was a little trickier.

The first thing to note here is that given the actual state and the following one, there is exactly one instruction that can make that conversion (each one of the four instructions gives a different next state in all cases), so the sequence of plain instructions (without the repetition counter) is unique and can be easily calculated by iterating the states and having a nextState function that given two characters returns the instruction needed. This function can be implemented with a bunch of if's, a precalculated table or making use of the order that states have and their relation with the instructions. For details about this, check SRM 298 editorial for the division 2 easy problem.

With that sequence calculated, all we need is the shortest and lexicographically first program taking reptitions. Of course we will have all consecutive appearances of the same instruction together to have a shorter program. If there are n>99, we have to split between several instructions as in example 4, so we proceed as follows: We need at least n/99 (rounded up) instructions, so we will have exactly that amount to keep the program as short as possible. Since the first one is the one that counts for the lexicographic comparisson, we'll make that instruction the one with less number of repetitions (because smaller numbers come first lexicographically than bigger ones). Seeing all this, is clear that all but the first of the generated instructions will have a 99 as counter, and the first one will have exactly the rest (which may also be 99 if the number of repetitions is an exact multiple of 99).

When all this is done, the only thing needed at the was to check if the return program has more than 100 characters and adjust the length according to the problem statement.

See Kawigi's solution for a clean implementation.

CorrectingParenthesization rate it discuss it
Used as: Division Two - Level Three:
Value 1000
Submission Rate 72 / 348 (20.69%)
Success Rate 2 / 72 (2.78%)
High Score it4.kp for 690.55 points (21 mins 7 secs)
Average Score 677.07 (for 2 correct submissions)

What we have here is a classical dynammic programming problem. When seeing just one string as input and some counting as output, usually DP is the way to go. In division 2, maybe some coders are not that used to it, but this problem is classical and may be one excellent chance to learn how to apply it, because the idea and implementation is quite clear. In the rules for making a well formed string given in the problem statement a string is "built" based on other smaller strings. This already has a smell to a recursive function. Also, since every rule use substrings (consecutive sets of characters included in the string to be formed) and a string has O(n2) substrings (for an introduction to the big O notation, check the algorithm complexity tutorial), this function has a very restricted domain (50x50=2500 at most), so memoization in a table will be quite possible.

First let's try to see whether s is a well formed string and that will lead us to the solution. Let's call f(i,j) to the functiont that says wether the substring that starts in ith character of s and ends in the jth character is a well formed string for each i,j.
if i>j, f(i,j) = true, because we are representing the empty string. Otherwise:
f(i,j) = ( f(i+1,j-1) AND enc(s[i],s[j]) ) OR there is a k, i<k<j such as f(i,k) AND f(k+1,j)
where enc(c,d) will be true only when c is an opening character and d a matching closing one. This is easy to implement as a recursive function, but what we need is the number of characters to correct. All we need to see is that, for each case in the above specification, we can check how many characters are needed to change to accomplish it. In the first case, it depends on s[i] and s[j], can be 0 if they match, 1 if they don't but at least one of them "points" to the inside, so changing the other will make them match, or 2 if both are pointing outside. In the other cases, is simply f(i,k) + f(k+1,j). So we take the minimum between this options and we are done. Also note that you always need (j-i+1) even, because an odd string is never a correct parenthesization. Implementation in C++ follows:

int mem[51][51];
string s;
int enc(char c, char d) {
   string p="([{)]}";
   if (p.find(c)/3>0 && p.find(d)/3<1) return 2;
   if (p.find(c)%3==p.find(d)%3 && c!=d) return 0;
   return 1;
}
int calc(int i, int j) {
   if (i>j) return 0;
   if (mem[i][j]>=0) return mem[i][j];
   int r=calc(i+1,j-1)+enc(s[i],s[j]); //enc returns 0, 1 or 2 as stated above
   for(int k=i+1;k<j;k+=2) r = min(r,calc(i,k)+calc(k+1,j));
   mem[i][j]=r;
   return mem[i][j];
}
int getMinChanges(string s) {
   this->s=s;
   for(int i=0;i<s.size();++i)for(int j=0;j<s.size();++j) mem[i][j]=-1;
   return calc(0,s.size()-1);
}
EscapingJail rate it discuss it
Used as: Division One - Level Two:
Value 450
Submission Rate 297 / 347 (85.59%)
Success Rate 226 / 297 (76.09%)
High Score reid for 444.87 points (3 mins 3 secs)
Average Score 368.54 (for 226 correct submissions)

Here we have prisoners and chains between them, and, if familiarized with graphs, this is the first thing that comes in mind when looking a scenario like this one. We can model the problem by having a graph with a node for each prisoner and an edge for each chain labeled with the length of the chain. There will be no edge between nodes representing unchained prisioners.

Now, the next step is to know what limits the maximum distance two prisoners can achieve. It should be clear from the first example that is not necessarily the length of the chain between them, because the sum of lengths of a "chain path" from one to the other could be shorter. This thought says it all: What limits their distance is a path of chains from one to the other. Of course it will be the shorter of all paths between a pair the one that have the upper bound we are looking for.

To find such shortest path in a graph there are several algorithms, but the one that fits better here is Floyd-Warshall all pairs shortest path algorith. Firstly, because we need the shortest path between all pairs to find the best one, and secondly, because it is the easiest and shortest to implement. With all the shortest paths (ie. upeer bounds) calculated, we simply iterate over all pairs to find the longest and return it. If there is a pair of prisoners that does not have any path connecting them (represented by a path of "infinite" length in the simpler implementation of Floyd-Warshall), it means they can achieve any distance, so we return -1.

For a simple implementation of this see reid's solution.

More formally, what this problem is asking to find the diameter of a graph, which is exactly the longest of all shortest paths between all pairs of vertices. For articles about this and other subjects in graph theory, check http://mathworld.wolfram.com/GraphDiameter.html.

ContextFreeGrammars rate it discuss it
Used as: Division One - Level Three:
Value 1000
Submission Rate 36 / 347 (10.37%)
Success Rate 7 / 36 (19.44%)
High Score misof for 662.35 points (22 mins 54 secs)
Average Score 509.11 (for 7 correct submissions)

This problem was a really difficult one, only 7 coders manage to correctly solving it, all experienced red coders with over 2500 rating points.

There were two approaches that i know of to solve it, one that is pure dynammic programming with a couple of "happy ideas" and another one (that misof told me after the end of the match) making use of language theory. I will explain the first approach because theory background is not needed to understand it and add some comments on the second approach at the end.

This approach consists, as most dynammic programming, in defining a recursive function possible to memoize in a table, that solves the problem. If you are familiarized with DP, and you should in order to take this problem, you know that once you get the recursive function in order, all the rest is just carefull dumb programming.

As string parsing usually does, what we need is a function over all the substrings of word. Since every substring can be defined with a start and end index (i will call them i and j), we have less than 35x35=1225 substrings to cover. We have plenty of space left for more parameters. When seeing the trees on the examples, the option comes to mind. If we add a char seed as an aditional parameter we have our function.

f(i,j,c), then, will say the number of parsing trees that the substring between i and j has using c as a seed. Of course, f(0,n-1,seed) it's the final answer to the problem, where n is the size of word. Also, f has a domain of size 35x35x26=31850, which is perfectly fine for the memory limits.

So, how to recursively define f(i,j,c)? Let's call s to the substring of word that goes from index i to index j. First, we have to try using all rules that have c in its left side. For each of this rules, r, we need to "break" s into little substrings to assign to each letter in the right side expression of r. We are actually matching s into r, with uppercase letters acting as variables and lowercase letters as constants. Of course, if r is made only with a couple of nonterminals, for example, the number of possible partitions (ie. ways to match it) is clearly exponential, and we will run out of time. But this part sounds a lot like another well known dynammic programming problem, partition a string into substrings and do something with them. Since what we have to do with each part is independent, we can just mantain an index on s (or the right side of r) and do it recursively.

This leads to the definition of a new function g(i,j,r,p) where r is a rule of rules (after correctly parsing it into all rules without pipes) and p is a position on it. At first glance it will seem the domain for this function is 35x35x500x35 aproximately, because there can be close to 500 rules and the limit on the length of each rule is 35 (rules that have an expression with more than 35 letters are clearly useless). But, the number of positions over all rules can never be greater than 2500, because that's the maximum number of total characters of rules, so the domain of this function is actually bounded by 35x35x2500=3062500, which is small enough to fit in topcoder memory limits. So to memorize g we will use a 3 dimensional array with the first two dimensions having size 35 for i and j, and the other one having size 2500 to memoize r and p together.

For details on how to map r and p to a number, see the code below. You can also do this part by having two dimensions and variable size arrays.

Now, we define f and g in terms of smaller versions of themselves:

f(i,j,c) = sum over all rules r that have c as left side of g(i,j,r,0)
g(i,j,r,p) {
   if (p>=size of r) {
      if (i>j) return 1 else return 0
   }
   if (i>j) return 0
   let RS be the string on the right side of the rule r
   if (RS[p] is lowercase) {
      if (RS[p] == word[i]) return g(i+1,j,r,p+1) //advance to next letter
      else return 0 //failed match
   } else {
      //we decide which substring starting in i
      //we assign to the current nonterminal RS[p]
      res = 0
      for k from i to j, inclusive:
         res = res + f(i,k,RS[p]) * g(k+1,j,r,p+1)
      return res
   }
}

If we memorize the results of this two functions, we'll be on our way. The total number of calls to the first function is it's domain size 31850 multiplied by the length of one run, that is bounded for the maximum number of rules 500, in total, in the order of 31850x500=15925000 total operations are made to calculate all calls of f.

The cost of calculating all calls of g are bounded in a similar way by 3062500x35=107187500, taking into account the generosity on the rule bounds, is not bad.

If we get rid of f and try to use only g, the running time of each call will grow because we wont be memoizing the repetition of sums over each letter, and we'll go over the time limit.

Now that we have functions that do the work and run in time, all we have to do is do the boring part of writing the C++/Java/VB/C# code and parse that annoying input. I won't get into details on how to do this because there are several ways in each of the languages, all with advantages and disadvantages.

To have a little summary of the "happy ideas" used, there were basically three:
1. The most basic one, use DP and indexes to represent substrings and add some extra parameters needed to count parsing trees.
2. Use two mutually dependent functions, neither one of them is ok by itself.
3. Memorize g correctly, to get it to fit in memory limits. Multiplying the range on each parameter is sometimes an upper-bound too high.

I will copy-paste my original solution in Java now, even though it's a little long, because it's the easier to follow with this editorial (because both were written by the same person). You are more than welcome to check the code from the 7 coders that correctly solved the problem and see their implementations. Some are clear enough to look at, and a couple are just-to-be-written-not-read code. I think many of them use the approach presented in this editorial, even though they do not have the two functions presented here defined right there, but the background idea is there.

char w[]; //word
int offset[]; //to map r,p into a number
int endR[], iniR[]; //init and end of the rules with each letter in the left side
char expr[][]; //expressions of each rule
static int maxRules = 600;
static int maxChars = 26;
static int maxCharsRules = 46;
static int maxPosRules = maxCharsRules*50;
static long LIMIT = 1000000000;
   
boolean term(char c) {
   return c>='a' && c<='z';
}

long memoRP[][][];
long cRP(int i, int j, int r, int p) { //the g function in the above description
   char[] e=expr[r];
   if (i==j && p==e.length) return 1;
   if (i>=j || p>=e.length) return 0;
   int h=offset[r]+p;
   if (memoRP[i][j-1][h]>=0) return memoRP[i][j-1][h];
   long ret=0;
   char c=e[p];
   if (term(c)) {
      if (c==w[i]) ret=cRP(i+1,j,r,p+1);
   } else {
      int hc=c-'A';
      int lim = j-e.length+p+1;
      for(int k=i+1;k<=lim;++k) {
         ret += cS(i,k,hc)*cRP(k,j,r,p+1);
         if (ret>LIMIT) {
            ret=LIMIT+1;
            break;
         }
      }
   }
   return memoRP[i][j-1][h]=ret;
}

long memoS[][][];
long cS(int i, int j, int h) { //the f function in the above description
   if (memoS[i][j-1][h]>=0) return memoS[i][j-1][h];
   long ret=0;
   for(int r=iniR[h];r<endR[h];++r) {
      ret += cRP(i,j,r,0);
      if (ret>LIMIT) {
         ret=LIMIT+1;
         break;
      }
   }
   return memoS[i][j-1][h]=ret;
}

public int countParsingTrees(String[] rules, char seed, String word) {
   Arrays.sort(rules);
   offset = new int[maxRules];
   w=word.toCharArray();
   memoRP=new long[w.length][w.length][maxPosRules];
   for(int i=0;i<w.length;++i)
   for(int j=0;j<w.length;++j)
   for(int k=0;k<maxPosRules;++k) memoRP[i][j][k]=-1;
   memoS=new long[w.length][w.length][maxChars];
   for(int i=0;i<w.length;++i)
   for(int j=0;j<w.length;++j)
   for(int k=0;k<maxChars;++k) memoS[i][j][k]=-1;
   iniR=new int[maxChars];
   for(int i=0;i<maxChars;++i) iniR[i]=100000;
   endR=new int[maxChars];
   expr=new char[maxRules][maxCharsRules];
   int nr=0;
   for(String rs : rules) { //parsing time!
      int h=rs.charAt(0)-'A';
      iniR[h]=Math.min(iniR[h],nr);
      String[] p = rs.substring(6,rs.length()).split(" \\| ");
      for(String e : p) {
         expr[nr]=e.toCharArray();
         offset[nr+1]=offset[nr]+e.length();
         ++nr;
      }
      endR[h]=Math.max(endR[h],nr);
   }
   int r=(int)cS(0,w.length,seed-'A');
   return r<=LIMIT?r:-1;
}

A few words about the other approach i talked about. The idea required a little language theory and knowledge of Context Free Grammars and the chomsky normal form. Since there's an efficient algorithm to convert any grammar to a chomsky normal form equivalent one, we can first do that transformation. As you can see, in chomsky normal form there is no need for the g function we discussed previously, because f can be directly implemented with a single loop that makes the partition between the only 2 nonterminals, if that's the case of that rule. This bound of 2 nonterminals gives us a bound of at most j-i partitions, instead of an exponential explosion, and eliminates the need for the g function. For an implementation with this idea see misof's solution.

Author
By soul-net
TopCoder Member