Match Editorial

2006 TopCoder Open
Online Round 4
March 22, 2006

Match summary

Round 4 always features some of the hardest problems TopCoder has to offer, and tonight was no exception. While the easy stumped no one (literally), only 37 coders solved the medium, and only 4 solved the hard. Chief amongst them was andrewzta with a 30 point lead over Ruberik, who opted to do the hard first, though it ended up not mattering for him. Eryx and krijgertie got 3rd and 4th with failed medium submissions.

The Problems

DrawTree

Used as: Division One - Level One:

Value 250

Submission Rate 96 / 96 (100.00%)

Success Rate 95 / 96 (98.96%)

High Score Eryx for 245.02 points (4 mins 4 secs)

Average Score 210.47 (for 95 correct submissions)

The easy problem had nothing tricky in it, but it did require a reasonable amount of code. There are a lot of different ways to do this problem, but most of them involve a recursive function that adds strings to a global variable. The calls to the recursive function should tell how much to indent. Adding the pipes is a little bit tricky, but we can just add a loop to find the last child, and add pipes after every child but the last one.

MonotoneSEMin

Used as: Division One - Level Two:

Value 500

Submission Rate 56 / 96 (58.33%)

Success Rate 38 / 56 (67.86%)

High Score halyavin for 426.43 points (12 mins 14 secs)

Average Score 268.24 (for 38 correct submissions)

The medium problem turned out to be rather difficult. The first step to solving it correctly is to realize the shape of the function. Since it is only evaluated at discrete steps, those are the only points where its value matters, and so the function ends up being piecewise flat. That is, it looks like a staircase (albeit an uneven one). Once we have this figured out, we can work on optimizing the squared error. Lets think about the simple case where we have the string "10". In this case, we'd really like to make the function go from 1 to 0, but that wouldn't be monotically increasing. So, we have to make the function lower for the first bit, or higher for the second bit, or both. It turns out that the lowest squared error occurs when we make the function 0.5 for both of them. In fact, we can derive a more general formula: if an interval has A ones and B zeros, and we have to assign everything in the interval the same value, then the best value is A/(A+B).

This fact is far from obvious, though it does seem reasonable intuitively. To prove it is true, one can use calculus. First, call the value assigned to the flat function x. Then determine the squared error in terms of x, A, and B. Set the derivative to 0, and solve for x in terms of A and B.

Now, we are almost there, we know how to pick our function for an interval if it has to be flat, and we know that the overall function is piecewise flat. So, the final step is to partition the whole sequence of bits into the proper flat segments. We'd like these segments to be as small as possible, subject to the constraint that they have to give a monotone function, since small intervals will be better suited to the bits in them. One way to do this is to start with a segment for every bit. This will give a function with low squared error, but it will obviously not be monotone. Next, pick two adjacent segments which violate the monoticity requirement and merge them. At first glance, it may seem like the way we pick which two to merge (out of many possibilities) will matter, but it turns out that it doesn't. Eventually, we have a monotone function, and we return its squared error.

So, if each interval [i..j] indicates that the function should return A/(A+B) for that interval, then the code ends up looking vaguely like this:

bits[1..N]
for(i = 1..N)
    create interval [i..i] 
while(not monotone)
    find intervals [i..j], [j+1..k] that violate monotonicity
    merged [i..j], [j+1..k] into [i..k]
return squared error of function

While the greedy solution above works (in O(N) when done properly), most coders opted to use some form of dynamic programming instead. They computed the best squared error for the first j bits, as well as the value of the function at the j^th bit. Then, using dynamic programming, then tried adding each sized interval after the first j bits, with the same value all along it, as described. See SnapDragon's solution for an example of this algorithm.

TreeReconstruct

Used as: Division One - Level Three:

Value 1000

Submission Rate 11 / 96 (11.46%)

Success Rate 4 / 11 (36.36%)

High Score Eryx for 733.78 points (18 mins 35 secs)

Average Score 585.95 (for 4 correct submissions)

Lets start by thinking of the simple cases of 2 or 3 nodes. Obviously, if the subset contains 2 nodes the original tree may have only had 2 nodes also, with the same distance between them. However, this is not the case when there are 3 nodes. Lets call the three nodes A, B and C, and call AB, AC, and BC the distances between them. Now, imagine that there was some other node D in the original tree, and that A, B, and C were all connected to it. In this case, it is not hard to see that AB + AC = BC + 2*AD. Thus, working backwards, we can figure out AD. If AD is 0, then there wasn't an extra node in the original graph -- A was simply between B and C. Otherwise, AD lets us figure out BD and CD, and thus we can insert D. In some cases, it will turn out that BD or CD is 0, in which case we clearly don't insert a new node, since D is equal to B or C. Those are all the possible cases. The impossible cases come from AD being negative, or implying that BD or CD are negative. Also, since the original weights have to be integral, if any of the weigts are fractional, the case is invalid.

Once we can do 3 nodes properly, adding more isn't that much harder conceptually. We can build up the graph by adding nodes to it one node at a time. Each time we add one of the nodes from the input, we may also need to add an extra node, which is not in the input (or which has not been added from the input yet). However, though this sounds like a reasonable approach, it ends up getting a bit messy. There are a number of cases to deal with, and it is a bit harder to figure out where exactly to insert when there are already a bunch of nodes in the graph. Instead, we can use a slightly different approach, courtesy of radeye.

For every triple, A, B and C of nodes in the input, there must be some node D (possibly the same as A, B or C) that is on the path between each pair. That is, getting from any one to any other goes through D. Now, we can figure out the distances AD, BD, and CD, as described above -- this part is no more complicated when there are extra nodes. Next, consider a fifth node E. We are interested in how far E is from D: DE. We know that DE ≥ AE-AD, since AD+DE≥AE. We have similar inequalities for B and C. Finally, since D lies on the paths between pairs of A, B, and C, at least one of these inequalities must have equality, otherwise E would be between A and D, between B and D, and between C and D, which is impossible. The only way to have equality and still satisfy the inequalities is to set DE to the max of (AE-AD,BE-BD,CE-CD). Thus, for every triple A, B, and C, we can find D and then find DE for every node E in the input. This gives us a distance vector which is unique to D. If multiple triples generate the same distance vector, they are generating the same node D. Thus, by placing all the generated distance vectors, along with the input distance vectors in a set, the return ends up being the size of the set. Here is radeye's implementation:

public class TreeReconstruct {
  public int reconstruct(String[] g1, String[] g2) {
    HashSet r = new HashSet() ;
    int n = g1.length ;
    int[][] d = new int[n][n] ;
    StringBuffer sb = new StringBuffer() ;
    for (int i=0; i<n; i++) {
      sb.setLength(0) ;
      for (int j=0; j<n; j++) {
        d[i][j] = Integer.parseInt("" + g1[i].charAt(j) + g2[i].charAt(j), 16) ;
        sb.append(d[i][j]).append(",") ;
      }
      r.add(sb.toString()) ;
    }
    for (int a=0; a<n; a++)
      for (int b=a+1; b<n; b++)
        for (int c=b+1; c<n; c++) {
          int da = d[a][b] + d[a][c] - d[b][c] ;
          int db = d[a][b] + d[b][c] - d[a][c] ;
          int dc = d[a][c] + d[b][c] - d[a][b] ;
          if (((da | db | dc) & 0x1000001) != 0)
            return -1 ;
          sb.setLength(0) ;
          for (int e=0; e<n; e++) {
            int de = Math.max(Math.max(d[a][e]-da/2, d[b][e]-db/2),
                d[c][e]-dc/2) ;
            int cs = 0 ;
            if (da/2 + de == d[a][e])
              cs++ ;
            if (db/2 + de == d[b][e])
              cs++ ;
            if (dc/2 + de == d[c][e])
              cs++ ;
            if (de < 0 || cs < 2)
              return -1 ;
            sb.append(de).append(",") ;
          }
          r.add(sb.toString()) ;
        }
    return r.size() ;
  }
}

By lbackstrom
TopCoder Member