Match Editorial

SRM 155
Thursday, July 17, 2003

Match summary

Incan Investigators Find Fraud by Deadbeat Dads in Twisty Two-Toned Trees...SnapDragon Wins Again!

SnapDragon continues his dominance, winning for the sixth time out of the past eight challenges. Yarin was hot on his heels until a late resubmit to fix a typo. ZorbaTHut used one of only a handful of challenges in Division 1 to pull into second place. In Division 2, TheFaxman finished all three problems in a lightning quick 24 minutes, only to watch ambclams pull ahead for the victory two minutes later.

SRM 155 marks the debut of VB as a TopCoder programming language. A scoring bug briefly showed one VB.NET user's total score as a heart-stopping -8000, but otherwise, the debut went smoothly. Welcome aboard VB coders!

The Problems

Quipu

Used as: Division Two - Level One:

Value 250

Submission Rate 197 / 227 (86.78%)

Success Rate 182 / 197 (92.39%)

High Score kats for 248.05 points (2 mins 31 secs)

Average Score 190.82 (for 182 correct submissions)

In this problem, you had to determine the size of each cluster of knots by counting consecutive 'X's, allowing for the possibility of empty clusters. You could do this with two nested loops, with the inner loop processing a single cluster and the outer loop walking from cluster to cluster:

  int number = 0;
  for (int i = 0; i < knots.length(); i++) {
    int cluster = 0;
    while (knots.charAt(i) == 'X') {
      cluster++;
      i++;
    }
    number = number*10 + cluster;
  }

Alternatively, you could collapse these two loops into a single loop by incrementing every time you see an 'X' and multiplying by 10 every time you see a '-' (except for the trailing '-'):

  int number = 0;
  for (int i = 0; i < knots.length()-1; i++) {
    if (knots.charAt(i) == 'X') number++;
    else number *= 10; // '-' case
  }

In either version, you could also ignore the leading '-' since multiplying 0 by 10 has no effect.

BenfordsLaw

Used as: Division Two - Level Two:

Value 500

Submission Rate 130 / 227 (57.27%)

Success Rate 105 / 130 (80.77%)

High Score TheFaxman for 446.93 points (10 mins 2 secs)

Average Score 277.27 (for 105 correct submissions)

According to Benford's Law, the probability that a number starts with digit D is log₁₀(1+1/D). Therefore, given a sample of N numbers, you would expect E = N*log₁₀(1+1/D) of them to start with D. A digit is questionable if its actual frequency is less than E/threshold or greater than E*threshold. You count how many times each leading digit actually occurs and compare it to these border values.

Many people lost time by accidentally using natural logarithms (base e) rather than base-10 logarithms. To convert from one base to the other, you use the identity log₁₀(x) = ln(x) / ln(10). Another easy mistake to make was to accidentally use integer division in calculating 1+1/D, instead of converting to doubles.

PaternityTest

Used as: Division Two - Level Three:

Value 1000

Submission Rate 70 / 227 (30.84%)

Success Rate 52 / 70 (74.29%)

High Score ambclams for 953.86 points (6 mins 18 secs)

Average Score 691.68 (for 52 correct submissions)

Used as: Division One - Level One:

Value 300

Submission Rate 123 / 136 (90.44%)

Success Rate 117 / 123 (95.12%)

High Score shadowless for 295.24 points (3 mins 37 secs)

Average Score 227.52 (for 117 correct submissions)

There are two basic approaches to deciding whether a particular man could be the father. The brute force approach is to test every possible partitioning of characters from the child's sample between the mother and the potential father, and accept any partitioning where half of the characters come from the mother and half from the father. With up to 2²⁰ possible partitionings per potential father, and up to 5 potential fathers, there could be up to 5 million partitionings to test. Timeout is certainly a concern in this range, but with a modicum of care it should be possible to test all 5 million partitionings in a few seconds.

A more efficient approach is to count how many characters in the child's sample match the mother and how many match the potential father, allowing for the fact that some characters could match both. If any characters match neither the mother nor the potential father, then the man can immediately be ruled out. Otherwise, a valid partitioning is possible if and only if at least half the characters match the mother and at least half match the potential father. Why? Well, consider an example. Suppose the sample is 20 characters long, with 12 characters matching the mother and 13 characters matching the father. Then, there are 7 characters that match only the mother, 8 characters that match only the father, and 5 characters that match both. To create a valid partitioning, we only need to assign 3 of the overlapping characters to the mother and 2 to the father, which is clearly possible.

But wait! We are guaranteed by the constraints that at least half the characters in the child's sample match the mother, so we don't even need to check that. We merely need to check that no characters match neither the mother nor the potential father, and that at least half the characters match the father, as in the following helper method:

  boolean possibleFather(String child, String mother, String man) {
    int match = 0;
    for (int i = 0; i < child.length(); i++) {
      if (child.charAt(i) == man.charAt(i)) match++;
      else if (child.charAt(i) != mother.charAt(i)) return false;
    }
    return match >= child.length()/2;
  }

The rest of the solution is simply to check each man, and keep the indices of those who cannot be ruled out.

QuipuReader

Used as: Division One - Level Two:

Value 450

Submission Rate 98 / 136 (72.06%)

Success Rate 67 / 98 (68.37%)

High Score ChristopherH for 395.86 points (10 mins 48 secs)

Average Score 271.22 (for 67 correct submissions)

This problem is similar to Quipu, the Division II Easy, but extends it in two ways. First, there are now several cords instead of just one. Second, there can now be an arbitrary number of '-'s between clusters of knots. The tricky part of the problem is deciding where one digit ends and the next digit begins. This is easy enough in examples like

  "-XXX--XX---XXXX-"
  "-XX--XXXX---XX--"

where one or more columns of dashes separate adjacent digits, but much harder in examples like

  "XXX-XX-XXXX"
  "XX-XXXX-XX-"
  "X--XXX---X-"

where adjacent digits are not separated by columns of dashes. A column begins a new digit if it satisfies two conditions. First, it must contain at least one 'X' that falls to the immediate right of a '-'. Second, it must not contain any 'X's that fall to the immediate right of another 'X'. These conditions can be tested for column i > 0 as follows:

    boolean newDigit = false;
    for (int j = 0; j < knots.length; j++) {
      if (knots[j].charAt(i) == 'X') {
        newDigit = (knots[j].charAt(i-1) == '-');
        if (!newDigit) break;
      }
    }

The rest of the code walks over all the columns, multiplying the numbers for all the cords by 10 when a new digit is begun, and incrementing the appropriate cord's number for each 'X' that is found.

RedBlack

Used as: Division One - Level Three:

Value 1000

Submission Rate 20 / 136 (14.71%)

Success Rate 16 / 20 (80.00%)

High Score SnapDragon for 621.54 points (25 mins 44 secs)

Average Score 465.91 (for 16 correct submissions)

Red-black trees are described in most textbooks on data structures, but they can be quite tricky to implement, often taking well over a hundred lines of code. This problem concerns a variant of red-black trees that is much simpler and shorter. Instead of the four different kinds of rotations used by standard red-black trees, this variant uses a single rebalancing transformation called a twist. The problem is to simulate a sequence of inserts and count how many twists occur.

There are four different configurations in which a twist can occur, but all four are twisted into the same target shape. By writing a method to produce this shape, we avoid replicating the rebalancing code in four different places. The desired shape is

            (RED) y
                 / \
                /   \
         (BLK) x     z (BLK)
              / \   / \
             T1 T2 T3 T4

and the method that produces this shape is

  Node twist(Node x,Node y,Node z,  Node T1,Node T2,Node T3,Node T4) {
    twistCount++;
    x.color = BLK;  x.L = T1;  x.R = T2;
    z.color = BLK;  z.L = T3;  z.R = T4;
    y.color = RED;  y.L = x;   y.R = z;
    return y;
  }

where I have abbreviated left and right as L and R.

The insertion code can be written as a recursive method. Except for the rebalancing code, it looks just like any other insertion into a binary search tree.

  Node ins(int x,Node t) {
    if (t == EMPTY) return new Node(RED,x,EMPTY,EMPTY);
    if (x < t.value) t.L = ins(x,t.L);
    else             t.R = ins(x,t.R);
  
    ...rebalance by calling twist if necessary...

    return t;
  }

The tricky part, of course, is the rebalancing code. We need to call twist whenever t has a red child that has a red child (t itself will always be black in this case.) The four configurations of red child and red grandchild can be coded very compactly as

    // check if child and grandchild are red
    if (t.L.color == RED && t.L.L.color == RED)
      return twist(t.L.L,t.L,t,  t.L.L.L,t.L.L.R,t.L.R,t.R);
    if (t.L.color == RED && t.L.R.color == RED)
      return twist(t.L,t.L.R,t,  t.L.L,t.L.R.L,t.L.R.R,t.R);
    if (t.R.color == RED && t.R.L.color == RED)
      return twist(t,t.R.L,t.R,  t.L,t.R.L.L,t.R.L.R,t.R.R);
    if (t.R.color == RED && t.R.R.color == RED)
      return twist(t,t.R,t.R.R,  t.L,t.R.L,t.R.R.L,t.R.R.R);

Although these masses of .L's and .R's can be hard to read, they correspond directly to the x,y,z and T1,T2,T3,T4 from the diagrams given in the problem:

       (BLK) z         (BLK) z               x (BLK)         x (BLK)
            / \             / \             / \             / \
     (RED) y   T4    (RED) x   T4         T1   z (RED)    T1   y (RED)
          / \             / \                 / \             / \
   (RED) x   T3         T1   y (RED)   (RED) y   T4         T2   z (RED)
        / \                 / \             / \                 / \
      T1   T2             T2   T3         T2   T3             T3   T4

Another trick that helps to make this code compact is to use a sentinel node to represent the bottom of the tree, instead of null. The sentinel node, called EMPTY, is simply a black node with arbitrary values in its other fields. Notice how the ins method checks for t == EMPTY instead of t == null. Without the sentinel node, tests like

    if (t.L.color == RED && t.L.L.color == RED)
      return ...;

would have to be written as

    if (t.L != null && t.L.color == RED &&
        t.L.L != null && t.L.L.color == RED)
      return ...;

The last detail to remember about insertions is that the root is colored black at the end of every insertion. This can be accomplished by a wrapper method that merely calls the main insertion method (ins) and colors the result black.

  Node insert(int x,Node root) {
    root = ins(x,root);
    root.color = BLK;
    return root;
  }

See my solution in the practice room for the complete code.

Many people complicated their solutions by trying to maintain parent pointers. Parent pointers are necessary only if you try to code insertion using loops instead of recursion, but such an approach tends to be messy.

This variant of red-black trees is a handy data structure to add to your toolbox. Although it may have seemed difficult in a timed setting, it is one of the simplest forms of balanced binary search trees you'll ever find.

By vorthys
TopCoder Member