Match Editorial

2004 TopCoder Open
Qualification Problem Set 4
September 7-8, 2004

Summary

The Easy problem was really easy, but the Hard problem made this set brutal. Only 7 coders passed system tests on the Hard, led by Eryx and tomek. Congratulations to jasonw, who made red for the first time by solving both problems.

The Problems

Genetics

Used as: Division One - Level One:

Value 200

Submission Rate 206 / 215 (95.81%)

Success Rate 200 / 206 (97.09%)

High Score Eryx for 197.93 points (2 mins 20 secs)

Average Score 173.19 (for 200 correct submissions)

Once you know how to handle a single pair of genes, it is simply a matter of looping through all the pairs.

    String answer = "";
    for (int i = 0; i < dom.length(); i++)
      answer += qualityOf(g1.charAt(i), g2.charAt(i), dom.charAt(i));
    return answer;

Now, what does the qualityOf function do? Basically, you want the following logic:

    char qualityOf(char x, char y, char dom) {
      if (both uppercase) return uppercase
      if (both lowercase) return lowercase
      // one uppercase and one lowercase
      if (dom == 'D') return uppercase
      if (dom == 'R') return lowercase
    }

Noting that uppercase letters have smaller ASCII codes than lowercase letters, you could write

      if (x == y) return x;
      if (dom == 'D') return min(x,y);
      if (dom == 'R') return max(x,y);

or even just

      return (dom == 'D') ? min(x,y) : max(x,y);

because min and max will give you the right answer when x and y are equal.

PhoneSearch

Used as: Division One - Level Three:

Value 1100

Submission Rate 29 / 215 (13.49%)

Success Rate 7 / 29 (24.14%)

High Score Eryx for 661.61 points (21 mins 49 secs)

Average Score 502.68 (for 7 correct submissions)

This was by far the hardest problem in the qualification round. The first step was to convert the single-letter frequencies into relative frequencies for each of the 26*26*26 three-letter prefixes. Suppose the phone book has some arbitrarily large number of names in it, and consider some prefix XYZ. Let N be the sum of the single-letter frequencies. The fraction of names that start with X is freq[X]/N, the fraction of names starting with X that have Y in the second position is freq[Y]/N, and the fraction of names starting with XY that have Z in the third position is freq[Z]/N. Altogether, the fraction of names that start with XYZ is (freq[X]*freq[Y]*freq[Z])/(N*N*N). Multiply all such fractions by N*N*N to get the relative frequencies of all prefixes as integers.

Next, we need some way to sum the relative frequencies of all prefixes between two prefixes lo and hi. (I'll assume the range from lo to hi is inclusive, but you could also make it exclusive, or—my favorite—inclusive of lo but exclusive of hi.) Pretend we have a function sum(lo,hi) that computes this sum. A very efficient way to calculate the sum is to precalculate a table of cumulative sums, and simply return cumulative[hi+1]-cumulative[lo]. But the problem sizes were small enough that you could probably get away with a simple loop.

Now, suppose lopage is the lowest page the desired prefix might occur on, and lopre is the lowest prefix that might occur on that page. Similarly, suppose hipage is the highest page the desired prefix might occur on, and hipre is the highest prefix that might occur on that page. There are P=hipage-lopage+1 pages under consideration.

The first page on which we expect the desired prefix, pre, to appear depends on the weighted fraction of prefixes that precede pre within the range from lopre to hipre. This fraction is sum(lopre,pre-1)/sum(lopre,hipre), where pre-1 is the prefix just before pre alphabetically. To turn this fraction into a page number, we adjust it as lopage + P*sum(lopre,pre-1)/sum(lopre,hipre), rounded down. Similarly, we expect the first page of the next higher prefix to be lopage + P*sum(lopre,pre)/sum(lopre,hipre), also rounded down. We expect the desired prefix to extend onto the same page as the next prefix, except when the page break falls exactly between the two prefixes, which occurs when P*sum(lopre,pre)/sum(lopre,hipre) is exactly an integer. In that case, the desired prefix will extend only to the previous page. The last page on which we expect pre to appear can thus be calculated as lopage + (P*sum(lopre,pre) - 1)/sum(lopre,hipre), where the minus one protects us from the page break.

The main search loop then looks something like

  flips = 0;
  lopage = 0;
  hipage = total number of pages - 1;
  lopre = AAA;
  hipre = ZZZ;
  while (lopage <= hipage) {
    P = hipage - lopage + 1;
    firstpage = lopage + P*sum(lopre,pre-1)/sum(lopre,hipre);
    lastpage = lopage + (P*sum(lopre,pre) - 1)/sum(lopre,hipre);
    midpage = (firstpage+lastpage)/2;
    flips++;

    if (prefix < first prefix on midpage) {
      hipage = midpage - 1;
      hipre = first prefix on midpage - 1;
    }
    else if (prefix > last prefix on midpage) {
      lopage = midpage + 1;
      lopre = last prefix on midpage + 1;
    } 
    else // found right page!
      return flips;
  }
  // no more pages to search!
  return -flips;

Notice that this code is nearly identical to a binary search, except for the calculation of midpage.

By vorthys
TopCoder Member