JOIN
 Match Editorial
2004 TopCoder Open
Qualification Problem Set 3

September 7-8, 2004

Summary

This set probably had the easiest hard problem, but it also had the most difficult easy problem to balance things out. reid was the quickest to finish them both, and ended up less than 50 points ahead of snewman - if only there was a challenge phase. qubits, a TopCoder from way back showed up and took third place, only a hair behind snewman.

The Problems

Rank
Used as: Division One - Level One:
 Value 300 Submission Rate 175 / 200 (87.50%) Success Rate 154 / 175 (88.00%) High Score reid for 295.59 points (2 mins 47 secs) Average Score 214.45 (for 154 correct submissions)

If you did this problem the hard way, it was surely the hardest of the qual easies. The hard way is to sort the elements, and then go through and rank them, and finally to put the ranks in the right places in the return. Granted, this isn't that hard, but its a lot more work than the easy way that gave reid over 295/300.

First, lets say there are no ties: in this case your rank is equal to one plus the number of people who beat you. Now, lets say you tied with one person, so the two of you get ranks n and n+1, and the average is hence n+0.5. Next, if you tied with 2 other people, the three of you would get ranks of n, n+1, and n+2, for an average of n+1. The pattern developing here is that every tie adds 0.5 to your rank. Hence, your ranks is one plus the number of people who beat you, plus 0.5 times the number of people who tied you. Therefore, two simple nested loops are sufficient to calculate the ranks of all the competitors.

GeneticCrossover
Used as: Division One - Level Three:
 Value 950 Submission Rate 82 / 200 (41.00%) Success Rate 62 / 82 (75.61%) High Score reid for 741.70 points (12 mins 49 secs) Average Score 525.11 (for 62 correct submissions)

Was it the lower score, or was this problem really easier? We'll never really know, but it did end up with more submissions than any other hard. The basic principle needed to solve it is known as linearity of expectations. This says that, if you know the expected values of a number of variables, then the expected value of the sum of those variables is equal to the sum of the expected values of the variables, regardless of the dependencies among those variables.

In this problem, the variables were the values contributed by each gene, elements of dom or rec. The expected value of each of these variables was equal to the p*dom+(1-p)*r, where p is the probability that a gene will be expressed dominantly. Since the return is simply the sum of all these expected values (by linearity of expectations), we only need to find p for each gene.

If a gene does not depend on another gene, then p is equal to the probability that one or both of the genes from the parents are dominant. There are only 4 cases to consider for each gene, and it isn't hard to just look at all four pairings. Alternatively, p = 1-(2-q1)*(2-q2)/4, where q1 and q2 are the numbers of recessive genes from each parent has. If gene i depends on gene j, we still have to compute the above probability for gene i. Since gene i may only be dominant if gene j is also, the we multiply the above value by the probability that gene j is dominant, which gives us the probability that both j and i are dominant.

```   double[] p;
int[] dep;
public double cross(String p1a, String p1b, String p2a, String p2b, int[] dom, int[] rec, int[] dep){
this.dep = dep;
char[][] ch = new char[4][];
ch[0]=p1a.toCharArray();
ch[1]=p1b.toCharArray();
ch[2]=p2a.toCharArray();
ch[3]=p2b.toCharArray();
p = new double[p1a.length()];
for(int i = 0; i<p1a.length(); i++){
for(int j = 0; j<2; j++){
for(int k = 0; k<2; k++){
char c1 = ch[j][i];
char c2 = ch[k+2][i];
if(c1>='A'&&c1<='Z')p[i] += .25;
else if(c2>='A'&&c2<='Z')p[i] += .25;
}
}
}
double ret = 0;
for(int i = 0; i<r.length; i++){
double r = rec(i);
ret += r * dom[i] + (1-r)*rec[i];
}
return ret;
}
double rec(int idx){
if(dep[idx] == -1)return p[idx];
else return p[idx] * rec(dep[idx]);
}
```
By lbackstrom
TopCoder Member