
Match Editorial 
2004 TopCoder Open
Qualification Problem Set 3September 78, 2004
Summary
This set probably had the easiest hard problem, but it also had the most
difficult easy problem to balance things out. reid was the quickest to finish
them both, and ended up less than 50 points ahead of snewman  if only there was
a challenge phase. qubits, a TopCoder from way back showed up and took third
place, only a hair behind snewman.
The Problems
Rank
Used as: Division One  Level One:
Value

300

Submission Rate

175 / 200 (87.50%)

Success Rate

154 / 175 (88.00%)

High Score

reid for 295.59 points (2 mins 47 secs)

Average Score

214.45 (for 154 correct submissions)

If you did this problem the hard way, it was surely the hardest of the qual
easies. The hard way is to sort the elements, and then go through and rank
them, and finally to put the ranks in the right places in the return. Granted,
this isn't that hard, but its a lot more work than the easy way that gave reid
over 295/300.
First, lets say there are no ties: in this case your rank is equal to one plus
the number of people who beat you. Now, lets say you tied with one person, so
the two of you get ranks n and n+1, and the average is hence n+0.5. Next, if
you tied with 2 other people, the three of you would get ranks of n, n+1, and
n+2, for an average of n+1. The pattern developing here is that every tie adds
0.5 to your rank. Hence, your ranks is one plus the number of people who beat
you, plus 0.5 times the number of people who tied you. Therefore, two simple
nested loops are sufficient to calculate the ranks of all the competitors.
GeneticCrossover
Used as: Division One  Level Three:
Value

950

Submission Rate

82 / 200 (41.00%)

Success Rate

62 / 82 (75.61%)

High Score

reid for 741.70 points (12 mins 49 secs)

Average Score

525.11 (for 62 correct submissions)

Was it the lower score, or was this problem really easier? We'll never really
know, but it did end up with more submissions than any other hard. The basic
principle needed to solve it is known as linearity of expectations. This
says that, if you know the expected values of a number of variables, then the
expected value of the sum of those variables is equal to the sum of the expected
values of the variables, regardless of the dependencies among those
variables.
In this problem, the variables were the values contributed by each gene, elements
of dom or rec. The expected value of each of these variables was
equal to the p*dom+(1p)*r, where p is the probability that a gene will
be expressed dominantly. Since the return is simply the sum of all these
expected values (by linearity of expectations), we only need to find p for each
gene.
If a gene does not depend on another gene, then p is equal to the probability
that one or both of the genes from the parents are dominant. There are only 4
cases to consider for each gene, and it isn't hard to just look at all four
pairings. Alternatively, p = 1(2q_{1})*(2q_{2})/4,
where q_{1} and q_{2} are the numbers of recessive genes from
each parent has. If gene i depends on gene j, we still have to compute the
above probability for gene i. Since gene i may only be dominant if gene j is
also, the we multiply the above value by the probability that gene j is
dominant, which gives us the probability that both j and i are dominant.
double[] p;
int[] dep;
public double cross(String p1a, String p1b, String p2a, String p2b, int[] dom, int[] rec, int[] dep){
this.dep = dep;
char[][] ch = new char[4][];
ch[0]=p1a.toCharArray();
ch[1]=p1b.toCharArray();
ch[2]=p2a.toCharArray();
ch[3]=p2b.toCharArray();
p = new double[p1a.length()];
for(int i = 0; i<p1a.length(); i++){
for(int j = 0; j<2; j++){
for(int k = 0; k<2; k++){
char c1 = ch[j][i];
char c2 = ch[k+2][i];
if(c1>='A'&&c1<='Z')p[i] += .25;
else if(c2>='A'&&c2<='Z')p[i] += .25;
}
}
}
double ret = 0;
for(int i = 0; i<r.length; i++){
double r = rec(i);
ret += r * dom[i] + (1r)*rec[i];
}
return ret;
}
double rec(int idx){
if(dep[idx] == 1)return p[idx];
else return p[idx] * rec(dep[idx]);
}
By
lbackstrom
TopCoder Member