Match Editorial

SRM 174
Saturday, December 13, 2003

Match summary

TopCoder served up a gamblers' special on the weekend as, improbably enough, every Division One problem dealt with chance. The horse to back was SnapDragon, who galloped across the finish line after less than thirty minutes' coding. Yarin would have been another good bet, returning from his Final Four showing at the TopCoder Open (coincidentally held in a casino) to place a distant second in this match. Half a pace behind came WishingBone to complete the trifecta. Probabilistic lunacy extended to the harder problems in Division Two, though the easiest was a tame deterministic affair. Olexiy cruised through them all for the division win, with fellow first-timers monsoon and pigworlds rounding out the top three in a crowded field.

The Problems

CrossWord

Used as: Division Two - Level One:

Value 250

Submission Rate 182 / 221 (82.35%)

Success Rate 160 / 182 (87.91%)

High Score mpbailey for 248.04 points (2 mins 32 secs)

Average Score 202.60 (for 160 correct submissions)

Given an array of strings filled with the characters '.' and 'X', we are to count the substrings of given length that consist only of '.' and that do not adjoin a '.' on either side. We can find all substrings fitting this description by making a single pass over each string. Upon encountering an 'X', we consider the number of uninterrupted '.' characters we have seen thus far. To account for cases where a suitable substring occurs at the very end of a string, we append a sentinel 'X' to each.

def crossword(board, size):
    seen = 0
    for row in board:
        row += 'X'
        count = 0
        for ch in row:
            if ch == '.':
                count += 1
            else:
                if count == size:
                    seen += 1
                count = 0
    return seen

We must not forget to initialize the '.' count after seeing an 'X'.

BirthdayOdds

Used as: Division Two - Level Two:

Value 500

Submission Rate 179 / 221 (81.00%)

Success Rate 125 / 179 (69.83%)

High Score hw_Tim for 495.43 points (2 mins 44 secs)

Average Score 356.05 (for 125 correct submissions)

Used as: Division One - Level One:

Value 250

Submission Rate 138 / 140 (98.57%)

Success Rate 106 / 138 (76.81%)

High Score overwise for 247.15 points (3 mins 3 secs)

Average Score 217.13 (for 106 correct submissions)

Given the number of days in a planetary year and the number of people in a room, we are to calculate the probability that at least two of them share a birthday. Oops! That's not exactly what the problem says. It asks us to calculate the number of people such that the probability of a shared birthday reaches a given threshold. But the required number of people in the room will not exceed the number of days in a year by more than one, so we can start with an empty room and add people to it, calculating the probability at each increment until we cross the threshold.

The superficial difficulty is that there are many ways for some set of people to have a birthday in common. There may be one or more pairs of people sharing a birthday, or several triples, even quadruples or greater. We can dispose of this red herring by calculating the probability that no two people share a birthday. There must be a shared birthday in all other cases, so we subtract this probability from 1. Better yet, we can leave it alone, aiming instead for 1 minus the threshold probability.

Just what is the probability that n people share no birthday among m possible days? The first of these n people can be born on any one of the m days. The second can only be born on one of m-1 days remaining out of the m days in the year. The third must be born on one of m-2 out of m days in the year, and so on down to (m-n+1)/m. We multiply each of these ratios to obtain the final odds.

def people(threshold, days):
    threshold = 1.0 - threshold/100.0
    prob = 1.0
    i = 0
    while (prob > threshold):
        prob *= (1.0*days-i)/days
        i += 1
    return i

Our calculation is simplified by the fact that we are interested in all possible orderings of birthdays. All cases where person x is born on March 3rd and person y on July 4th are distinct from those where x is born on July 4th and y on March 3rd.

ProbabilityTree

Used as: Division Two - Level Three:

Value 1000

Submission Rate 67 / 221 (30.32%)

Success Rate 43 / 67 (64.18%)

High Score pigworlds for 880.76 points (10 mins 45 secs)

Average Score 580.60 (for 43 correct submissions)

We are asked to compute the probability of each event in something called a probability tree and to report which ones fall within a given interval. There's some beating around the bush in the problem statement, but the crucial facts are as follows.

Each node in the probability tree is marked with the probability q that its event will occur if the event associated with its parent also occurs, and the probability r that its event will occur even if that of its parent does not. If we know the probability p that the parent event will occur, we can calculate this event's probability with the formula p*q+p*r.

If we don't know the probability of the parent event, we had better wait until we've calculated it. This suggests an iterative approach in which we make multiple passes over the nodes of the tree to calculate the probability for every node whose parent probability is already known. We make at least one new calculation with each pass until, finally, all probabilities are known.

def odds(tree, lower, upper):
    n = len(tree)
    indices = []
    probs = [-1] * n
    done = [0] * n
    probs[0] = float(tree[0])/100.0
    if (probs[0] > lower/100.0 and probs[0] < upper/100.0):
        indices.append(0)
    m = 1
    while (1):
        if (m == n):
            break
        for i in range(1, n):
            subs = tree[i].split()
            parent = int(subs[0])
            if (probs[parent] < 0.0 or done[i]):
                continue
            p1 = float(subs[1])/100.0
            p2 = float(subs[2])/100.0
            probs[i] = p1*probs[parent] + p2*(1.0-probs[parent])
            m += 1
            done[i] = 1
            if (probs[i] > lower/100.0 and probs[i] < upper/100.0):
                indices.append(i)
    indices.sort()
    return indices

The counter m is incremented with each new calculation until it reaches n, the number of nodes. We use the strictly-greater and strictly-less comparisons because an exclusive interval is specified. Note also that we use a sentinel array to avoid duplicate calculation, and that we sort the indices immediately before returning them.

RangeGame

Used as: Division One - Level Two:

Value 600

Submission Rate 53 / 140 (37.86%)

Success Rate 33 / 53 (62.26%)

High Score SnapDragon for 503.72 points (12 mins 56 secs)

Average Score 300.57 (for 33 correct submissions)

This problem is almost a generalization of the notorious brainteaser known as Monty Hall's dilemma. Monty Hall was the host of a game show in which the grand prize was a Mercedes Benz. If the contestant made it to the final round of the game, she would confront three closed doors. Behind one of them was the Mercedes, while the other two concealed a pair of goats. The contestant would indicate her choice by pointing to one of the doors. Instead of opening it, however, Monty Hall would open one of the two other doors to reveal a goat. The contestant would then have the opportunity to alter her choice of door before the final disclosure.

The question is whether the contestant stands to gain by switching to the remaining door. One school of thought, a fallacious one, is that the Mercedes is either behind the initially chosen door or behind the other unopened one, so the contestant's chance of winning is one half regardless of whether she changes her choice. Another fallacious line of reasoning concludes that she has a two-thirds chance by staying with her first choice, since Monty Hall can choose one of two doors if she initially picked the Mercedes, but is constrained to only one if she is standing before a goat.

In fact, the contestant has a two-thirds chance of winning if she changes her selection. To see why this is so, consider the difference between a contestant who has a policy of never switching and one who always does. The non-switcher has a one-third chance of winning, since she picks a door and sticks with it. The switching contestant, however, loses only in those cases where the non-switcher would win, and wins otherwise. An astute contestant will therefore double her chances by switching, assuming the constraint that Monty Hall consistently reveals a goat. In real life, the financial pressures of a television show might make it less likely that the contestant is offered a chance to switch when her initial choice is a door concealing a goat.

In the RangeGame problem, however, if the prize patterns are {"10", "20", "30"} and the hint history is {"20"}, the answer is not {10, 30} but {10, 10}. Why the discrepancy? Note that on Monty Hall's game show, the host will never open the door that is the contestant's initial choice. In RangeGame, on the other hand, the host is as likely to open the contestant's first choice as any other non-winning door. This changes the event space in such a way that there is no longer any advantage to pursuing the switching strategy.

The problem at hand is essentially asking us to identify the door whose number is included in the greatest number of intervals. Because ties are broken in favor of lower numbers, the optimal number will always be the lowest number in some interval. This optimality principle can be proven by contradiction. Suppose that the optimal number x is not the lowest in any interval. Consider x-1, its lesser neighbor. If x-1 is included in as many intervals as x, then x is not optimal. But if x-1 is included in fewer intervals, then x must constitute the border of some interval and is therefore its lowest number. Hence, it cannot be the case that the optimal number is not the lowest in any interval.

The upshot is that we can restrict our search to the lowest number in each interval. First, let's delegate the door-pattern parsing to a helper function. A pattern is formed of space-separated globs. We split each glob into a pair of consecutive integers, so that the entire pattern is represented by an integer array. Note the special case where an interval begins and ends with the same integer.

def intervals(patt):
    ret = []
    globs = patt.split(' ')
    for glob in globs:
        parts = glob.split('-')
        ret.append(int(parts[0]))
        if len(parts) == 1:
            ret.append(int(parts[0]))
        else:
            ret.append(int(parts[1]))
    return ret

It's also convenient to have a function that decides whether one sequence of intervals intersects another at any point. We step through consecutive pairs of integers in each array, comparing them pairwise for overlap.

def overlap(ar, br):
    for ai in range(0, len(ar), 2):
        for bi in range(0, len(br), 2):
            if (ar[ai] < br[bi] and ar[ai+1] < br[bi]):
                continue
            if (ar[ai] < br[bi+1] and ar[ai+1] > br[bi+1]):
                continue
            return 1
    return 0

In order to find the optimal door number, we take the lower number in each interval and double it, as it were, to make a temporary interval that we can pass to overlap.

def best(doors):
    max = -1
    maxlap = -1
    for door in doors:
        for i in intervals(door):
            ii = [i, i]
            lap = 0
            for d in doors:
                if (overlap(ii, intervals(d))):
                    lap += 1
            if (lap > maxlap or (lap == maxlap and i < max)):
                max = i
                maxlap = lap
    return max

Finally, we step through the hints one by one, eliminating overlapping doors and recalculating the optimal door number with each iteration.

def guess(doors, hints):
    guesses = [best(doors)]
    for hint in hints:
        doomed = []
        for door in doors:
            if overlap(intervals(hint), intervals(door)):
                doomed.append(door)
        for door in doomed:
            doors.remove(door)
        guesses.append(best(doors))
    return guesses

We use the doomed array to avoid modifying doors while we're iterating over it.

PointSystem

Used as: Division One - Level Three:

Value 800

Submission Rate 30 / 140 (21.43%)

Success Rate 21 / 30 (70.00%)

High Score SnapDragon for 680.68 points (12 mins 20 secs)

Average Score 497.37 (for 21 correct submissions)

Given the minimum number of points and minimum lead required to win a game, as well as the probability that the underdog wins the point in any turn, we are to calculate the probability that the underdog will eventually defeat his opponent. The key to solving this problem is to envision the event space as a matrix of possible scores (x,y) where the underdog has scored x points to the favorite's y. We can then formulate a recurrence that uses the probabilities of lower scores to calculate the probability of a higher score.

If the underdog has chance s of winning any given point, the odds that he can make the score (1,0) are exactly s, and the odds of a (0,1) score are 1-s. What are the odds that the score will become (1,1)? The case where the favorite catches up contributes (1-s)*s to the probability of this event, and the case where the underdog is the one who scores the second point contributes s*(1-s). There are no other ways of reaching the score (1,1) with the gain of a single point by either player.

In general, the odds of reaching a score (x,y) are P(x,y) = s*P(x-1,y) + (1-s)*P(x,y-1). We can put this formula directly to use by implementing recursion with memoization. Alternatively, we can start from the lowest scores and build our way up.

def upset(min, lead, skill):
    skill = skill/100.0
    prob = 0.0
    odds = []
    for i in range(2001):
        odds.append(2001*[0.0])
    odds[0][0] = 1.0
    for games in range(2000):
        for underdog in range(games+1):
            favorite = games-underdog
            if (favorite >= min and favorite-underdog >= lead):
                continue
            if (underdog >= min and underdog-favorite >= lead):
                prob += odds[underdog][favorite]
                continue
            odds[underdog+1][favorite] += odds[underdog][favorite]*skill
            odds[underdog][favorite+1] += odds[underdog][favorite]*(1.0-skill)
    return prob

The two continue clauses are vital to the correct functioning of this program: we must skip scores that are unattainable because the game would have ended at a prior point spread. Observe that as x+y increases, the probability of a true underdog, with s < .5, attaining a score of (x,y) tends toward zero. Experimentation shows that calculating the odds of every score up to x+y=2000 yields sufficient precision for our purposes.

By Eeyore
TopCoder Member