Match Editorial

SRM 199
Wednesday, June 16, 2004

Match summary

SRM 199 faced TopCoders with two of their most common fears: probability and geometry. venco took the top spot in Division-I with the only (out of nine submitted) passing solution to the 1000-point problem. A successful challenge by jonmac lifted him up into second place, over third-place nullspace who's rating is on track to shoot past the 2200 mark.

In Division-II, an unsuccesful challenge by Savior was not enough to knock him out of the top spot, as he solved the 1000-point problem in only 12 minutes and saw his rating jump 348 to bring him within 5 points of Division-I. loveislife and Sputnik finished second and third, separated by less than 20 points.

The Problems

StringMult

Used as: Division Two - Level One:

Value 250

Submission Rate 160 / 168 (95.24%)

Success Rate 151 / 160 (94.38%)

High Score mckavity for 247.81 points (2 mins 40 secs)

Average Score 219.12 (for 151 correct submissions)

This problem required a program to multiply a string sFactor by an integer iFactor, according to a provided definition for performing such a multiplication. The first step is to reverse sFactor if iFactor is negative. Then, to an empty string, concatenate abs(iFactor) copies of sFactor.

In pseudocode:

    if iFactor < 0 {
        reverse sFactor
        iFactor = -iFactor
    }

    ret = ""
    for i = 1 to iFactor {
        ret = ret + sfactor
    }

    return ret

If implemented correctly, you do not need to worry about the special cases of the string being empty or the integer being zero, although sometimes it's easier to just code the special cases explicitly than to convince yourself that your program handles them correctly.

TriangleCount

Used as: Division Two - Level Two:

Value 500

Submission Rate 86 / 168 (51.19%)

Success Rate 76 / 86 (88.37%)

High Score sputnik33 for 463.05 points (8 mins 9 secs)

Average Score 305.09 (for 76 correct submissions)

Used as: Division One - Level One:

Value 250

Submission Rate 133 / 151 (88.08%)

Success Rate 128 / 133 (96.24%)

High Score venco for 245.63 points (3 mins 48 secs)

Average Score 191.60 (for 128 correct submissions)

To solve this problem, coders had to write a program to return the total number of triangles (of any size) in a piece of a triangular grid. This is by no means an original problem -- see the "Triangular Tiling" article on mathworld.wolfram.com. Some people solved this problem by counting the number of triangles for the first few sizes, and then entering them into the On-Line Encyclopedia of Integer Sequences to find the closed-form solution.

However, even without finding a closed-form solution, it was a simple matter to loop over all the triangles and count them. See venco's program for a nice example of such a solution.

Transpose

Used as: Division Two - Level Three:

Value 1000

Submission Rate 26 / 168 (15.48%)

Success Rate 19 / 26 (73.08%)

High Score Savior for 853.22 points (12 mins 13 secs)

Average Score 571.82 (for 19 correct submissions)

Taking the transpose of a non-square matrix in-place sounds like a simple enough problem, but an elegant algorithm is elusive. The key is to come up with a formula that gives you, for a given index in the list, the new index that this element must be copied to. This formula can be derived as follows:

Given an index i, its position in a matrix with M rows and N columns is:
    row = i / N (integer division)
    col = i % N (modulo operator)
Swapping the row and column, and M and N, the new location i₂ is:
    i₂ = col*M + row = (i % N)*M + (i / N)

By iterating the above formula, you will eventually return to your starting element, and there will likely be many such cycles. Copying each element to its next position in a cycle of n elements requires n-1 swaps. So, to compute the total number of swaps required to copy each element to its next position (and thus inverting the matrix), you simply must find all the cycles, subtract 1 from their lengths, and compute the sum. Note that since the sum of all the cycle lengths must be M*N, this is equivalent to M*N minus the number of cycles.

See sputnik's program for a clean implementation of this solution.

ChipRace

Used as: Division One - Level Two:

Value 500

Submission Rate 94 / 151 (62.25%)

Success Rate 72 / 94 (76.60%)

High Score gilbert for 449.00 points (9 mins 48 secs)

Average Score 302.75 (for 72 correct submissions)

Problems dealing with probability intimidate many people, but the nice thing about this problem is that once you figure out how to solve it, the program can actually be quite short and elegant.

The probability of a player winning a chip is equal to the sum of the probabilities of all the different ways that can happen. That is, it is the probability that he will win the first chip, plus the probability that he will win the second chip, plus the probability that he will win the third chip, etc. The probability that he will win the first chip is the chance that he has the highest card, which is simply the number of cards he gets divided by the total number of cards.

The probability that he will win the second chip is the chance that he has the highest card of those remaining after some other player wins the first chip. This can computed by looping over all the other players and taking the sum of the chance that that other player wins the first chip, times the chance that our first player wins the second chip (his number of cards divided by the number of cards remaining). The probabilities for winning the third and later chips can be computed similarly.

This can be coded very succinctly as a recursive function. For a nice example, see gilbert's solution, who, incidentally, took an early lead after submitting his first two problems, only to knocked down later by a successful challenge to his 250-point problem. The small limits to the input parameters allow a good recursive solution to work without the need for memoization.

Of those who submitted this problem, over three quarters got it right. The most common reasons for failing were due to precision problems or timing out on the worst possible case (10 fours).

ClosestPoints

Used as: Division One - Level Three:

Value 1000

Submission Rate 9 / 151 (5.96%)

Success Rate 1 / 9 (11.11%)

High Score venco for 466.48 points (42 mins 27 secs)

Average Score 466.48 (for 1 correct submission)

At first, this problem seems simple: generate a bunch of random points and find the closest pair. However, the range of the input parameters (up to 150,000 points with coordinates -1,000,000 to 999,999, obviously prevents a simple brute force solution for running in under eight seconds. The key to many TopCoder problems is to identify which algorithm to use, but my goal for this problem was to emphasize selecting a good data structure.

This problem can be solved with the use of a "spatial" data structure, which arranges the points based on their position in order to avoid testing pairs of points that cannot possibly be the closest. The simplest such data structure is a fixed grid, where space is divided up into a three-dimensional array of boxes, and as points are generated they are sorted into their corresponding boxes. Provided that the boxes are sufficiently populated, points only need to be compared to other points in their box and to the points in the 26 neighboring boxes. Such a solution would fail if there were too few points, and the closest pair were not in neighboring boxes. It would also time out of the points were not evenly distributed, and all of the points ended up in just a few boxes. Fortunately, the pseudo-random number generator used in this problem distributes the points quite evenly.

Another solution is to use an "octree" or a "kd-tree", which are three-dimensional versions of a binary trees. Each node of an octree is divided in x, y, and z, resulting in eight children. As points are inserted into the octree, nodes are adaptive subdivided when they contain too many points. Finding the closest pair of points can be accomplished by recursively comparing the octree to itself, and stopping the recursion early at each step when the two nodes being compared are farther apart than the closest pair found so far.

A "kd-tree" (used by venco for the only passing solution to this problem) is more efficient, splitting each node by a single plane, chosen to divide the points along the major axis of their bounding box.

Several people attempted to solve this problem by sorting the points based only on their x coordinate. They then did an N² comparison, but visited the points in sorted order, and broke out of the inner loop early when the difference of their x-coordinate was larger than the closest distance found so far. These solutions actually worked fine for the largest parameter values when the points were spread out far apart. But when the range of coordinates was reduced, and there were several thousand points with each possible x coordinate, these solutions still compared too many points.

By legakis
TopCoder Member