JOIN
Get Time

2003 TopCoder Open sponsord by Intel®

Coding Tab Component tab
Overview Schedule Summary Schedule Rules Advancers
Reception Room 1 Room 2 Room 3 Room 4 Final Round Slide Show

Play-by-Play  |  Photos  |  Problem Analysis

Problem Set Analysis & Opinion

by vorthys,
TopCoder Member
Thursday, December 4, 2003

RepeatedSubstrings
Used as: Division 1 - Level 1:

Consider the compressed text "AB&0-4", which decompresses to "ABABABA". How do we know that the last character is an A? Because it is a copy of the character in position 4. How do we know that character is an A? Because it is a copy of the character in position 2, which in turn is a copy of the character in position 0. We know the character in position 0 is an A because it appears directly in the compressed text. Decompression, then, is all about propagating the known characters into all the positions that depend on them, directly or indirectly. We also need to detect cyclic dependencies, in which case we can't tell what the character is, so we write question marks to the relevant positions.

The simplest solution is to propagate known characters to all positions that depend on them directly, and iterate until there are no more changes. In pseudocode,

    text = string of 256 question marks
    loop
       current = ""
       for each token in compressed do
          if token is letter or space then append to current
          if token is &i-j then
             s = substring of text from position i to position j
             append s to current
       if current == text then return text
       text = current

This simple solution is efficient enough for the small strings considered here, but would be too slow for large texts. In the worst case, it takes time quadratic in the size of the decompressed text. A more efficient solution follows each chain of dependencies all the way to its conclusion.

    text = ""
    dependsOn = array of 256 integers
    for each token in compressed do
       if token is letter or space then append to text
       if token is &i-j then
          for each position p from i to j do
             append '-' to text
             dependsOn[current position in text] = p
    for each position p in text do
       if text[p] == '-' then
          q = p
          while text[q] == '-' do  // find known character
             text[q] = '?'
             q = dependsOn[q]
          known = text[q]
          if known == '?' continue // found cycle, leave as question marks
          q = p
          while text[q] == '?' do  // copy known into positions in chain
             text[q] = known
             q = dependsOn[q]
   return text
This revised algorithm runs in time proportional to the size of the decompressed text.

Now, for your next challenge, write the corresponding compression algorithm that always produces the shortest possible output! (Or, to be more precise, the shortest output that decompresses without any question marks.) The nice thing about this framework is that the compression engine is adjustable on the fly. You can choose to trade off compression speed vs compression quality at your whim without affecting the decompression engine at all.

MazeImprovement
Used as: Division 1 - Level 2:

First, notice that disconnecting a short deadend and reconnecting it to an adjacent deadend can eliminate up to three short deadends at once: the short deadend that was disconnected, the deadend that it was reconnected to (if that deadend happened to also be short), and a short deadend that happened to be connected to the same intersection as the original short deadend (if that intersection became a corridor when the original short deadend was disconnected). However, this transformation can never create a new short deadend. Therefore, we can process the candidates row-by-row in a single pass from north to south. Within each row, we should proceed from west to east.

The easiest part to get wrong is the tiebreakers for which deadend to reconnect a short deadend to, in case there is more than one to choose from. The preference is first for other short deadends, then for the northernmost deadend, and finally for the westernmost deadend. Putting these preferences together, the order of priority is

  1. a short deadend to the east,
  2. a short deadend to the south,
  3. a deadend to the north,
  4. a deadend to the west,
  5. a deadend to the east, and
  6. a deadend to the south.
There cannot be short deadends to the north or the west because those would already have been eliminated. A short deadend to the east takes precedence over a short deadend to the south because it is farther north.

A final point to be careful about is your representation of the connections. A common approach is to use a three-dimensional array of booleans, where connected[X][Y][D] is true if the node at coordinates <X,Y> is connected to the node in direction D. However, connections are two-way. You could handle two-way connections by storing a separate boolean for each direction, so that, for example, connected[X][Y][WEST] is duplicated in connected[X-1][Y][EAST]. But then you have to be careful to update both booleans whenever you disconnect or reconnect a node. It is very easy to forget one of them. Alternatively, you could store only a single boolean for each connection, perhaps only in the north and east directions. Then, for example, when you wanted to lookup connected[X][Y][WEST], you would have to lookup connected[X-1][Y][EAST] instead. I usually find this second approach easier, but it does mean you have to be very careful about how you handle the borders.

SkewHeaps
Used as: Division 1 - Level 3:

The key to solving this problem is realizing that the position of the root in the output list depends only on the sizes of the two subtrees, not on their shapes. Recall that the root is the minimum element in the tree. All elements that were inserted after the minimum element end up on alternating sides of the tree, whereas elements inserted before the minimum element end up on the same side of the tree (the larger side). In other words, the insertion sequence in general has the form

   [all X's] M [alternating X's and Y's]
where X's are the elements from the larger subtree, Y's are the elements from the smaller subtree, and M is the minimum element. Furthermore, by the rules of insertion, the alternating X's and Y's must end with an element from the left subtree. Altogether, there are four possibilities:
  1) L...L M RL...RL
  2) R...R M L RL...RL

  3) R...R M RL...RL
  4) L...L M L RL...RL
where L's are the elements from the left subtree and R's are the elements from the right subtree. In fact, however, cases 3 and 4 are impossible, because the first element inserted after M will go on the opposite side of the tree from all the elements inserted before M.

We can determine which case applies by looking at the sizes of the two subtrees. If the left subtree is at least two elements bigger than the right subtree, then we must fall under case 1. If the right subtree is bigger than the left subtree, then we must fall under case 2. But we have a choice if the left and right subtrees are the same size, or if the left subtree is one bigger. If the subtrees are the same size, then we could fall under either case 1 or case 2:

   1) M RL...RL
   2) R M L RL...RL
Because M is the overall minimum element, case 1 is lexicographically smaller, so we choose that case. Similarly, if the left subtree is one element bigger than the right subtree, then we could fall under either case 1 or case 2:
   1) L M RL...RL
   2) M L RL...RL
This time case 2 is lexicographically smaller, so we choose that case.

Finally, we need to figure out the internal orderings of the L's and R's. These can be obtained by recursive calls to history on the left subtree and the right subtree, respectively. Altogether, the final algorithm is

    history(tree) is
       if tree is empty then return empty sequence

       M = the element at the root
       LH = history(left subtree)  // #LH is size of LH
       RH = history(right subtree) // #RH is size of RH
       HIST = empty sequence
       if #LH == #RH or #LH > #RH+1 then
          // Case 1: L...L M RL...RL
          append first #LH-#RH elements of LH to HIST
          append M to HIST
          alternate appending elements of RH and remaining elements of LH to HIST
       else
          // Case 2: R...R M L RL...RL
          append first #RH-(#LH-1) elements of RH to HIST
          append M to HIST
          alternate appending elements of LH and remaining elements of RH to HIST
       return HIST
Notice that there is no need for special logic to handle the situations where
  • both subtrees are empty (falls under case 1),
  • the right subtree is empty but the left is not (falls under case 1 if the left subtree contains at least two elements, or under case 2 if the left subtree contains exactly one element), or
  • the left subtree is empty but the right is not (such trees are forbidden as input because they are impossible to construct using only insertions).

Your sense of algorithmic aesthetics may differ from mine, but this is quite possibly the most elegant new algorithm I've seen all year!

Skew heaps, by the way, are another excellent data structure to keep in your toolbox. Do not be frightened away by the difficulty of this problem. Skew heaps themselves are ridiculously easy to implement, and they perform comparably to ordinary binary heaps (the kind used in heapsort). In addition, skew heaps allow you to merge two priority queues in logarithmic time, which ordinary binary heaps do not.