PlaybyPlay  Photos  Problem Analysis
Problem Set Analysis & Opinion
by vorthys,
TopCoder Member
Thursday, December 4, 2003
RepeatedSubstrings
Used as: Division 1  Level 1:
Consider the compressed text "AB&04", which decompresses
to "ABABABA". How do we know that the last character is an A?
Because it is a copy of the character in position 4. How do we know
that character is an A? Because it is a copy of the character in
position 2, which in turn is a copy of the character in position 0.
We know the character in position 0 is an A because it appears directly in
the compressed
text. Decompression, then, is all about propagating the known characters into
all the positions that depend on them, directly or indirectly. We also
need to detect cyclic dependencies, in which case we can't tell what the
character is, so we write question marks to the relevant positions.
The simplest solution is to propagate known characters to all positions that
depend on them directly, and iterate until there are no more changes. In
pseudocode,
text = string of 256 question marks
loop
current = ""
for each token in compressed do
if token is letter or space then append to current
if token is &ij then
s = substring of text from position i to position j
append s to current
if current == text then return text
text = current
This simple solution is efficient enough for the small strings considered
here, but would be too slow for large texts. In the worst case, it
takes time quadratic in the size of the decompressed text.
A more efficient solution follows each chain of dependencies all the
way to its conclusion.
text = ""
dependsOn = array of 256 integers
for each token in compressed do
if token is letter or space then append to text
if token is &ij then
for each position p from i to j do
append '' to text
dependsOn[current position in text] = p
for each position p in text do
if text[p] == '' then
q = p
while text[q] == '' do // find known character
text[q] = '?'
q = dependsOn[q]
known = text[q]
if known == '?' continue // found cycle, leave as question marks
q = p
while text[q] == '?' do // copy known into positions in chain
text[q] = known
q = dependsOn[q]
return text
This revised algorithm runs in time proportional to the size of the
decompressed text.
Now, for your next challenge, write the corresponding compression
algorithm that always produces the shortest possible output! (Or, to
be more precise, the shortest output that decompresses without any
question marks.) The nice thing about this framework is that the
compression engine is adjustable on the fly. You can choose to trade
off compression speed vs compression quality at your whim without affecting the
decompression engine at all.
MazeImprovement
Used as: Division 1  Level 2:
First, notice that disconnecting a short deadend and reconnecting it
to an adjacent deadend can eliminate up to three short deadends at
once: the short deadend that was disconnected, the deadend that it was
reconnected to (if that deadend happened to also be short), and a
short deadend that happened to be connected to the same intersection
as the original short deadend (if that intersection became a
corridor when the original short deadend was disconnected). However,
this transformation can never create a new short deadend. Therefore,
we can process the candidates rowbyrow in a single pass from north to
south. Within each row, we should proceed from west to east.
The easiest part to get wrong is the tiebreakers for which deadend to
reconnect a short deadend to, in case there is more than one to choose
from.
The preference
is first for other short deadends, then for the northernmost deadend, and
finally for the westernmost deadend.
Putting these preferences together, the order of priority is
 a short deadend to the east,
 a short deadend to the south,
 a deadend to the north,
 a deadend to the west,
 a deadend to the east, and
 a deadend to the south.
There cannot be short deadends to the north or the west because those would
already have been eliminated. A short deadend to the east takes precedence
over a short deadend to the south because it is farther north.
A final point to be careful about is your representation of the connections.
A common approach is to use a threedimensional array of booleans, where
connected[X][Y][D] is true if the node at coordinates
<X,Y> is connected to the node in direction D. However,
connections are twoway. You could handle twoway connections by
storing a separate boolean for each direction, so that, for example,
connected[X][Y][WEST] is duplicated in
connected[X1][Y][EAST]. But then you have to be careful to
update both booleans whenever you disconnect or reconnect a node.
It is very easy to forget one of them.
Alternatively, you could store only a single boolean for each
connection, perhaps only in the north and east directions. Then, for
example, when you wanted to lookup connected[X][Y][WEST], you
would have to lookup connected[X1][Y][EAST] instead. I
usually find this second approach easier, but it does mean you have to
be very careful about how you handle the borders.
SkewHeaps
Used as: Division 1  Level 3:
The key to solving this problem is realizing that the position of the
root in the output list depends only on the sizes of the two
subtrees, not on their shapes. Recall that the root is the
minimum element in the tree. All elements that were inserted
after the minimum element end up on alternating sides of the
tree, whereas elements inserted before the minimum element end
up on the same side of the tree (the larger side). In other words, the
insertion sequence in general has the form
[all X's] M [alternating X's and Y's]
where X's are the elements from the larger subtree, Y's are the
elements from the smaller subtree, and M is the minimum element.
Furthermore, by the rules of insertion,
the alternating X's and Y's must end with an element from the
left subtree. Altogether, there are four possibilities:
1) L...L M RL...RL
2) R...R M L RL...RL
3) R...R M RL...RL
4) L...L M L RL...RL
where L's are the elements from the left subtree and R's are the
elements from the right subtree. In fact, however, cases 3 and 4
are impossible, because the first element inserted after M
will go on the opposite side of the tree from all the elements
inserted before M.
We can determine which case applies by looking at the sizes of the
two subtrees. If the left subtree is at least two elements bigger
than the right
subtree, then we must fall under case 1. If the right subtree is bigger
than the left subtree, then we must fall under case 2. But we have
a choice if the left and right subtrees are the same size, or
if the left subtree is one bigger. If the subtrees are the same size,
then we could fall under either case 1 or case 2:
1) M RL...RL
2) R M L RL...RL
Because M is the overall minimum element, case 1 is lexicographically
smaller, so we choose that case. Similarly, if the left subtree is one
element bigger than the right subtree, then we could fall under either case 1 or
case 2:
1) L M RL...RL
2) M L RL...RL
This time case 2 is lexicographically smaller, so we choose that case.
Finally, we need to figure out the internal orderings of the L's and R's.
These can be obtained by recursive calls to history on the
left subtree and the right subtree, respectively. Altogether, the final
algorithm is
history(tree) is
if tree is empty then return empty sequence
M = the element at the root
LH = history(left subtree) // #LH is size of LH
RH = history(right subtree) // #RH is size of RH
HIST = empty sequence
if #LH == #RH or #LH > #RH+1 then
// Case 1: L...L M RL...RL
append first #LH#RH elements of LH to HIST
append M to HIST
alternate appending elements of RH and remaining elements of LH to HIST
else
// Case 2: R...R M L RL...RL
append first #RH(#LH1) elements of RH to HIST
append M to HIST
alternate appending elements of LH and remaining elements of RH to HIST
return HIST
Notice that there is no need for special logic to handle the situations
where
 both subtrees are empty (falls under case 1),
 the right subtree is empty but the left is not (falls under case 1 if
the left subtree contains at least two elements, or under case 2 if the
left subtree contains exactly one element), or
 the left subtree is empty but the right is not (such trees are
forbidden as input because they are impossible to construct using
only insertions).
Your sense of algorithmic aesthetics may differ from mine, but this is
quite possibly the most elegant new algorithm I've seen all year!
Skew heaps, by the way, are another excellent data structure to keep in
your toolbox. Do not be frightened away by the difficulty of this
problem. Skew heaps themselves are ridiculously easy to implement, and
they perform comparably to ordinary binary heaps (the kind used in heapsort).
In addition, skew heaps allow you to merge two priority
queues in logarithmic time, which ordinary binary heaps do not.
