Saturday, July 31, 2004 Match summary This match had a few surprises. Seven coders managed to get all three problems correct, which is not surprising. But of the top nine rated coders entering the match only one, Yarin, managed to make it into that group. The second through seventh place finishers were all reds with their circles less than halffilled or yellows! Quite a few rating points moved between the top finishers in this match. Yarin, with the help of 50 challenge points, barely slipped past gepa by only 4 points for the win. Coming in a comparatively distant third (actually only 128 behind) was gawry a yellow! One more performance like that and he won't be yellow anymore. gawry's +200 point rating gain also doesn't hurt Poland's country rating. gepa's 140 point gain helps out Germany's country rating, while Yarin continues to spearhead team Sweden, and his 85 point rating gain to 3141 propels him into fourth place in the current top coders in algorithm competition list. Congratulations Yarin. (Wanna join team "Turks and Caios Islands"?) Division II was won impressively by _(China) with a 1508 point debut match (including 4 successful challenges for 200 points) to give an initial rating of 1800. therealmoose(USA) takes second (which I can tell you has nothing to do with paying the writer $1000 a few days ago for a laptop! Except maybe that giving me money brings good TopCoding luck, I suggest everyone try it! I understand the larger the sum, the more effective it is). FunkFreaker(Finland) takes third in another impressive debut performance which yields an initial rating of 1659. Four divisionII coders managed to get all three problems correct, with drexelmcs01(USA) taking fourth. This match had one of the highest similarities between DivI and DivII problems seen recently. Two problems were shared between the divisions: HexagonalSums and SpamDetector (as 600/250 in DivI and 1100/500 in DivII). Another similarity was the DivI 1000 point problem, LongPipes, and the shared HexagonalSums problem which were both knapsack problem variations. But the two problems were very different requiring completely different techniques. LongPipes shared more similarity with Yarin's "Knapsack" the now legendary 01 knapsack problem in SRM140, which no one was able to solve in time. But they too were completely different problems requiring different solutions. Even Yarin wrote a fresh solution to LongPipes from scratch during this SRM without referring back to his old problem. I will have more to say about the differences between these two problems below in the problem writeup. And unfortunately last minute simplifications the SpamDectector left it being pretty much identical to the CheatCodes problem of SRM 154, that was a mistake I was unaware of. Average was a deliberate reuse of an old problem. So this SRM doesn't win any prizes for originality. Of all the NPcomplete problems, the knapsack problem is clearly my favorite. So simple to state, so hard to solve. And variants are popular as SRM problems, although usually they are small enough to use direct brute force, or constrained in such a way so that the pseudopolynomial time knapsack algorithm (brought to you by dynamic programming) works in reasonable time. The ProblemsAverageUsed as: Division Two  Level One:
There are two passes involved in this problem. On the first pass you calculate the average. On the second pass you compare to see which scores are below the average. The only trick is to do the comparison properly so that rounding or floating point equality issues don't arise. The easiest way to do this is to multiply all the scores by the number of scores. Now all the averages will come out perfect integers. This works because a<b implies a*n<b*n for n>0 and a*n+b*n for integers a and b is always exactly divisible by n. int average, total = 0 , c = 0 , n = math.length ; for(i=0;i<n;i++) total += (math[i] + verbal[i]) * n ; average = total / n ; for(i=0;i<n;i++) if ( (math[i] + verbal[i]) * n < average ) c++ ; return c ;SpamDetector Used as: Division Two  Level Two: Used as: Division One  Level One:
In SpamDetector my goal was to write a fairly straightforward string manipulation problem that could not be solved by just calling string library routines. I wanted some kind of loop actually twiddling characters. I wasn't successful in this as the least string intensive, and most library intensive, solution to this problem is probably the one used by Enogipe, who completed the solution the fastest. After converting to lower case, take the "keywords" and transform them by sticking in a bunch of '+' characters into something like "k+e+w+o+r+d+s+". This can now be used as a pattern in the Java regular expression matching method of the String class. Although Enogpipe reasonably used a loop and StringBuffer to build the pattern, a complete master of inappropriate(?) library overuse might have done: for(c=i=0;i<subjectLine.split(" +").length;i++) for(j=0;j<keywords.length;j++) if(subjectLine.toLowerCase().split(" +")[i].matches( keywords[i].toLowerCase().replaceAll(".","$0+"))) { c++; break;} return c ; But that borders on Perl code. One must note that the two loops here must be nested in the given order, reversing them does not work, and the "break" out of the inner loop is also necessary. There is also a subtle issue here. The regular expression matcher only works here because of the constraint that keywords can not have more than two identical characters in a row. Otherwise a pattern of 24 "a+"'s followed by a "b" matching against a string of 50 "a"'s will time out as the matcher backtracks through a whole lot of different possible match prefixes. A really smart matcher based on simulating a nondeterministic finite state machine will work, but not one that translates the pattern into a deterministic finite state machine. The Java matcher times out. The work around here is to remove the intermediate '+' characters between identical letters. Using replaceAll("(.)\\1*","$0+") instead works too. If you understand that without referring to the Java API then you.matches("geek") in a good way. This "no triple letters" constraint was included because this was supposed to be an easy problem in which you should not have to worry about (or even consider the possibility of) library routines timing out. There are many other ways to do the matching for this problem. One straightforward (pun intended) method is to start two pointers one at the beginning of the pattern and one at the beginning of the subject word. If the two characters match, advance both pointers, else if the subject word character matches the previous pattern character then advance the subject word pointer, else no match. If you make it to the end of both strings you have a match. The nice thing about this solution, which uses only elementary operations, is that each match clearly runs in linear time in every case. The solution I put in the DivI practice room as "writer" does it yet another completely different way, which takes advantage of the no triple letters constraint. HexagonalSumsUsed as: Division Two  Level Three: Used as: Division One  Level Two:
When I saw mention of Legendre's proof of all integers above 1719 being expressible as the sum of four hexagonal numbers, I could not help but think "this obscure, useless bit of number theory trivia is just crying out to be built into SRM problem." Plus it is really a knapsack problem with the constraint that no solution has more than 6 elements and a guarantee that a solution always exists. It is different than most knapsack problems we see in SRMs in that the number of items to choose from is quite high (there are 707 hexagonal numbers less than a million and you have to consider using each hexagonal number up to 5 times) but the constraints make the problem very doable. The first step is to generate the hexagonal numbers. The very first google hit on "hexagonal numbers" leads you to a page where Legendre's proof is mentioned and gives the formula for the nth hexagonal number as n*(n21), using 1based indexing. Otherwise it is pretty easy to see that hex(0)=1 and hex(n)=hex(n1)+4n+1 when using 0based indexing and that is sufficient to quickly generate a table of as many hexagonal numbers as you need. Or, since it is a two dimensional figure, it is reasonable to expect that the number of points is a quadratic function. Knowing that one can generate Legendre polynomials from 3 data points to directly give the formula. Which is a bit of overkill, but it does give me an excuse for mentioning Legendre a couple more times. Many ways exist to attack this problem. If you don't know about all the range limits mentioned in the problem (those hints were there for a reason) but had an idea that the answers were always small, then you might try something like a progressive deepening search. Is it a equal to a hexagonal number, is it equal to the sum of two hexagonal numbers, is it equal to the sum of three hexagonal numbers, etc. It turns out this works just fine. But if you started out by checking all combinations of 6 or fewer hexagonal numbers, you are going to have problems because there are a lot of hexagonal numbers less than a million. The pseudopolynomial time knapsack algorithm is even just a bit too slow for this problem because there are so many hexagonal numbers, and each hexagonal number might be used multiple times in a sum. When you are looking for combinations of a most N items, it is more efficient to use N1 nested loops and some kind of hash table or tree look up for the Nth item, instead of using N nested loops (or recursive calls). This one level of optimization is quite sufficient for this problem. And if you know that a particular sum exists and has at most N terms once you finish searching for N1 terms you are done. There is no reason to continue on to verify that there is indeed an N term solution. Since I gave so many hints, this is the sort of solution I expected to people to write if you included all the optimizations: This code is actually O(sqrt(n) log n), wow! Because for large sums you never get past the if (sum>146858) return 3 ; You could be much sloppier and still be fast enough. For an even faster, but somewhat silly, solution, see the "writer" solution in the practice room for DivI. hex[0]=n=1; while(hex[n1]<=sum ) { if ( hex[n1]==sum ) return 1 ; hex[n]=hex[n1]+4*n+1; n++ ; } if ( search (sum , 2, 0) ) return 2 ; if ( sum > 146858) return 3 ; if ( search (sum, 3, 0) ) return 3 ; if ( sum > 130 ) return 4 ; if ( search (sum, 4, 0) ) return 4 ; if ( sum > 26 ) return 5 ; if ( search (sum, 5, 0) ) return 5 ; return 6; boolean search ( int goal , int terms , int smallest) { if ( goal < 0 ) return false ; if ( terms > 2 ) for ( i = smallest ; hex[i] <= goal / terms ; i ++ ) if ( search ( goalhex[i] , terms1 , i ) ) return true ; else for ( i = 0 ; hex[i] <= goal / terms ; i ++ ) if ( Arrays.binarySearch (hex, goalhex[i]) >= 0 ) return true ; return false ;LongPipes Used as: Division One  Level Three:
LongPipes is a plain vanilla knapsack problem. The range of the values is to large too use the pseudopolynomial algorithm. The number of items is too big to enumerate all the possible combinations. We know it is an NPcomplete problem so there is not going to be a polynomial time algorithm that we can use. We must somehow reduce the exponent, 38, down to a more reasonable value, such as 19. This can be done with a standard algorithmic technique, divideandconquer. Divideandconquer is basic idea behind many popular algorithms and metaalgorithms such as dynamic programming. Can you somehow divide the problem into pieces, do some separate processing on the pieces, and then combine the results in a way that is less work than attacking the whole problem directly? For many problems you can do this recursively and the speed up is dramatic. For the knapsack problem it is pretty much only possible to apply this technique once, for one big speedup. The trick, oh excuse me, I mean algorithmic technique, is to divide the items into two groups. Then calculate the all the combinations of items from the first group. We need to know the sum of the lengths of the pipe segments in the combination, let's call this the "SUM," and the number of pipe segments in the combination, let's call this "NUM." Store the SUM and the NUM of each combination in a convenient data structure, such as a hash table or balanced tree, using the SUM as the key and NUM as the value. When you store each combination's info in the data structure, first check if there is already a combination with the same SUM in the data structure. If so keep only one of them, one with the smaller NUM. This data structure will have at most a half a million entries. Be sure to include the empty combination with NUM=0 and SUM=0. Now we move to the second group of items. Calculate the SUM and NUM of all of the combinations of items from the second group (again being sure to include the empty combination SUM=0, NUM=0). For each combination, calculate how much more you need (desiredLength  SUM). If this number is positive, do a lookup in the data structure using this positive number as the key. If there is an element in the data structure with that SUM then we have found a combination of the correct SUM which may be partly in the first group and partly in the second group. Save NUM + the NUM from the data structure if it is the minimum found so far. Once you have gone through all the combinations of items in the second group, return the best NUM1 (remember we are asked for the number of welds not the number of segments) or 1 if nothing was found. In the following code, load(), processes the first group into the HashMap, map. Then search() generates the combinations of the second group and queries the HashMap to find exact fits. An optimized combination generator is shown in both load() and search(). The size of the hashtable is limited to 2^18 as a performance tuning measure. int minumumWelds( String [] segments, String desiredLength) ; { int n = segments.length; int n1 = n / 2 ; if ( n1 > 18 ) n1 = 18 ; int n2 = n  n1 ; long len = Long.parseLong(desiredLength.replaceAll("\\.","")) ; HashMap map ; load(segments,n1); search(segments,len,n1,n2) ; if ( best < 1000 ) return best1 ; return 1 ; } void load(String[] segs , int sz ) { long [] seg = new long [sz] ; for(i=0;i<sz;i++) seg[i]=Long.parseLong(segs[i].replaceAll("\\.","")); int n1 = sz/2 ; n2 = szn1 ; e1=1<l<n1 ; e2=1<<n2 ; long[]a1=new long[e1] ; long[]a2=new long[e2] ; int []c1=new int [e1] ; int []c2=new int [e2] ; for(i=0;i<e1;i++) for(b=1,j=0;j<n1;j++,b+=b) if((b&i)!=0) { a1[i]+=seg[j] ; c1[i]++; } for(i=0;i<e2;i++) for(b=1,j=0;j<n2;j++,b+=b) if((b&i)!=0) { a2[i]+=seg[j+n1] ; c2[i]++; } for(i1=0;i1<e1;i1++) for(i2=0;i2<e2;i2++) { int num = c1[i1]+c2[i2] ; Long SUM = new Long (a1[i1]+a2[i2] ) ; Long NUM = new Integer (num) ; Integer val = map.get(SUM) ; if ( val == null  val.integerValue()>num ) map.put(SUM,NUM) ; } } void search load(String[] segs , long len, int off, int sz ) { long [] seg = new long [sz] ; for(i=0;i<sz;i++) seg[i]=Long.parseLong(segs[i+off].replaceAll("\\.","")); int n1 = sz/2 ; n2 = szn1 ; e1=1<<n1 ; e2=1<<n2 ; long[]a1=new long[e1] ; long[]a2=new long[e2] ; int []c1=new int [e1] ; int []c2=new int [e2] ; for(i=0;i<e1;i++) for(b=1,j=0;j<n1;j++,b+=b) if((b&i)!=0) { a1[i]+=seg[j] ; c1[i]++; } for(i=0;i<e2;i++) for(b=1,j=0;j<n2;j++,b+=b) if((b&i)!=0) { a2[i]+=seg[j+n1] ; c2[i]++; } for(i1=0;i1<e1;i1++) for(i2=0;i2<e2;i2++) { int num = c1[i1]+c2[i2] ; Long SUM = new Long (lena1[i1]a2[i2] ) ; Integer val = map.get(SUM) ; if ( val != null  val.integerValue()+num<best ) best = val.integerValue()+num ; } } Possible Optimizations:
Now about the similarity of this problem to Yarin's Knapsack problem from SRM 140. Some coders expressed the opinion that the two problems were in fact the same, or exactly the same algorithm would solve them both. This is not the case and a better understanding of the different types of knapsack problem should clear this up.
Now all of these problems benefit from the divideandconquer technique. In all the algorithms except the 01 knapsack problem, the items are divided into two groups. All of the combinations of the items in one group are calculated and stored. Then as each combination from the second group is generated a single query is made into the data structure holding the results from the first group. The inner loop processing for these problems is trivial. In the algorithm for the 01 knapsack problem you first divide the items into two groups, then all the combinations of each group are stored in two data structures, possibly just arrays. The two arrays are then sorted and a merging process is used to find the maximum combination subject to the size constraint. The process is much more complex than making a single query into an indexed data structure. I don't know that the time complexity of this step is, but I strongly suspect that it is worse than n log n, where n is the size of the arrays. So even though both problems use "divide in half" and "generate all the combinations of each half" stages, the rest of the algorithms are quite different. The LongPipes algorithm is much easier and faster and thus allows a maximum problem size of 38. Knapsack allowed a maximum problem size of 34. So if you had a working Knapsack program you probably could not use it to solve LongPipes problems of size 38 without timing out. So a working solution for LongPipes must be different than a solution for Knapsack. 
