Google Treasure Hunt finished

Today was a slow day, so I just decided to finish the last question of the Google Treasure Hunt 2008.

Here's a quick summary of all the questions and a basic outline of how I solved each of them:

1. The Robot

A robot is in the left upper corner of a rectangular grid. It has to reach the goal on the right lower corner of that grid. With every move, the robot can go either down or right to reach the next square of the grid. No other directions are allowed. The question is: how many different paths can the robot take to reach the goal?

How I solved it: The number of further possible paths for each grid follows a pretty simple recursion: for each grid that is not adjoint to the goal, the number of possible paths is 2 (since it can go either down or right) plus the number of possible paths when standing on the right neighbour (which would be the next square if you chose to go right), plus the number of possible paths when standing on the lower neighbour. The base cases are for the squares directly above or left from the goal, there is just one possible path there.

Writing down that recursive function is pretty trivial. But just evaluating it is not enough: the width and height of Google's rectangular grid lies somewhere between 30 and 60, which drives the naive approach infeasible.

One solution is to apply a straightforward approach of dynamic programming: start by computing the number of paths for every square in the rightmost column (which is 1, because you can't go right when being in the last column), then proceed to compute the left neighbouring column, starting with the lowermost square. During each computation, you can immediately use the memorized path numbers of the square's lower and right neighbours without any recursion. This is enough to get the correct answer in a few seconds.

A better approach: I didn't notice it, but the problem can easily be reduced to the binomial coefficient or choose function: since the robot can't ever go back and it always has to reach the opposite corner, every path has the same amount of down moves (D) and right moves (R) and it's sufficient to calculate the number of possible combinations of width-1 R-moves and height-1 D-moves.

Some people apparently chose to do the whole thing in Excel or OpenOffice: every grid square is represented by a sheet cell and each cell contains a formula that calculates the number of paths based on the neighbouring cells, just like I explained above. That's actually a pretty funny way to do some limited functional programming with automatic memoization!


2. The Zip File

You have to download a ZIP archive, which contains several text files in several subdirectories. For each given path patterns (simple ones, the paths must contain a substring and the filenames have to end with a specific extension), find the matching files and take the sum of the numbers in a specific line number. Multiply all sums together, enter the result. Example:

Sum of line 1 for all files with path or name containing BCD and ending in .rtf
Sum of line 5 for all files with path or name containing EFG and ending in .js
Hint: If the requested line does not exist, do not increment the sum.
Multiply all the above sums together and enter the product below.


How I solved it: I don't know the exact command line anymore, because I solved it within my shell, and the history has gone since. But it was mostly find and sed. There may have been some awk in there, too. There's really a million ways to do, and provided you use your shell it's pretty straightforward.

3. The Network


This one was pretty easy. You get a number of hosts (named A,B,C...) and a routing table which contains four entries for every host (including a default gateway). You are given a starting host name and a destination IP and your goal is to give the sequence of hosts that a packet would travel.

I'm not quite sure what they are aiming at. You need to know next to nothing to solve it, and you don't even need to use your computer in any way. In fact, people who never cared about computer science or mathematics at all could easily solve it after a 5 minutes introduction of how a routing table works.

How I solved it: I did it manually, by following the routes and writing down which hops the packet went through. The only part of the computer that I used was the screen. By looking at it.

4. The Primes

This got more interesting again. You have to find the smallest number which can be expressed as the sum of several given amounts of consecutive prime numbers and which is also a prime number itself. Here's an example:

Find the smallest number that can be expressed as
the sum of 17 consecutive prime numbers,
the sum of 43 consecutive prime numbers,
the sum of 57 consecutive prime numbers,
the sum of 569 consecutive prime numbers,
and is itself a prime number.
How I solved it: Because the constraints are about consecutive prime numbers, all you need is contiguous windows over a (theoretically infinite) list of prime numbers. Except for the last constraint that states that the matching number should be a prime number, which is equivalent to the number being member of the list.

With a lazy language, where infinite lists are no problem, you could write a more or less complex function which generates that list of prime numbers, but don't: there are plenty of precomputed lists of prime numbers on the internet. Just read one in and use it, it's much less hassle.

The first thing to do is to find a prime number candidate. I did this by sliding the largest window (whose size is the amount of summed numbers in the last constraint) until a prime number is found. The check for a prime number itself is a simple search over the prime number list.

Once that candidate is found, it's time to check the other constraints, in descending order of their sum size. To do that, you slide a new, smaller window over your prime list, its size again being the number of summands in the current constraint, until its sum matches your candidate. However, you don't have to start the windows at the beginning of the list: quick reasoning tells you that any sum of numbers within or ahead of a bigger window whose number of summands is less than the previous window will be smaller than the sum of that window and thus can't be your candidate. It is sufficient to start at i-n+1, where i is the first index position after the bigger window and n is the size of the current window. Since you are checking your constraints in descending order of the window size, you can always use the upper bound of the previous constraint's window, subtract the current size and add 1.

All constraints have to match, so once the sum of any window gets larger than your candidate, you have to start over and look for a new candidate.

The sliding itself can be optimized a bit: instead of just increasing the indexes and recalculate the full sum, it's enough to simply subtract the first number in the window and add the next number outside of the windows upper bound. The algorithm will be fast enough in any way.

As you can see, I didn't make any assumptions about prime numbers at all. In fact, the whole algorithm works with any arbitrary list of numbers, as long as it's ordered.


This was fun, let's do it again next year. Next time, it might be a good idea to start solving as soon as the questions come out, maybe I'll have the chance to get one of the treasures :)

No comments: