Code highlighting is wonderful, but we're having too much of it

There's been some debate about syntax highlighting, about how and even if it's useful at all. This article talks about how we're doing it wrong, and this much older article doubts its usefulness in general. I think the arguments against syntax highlighting are thought provoking and so I've since been forming an opinion of my own.

I admit that before I read any of the articles, I did not really question syntax highlighting. It's just been there for a long time, it certainly looks neat, and since it became some kind of bare minimums standard feature, I even started to associate its absence with inferior or misconfigured editors (in emacs, the absence of syntax highlighting in a buffer is a good sign that you may be missing a much desired major mode for editing that kind of file).

By now though, I do agree that a lot of the common over-abundant syntax highlighting is questionable at best. As Linus Akesson's article demonstrates, reading a sentence where almost every word is colored according to some syntactical class neither makes reading easier, nor does it really add any valuable information. With respect to actual code, I then find myself thinking that "yes, I do know that if is a keyword of the language I'm using right now, thank you very much".

But I do also think that there's a lot of (not necessarily syntax) highlighting that really does add some tremendous value.

Let's take a look at some examples:

Making stuff that I am looking for pop out

When I am looking for pairs of matching things (parens, braces or even start/end tags in XML) it is certainly useful when the editor lights the corresponding element up without any effort on my part. And when I search for expressions, it's most helpful if the editor highlights all found occurrences while the search is active, so that I may visually scan through the file.

One of the most valuable features of modern editors, however, is the ability to highlight all occurrences of the same syntactical element, like a specific variable or function, just by placing the cursor on it. IntelliJ is one editor which does this, and it is often a huge help when reading or modifying code, even more so if it's someone else's code with which I am not fully familiar yet (which I tend to dig in a lot). While there is also the exhaustive "Find Usages" feature which gives you a clickable list of global usages, when working locally a clear visual standout is more useful than any list in some other panel.

This screenshot, however, shows well how IntelliJ's default of subtly underlining the selected variable is drowned out in a sea of over-abundant formatting attributes. Granted, IntelliJ's default is also a little bit too subtle for my tastes, but looking for those underlined "i"s in that lit up mess hurts me more than it helps. And since this is not my code (it's part of jansi, by the way), I would appreciate the help.

Let's turn the symbol usage up for just a little bit and all the other stuff way down:

Much better. I can easily see where "i" gets used. The colorful stuff is now the stuff that matters.

Notice that in this screenshot, the variable "buff" is also still highlighted with a yellowy background. That's because it belongs to another class of stuff that I care about, which is:


What IntelliJ is trying to tell me is that the code may use a StringBuilder instead of a StringBuffer at that point for efficiency. This, like any other warning, error or anything with the string "TODO" in it, is information that I am always interested in. In general, anything that I consider an anomaly should stand out from the rest, because it tells me: "hey, look at this, there's something wrong or unusual here, you should probably do something about it".

Syntax Subtleties

Syntax highlighting itself becomes useful again when the syntax is subtle enough that I welcome further help about it. Take this example of string interpolation in Scala:

Here, highlighting helps reminding me that .name in $ will be interpreted as part of the string (something I most likely did not intend), that I escaped the $ in 40$ correctly ("aha, so it's not \$"), and where the expression in ${...} really ends.

This is different from, say, plain old Java code where I really do know what word is a keyword.

Less is More

In theory, a lot of common syntax highlighting aims to convey useful information that is not always immediately available from the directly surrounding code. IntelliJ for example colors local variables, fields, constants etc. differently. In practice, though, I don't need that information constantly shoved into my face. I often already know what is what, because I am actively working on the code in question. But even if I don't, I'm entirely content to do some small additional action (like command-clicking on the symbol) to find out.

Only for some of the more interesting cases like constants or static variables do I keep some unobtrusive formatting (like cursive script) enabled. With a constant visual overload, on the other hand, I tend to ignore any formatting attributes altogether.

The conclusion is simple: I would not want to live without code highlighting in certain situations. But in order for it to become actually useful, I really need less highlighting than what a lot of the defaults offer.

Fixing xterm-256color with NetBSD

Update: the patch made it into NetBSD 6.1, so if you can upgrade, this fixes the problem.

I am mostly writing this article because I could not really find anything about this problem, despite the fact that it should occur somewhat commonly nowadays.

If you have ever logged into a NetBSD machine using xterm-256color as value for your TERM-variable (this is the default for Mac OS X's, for example), you may have noticed that any program which tries to use colors in the terminal produces garbage among its normal output, essentially rendering every colored terminal application useless.

Things to try this with are vim, emacs, screen and tmux. Depending on your configuration, starting them up is often enough to see that something is wrong, but for extra fun try M-x list-colors-display in emacs.

emacs displaying a world devoid of any colors, corrupted and bland :(


A workaround is fairly easy: set your TERM variable to xterm instead of xterm-256color. You can, for example, do this ad hoc by typing export TERM=xterm, or instruct your terminal emulator to set TERM accordingly on startup, or employ some shell startup script logic. However this is not only kludgy, but of course also disables the use of 256 color capabilities, which at least for emacs would be quite a nice thing to have.


The actual reason is not any incompatibility, nor any problem in NetBSD's terminfo entries, but in fact a bug in NetBSD's libterminfo. The full bug report is here, but the short version is that conditionals in terminfo are not correctly handled on NetBSD. While those are pretty rare, which is probably why this bug has gone unnoticed for so long, they are used in the color escape codes for xterm-like emulators with support for 256 colors, making the problem immediately visible in those.

I included a patch in the aforementioned bug report, so hopefully this will be fixed in one of the next releases. If you don't want to wait until then, you may apply the patch yourself.

And if you are really lazy, you may also download the patched sources to just libterminfo and compile and install them like this:

$ cp /usr/bin/tic .
$ make
$ sudo make install

The first step is needed because the build system somehow wants tic (the terminfo compiler) in the same directory when building like this.

(Keep in mind that I took the libterminfo-sources from NetBSD 6.0-rc2, which may or may not be a good version for you. I also don't know if or how much of the NetBSD sources in /usr/src you actually need to have for the compilation to work.)

Once it is installed, color codes with xterm-256color should work right away!

a happier emacs in a shiny 256 color world.

Assembly Evolution Part 1: Accessing Memory and the strange case of the Intel 4004

While it has become far less relevant for non-system developers to write assembly than it was a few decades ago, by now CPUs have nevertheless made it much more comfortable to do so. Today we are used to a lot of things: fancy indirect addressing modes with scale, a galore of general purpose registers instead of an accumulator and maybe one or two crippled index registers, condition codes for nearly every instruction (on ARM)...

But also the basics themselves have evolved. Let's take a look at what past programmers had to put up with in entirely simple, everyday things. We'll start with the most trivial: writing to memory.

Our goal is to write a single immediate value of 3 into the memory location 5. In light of paging, segmenting and bank switching, we'll use whatever is convenient as a definition for "memory location". Also, we'll let the CPU decide what the word size should be. Since you only need 2 bits to represent a 3, it should fit with every CPUs word size (except for 1 bit CPUs, which actually existed, but that's a story for another posting). If we have the choice, we'll just take the smallest.

We'll work backwards, from the present to the past, to explore the wonders of direct addressing in Intel CPUs. (One precautionary warning though: I only really tested the 4004 code in an emulator, and my habits are highly tainted by current Intel CPUs. So if I made some mistake somewhere, kindly point it out and I'll fix it!)


On a modern x86 CPU, it is of course fairly easy to write the value 3 to memory cell 5. You just do it:

mov byte [5], 3

A single instruction, simple and obvious. I cheated a bit by not using a segment prefix, nor did I set up any segment registers/selectors beforehand. But assuming a nowadays common OS environment in protected mode, you probably don't want to fiddle with those selectors anyway.


The Intel 8085 is somewhat of a direct predecessor to the 8086, the first in the line of the excessively successful x86 processors. Unlike the 8086, there is no instruction which allows us to move an immediate value to a directly specified address. One way to circumvent this simple limitation is to just use the accumulator to hold our value (thanks to gp2000 for pointing out that there actually are load and store instructions that take a direct 16bit address):

STA 0005h

The amount of instructions with a direct addressing mode is very limited, though. Except for branches, you can't do anything else than loading and storing values.

For any kind of arithmetic, though, you aren't forced to load the value from memory into a register, perform your arithmetic instruction and store it back. Virtually all instructions with a register addressing mode (that is, instruction that directly operate on the contents of a register) also allow register indirect memory addressing through a pseudo register called "M".

This pseudo register is in reality just backed by the registers H and L paired together, each 8 bit wide, and accessing it accesses the memory location they point at (you may take a guess which register receives the High byte, and which the Low byte of the address).

So we could also load our byte with:

LXI H, 0005h    ; unlucky syntax, as this actually means HL instead of just H
MVI M, 3


We continue going further back, skipping the 8080 because it wasn't very different in that regard, and arrive at its direct predecessor instead, the Intel 8008. The 8080 and 8085 were source compatible to the 8008 (which, mind you, is not the same as binary compatible... also it may or may not have required some light automated translation), but in the downward direction we have something vital taking from us:
The only instructions that were allowed to contain 14bit (that's the width of its address bus) immediate values at all are jumps and branches. Consequently, we are left with no way to completely specify our destination address in one instruction!

Instead, we have to access H and L, together forming pseudo register M's address, one at a time:

LHI 00h
LLI 05h

By the way, the 8008 had a 7 level deep internal stack, which I will cover another time. Because of that, and because of the lack of any direct addressing except for branches, it seems that memory is really only ever addressed through either the program counter (for fetching instructions and immediate values after them) or through HL. I wonder if the CPU designers took any shortcuts when designing that in hardware, and whether, for example, HL is always on the bus when the CPU is outside of its instruction fetch cycles?


It's hardly possible to go back further than the Intel 4004, at least if you are only considering single chip CPUs (at the time of its conception in the early 70s, there were already famous multi-chip CPUs with comfortable orthogonal instruction sets, notably the PDPs). Indeed, it was the first widely available single chip CPU. This little thing was a 4-bit CPU with some strange quirks, which we will explore further. Overall, it bears little to no resemblance to its successor in name, the Intel 8008 (except for the internal stack).

But let's just look at the code for writing a value of 3 into the memory location at 5 first:

FIM P0, 5; load address 05h into pair R0,R1
SRC P0   ; set address bus to contents of R0,R1
LDM 3    ; load 3 into accumulator
WRM      ; write accumulator content to memory

That looks a bit strange.

As a 4 bit CPU, the 4004 has 4 bit wide registers and addresses 4 bit nibbles as words in memory. It has only one accumulator on which the majority of operations is performed, but sixteen index registers (R0-R15).

Those index registers are handy for accessing memory: Besides loading values directly from ROM, an instruction exists to load data indirectly, which sets the address bus to the ROM cell's content. Another instruction performs an indirect jump instead. Other than that, you can just increment index registers, albeit there is the interesting "ISZ" instruction that not only increments, but also branches if the result is not 0.

Because the 4004 uses 8 bits to address the 4 bit nibbles, every two consecutive index registers form a pair, which is then used for memory references.

Note that I explicitly said ROM above. This is because in the 4004 architecture, ROM and RAM are actually vastly different beasts, at least from the assembly programmer's perspective. You can not directly access RAM. It always involves index register pairs, manually sending their content to the address bus (with a strangely named instruction "SRC", which for some reason spells out send register control) and then issuing another instruction which transfers from or to the accumulator.

Interestingly, accessing regular RAM nibbles is not your only choice among the transfer instructions. You can also fetch from and to I/O ports. But the CPU does not have any direct I/O port, instead they are available on both RAM and ROM! You can also read and write "RAM status characters", which to me look like plain regular RAM cells within another namespace. If someone knows, I'd love to hear what they were used for (and if they maybe did behave differently to normal RAM).

Take a look at the data sheet. Within its only 9 pages, the instruction set is depicted on page 4 and 5. Especially in the light that fairly reasonable orthogonal instruction sets appear to have been available in multi-chip CPUs, this first single-chip CPU is clearly a strange specialization towards the desk calculator it was meant for (the Busicom 141-PF). It has the aforementioned index register-centered RAM access, separate ROM (although there is a transfer instruction which strangely refers to some optional "read/write program memory"), a three level internal stack which is almost useless for general purpose programming and a lone special purpose instruction for "keyboard process" (KBP).

Original 4004 CPUs go from anything from a few to a few thousand dollars on eBay, depending on their packaging and revision. If you'd like to, you can instead play around with a virtual one in this java-script based, fully fledged assembler, disassembler and emulator, or read the rescued source code of the Busicom 141-PF calculator. There's lots more of schematics, data sheets and other resources on the Intel's anniversary project page.

Update 2011/11/7: some things in the 8085 and 8008 sections got mixed up, so I rewrote them. Thanks for pointing it out!

CPU emulation is subtle

Update 2018-08-07: Reverse engineering of the original Game Boy hardware is still very much ongoing, showing some surprising complexity. At Disobey 2018, Joonas Javanainan gave his "Reverse Engineering fine details of Game Boy hardware" talk, with new findings about instruction execution and instruction handling, which also happen to answer some of the questions found below. He found some of the details by exploiting the fact that serial transfers (e.g. the link cable interface) use their own completely independent and externally applied clock, effectively using them as a powerful asynchronous probe, to some very interesting and unexpected results.

Also, while the text below mention's Michael Steil's Ultimate C64 talk, he has since given the Ultimate Game Boy Talk at 33c3 in the same spirit, cramming a great amount of detailed information into the hour and presenting some new findings about the Game Boy's display pipeline as well.

It's amazing how a three decades old and seemingly so simple device, that remains one of the best targets for starting out with emulator projects because of its low surface complexity, still manages to surprise with new, interesting implementation details. A prime example of fractal complexity.

I am currently toying with the idea of writing a Game Boy emulator (in Haskell), to try out some programming techniques which seem to be less common in that area, at least as far as I have looked at other emulators. In this post, however, I will focus less on those techniques and more on the subtleties of emulating a CPU in general. After all, since this is my first foray into essentially implementing a whole CPU (albeit in software), it turns out that there are a lot more peculiarities to find out and take care of than one might think - at least from the perspective of a programmer who usually develops for the hardware.

The Game Boy is in fact a pretty popular system to emulate, owing mostly to the Game Boy's immense popularity at its time and the fact that it has a comparably simple system design. There's mostly just a core CPU, RAM, the game cartridge's ROM and some memory mapped hardware with only a hand full of registers, which makes writing an emulator a pretty painless experience. This in general, and my close attachment sprouting from countless hours spent playing on a Game Boy while growing up in particular, makes it a perfect choice for my experimentation with full hardware emulation.

Taking a first look at the Game Boy's core CPU, emulating it looks pretty straightforward at first. You have some basic processor state (mostly registers), a reasonable amount of instructions to modify said state and some outside interaction, like memory access and interrupts. However, if you want to do it exactly right, that is, if your emulated CPU should ideally be able to fully mirror a Game Boy's behavior for all pieces of existing code, you have to take a second, deeper look at how these things are actually implemented.

As developers have been pushing existing hardware beyond their usually conceived limit, they started to exploit a CPU's behavior on a much lower level than initially visible and documented. CPUs have implementation details and bugs that cause side effects which seem completely invisible to normal code, written according to specification. In a very excellent talk about the C64, Michael Steil presents some of these details of the C64's 6502-based CPU, that are often particular to the specific revision of the chip used in that machine.

Here are some tricky areas I came across while looking at the Game Boy CPU, as well as some concrete examples. (Note that I'm not covering the obvious issues like undocumented opcodes, as it is clear at first glance that a CPU which does not know them won't run code using them.)

Memory Access. At first, this seems straightforward. Instructions may read memory at specific locations, and they may modify it. The outcomes of these operations are usually very well documented. Their implementation details, however, may vary. One such detail is the order in which memory accesses occur. For example, if a single instruction needs to read from two different memory locations to compute its result, which is read first? Exactly when, that is in which cycle during the instructions execution, do these accesses occur? Worst of all, some CPUs perform some spurious memory accesses that have no effect on its outcome at all, usually as the result of a bug in the chip's design. They may read a single memory location twice, or they may even read memory locations that aren't involved in the operation in any way at all.

Why is it important to care about such behavior in an emulator, though? Who really cares if a value is read when it isn't used anyway, and how is access order relevant if the outcome remains the same? The answer is that why it usually doesn't matter for ordinary memory, not everything within the address space necessarily is ordinary memory. Instead, a memory mapped hardware register may exist at that location, in which case the hardware's state may not only change by writing to that register, even a simple read access may trigger this. For example, on the C64's IO chips, called CIA, reading the Interrupt Control and Status Register acknowledges interrupts and clears the flags.

This becomes a problem when existing code tries to work around peculiar behavior in a manner that breaks if the bug itself doesn't happen, and becomes even worse when clever programmers explicitly use it for performance reasons. In these cases, an emulated CPU that doesn't signal those memory accesses simply won't run that code correctly.

Interrupts. Interrupts aren't so simple to begin with, if not just because they disrupt normal control flow. There are already the somewhat obvious issues, like what happens if they occur more or less simultaneous. The Game Boy CPU has a (memory mapped) flag register that indicates which interrupts occurred. Let's assume that it is possible that the other hardware would generate two, by ordinal number, different interrupts during the CPU's execution of one maybe particularly long lasting instruction. Which interrupt handler is executed first, the one which is associated to the interrupt which was signaled slightly earlier, or does the interrupt's ordinal number play a role in deciding this? A more pressing question is which bits are set in the flag register upon entrance of the first chosen interrupt handler, only the one for the interrupt corresponding to the called handler?

Things like this should be documented, but it can be hard to come across a complete description sometimes, especially if bugs or unusual corner cases are involved. To give a more contrived example, I'm not even really sure yet that the CPU does not fetch the would-be-next instruction's opcode (resulting in a memory access) before checking for interrupts and jumping into the handler, throwing away the fetched opcode. Some documentation says "The interrupt will be acknowledged during opcode fetch period of each instruction", which is somewhat ambiguous. While especially this issue seems irrelevant for real-life code, there exists a whole scene (the demo scene) of people who have shown that every single side effect can and possibly will be used to achieve some impressive feats that were otherwise inconceivable. Think about it: cleverly taking side effects within instructions into account, effectively abandoning their atomicity, basically allows you to time actions on a sub-instruction level. Which brings us to...

Timing. If you want to be able to run even the most contrived game code, cycle-accurate timing is very important. Especially between CPU and graphics hardware, where the number of cycles during various phases of display rendering is of particular interest. Since these timings, while often undocumented, are pretty constant in hardware, programmers tend to design code which relies on them very early in the platform's life cycle. As hinted at, a lot of pretty effects are only really possible with cycle-accurate programming, and with a tendency to push game graphics way beyond their initial specification, a game's probability to use more and more egregious techniques which rely on increasingly accurate timing behavior gets higher with its release date.

Superficially, core CPU timing itself seems not so interesting. There are very handy tables out there that specify exactly how many cycles an instruction takes, but this is just the tip of the iceberg. Instructions with a lot of cycles take so long because they usually do so much during their execution. Among internal operations are often also once again some memory accesses with their possible side effects. With instructions like CALL NC, a16 that execute for up to 24 cycles, which is about one tenth of the time it takes to draw one screen line, a hypothetical perfect emulator would have to take even that into account (though, granted, the addresses normally used with this particular instruction are rather unlikely to suffer from memory access side effects).

Timing is hard to get completely right, partly because documentation is often sparse. For example, one thing that I haven't been able to dig up some some useful documentation for, is how many cycles pass between the completed execution of an instruction during which the interrupt occurred and fetching the first opcode of the interrupt handler.

Opting to measure these things, on the other hand, is often complicated and sometimes inaccurate. It also requires you to have means to inject your own code into the real system in the first place, a task which is not too hard on home computers, but for game systems usually requires access to additional hardware which is hard to come by.

Yet it's important to get it exactly right, because the effects of small inaccuracies may result in strange effects (or rather, their failure) whose reason is hard to pinpoint. If you don't exactly know how the hardware is implemented (which is often the case) and can't conclusively find out, you may end up in situations where fixing a timing problem causes various other ones.

A look at existing emulators, by the way, only helps to get an overall idea. All emulators with publicly available source code that I have looked at so far had small inconsistencies, mostly in exactly the places where I wasn't able to find conclusive information online. One emulator, for example, executes interrupt handlers immediately after the instruction during which it happened, without any cycles in between. This seems highly unlikely, however, because the CPU at least needs to push the return address on the stack, requiring one memory access minimum. Another emulator just decides to waste a few cycles before jumping into the handler instead. Putting the desire for a cycle-accurate memory write to the stack aside, this may very well be the right amount, but it's hard to just trust those arbitrary numbers with no information on what's really happening on a real machine to use up those cycles.

So without proper documentation about the implementation rather than just the specifications of the visible interface, you often have to resort to a lot of guesswork. Unfortunately, such documentation is hard to find, if it even exists in any public manner. For the Z80 CPU, on which the Game Boy's core CPU is heavily based on, I felt lucky at first to come by a cycle-exact breakdown of every instruction's operation. But it quickly turns out that the Game Boy's core CPU is a completely different beast that merely bases its instruction set on the Z80 with possibly completely different inner workings. This renders the Z80 tables almost useless, if not for the fact that it may enable more educated guesses. It also further illustrates the difference between specification and actual implementation.

In the end, the most viable solution is probably testing, testing, testing, and if a game fails to run in some aspect, think about which likely implementation would fit with the game's expectations. Over time and over playing countless different ROMs on your emulator, those guesses should converge to something close to the actual implementation of the original device--at least until some obscure game shatters your assumptions with another contrived piece of code.

Unless, of course, you have access to some very deep implementation details, like people interested in emulating a 6502 CPU now have: The people at Visual 6502 have exposed, photographed and meticulously digitalized the CPU, so that its complete internal workings are now publicly available for all, and they even provide a visual transistor simulation of the chip in JavaScript. That, at least, should answer every conceivable question about the CPU's implementation and render virtually all detective work about it useless. But where's the fun in that?

"It's really not that bad!"

Look, people, I don't care if you somehow manage to create feasible, readable code in the horrible programming language of your choice. Fact is, if the language requires you to follow meticulous coding guidelines and have extraordinary discipline, because freely using its features would create a maintenance nightmare, it is a bad language.


They squeezed Text file busy into ETXTBSY and no such process into ESRCH (very obvious). They even left out an S in EACCES and when your Argument list is too long, that condition is mnemonically referred to as E2BIG.

But File name too long is, of course, called ENAMETOOLONG.

Google Treasure Hunt finished

Today was a slow day, so I just decided to finish the last question of the Google Treasure Hunt 2008.

Here's a quick summary of all the questions and a basic outline of how I solved each of them:

1. The Robot

A robot is in the left upper corner of a rectangular grid. It has to reach the goal on the right lower corner of that grid. With every move, the robot can go either down or right to reach the next square of the grid. No other directions are allowed. The question is: how many different paths can the robot take to reach the goal?

How I solved it: The number of further possible paths for each grid follows a pretty simple recursion: for each grid that is not adjoint to the goal, the number of possible paths is 2 (since it can go either down or right) plus the number of possible paths when standing on the right neighbour (which would be the next square if you chose to go right), plus the number of possible paths when standing on the lower neighbour. The base cases are for the squares directly above or left from the goal, there is just one possible path there.

Writing down that recursive function is pretty trivial. But just evaluating it is not enough: the width and height of Google's rectangular grid lies somewhere between 30 and 60, which drives the naive approach infeasible.

One solution is to apply a straightforward approach of dynamic programming: start by computing the number of paths for every square in the rightmost column (which is 1, because you can't go right when being in the last column), then proceed to compute the left neighbouring column, starting with the lowermost square. During each computation, you can immediately use the memorized path numbers of the square's lower and right neighbours without any recursion. This is enough to get the correct answer in a few seconds.

A better approach: I didn't notice it, but the problem can easily be reduced to the binomial coefficient or choose function: since the robot can't ever go back and it always has to reach the opposite corner, every path has the same amount of down moves (D) and right moves (R) and it's sufficient to calculate the number of possible combinations of width-1 R-moves and height-1 D-moves.

Some people apparently chose to do the whole thing in Excel or OpenOffice: every grid square is represented by a sheet cell and each cell contains a formula that calculates the number of paths based on the neighbouring cells, just like I explained above. That's actually a pretty funny way to do some limited functional programming with automatic memoization!

2. The Zip File

You have to download a ZIP archive, which contains several text files in several subdirectories. For each given path patterns (simple ones, the paths must contain a substring and the filenames have to end with a specific extension), find the matching files and take the sum of the numbers in a specific line number. Multiply all sums together, enter the result. Example:

Sum of line 1 for all files with path or name containing BCD and ending in .rtf
Sum of line 5 for all files with path or name containing EFG and ending in .js
Hint: If the requested line does not exist, do not increment the sum.
Multiply all the above sums together and enter the product below.

How I solved it: I don't know the exact command line anymore, because I solved it within my shell, and the history has gone since. But it was mostly find and sed. There may have been some awk in there, too. There's really a million ways to do, and provided you use your shell it's pretty straightforward.

3. The Network

This one was pretty easy. You get a number of hosts (named A,B,C...) and a routing table which contains four entries for every host (including a default gateway). You are given a starting host name and a destination IP and your goal is to give the sequence of hosts that a packet would travel.

I'm not quite sure what they are aiming at. You need to know next to nothing to solve it, and you don't even need to use your computer in any way. In fact, people who never cared about computer science or mathematics at all could easily solve it after a 5 minutes introduction of how a routing table works.

How I solved it: I did it manually, by following the routes and writing down which hops the packet went through. The only part of the computer that I used was the screen. By looking at it.

4. The Primes

This got more interesting again. You have to find the smallest number which can be expressed as the sum of several given amounts of consecutive prime numbers and which is also a prime number itself. Here's an example:

Find the smallest number that can be expressed as
the sum of 17 consecutive prime numbers,
the sum of 43 consecutive prime numbers,
the sum of 57 consecutive prime numbers,
the sum of 569 consecutive prime numbers,
and is itself a prime number.
How I solved it: Because the constraints are about consecutive prime numbers, all you need is contiguous windows over a (theoretically infinite) list of prime numbers. Except for the last constraint that states that the matching number should be a prime number, which is equivalent to the number being member of the list.

With a lazy language, where infinite lists are no problem, you could write a more or less complex function which generates that list of prime numbers, but don't: there are plenty of precomputed lists of prime numbers on the internet. Just read one in and use it, it's much less hassle.

The first thing to do is to find a prime number candidate. I did this by sliding the largest window (whose size is the amount of summed numbers in the last constraint) until a prime number is found. The check for a prime number itself is a simple search over the prime number list.

Once that candidate is found, it's time to check the other constraints, in descending order of their sum size. To do that, you slide a new, smaller window over your prime list, its size again being the number of summands in the current constraint, until its sum matches your candidate. However, you don't have to start the windows at the beginning of the list: quick reasoning tells you that any sum of numbers within or ahead of a bigger window whose number of summands is less than the previous window will be smaller than the sum of that window and thus can't be your candidate. It is sufficient to start at i-n+1, where i is the first index position after the bigger window and n is the size of the current window. Since you are checking your constraints in descending order of the window size, you can always use the upper bound of the previous constraint's window, subtract the current size and add 1.

All constraints have to match, so once the sum of any window gets larger than your candidate, you have to start over and look for a new candidate.

The sliding itself can be optimized a bit: instead of just increasing the indexes and recalculate the full sum, it's enough to simply subtract the first number in the window and add the next number outside of the windows upper bound. The algorithm will be fast enough in any way.

As you can see, I didn't make any assumptions about prime numbers at all. In fact, the whole algorithm works with any arbitrary list of numbers, as long as it's ordered.

This was fun, let's do it again next year. Next time, it might be a good idea to start solving as soon as the questions come out, maybe I'll have the chance to get one of the treasures :)

spreading a single request over multiple access logs impossible with lighttpd?

It seems that lighttpd can't spread one HTTP request over multiple, possibly differently formatted access logs. At least I couldn't find anything in mod_accesslog's documentation or with some quick Google searches.

This is an example of how to specify the format and destination for access logging:

accesslog.filename = "/var/log/lighttpd/access.log"
accesslog.format = "%h %V %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\""

You may notice that there are just the two variables accesslog.filename and accesslog.format, only referencing "the" access log instead of several individual ones, and that the assigned values are strings, not lists (I tried specifying lists, lighttpd doesn't accept them).

Now, spreading different, individual HTTP requests over multiple log files is still possible. For example, you might want to log request for different virtual hosts into different log files, an intention which is quite common. To do this, simply use lighttpd's conditional configuration syntax, something which I like very much about lighttpd, and reassign different values to accesslog.filename (and/or accesslog.format):

$HTTP["host"] == "" {
accesslog.filename = "/var/log/lighttpd/"

But it seems that you can't log the same HTTP requests to multiple destinations. Perhaps you want to have one big access log that contains every request for every virtual host and several specific access logs that each contain requests to an individual virtual host. Or you want to have two access logs which contain exactly the same requests, but with different formats.

I did the latter one under apache, in order to log more information for debugging purposes, without disrupting the scripts that gather statistics from the common access logs. That's how I discovered lighttpd's apparent lack of ability to do so.

One workaround would be to pipe into a script instead of writing directly into a file, which lighttpd supports out of the box:

accesslog.filename = "|/path/to/some/log-script"

But that's adding an additional layer of complexity and you can't easily use lighttpd's conditional configuration to decide on the destination anymore. Additionally, in the above debugging scenario where different log files contain the same requests but with different formats, you'd have to specify the set union of all format options in your lighttpd configuration and parse and reformat inside of your script.

Is there anything I have overlooked or are there any plans for future lighttpd versions?

Internet Explorer 7 caches incomplete downloads sometimes

Internet Explorer 7 shows a strange issue with downloads that haven't finished for various reasons. After one incomplete download attempt, users are unable to properly restart the download in order to fetch the complete file. Subsequent tries seem to succeed immediately, without the truncated file actually being completed or replaced.

Thus, once a download was aborted, the user will have difficulties to download the file at all.

This problem occurs when some isolated issue caused a premature end of the HTTP connection, even though the server would normally be able to serve the file completely. I am not yet sure about the exact conditions, but it happened often enough for users to complain. The access logs are pretty clear about it when it happens: one 200 response followed by several 304 responses for the same IP.

Internet Explorer 7's misbehavior occurs because it sometimes doesn't remove incomplete downloads from its cache (which it usually seems to do). In all cases, IE7 should be able to make a distinction between complete and incomplete downloads, because the server did send a correct Content-Length header, but the amount of transferred and saved data was less than the announced value.

So, what IE7 should do on subsequent retries is either
  1. try to do a partial GET to acquire the rest of the file (this would be the preferred behaviour) or
  2. redownload the file with a normal, unconditional GET.
But what IE7 actually does is this: Because the file whose download attempt is being made is, allegedly, already in the browser's cache, it just sends a conditional GET to the web server. It does this by including an If-Modified-Since header in its request, containing the last modification time of the file.

However, since the file on the server really didn't change (it's just bigger, but file size is not checked), the web server correctly responds with a 304 Not Modified status code and no data. IE7 then proceeds to copy the file out of its cache, even though it's a truncated version from a previously aborted connection.

This happens for all subsequent requests and thus effectively blocks the user from downloading the corresponding file at all. Worse even: the user, which does not see any error messages, is instead just told his retry finished without any apparent issue. This leads him to think that the problem is a truncated file on the server, hence not a correctly retrievable file at all.

To resolve this problem, users have to manually clear their cache. However, it's hard to communicate this to possibly unknown users who (quite reasonably) assume that it's a persistent server problem.

Another workaround would be to tell the server that it shouldn't follow conditional GETs with If-Modified-Since header if MSIE7 is detected. But even if it's even possible at all to configure your web server to do this, you should only do this for the larger "download" content of your website (i.e. podcasts and other large static files), as you would be effectively disabling a part of the browser's cache.

So, if you are suffering some dropped connections for whatever reason, IE7 might make matters worse by turning temporary glitches into semipermanent problems. Let's hope this bug gets fixed in future browser versions. In the meantime, tell users which complain about truncated files to clear their cache.

converting videos for iPod video using ffmpeg

This should work with all platforms where ffmpeg is available, including Linux, Windows, Mac OS X and several BSD flavors.

After experimenting a bit, I finally got the right command lines working for ffmpeg which convert every video (at least every video that ffmpeg supports as input) to iPod video friendly format. Everything seems to work, including forward/backward seeking.

First of all, you need to compile xvid and support into ffmpeg. The corresponding arguments to ffmpeg's configure script are --enable-xvid and --enable-faac. You may need to compile and install faac and xvid first. You also need to specify --enable-gpl, since ffmpeg's configure script won't accept those options otherwise.

If you like to be able to encode videos into h264 format instead of MPEG4/xvid, you might also install x264 libraries and specify --enable-x264 to ffmpeg's configure.

Now that you have everything up and running, you may try to convert a video. For a 4:3 video, use:

ffmpeg -i foo.avi -f mov -b 1800 -maxrate 2500 \
-vcodec xvid -qmin 3 -qmax 5 -s 320x240 \
-acodec aac -ab 128 \

For some reason, I can't get my iPod to honor the aspect ratio settings and to squish a 16:9 video into letterbox format (or zoom into fullscreen, for that matter), although iTunes and Quicktime seem to have no problem doing so. To work this around, just tell ffmpeg to add the black bars on top and bottom itself and substract the height of the black bars from the actual video size:

ffmpeg -i foo.avi -f mov -b 1800 -maxrate 2500 \
-vcodec xvid -qmin 3 -qmax 5 \
-s 320x180 -padtop 30 -padbottom 30 \
-acodec aac -ab 128 \

It may be worth playing with bitrate and quality settings or also trying to use 2-pass-encoding, remember however that at least according to Apple's specification, the iPod with video only supports up to 2500kbps for MPEG4 and up to 768kbps for h264. Another thing that you might want to play with, especially if you plan on outputting your videos on a TV screen (using the optional A/V cable) instead of just on the iPod are the size and padding options.

If you know of any means to make the encoding process faster or to induce better quality, please drop me a comment. And now, enjoy your movies and clips on the go!

Update: If forward/backward skipping doesn't work, update your version of gtkpod.