Internet Explorer 7 caches incomplete downloads sometimes

Internet Explorer 7 shows a strange issue with downloads that haven't finished for various reasons. After one incomplete download attempt, users are unable to properly restart the download in order to fetch the complete file. Subsequent tries seem to succeed immediately, without the truncated file actually being completed or replaced.

Thus, once a download was aborted, the user will have difficulties to download the file at all.

This problem occurs when some isolated issue caused a premature end of the HTTP connection, even though the server would normally be able to serve the file completely. I am not yet sure about the exact conditions, but it happened often enough for users to complain. The access logs are pretty clear about it when it happens: one 200 response followed by several 304 responses for the same IP.

Internet Explorer 7's misbehavior occurs because it sometimes doesn't remove incomplete downloads from its cache (which it usually seems to do). In all cases, IE7 should be able to make a distinction between complete and incomplete downloads, because the server did send a correct Content-Length header, but the amount of transferred and saved data was less than the announced value.

So, what IE7 should do on subsequent retries is either
  1. try to do a partial GET to acquire the rest of the file (this would be the preferred behaviour) or
  2. redownload the file with a normal, unconditional GET.
But what IE7 actually does is this: Because the file whose download attempt is being made is, allegedly, already in the browser's cache, it just sends a conditional GET to the web server. It does this by including an If-Modified-Since header in its request, containing the last modification time of the file.

However, since the file on the server really didn't change (it's just bigger, but file size is not checked), the web server correctly responds with a 304 Not Modified status code and no data. IE7 then proceeds to copy the file out of its cache, even though it's a truncated version from a previously aborted connection.

This happens for all subsequent requests and thus effectively blocks the user from downloading the corresponding file at all. Worse even: the user, which does not see any error messages, is instead just told his retry finished without any apparent issue. This leads him to think that the problem is a truncated file on the server, hence not a correctly retrievable file at all.

To resolve this problem, users have to manually clear their cache. However, it's hard to communicate this to possibly unknown users who (quite reasonably) assume that it's a persistent server problem.

Another workaround would be to tell the server that it shouldn't follow conditional GETs with If-Modified-Since header if MSIE7 is detected. But even if it's even possible at all to configure your web server to do this, you should only do this for the larger "download" content of your website (i.e. podcasts and other large static files), as you would be effectively disabling a part of the browser's cache.

So, if you are suffering some dropped connections for whatever reason, IE7 might make matters worse by turning temporary glitches into semipermanent problems. Let's hope this bug gets fixed in future browser versions. In the meantime, tell users which complain about truncated files to clear their cache.

No comments: