April 18, 2006

Torvalds creates patch for cross-platform virus

Author: Joe Barr

Linus Torvalds has had an opportunity to examine the testing and analysis by Hans-Werner Hilse which we reported on yesterday, and has blessed it as being correct. The reason that the virus is not propagating itself in the latest kernel versions is due to a bug in how GCC handles specific registers in a particular system call. He has coded a patch for the kernel to allow the virus to work on even the latest Linux kernel.

That may sound terribly complex, so let's break it down. A system call is made when an application, in this case, the virus, wants the kernel to perform a task for it: perhaps to read some data, or write it to a file, or so on.

As part of the housekeeping done by an application before such a call, specific registers -- a register is a temporary storage address which can be accessed as fast as possible by the CPU -- are loaded with additional information required to perform whatever task the call is asking for.

If you wanted to move a string of data like "CAPZLOQ TEKNIQ 1.0" from one place in memory to another, you would need to load the address where the string begins in one specific register, the address where you want it moved to in another register, and the number of bytes to move in yet another.

By convention, applications assume that certain registers will not be changed during the call. The reason the virus did not work in the latest kernel is that one register, the ebx register, which the virus expects to remain unchanged, is being overwritten.

The bug, which seems to me is more of a bug in GCC than the kernel, doesn't seem to appear in most code. It takes the rare combination of hand-crafted assembler code and the use of old, now deprecated, system calls to appear. This lends support to the speculation that this virus is not new code at all, in spite of how Kaspersky Lab is trying to use it to drum up new business.

I wrote Torvalds with Hilse's suspicion that the problem is caused by the ftruncate system call, manifested in the erroneous old_mmap function. According to Torvalds:

This is exactly right. "sys_ftruncate()" seems to corrupt %ebx due to a compiler issue. We've had that issue before: the kernel uses some special calling conventions where the system call stack is both the saved register area _and_ the argument area to the system calls.

That speeds up system call entry, since we avoid any unnecessary argument setup costs, but sadly gcc then thinks the callee function owns the argument stack, and can overwrite it. We've had hacks to avoid it in the past, but the ftruncate case has gone unnoticed (see later on why it doesn't matter for any normal apps).

So, for sys_ftruncate(), gcc compiles it to

                movl    4(%esp), %eax   # fd, fd
                xorl    %ecx, %ecx      # length
                movl    8(%esp), %edx   # length, length
                movl    $1, 4(%esp)     #,
                jmp     do_sys_ftruncate        #

where that "movl $1, 4(%esp)" overwrites the original argument stack (the first argument, which is the save-area for %ebx).

Sad, sad. This particular case only happens with "-mregparm=3", which has been around for a long time, but only became default in 2.6.16. Which is probably why Hans-Werner didn't see the problem with older binaries. He just compiled with a different configuration.

Now, the reason normal programs don't care is that glibc saves and restores the %ebx register in the system call path. So if you use the regular C library, you'd never care. The virus has probably been written by hand in assembly, and because it didn't save/restore %ebx, it was hit by the fact that the system call modified it.

(To make it even harder to hit - it probably also only happens with the old "int 0x80" system call mechanism, not with the modern "syscall" entrypoint. Again, you'd probably only see this on old hardware _or_ if you wrote your system call entry routines by hand).

So the virus did a number of strange things to make this show up, but on the other hand the kernel does try to avoid touching user registers, even if we've never really _guaranteed_ that. So the 2.6.16 effect is a mis-feature, even if a _normal_ app would never care. It just happened to bite the infection logic of your virus thing.

Hilse has tested the patch provided by Torvalds as a workaround, and reports:

Indeed, this worked. With a recompiled kernel, everything is running as expected. And yes, it is using the int 0x80 interface from assembly code. As it's viral code, it is trying to avoid any overhead and reuses registers as much as it can (from what I can tell).

Leave it to open source hackers to debug and fix aging viral code so that it works correctly. And shame on the anti-viral industry, Kaspersky Lab in particular, for its attempts to deceive the public by passing off old code as something new.


  • Security
Click Here!