February 24, 2010

HowTo Read a VirtualBox VM disk w/out starting the VM

I happen to use Sun's VirtualBox quite a bit. I happen to be one of only a small handfull of users in a company of over 54,000 employees who runs Linux primarily on his/her workstation. That means I often find myself having to fire up a Windows XP virtual machine to view or edit a Visio diagram, or to prove to someoen that something looks a certain way even in IE. Anyway ...


Sometimes, I just need to move a file out fo the VM and into the host file system (or vice-versa) and have no other need to fire up the VM. So, Instead I mount the .VDI file into my Linux file system and then unmount the image when I'm done. 

I figured some of you might find it useful to be able to do the same, so I'm sharing how to do that with you, now. 


Unlike QEMU and many other VMs, cirtual box wraps a lot of metadata around the FS in their virtual disk image. Since my VM is a Windows XP system running NTFS, I needed to know the magic number for that FS. This could be discovered by googling for it, but you could also do a hex dump of first few bytes of a partition, or even check the magic file (/etc/magic or /usr/share/file/magic in Ubuntu). 

That magic number happens to be 0xeb52904e544653 for NTFS. We want to tell the mount command to mount the files system starting at the offset where it sees that magic number, but the mount command only takes a hex offset. We also want to use a loop device since we're mounting a file, and not an actual disk / device. Fortunately mount also makes this easy. Here's the command I used to mount my NTFS partition from inside of the .vdi file:

sudo mount -o loop,umask=0000,offset=0x$(hd -n 1000000 ~/.VirtualBox/VDI/WindowsXP.vdi | grep "eb 52 90 4e 54 46 53" | cut -c 1-8) ~/.VirtualBox/VDI/WindowsXP.vdi /mnt

The real trick is, of course, in the $(hd -n 1000000 ~/.VirtualBox/VDI/WindowsXP.vdi | grep "eb 52 90 4e 54 46 53" | cut -c 1-8) portion of that. What is that doing?

I'm taking a hex dump of the first 1,000,000 bytes of the .VDI file. Most of that data is VirtualBox's "wrapper". Here's the first 64 bytes:


00000000  3c 3c 3c 20 69 6e 6e 6f  74 65 6b 20 56 69 72 74  |<<< innotek Virt|

00000010  75 61 6c 42 6f 78 20 44  69 73 6b 20 49 6d 61 67  |ualBox Disk Imag|

00000020  65 20 3e 3e 3e 0a 00 00  00 00 00 00 00 00 00 00  |e >>>...........|

00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

Then, I am grepping for that NTFS magic number in the stream of hex output from hd. 



adam@adam-laptop:~$ hd -n 1000000 .VirtualBox/VDI/WinXP.vdi | grep  "eb 52 90 4e 54 46 53" 

00030000  eb 52 90 4e 54 46 53 20  20 20 20 00 02 08 00 00  |.R.NTFS    .....|

So, now we have the offset in the file, 0x00030000. Now, I want to pass just that output to mount from my shell expansion, hence I use the cut command to send back only the first 8 characters of output. 
So after all that, This is what mount sees:
sudo mount -o loop,umask=0000,offset=0x00030000 ~/.VirtualBox/VDI/WindowsXP.vdi /mnt
The only thing left to do, really, is to wrap it all up in a convenient little script. 
if [ -f ~/.VirtualBox/VDI/$1 ]
  MOUNT=`sudo mount -o loop,umask=0000,offset=0x$(hd -n 1000000 ~/.VirtualBox/VDI/$1 | grep "eb 52 90 4e 54 46 53" | cut -c 1-8) ~/.VirtualBox/VDI/$1 /mnt 2>&1`
  if [ "$?" -ne "0" ]
    echo -e "Mount Failed!!!! "
    OUT=`echo $MOUNT | grep "indicates unclean shutdown" 2>&1`
    if [ "$?" -eq "0" ]
      echo "NTFS was not cleanly unmounted"
    sudo losetup -d /dev/loop0
    exit 0
 echo "$1 - files not found"
This way, I can pass the name of the particular VDI file on the command line, since I have several VDI files. One other enhancement might be to aloow specifying other FS types. For example, I have a OpenSolaris VM with a ZFS file system. I haven't made this enhancement yet, but it's simple enough. It just takes the time to collect the magic numbers, and then to add the login into the script. 
Hope someone else finds this usefull ...


Click Here!