December 11, 2012

du Know How Big Your Linux Files Are?

Last week we took a look at the df command, which is for getting information on filesystems. A related command is the du command, which tells you how much disk space your files and directories use. I like the du -- disk usage -- command because it is faster and more powerful than clanking around with clumsy graphical file managers. It's familiar to Linux oldtimers, but perhaps not so familiar to newer Linux users. It tells you the size of a single file:

$ du -h grundfos-321.pdf
2.6M grundfos-321.pdf

The -h option tells it to use nice human-readable numbers like 2.6M, instead of 2648. This works for directories, too. This example adds the -s option to display the total size for the directory and everything in it:

$ du -sh z-audacity
307M	z-audacity

If you omit -s then it prints the sizes of all subdirectories, and a grand total. To see the sizes of all files and subdirectories, use -a:

$ du -ah z-audacity
4.0K    z-audacity/chapter-schedule
900K    z-audacity/futurans_1.3-1_all.deb
72K     z-audacity/jack-howto_files/patchbay1socket.png
4.0K    z-audacity/jack-howto_files/book.css

http://upload.wikimedia.org/wikipedia/commons/thumb/d/db/Hard_drive-bottom.jpg/277px-Hard_drive-bottom.jpgOur data grow as fast as our storage media

This particular du incantation is an old favorite for finding the top ten biggest files in a directory. This example finds the ten largest files in my data directory:

$  du -a storage | sort -nr | head -n 10
105367464	.
61717620	./archives
36011036	./archives/Music
31444216	./unsorted-pics
31384836	./unsorted-pics/Pictures
23290472	./archives/Pictures
20403728	./archives/Music/Terry-albums
13239772	./archives/Pictures/2009-4-7
11878452	./camcorder
7805196	./unsorted-pics/Pictures/yarrow-june-201

You can query multiple directories at once:

$ du -sh unsorted-pics receipt_files archives
30G	unsorted-pics
396K	receipt_files
59G	archives

Use wildcards to select subdirectories. The first example only queries the second level directories, the second example queries the second and third level directories, and the third example skips over the second level:

$ du -sh unsorted-pics/*
3.4M	unsorted-pics/albums
34M	unsorted-pics/bratgrrl-pics
1.8M	unsorted-pics/coast-album

$ du -sh unsorted-pics/*/* 8.0K unsorted-pics/albums/index.html 1.8M unsorted-pics/albums/outdoors 264K unsorted-pics/bratgrrl-pics/2eaglessmall.jpg
$ du -sh unsorted-pics/../* 8.0K unsorted-pics/../5-foss-groupware-servers.html 8.0K unsorted-pics/../ardour1.html 2.5M unsorted-pics/../Bow_Violin.pdf

You can query the current directory for particular filetypes:

$ du -h *.jpeg *.png
32K	figure1.jpeg
128K	figure2.jpeg
8.0K	figure3.png

Use the -S option to get the size of a directory, without including its subdirectories:

$ du -Ssh ~
17M	/home/carla

One of my favorite options is -x, which keeps du in the local filesystem. My home directory includes remote mounts and links to directories on other filesystems. Compare the results of the two examples-- the first example tallies up everything in my homedir, including remote mounts and linked directories, while the second example only adds up files on the local filesystem:

$ du -sh ~
1.9T	/home/carla

$ du -sxh ~ 6.8G /home/carla

So, when you have a conglomeration like this, how do you know where your files are actually stored? With the df command, of course. As is the case with so many of these little old Linux commands, there are many more things du can do, and you will find these in man du.