Slow Search
Author Message
Posted : Tue, 21 October 2008 03:15:32
Subject : Slow Search
Hi, I have server with 7TB of storage with ext3. I need a way to speed up my search on the filesystem. Simple find takes ages to search and I cannot use locate because I could not find any option to find files based on filesize with it. Is there any application which can save filesystem index into a database so that query according to the attributes. Or please suggest me a best way to find files based on filesize on such a large storage. Thanks in advance.
PerlCoder
Posted : Sat, 25 October 2008 19:38:04
Subject : Re: Slow Search
I've never used either of these, but maybe they have what you are looking for? http://beagle-project.org/Main_Page http://www.lesbonscomptes.com/recoll/
Egyptian
Posted : Sun, 26 October 2008 09:42:02
Subject : Slow Search
you may need to do some research on the best filesystem to use. there are other filesystems that may be better suited to your usage of the file system. eg. jfs, xfs, reiserfs. i say this coz not all filesystems are created equally. some filesystems are made with the objective of handling large file sizes (greater than 1 gb) etc. hope that helps
Slightcrazed
Posted : Mon, 27 October 2008 13:44:47
Subject : Re: Slow Search
[quote=Egyptian]you may need to do some research on the best filesystem to use. there are other filesystems that may be better suited to your usage of the file system. eg. jfs, xfs, reiserfs. i say this coz not all filesystems are created equally. some filesystems are made with the objective of handling large file sizes (greater than 1 gb) etc. hope that helps[/quote] I doubt that he is in a position to change the used FS, and even if he is, that isn't going to speed up search in any perceivable way. I wrote (but sadly, can't FIND) a script a while back that did find in a multi-threaded way for the same reason. Instead of searching the entire tree with a single thread, it dove into predefined directories and did its job using multiple find processes for each directory, all of which returned results to stdout. The problem with this method is that A - you need a multiprocessor or multi-core system to handle the individual threads efficiently, and B - you better have a storage back-end with some seriously high I/O limits. If, (and that is a big IF) you're system is processor bound when doing a find, then this method will probably net you some speed. If not, then this probably won't do you much good. Otherwise, the suggestion of a search indexer like beagle is valid.... though not nearly as flexible as find.