Linux.com

find /var/www/ -type f >> FullFileList.txt

Link to this post 09 Aug 10

find /var/www/ -type f >> FullFileList.txt

i try to get some structure into a poorly sorted collection of 100Million html,jpg,flv,swf,doc,pdf etc files which a spread to a a million of subfolders.

find /var/www/ -type f >> FullFileList.txt

a whole run would take about 20 hours and fill about 100 Million lines

but for some reason its always failing somewhere at 60% and filesize 2GB

i tryed it about 10 times in a row

filesystem is XFS

maybe its just one corrupted filename or read error but its not always exactly the same line were it stops

now i wonder if there is a way to continue

if not , if there is a whole other method to reach the same listing, which will also be able to continue instead of restart

thanks a lot in advance
Jonas

Link to this post 09 Aug 10

Have you tried updating your slocate database using the "updatedb" command, then doing the search using locate to buypass the direct find function?

And example would be:
for FIL in $(locate /var/www);do if [ -f "$FIL" ];then echo "$FIL">>~/files.txt; fi; done

Link to this post 09 Aug 10

hey,
okay
right, i did not try this, thank you

updatedb  --database-root /var/www  --output /var/WWWMLOCATEDB

question: will i be able to continue this in case it also fails at a certain point?

it will probablly take until tommrow until i see/report the results
the file seems growing slowlyer than it was with find and takeing less resources (maybe its restriced by a setting?)

btw - do you guys know any opensource/free searchengine like tool which fulltext indexes the whole folder (100million files, 1million folders, 5000gb) with a high performance and make the search available to website visitors?

Link to this post 09 Aug 10

updatedb seems too slow?
running 2 hours it got 75.000 lines only whereas find did 10 Million lines in that time
its constantly useing only 20mb of Ram , maybe thats a setting somewhere?

Link to this post 09 Aug 10

I'm sorry, I did not realize the scope of your logging. With the quantity of files you have I do not know of a program or methods that can index the files in a prompter timeline.

To my knowledge using a pre-built slocate database is the quickest method, what is slowing it down is the file verification in the find statement, unfortunately if you turn that off it will also display the directories in your output file.

Does anyone else know of a tool that will fit his needs?

Link to this post 09 Aug 10

hey

To my knowledge using a pre-built slocate database is the quickest method, what is slowing it down is the file verification in the find statement, unfortunately if you turn that off it will also display the directories in your output file.

i did not even reach this step because i first have to create the database...
to get the files from the database if it was there was no issue, even a customised regex would only take minutes.

[b]trys so far[/b]

find /var/www/ -type f >> FullFileList.txt:
writing about 5 Million lines ( files only) an hour

fails somewhere at about 50million lines, mostly a similar size (maybe readerror, corrupted failname etc)


updatedb  --database-root /var/www  --output /var/WWWMLOCATEDB

yet one try done only, but only writing 35.000 lines an hour and process failed after 4hours :/


Does anyone else know of a tool that will fit his needs?

yes please :) :woohoo:

Who we are ?

The Linux Foundation is a non-profit consortium dedicated to the growth of Linux.

More About the foundation...

Frequent Questions

Join / Linux Training / Board