Linux.com

joonas

joonas

  • Linux.com Member
  • Posts: 9
  • Member Since: 08 Aug 10
  • Last Logged In: 05 Jan 11

Latest Posts

Posted by
Topic
Post Preview
Posted
  • joonas
    help please: apply REGEX to multiple lines?
    hi friends! happy new year! i never manage to do a regex includeing multiple lines, does anyone has an idea about that? imagin you have a file a b c and you want to turn it into just b but without useing grep next line/line before it needs to be something like: sed -i "s#a\n(b\n)c\n#$1#g" file (which does not work) thanks a lot
    Link to this post 02 Jan 11

    hi friends! happy new year!
    i never manage to do a regex includeing multiple lines, does anyone has an idea about that?

    imagin you have a file

    a
    b
    c

    and you want to turn it into just

    b

    but without useing grep next line/line before
    it needs to be something like:
    sed -i "s#a\n(b\n)c\n#$1#g" file (which does not work)

    thanks a lot

  • joonas
    Lets draw a lowcost distributed database concept
    Hello! At the latest since i operate a bunch of single cheap dedicated machines which is limiting the growth of each project, i share this common dream of a (lowcost) distributed database Especially the idea of one which come as close as possible to the imagination of one single machine with combined power of especially 100s of disks and of course also 100 of cpu cores & terabytes of RAM to use it like an amazon S3 "flatrate" account, while its single machines are maybe distributed to several locations To name a more conrete subject of discussion: it will already be great if you guys can just name single pieces of Software and tell why its a must to use it or not. At a whole this will very well be able to draw a concept in mind. [b]Examples :[/b] [i] "[b]Apache Hadoop[/b]: obviously a good choice because its proven to work even for giants as facebook and its free&opensource" "[b] Open Solaris [/b] because ZFS ....? " [/i] But the most striking addtional question is: what if iam not able to build my own perfect cluster but instead i just rent a bunch of cheap 100mbit storage servers at one ISP and another few at another ISP at a completely differnt locations and call them all a distrubuted database cluster --- will i be able to configure something like Hadoop to still work stable under those cirumstances? Thanks a lot for reading :woohoo:
    Link to this post 18 Aug 10

    Hello!
    At the latest since i operate a bunch of single cheap dedicated machines which is limiting the growth of each project, i share this common dream of a (lowcost) distributed database

    Especially the idea of one which come as close as possible to the imagination of one single machine with combined power of especially 100s of disks and of course also 100 of cpu cores & terabytes of RAM to use it like an amazon S3 "flatrate" account, while its single machines are maybe distributed to several locations

    To name a more conrete subject of discussion:
    it will already be great if you guys can just name single pieces of Software and tell why its a must to use it or not.
    At a whole this will very well be able to draw a concept in mind.

    Examples :

    "Apache Hadoop: obviously a good choice because its proven to work even for giants as facebook and its free&opensource"
    " Open Solaris because ZFS ....? "

    But the most striking addtional question is:
    what if iam not able to build my own perfect cluster
    but instead i just rent a bunch of cheap 100mbit storage servers at one ISP and another few at another ISP at a completely differnt locations and call them all a distrubuted database cluster --- will i be able to configure something like Hadoop to still work stable under those cirumstances?


    Thanks a lot for reading :woohoo:

  • joonas
    RE: find /var/www/ -type f >> FullFileList.txt
    Hello:) i were now able to complete a filelist with find. now that the filelist is done i more think about: creating the database and maybe to store text/html/css files within the database and also in less soon future ultimatly buying addtional harddrives and do a fast fulltext search...... About your script: first i thought you already had a completed one, but now iam pretty impressed by your offer to help with a script you are making specialy for this issue! it must not be working already (especially since a part of the issue is gone) but i will anyways be curious to look through the code/ideas you already wrote down. thanks, Joonas
    Link to this post 13 Aug 10

    Hello:)
    i were now able to complete a filelist with find.

    now that the filelist is done i more think about:
    creating the database
    and maybe to store text/html/css files within the database and also in less soon future ultimatly buying addtional harddrives and do a fast fulltext search......

    About your script:
    first i thought you already had a completed one, but now iam pretty impressed by your offer to help with a script you are making specialy for this issue! it must not be working already (especially since a part of the issue is gone) but i will anyways be curious to look through the code/ideas you already wrote down.

    thanks, Joonas

  • joonas
    RE: find /var/www/ -type f >> FullFileList.txt
    i still wonder if there isnt a way to make the find more error resistant or just continue with the file at a certain line. it will of course be nice to try you script! :)
    Link to this post 10 Aug 10

    i still wonder if there isnt a way to make the find more error resistant or just continue with the file at a certain line.

    it will of course be nice to try you script! :)

  • joonas
    RE: find /var/www/ -type f >> FullFileList.txt
    [b]mfillpot wrote:[/b] [quote]The failure may have been due to a filesize limitation, have you reviewed the size of the output files? This just came to me, if this indexing is failing because of size then maybe a database solution may be the best bet, it may take a while but it will be easy to search and use.[/quote] no, its stoped between 2 and 3 GB only. there is TB sized image file on the same partition...
    Link to this post 09 Aug 10

    mfillpot wrote:

    The failure may have been due to a filesize limitation, have you reviewed the size of the output files?
    This just came to me, if this indexing is failing because of size then maybe a database solution may be the best bet, it may take a while but it will be easy to search and use.


    no, its stoped between 2 and 3 GB only.
    there is TB sized image file on the same partition...

  • joonas
    RE: find /var/www/ -type f >> FullFileList.txt
    hey [quote]To my knowledge using a pre-built slocate database is the quickest method, what is slowing it down is the file verification in the find statement, unfortunately if you turn that off it will also display the directories in your output file.[/quote] i did not even reach this step because i first have to create the database... to get the files from the database if it was there was no issue, even a customised regex would only take minutes. [size=4][b]trys so far[/b][/size] [code]find /var/www/ -type f >> FullFileList.txt: writing about 5 Million lines ( files only) an hour fails somewhere at about 50million lines, mostly a similar size (maybe readerror, corrupted failname etc)[/code] [code]updatedb --database-root /var/www --output /var/WWWMLOCATEDB[/code] yet one try done only, but only writing 35.000 lines an hour and process failed after 4hours :/ [quote]Does anyone else know of a tool that will fit his needs?[/quote] yes please :) :woohoo:
    Link to this post 09 Aug 10

    hey

    To my knowledge using a pre-built slocate database is the quickest method, what is slowing it down is the file verification in the find statement, unfortunately if you turn that off it will also display the directories in your output file.

    i did not even reach this step because i first have to create the database...
    to get the files from the database if it was there was no issue, even a customised regex would only take minutes.

    [b]trys so far[/b]

    find /var/www/ -type f >> FullFileList.txt:
    writing about 5 Million lines ( files only) an hour

    fails somewhere at about 50million lines, mostly a similar size (maybe readerror, corrupted failname etc)


    updatedb  --database-root /var/www  --output /var/WWWMLOCATEDB

    yet one try done only, but only writing 35.000 lines an hour and process failed after 4hours :/


    Does anyone else know of a tool that will fit his needs?

    yes please :) :woohoo:

  • joonas
    RE: find /var/www/ -type f >> FullFileList.txt
    updatedb seems too slow? running 2 hours it got 75.000 lines only whereas find did 10 Million lines in that time its constantly useing only 20mb of Ram , maybe thats a setting somewhere?
    Link to this post 09 Aug 10

    updatedb seems too slow?
    running 2 hours it got 75.000 lines only whereas find did 10 Million lines in that time
    its constantly useing only 20mb of Ram , maybe thats a setting somewhere?

  • joonas
    RE: find /var/www/ -type f >> FullFileList.txt
    hey, okay right, i did not try this, thank you [code]updatedb --database-root /var/www --output /var/WWWMLOCATEDB[/code] [b]question: will i be able to continue this in case it also fails at a certain point?[/b] it will probablly take until tommrow until i see/report the results the file seems growing slowlyer than it was with find and takeing less resources (maybe its restriced by a setting?) btw - do you guys know any opensource/free searchengine like tool which fulltext indexes the whole folder (100million files, 1million folders, 5000gb) with a high performance and make the search available to website visitors?
    Link to this post 09 Aug 10

    hey,
    okay
    right, i did not try this, thank you

    updatedb  --database-root /var/www  --output /var/WWWMLOCATEDB

    question: will i be able to continue this in case it also fails at a certain point?

    it will probablly take until tommrow until i see/report the results
    the file seems growing slowlyer than it was with find and takeing less resources (maybe its restriced by a setting?)

    btw - do you guys know any opensource/free searchengine like tool which fulltext indexes the whole folder (100million files, 1million folders, 5000gb) with a high performance and make the search available to website visitors?

  • joonas
    find /var/www/ -type f >> FullFileList.txt
    find /var/www/ -type f >> FullFileList.txt i try to get some structure into a poorly sorted collection of 100Million html,jpg,flv,swf,doc,pdf etc files which a spread to a a million of subfolders. find /var/www/ -type f >> FullFileList.txt a whole run would take about 20 hours and fill about 100 Million lines but for some reason its always failing somewhere at 60% and filesize 2GB i tryed it about 10 times in a row filesystem is XFS maybe its just one corrupted filename or read error but its not always exactly the same line were it stops now i wonder if there is a way to continue if not , if there is a whole other method to reach the same listing, which will also be able to continue instead of restart thanks a lot in advance Jonas
    Link to this post 09 Aug 10

    find /var/www/ -type f >> FullFileList.txt

    i try to get some structure into a poorly sorted collection of 100Million html,jpg,flv,swf,doc,pdf etc files which a spread to a a million of subfolders.

    find /var/www/ -type f >> FullFileList.txt

    a whole run would take about 20 hours and fill about 100 Million lines

    but for some reason its always failing somewhere at 60% and filesize 2GB

    i tryed it about 10 times in a row

    filesystem is XFS

    maybe its just one corrupted filename or read error but its not always exactly the same line were it stops

    now i wonder if there is a way to continue

    if not , if there is a whole other method to reach the same listing, which will also be able to continue instead of restart

    thanks a lot in advance
    Jonas

Who we are ?

The Linux Foundation is a non-profit consortium dedicated to the growth of Linux.

More About the foundation...

Frequent Questions

Join / Linux Training / Board