Linux.com

Re: Speed of R

Posted by: Anonymous [ip: 87.61.53.234] on October 23, 2008 07:08 PM
Hi (I'm not the one you're quoting)
I've used both and I've seen the issue with large datasets in R ('huge' probably means millions of records).

My experience is that as long as you have enough memory you are not likely to see any problems. But if you run our of memory R fails most disgracefully. When R is running close the memory limit it slows down tremendously.

It's a well known problem that stems from two factors. 1) R needs all its data in memory 2) R doesn't do 'call by reference' which means every time an object is passed to a function a new copy is made in memory. This is further aggravated by the typical coding style of R where you often end up with many levels of nested functions necessitating a new copy at every level. In some instances I had to untangle the whole process to bypass the most memory intensive steps or I had to write my own custom versions to make my analyses fit the available memory.

I know there has been some efforts regarding R's performance with large datasets though I haven't looked into it recently. Spss (at least until version 15) had a lot of trickery to deal with large datasets, I didn't know that about pspp before so I'm going to check it out.

#

Return to PSPP brings an industry standard statistical tool to Linux