I will be analysing vast amount of network traffic related data shortly, and will pre-process the data in order to analyse it. I have found that R and SPSS are among the most popular tools for statistical analysis. I will also be generating quite a lot of graphs and charts. Therefore, I was wondering what is the basic difference between these two softwares.
I am not asking which one is better, but just wanted to know what are the difference in terms of workflow between the two (besides the fact that SPSS has a GUI). I will be mostly working with scripts in either case anyway so I wanted to know about the other differences.
I have not data for it, but from my experience I can tell you one thing:
SPSS is a lot slower than R. (And with a lot, I really mean a lot)
The magnitude of the difference is probably as big as the one between C++ and R.
For example, I never have to wait longer than a couple of seconds in R. Using SPSS and similar data, I had calculations that took longer than 10 minutes.
As an unrelated side note: In my eyes, in the recent discussion on the speed of R, this point was somehow overlooked (i.e., the comparison with SPSS). Furthermore, I am astonished how this discussion popped up for a while and silently disappeared again.
Here is something that I posted to the R-help mailing list a while back, but I think that it gives a good high level overview of the general difference in R and SPSS:
There are GUIs for R that make it a bit easier to use, but also limit the functionality that can be used that easily. SPSS does have scripting which takes it beyond being a mere bus, but the general phylosophy of SPSS steers people towards the GUI rather than the scripts.
There are some great responses above, but I will try to provide my 2 cents. My department completely relies on SPSS for our work, but in recent months, I have been making a conscious effort to learn R; in part, for some of the reasons itemized above (speed, vast data structures, available packages, etc.)
That said, here are a few things I have picked up along the way:
Unless you have some experience programming, I think creating summary tables in CTABLES destroys any available option in R. To date, I am unaware package that can replicate what can be created using Custom Tables.
SPSS does appear to be slower when scripting, and yes, SPSS syntax is terrible. That said, I have found that scipts in SPSS can always be improved but using the EXECUTE command sparingly.
SPSS and R can interface with each other, although it appears that it's one way (only when using R inside of SPSS, not the other way around). That said, I have found this to be of little use other than if I want to use ggplot2 or for some other advanced data management techniques. (I despise SPSS macros).
I have long felt that "reporting" work created in SPSS is far inferior to other solutions. As mentioned above, if you can leverage LaTex and Sweave, you will be very happy with your efficient workflows.
I have been able to do some advanced analysis by leveraging OMS in SPSS. Almost everything can be routed to a new dataset, but I have found that most SPSS users don't use this functionality. Also, when looking at examples in R, it just feels "easier" than using OMS.
In short, I find myself using SPSS when I can't figure it out quickly in R, but I sincerely have every intention of getting away from SPSS and using R entirely at some point in the near future.
Check out this video why is good to combine SPSS and R...
Link
http://bluemixanalytics.wordpress.com/2014/08/29/7-good-reasons-to-combine-ibm-spss-analytics-and-r/
If you have a compatible copy of R installed, you can connect to it from IBM SPSS Modeler and carry out model building and model scoring using custom R algorithms that can be deployed in IBM SPSS Modeler. You must also have a copy of IBM SPSS Modeler - Essentials for R installed. IBM SPSS Modeler - Essentials for R provides you with tools you need to start developing custom R applications for use with IBM SPSS Modeler.
The initial workflow for SPSS involves justifying writing a big fat cheque. R is freely available.
R has a single language for 'scripting', but don't think of it like that, R is really a programming language with great data manipulation, statistics, and graphics functionality built in. SPSS has 'Syntax', 'Scripts' and is also scriptable in Python.
Another biggie is that SPSS squeezes its data into a spreadsheety table structure. Dealing with other data structures is probably very hard, but comes naturally to R. I wouldn't know where to start handling network graph type data in SPSS, but there's a package to do it for R.
Also with R you can integrate your workflow with your reporting by using Sweave - you write a document with embedded bits of R code that generate plots or tables, run the file through the system and out comes the report as a PDF. Great for when you want to do a weekly report, or you do a body of work and then the boss gives you an updated data set. Re-run, read it over, its done.
But you know, your call...
Well, are you a decent programmer? If you are, then it's worthwhile to learn R. You can do more with your data, both in terms of manipulation and statistical modeling, than you can with SPSS, and your graphs will likely be better too. On the other hand, if you've never really programmed before, or find the idea of spending several months becoming a programmer intimidating, you'll probably get more value out of SPSS. The level of stuff that you can do with R without diving into its power as a full-fledged programming language probably doesn't justify the effort.
There's another option -- collaborate. Do you know someone you can work with on your project (you don't say whether it's academic or industry, but either way...), who knows R well?