SPSS is a nice statistics/analytics package that, since 1968 (!), seems to have been well-regarded program for classic statistics. It now has many new bits and pieces that target the predictive modelling market too.
In my experience it was previously mostly used in academia, especially the social sciences, but these days it seems it has made inroads into business and government data too.But it’s not yet common enough that we all have a copy sadly, and hence I faced a challenge of how to use a large data file in SPSS format within the other tools at our disposal.
Importing the data straight was not a big issue. Both Alteryx, and Tableau can read the files in natively which is great (this is a new feature in the latest release of Tableau – choose “Statistical file” in the “Connect” data box.)
However, they import the data as-is, and don’t seem to be able to access all the special metadata of an SPSS file all that well. This is very fair – but quite annoying if we are talking about 300,000 questionnaire responses with 400 fields somebody else designed, all helpfully encoded and code-booked as per best data practice 🙂
My first issue was that SPSS has the concept of:
- variable labels: e.g. a field called “QUESTION13PARTB” might have the label “What is your gender?”.
- factors (a.k.a. value labels): meaning the answers to that question might actually be saved in the data as 0 or 1, they actually represent female” or “male”.
My goal was to undo the variable labels and value factors so that instead of:
“QUESTION13PARTB:0” I could get “What is your gender?: Female”.
Alteryx default settings can get you part of the way there very easily. When used on an SPSS (*.SAV) file the “Input Data” tool has an option to “expand factors”. If not ticked you’ll get 0 or 1 in the above example, if ticked you get female or male. Super easy.
But I could not find a way to extract the variable labels, i.e. translate QUESTION13PARTB into “What is your gender?”.
Upon use you get the following options, which you can see pretty perfectly coincide with the task at hand.
The concept of SPSS Variable Labels is pretty much the same as Alteryx field descriptions, so the second tickbox there is basically recreating the original SPSS setup but in Alteryx. You will see the field descriptions in Alteryx tools like the select tool in that case – or you have the first option which literally replaces the coded field names with their SPSS descriptions if you prefer.
The gallery does warn that this tool is not super fast. I would agree, the default Alteryx input tool is much faster – but even with a recordset in the hundreds of thousands with a with of a few hundred columns I only had to remain patient for about 5 minutes before it was done.
Those without Alteryx – you can do this all in R (indeed that is apparently how this Alteryx tool actually works) – but it will quite possibly be a much less friendly experience if you’re coming to it from scratch! Here’s some docs on the read.spss command you might start from though for any r-curious.