The Datasaurus: a monstrous Anscombe for the 21st century

Most people trained in the ways of data visualisation will be very familiar with Anscombe’s Quartet. For the uninitiated, it’s a set of 4 fairly simple looking X-Y scatterplots that look like this.

Anscombe's Quartet

What’s so great about those then? Well, the reason data vizzers get excited starts to become clear when you realise that the dotted grey lines I have superimposed on each small chart are in fact the mean average of X and Y in each case. And they’re basically the same for each chart.

The identikit summary stats go beyond mere averages. In fact, the variance of both X and Y (and hence the standard deviation) is also the pretty much the same in every chart. As is the correlation coefficient of X and Y, and the regression line that would be the line of best fit if were you to generate a linear model based on each of those 4 datasets.

The point is to show the true power of data visualisation. There are a bunch of clever-sounding summary stats (r-squared is a good one) that some nefarious statisticians might like to baffle the unaware with – but they are oftentimes so summarised that they can lead you to an entirely misleading perception, especially if you are not also an adept statistician.

For example, if someone tells you that their fancy predictive model demonstrates that the relationship between x and y can be expressed as “y = 3 + 0.5x” then you have no way of knowing whether the dataset the model was trained on was that from Anscombe 1, for which it’s possible that it may be a good model, or Anscombe 2, for which it is not, or Anscombe 3 and 4 where the outliers are make that model sub-par in reality, to the point where a school child issued with a sheet of graph paper could probably make a better one.

Yes analytics end-users, demand pictures! OK, there are so many possible summary stats out there that someone expert in collating and mentally visualising the implication of a combination of a hand-picked collection of 30 decimal numbers could perhaps have a decent idea of the distribution of a given set of data – but, unless that’s a skill you already have (clue: if the word “kurtosis” isn’t intuitive to you, you don’t, and it’s nothing to be ashamed of), then why spend years learning to mentally visualise such things, when you could just go ahead and actually visualise it?

But anyway, the quartet was originally created by Mr Anscombe in 1973. Now, a few decades later, it’s time for an even more exciting scatterplot collection, courtesy of Justin Matejka and George Fitzmaurice, take from their paper “Same Stats, Different Graphs“.

They’ve taken the time to create the Datasaurus Dozen. Here they are:

Datasaurus Dozen.png

What what? A star dataset has the same summary statistics as a bunch of lines, an X, a circle or a bunch of other patterns that look a bit like a migraine is coming on?

Yes indeed. Again, these 12 charts all have the same (well, extremely similar) X & Y means, the same X & Y standard deviations and variances, and also the same X & Y linear correlations.

12 charts are obviously more dramatic than 4, and the Datasaurus dozen certainly has a bunch of prettier shapes, but why did they call it Datasaurus? Purely click-bait? Actually no (well, maybe, but there is a valid reason as well!).

Because the 13th of the dozen (a baker’s dozen?) is the chart illustrated below. Please note that if you found Jurassic Park to be unbearably terrifying you should probably close your eyes immediately.

Datasaurus Main

Raa! And yes, this fearsome vision from the distant past also has a X mean of 54.26, and Y mean of 47.83, and X standard deviation of 16.76, a Y standard deviation of 26.93 and a correlation co-efficient of -0.06, just like his twelve siblings above.

If it’s hard to believe, or you just want to play a bit, then the individual datapoints that I put into Tableau to generate the above set of charts is available in this Google sheet – or a basic interactive viz version can be found on Tableau Public here.

Full credit is due to Wikipedia for the Anscombe dataset and Autodesk Research for the Datasaurus data.

Simpson’s paradox and the importance of segmentation

Here’s a classic business analysis scenario, which I’d like to use to illustrate one of my favourite mathematical curiosities.

Your marketers have sent out a bunch of direct mail to a proportion of your previous customers, and deliberately withheld the letters from the rest of them so that they can act as a control group.

As analyst extraordinaire, you get the job of totting up how many of these customers came back and bought something. If the percentage is higher in the group that received the mail, then the marketers will be very happy to take credit for increasing this year’s revenue.

However, once the results are back, it’s not looking great. Aggregating and crosstabulating, you realise that actually the people who were not sent the mail were a little more likely to return as customers.

Sent marketing? Count of previous customers in group Count of customers returning to store Success rate
No 300 40 13%
Yes 300 32 11%

You go to gently break the bad news to the marketing team, only to find them already in fits of joy. Some other rival to your analytical crown got there first, and showed them that their mailing effort in fact attracted a slightly higher proportion of both men and women to come back to your shop. A universally appealing marketing message – what could be better? Bonuses all round!

Ha, being the perfect specimen of analytical talent that you are, you’ve got to assume that your inferior rival messed up the figures. That’s going to embarrass them, huh?

Let’s take a look at your rival’s scrawlings.

Gender Sent marketing? Count of previous customers in group Count of customers returning to store Success rate
Female No 200 37 19%
Female Yes 100 21 21%
Male No 100 3 3%
Male Yes 200 11 6%
  • So, a total of 100+200 people were sent the letter. That matches your 300. Same for the “not-sent” population.
  • 21 + 11 people who were sent the letter returned, that matches your 32.
  • 37 + 3 people who were not sent the letter returned, again that matches your 40.

What what what??!

This is an example of “Simpson’s Paradox”, which luckily has nothing to do with Springfield’s nuclear reactor blowing up and rupturing a hole in the very fabric of mathematical logic.

Instead, the disparities in the sample size and propensity of the two gender populations to return as customers are coming into play.

Whether they received the mailing or not, the results show that women were much more likely to return and shop at the store again anyway. Thus, whilst the marketing mail may have had a little effect, the “gender effect” here was much stronger.

Gender could therefore be considered a confounding variable. This could have been controlled for when setting up the experiment, had it been tested and known how important gender was with regard to the rate of customer return beforehand.

But apparently no-one knew about that or thought to test the basic demographic hypotheses . As it happened, with whatever sample selection method was employed, only half as many women happened to be in the group that was sent the mailing,  vs the proportion of men who were sent the mailing.

So, whilst the mailing was marginally successful in increasing the propensity of both men and women to return to the store, the aggregate results of this experiment hid it because men – who were intrinsically far less likely to return to the store than women – were over-represented in the people chosen to receive the mailing

New website launch from the Office of National Statistics

Yesterday, the UK Office of National Statistics, the institution that is “responsible for collecting and publishing statistics related to the economy, population and society”, launched its new website.

As well as a new look, they’ve concentrated on improving the search experience and making it accessible to mobile device users.

The front page is a nice at-a-glance collection of some of the major time series ones sees in the news (employment rate, CPI, GDP growth etc.) . And there’s plenty of help-yourself downloadable data; they claim to offer 35,000 time series which you can explore and download with their interactive time series explorer tool.

The Sun and its dangerous misuse of statistics

Here’s the (pretty abhorrent) front cover of yesterday’s Sun newspaper.

Capture

Bearing in mind that several recent terrorist atrocities are top of everyone’s mind at the moment, it’s clear what the Sun is implying here.

The text on the front page is even more overt:

Nearly one in five British Muslims have some sympathy with those who have fled the UK to fight for IS in Syria.

The implication is obviously a starkly ominous claim that 20% of Britain’s Muslims support the sort of sick action that IS have claimed responsibility for in Paris and other places that have experienced horrific, crazed, assaults in recent times.

This is pushed even more fervently by the choice to illustrate the article solely with a photo of Jihadi John, an inexcusably evil IS follower “famous” for carrying out sick, cruel, awful beheadings on various videos.

Given the fact that – most thankfully – there are far, far fewer people featuring on videos of beheadings than there are people travelling to Syria to join fighters, this is clearly not a representative or randomly chosen photo.

Sun writer Anila Baig elaborates in a later column:

It beggars belief that there are such high numbers saying that they agree with what these scumbags are doing in Syria. We’ve all seen the pictures of how innocent Yazidi girls are treated, how homosexuals are thrown off tall buildings. It’s utterly abhorrent.

The behaviours she describes are undoubtedly beyond abhorrent.

But of course, this 1 in 5 statistic isn’t true – even in the context of the small amount of data that they have used to support this story.

It is however a very dangerous claim to make in the current climate, where “Islamophobia” and other out-group prejudices make the lives of some people that follow Islam – or even look like a stereotype of someone that might – criminally unpleasant. Britain does have statutes that cover the topic of hate speech after all, and hate speech can come from many quarters.

Anyway, enough of the rants (kind of) and onto the statistics.

There are three main points I describe below, which can be summarised as:

  • The survey this headline is based on did not ask the question that the paper implies.
  • The paper deliberately misleads its readers by failing to give easily-available statistical context to the figures.
  • The methodology used to select respondents for the survey was anyway so inadequate that it’s not possible to tell how representative the results are.

 

Onwards:

The Sun’s claim that 1 in 5 British Muslims have sympathy for Jihadis (the headline) and those who fight for IS (in main text) comes from a survey they commissioned, which was executed by professional polling company Survation. You can read a summarised version of the quantitative results here.

The “20% sympathise with Isis” and those implications are based on responses to question 6 (page 9 of the PDF results above), which asked people to say which of a set of sentences they agreed with the most. The sentences were of the form “I have a [lot/some/no] of sympathy with young Muslims who leave the UK to join fighters in Syria”.

Results were as follows, and the Sun presumably added up the first 2 boxes to get to their 20%, which isn’t a bad approximation. Note though that an equally legitimate  headline would have been “95% of Muslim respondents do not have a lot of sympathy for jihadis”.

Response % selected
I have a lot of sympathy with young Muslims who leave the UK to join fighters in Syria 5.3%
I have some sympathy with young Muslims who leave the UK to join fighters in Syria 14.5%
I have no sympathy with young Muslims who leave the UK to join fighters in Syria 71.4%
Don’t know 8.8%

However, compare the claim that they have sympathy with those ‘who have fled the UK to fight for IS’ and ‘they agree with what these scumbags are doing…homosexuals thrown off tall buildings’ (even ignoring the implications of supporting the sort of criminal mass murder seen in Paris) with the question actually asked.

There was no mention of IS or any particular act of terrorism or crime against human rights in the question whatsoever.

The question asks about joining fighters in Syria. Wikipedia has a list of the armed groups involved in the Syrian war. At the time of writing, they have been segmented into 4 groups: the Syrian Arab Republic and allies; the Syrian Opposition + al-Qaeda network and allies; the Kurnodish self-administration and allies; and ISIL (aka IS) and allies.

There are perhaps in the order of 200 sub-groups (including supporters, divisions, allies etc.) within those divisions, of which the huge majority are not affiliated with IS. Even the UK is in the “non-lethal” list, having donated a few million to the Syrian opposition groups.

To be fair, the question did ask about joining fighters rather than the c. 11 non-lethal groups. But we should note that – as well as the highly examined stream of people apparently mesmerised by evildoers to the extent of travelling to fight with them – there was also  a Channel 4 documentary a few months ago showing a different example of this. In it, we saw 3 former British soldiers who had decided to travel to Syria and join fighters – the ones who fight against IS. I do not know what religion, if any, those 3 soldiers followed – but is it possible someone might feel a little sympathy towards the likes of them?

It is not necessarily a good thing for a someone to be travelling abroad to join any of these groups with a view to violent conflict, and I am convinced that some people do travel to join the most abhorrent of groups.

But, the point is that, despite what the Sun wrote, the question did not mention IS or any of their evil tactics, and could have in theory suggested some sort of allegiance to very many other military-esque groups.

The question only asks whether the respondent has sympathy for these young Muslims who travel abroad.

To have sympathy for someone does not mean that you agree with the aims or tactics of the people they are persuaded to go and meet.

Google tells us that the definition of sympathy is:

feelings of pity and sorrow for someone else’s misfortune.

One can quite easily imagine a situation where, even if you believe these people are travelling to Syria specifically to become human weapons trying to mass target innocent victims you can have some sympathy for the young person involved.

It seems plausible to have some sympathy for a person that has been brainwashed, misguided, preyed on by evildoers and feels that they have such quality of life that the best option for their future is to go and join a group of fighters in a very troubled country. Their decisions may be absurd, perhaps they may even end up involved in some sick, awful, criminal act for which no excuses could possibly apply – but you could have some sympathy for a person being recruited to a twisted and deadly cause, whilst virulently disagreeing with the ideology and actions of a criminal group that tries to attract them.

And, guess what, the Sun handily left out some statistics that might suggest that is some of what is happening.

For every such survey that concentrates on the responses of a particular population, it’s always important to measure the base rate, or a control group rate. Otherwise, how do you know whether the population you are concentrating on is different from any other population? It’s very rare that any number is meaningful without some comparative context.

As it happens, a few months ago, the same survey group did a survey on behalf  of Sky News that asked the same question to non-Muslims. The results can be seen here, on page 8, question 5, reproduced below.

As the Sun didn’t bother providing these “control group” figures, we’re left to assume that no non-Muslim could ever “show sympathy”to the young Muslims leaving the UK to join fighters. But…

Response % selected
I have a lot of sympathy with young Muslims who leave the UK to join fighters in Syria 4.3%
I have some sympathy with young Muslims who leave the UK to join fighters in Syria 9.4%
I have no sympathy with young Muslims who leave the UK to join fighters in Syria 76.8%
Don’t know 9.6%

So, 14% of non-Muslims respond that they have a lot or some sympathy with this group of travellers. Or as the Sun might headline it: “1 in 7 Brit non-Muslims sympathy for jihadis” (just below the same picture of a lady in a bikini, obviously).

14% is less than 20% of course – but without showing the base rate the reader is left to assume that 20% is “20% compared to zero” which is not the case.

Furthermore in some segments of the surveyed population, the sympathy rates in non-Muslims are higher than in Muslims.

The Sun notes:

The number among young Muslims aged 18-34 is even higher at one in four.

Here’s the relevant figures for that age segment from a) the poll taken to support the Sun’s story, and b) the one that asked the same question to non-Muslims.

Response Muslims aged 18-34 Non-Muslims aged 18-34
I have a lot of sympathy with young Muslims who leave the UK to join fighters in Syria 6.9% 10.9%
I have some sympathy with young Muslims who leave the UK to join fighters in Syria 17.6% 19.2%
I have no sympathy with young Muslims who leave the UK to join fighters in Syria 66.2% 52.2%
Don’t know 9.3% 17.7%

So-called “Jihadi sympathisers” aged 18-34 make up a total of 24.5% of Muslims, and 30.1% of non-Muslims.

Another headline, dear Sun writers: “1 in 3 Brit non-Muslims youth sympathy for jihadis thankfully moderated by less sympathetic Muslim youth?”

A similar phenomenen can be seen when broken down by non-Islamic religions. Although some of these figures are masked due to small samples, one can infer from the non-Muslim survey that a greater than 20% proportion of the sample that classified themselves as some religion outside of Christian, Muslim and “not religious” were at least somewhat sympathetic to these young Muslims who go to Syria.

As a final cross-survey note, it so happens that the Muslim-focussed survey was also carried out targetting a Muslim population earlier in the year, in March, too, again with Survation and Sky News. Here’s the results of that one:

Response % selected
I have a lot of sympathy with young Muslims who leave the UK to join fighters in Syria 7.8%
I have some sympathy with young Muslims who leave the UK to join fighters in Syria 20.1%
I have no sympathy with young Muslims who leave the UK to join fighters in Syria 61.0%
Don’t know 11.1

Totting up the support for having some/a lot of sympathy for the young Muslims leaving the UK for Syria, we see that the proportion showing any form of sympathy fell from 27.9% in March to 19.8% now in November.

That’s a relatively sizeable fall, again not mentioned in the Sun’s article (because that would spoil the point they’re trying to make the reader conclude). Here’s another headline I’ll donate to the writers: ‘Dramatic 30% fall in Brit Muslims sympathy for jihadis‘.

Next up, time to talk about the method behind the latest survey.

Regular readers of the Sun might have noticed that normally the formal surveys they do are carried out by Yougov. Why was it Survation this time? The Guardian reports that Yougov declined the work because

it could not be confident that it could accurately represent the British Muslim population within the timeframe and budget set by the paper.

So rather than up the timeframe or the budget, the Sun went elsewhere to someone that would do it cheaper and quicker. Survation apparently complied.

Given most surveys cannot ask questions to every single person alive, there’s a complex science behind selecting who should be asked and how to process the results to make it representative of what is claimed.

Here’s the World Bank on how to pick respondents for a household survey. Here’s an article from the Research Methods Knowledge Base on selecting a survey method. There is far more research on best practice on polling, and this is one reason why resourcing using professional pollsters is often a necessity if you want vaguely accurate results, especially on topics that are so controversial.

However, none of the research I’ve seen has ever suggested that one should pick out a list of people whose surname “sounds Muslim” and ask them the questions, which is, incredibly, apparently the shortcut Survation used given they didn’t have the time or money to do the detailed and intricate work that would be necessary to generate a statistically representative sample of all British Muslims.

It might be that Survation did happen to choose a perfectly representative sample, but the problem is we just do not know. After talking to other pro-pollsters, the Guardian noted:

It cannot be determined how representative the Survation sample is because of a lack of various socioeconomic and demographic details.

Even if the rest of the story had been legit, then – being generous with the free headlines – the Sun would have been more accurate to write “1 in 5 out of the 1500 people we had the time to ring that had “Muslim sounding surnames” and did in fact agree that they were Muslim’s sympathy for jihadis“. But it’s a bit less catchy and even the non-pros might notice something methodologically curious there.

So, to summarise – the Sun article, which seems to me to tread dangerously near breaking various legislation on hate speech in principle if not practice, is misleadingly reporting on the results of a questionnaire:

  • that did not even ask the question that would be appropriate to make the claims it is headlining.
  • without providing the necessary comparative or historical perspective to make the results in any way meaningful.
  • that was carried out in inadequate, uncontrolled fashion with no view to producing reliable, generalisable results.

We have to acknowledge that, on the whole, people going away to join fighter groups is a bad, sad event and one that the world needs to pay attention to. Infinitely moreso, obviously, if they are travelling to commit atrocities with groups that can be fairly described as evil.

But for the Sun imply such dramatic and harmful allegations about a section of the British population to whom prejudice against is already very apparent (note the  300% increase in  UK anti-Muslim hate crime last week)  to its huge readership – who will now struggle to entirely forget the implication during their dinner-table conversation even if they wanted to –  is not only extremely poor quality data analysis, but also downright irresponsible and even dangerous.

Kruskal Wallis significance testing with Tableau and R

Whilst Tableau has an increasing number of advanced statistical functions – a case in point being the newish analytics pane from Tableau version 9 – it is not usually the easiest tool to use to calculate any semi-sophisticated function that hasn’t yet been included.

Various clever people have tried to work some magic aroud this, for instance by attempting “native” Z-testing. But in general, it’s sometimes a lot of workaroundy effort for little gain. However, a masterstroke of last year’s Tableau version was that is included the ability to interface with R.

R is a statistical programming language that really does have every statistical or modelling function you’ll ever need. When someone invents new ones in the future, I have no doubt an R implementation will be forthcoming. It’s flexibility and comprehensiveness (and price…£0) is incredible. However it is not easy to use for the average person. For a start, the default interface is a command line, and – believe it or not – there are people alive today who never had the singular joy of the command line being the only way to use a computer.

It must be said that in some ways Tableau has not really made R super easy to use. You still need to know the R language in order to use R functionality inside Tableau. But using the Tableau interface into R has 2 huge benefits.

  1. It integrates into your exist Tableau workspace. The Tableau interface will push any data you have in Tableau, with all your filters, calculations, parameters and so on to R and wait for a result. There’s no need to export several files with different filters etc. and read.csv(“myfile1.csv”) them all into R. Once you’ve set up the R script, it reruns automatically whenever your data changes in Tableau.
  2. It visualises the results of the R function immediately and dynamically, just like any other Tableau viz. R has a lot of cool advanced graphics stuff but it can be hard to remember quite how to use and the most of the basic components offer no interactivity. No drag and drop pills or “show me” to be seen there!

So recently when I needed to perform a Kruskal Wallis significance test over data I was already analysing in Tableau, this seemed the obvious way to do it.

Kruskal Wallis is fully explained in Wikipedia .To over-simplify for now, it allows you to detect whether there is a difference in data when it comes from multiple groups – like ANOVA more famously does, but KW doesn’t require data to be parametric.

This is classic hypothesis testing. With my questionnaire, we start with the hypothesis that there is no difference in the way that different groups answer the same question, and see if we can disprove that.

In honesty, the fact it’s a KW test is not all that important to this post. Using R to perform a certain test in Tableau is pretty much the same as it would be for any other test, aside from needing to know the name and syntax of the test in R. And Google will never let you down on that front.

In this case, I wanted to analyse questionnaire results scored on a Likert-type scale to determine on several dimensions whether the answering patterns were significantly different or not. For instance, is there a real difference in how question 1 is answered based on the respondents age? Is there a real difference in how question 2 is answered based on respondents country of origin? How confident can we be that any difference in averages truly represents a real difference and is not the results of random noise?

I wanted to do this for about 30 questions over 5 different dimensions; a total of 150 Krusal-Wallis tests, and I didn’t overly fancy doing it manually.

Had I got only a copy of pure R to hand, this might be how I would have tackled the question – assuming that I output the results of question 1 dimensioned by groups in a CSV file before, in a format like this:

RespondentID Group Score
R1 Group A 4
R2 Group B 5
R3 Group B 5
R4 Group A 3
R5 Group C 1


setwd('../desktop/questionnaire')
q1_data<-read.csv('q1_responses.csv')
q1_dataframe<-setNames(data.frame(q1_data[3], q1_data[2]), c('scores','groups'))
kruskal.test(scores ~ groups, data=q1_dataframe)

This would give me output like the below.


Kruskal-Wallis rank sum test
data: scores by groups
Kruskal-Wallis chi-squared = 2.8674, df = 2, p-value = 0.2384

To understand the syntax and what it returns, see the man page for kruskal.test here.

But to pick a singular point, it shows case shows that the “p-value”, the probability of the differences in scores between the groups being purely down to the random fluctuations of chance, is 0.2384, aka 24%. Numbers above 5-10% are often held to mean that any difference seen has not been statistically proven. Here there’s about a 1 in 4 chance that any differences in these answers are purely due to random chance.

There are endless discussions/controversies as to the use of P values, and statistical vs “clinical” differences – but this is the end point we are aiming for right now.

I could then have done the above process for the other 149 combinations I wanted to test and constructed a table of P results (or scripted to do the same), but it seemed more fun to use Tableau.

First you have to set up Tableau to interface with R correctly. This means running an R-server (called Rserve, handily enough) which is of course also free, and can be started from within the R interface if you have it open. Here’s Tableau’s handy instructions  but it can be boiled down to:
install.packages("Rserve")
library(Rserve)
Rserve()

A few notes:

  • It’s fine to run the Rserve on the same computer as Tableau is on.
  • You only have to enter the first line the first time you use Rserve.
  • Capitalisation of Rserve matters for some commands!
  • If you are going to use Rserve a lot more than you use R you can start it from an exe file instead, as detailed in the above Tableau article.
  • Workbooks using R will not work in Tableau Reader or Tableau Online.

Then carry on the process of setting up R as per the Tableau article. Notably you have to look in the Help menu (a strange place for it perhaps) for “Settings and performance” then “Manage R connection” and then type in the name of your Rserve host. If you’re running it on your computer then choose “localhost” with no user/password.

R connection

Next, assuming you have your Tableau workbook already set up with your questionnaire data, create a new worksheet which filters to show only a single question, and has each unique person that answered that question and the grouping variable on the rows. This layout is important, as will be discussed later.

Layout

Now it’s time to write the formula that interfaces between Tableau and R.

This Tableau blog post explains in detail how to do that in general – but the essence is to use a calculated field with one of the Tableau SCRIPT_ functions that corresponds to the type of result (string, integer etc.) that you want to retrieve from R. You then pass it literal R code and then use placeholder variables .arg1, .arg2…argN to tell it which data you want Tableau to send to R.

The calculated field will then send the R script with the relevant data inserted as a R vector into the placeholders to the R server, and wait for a result to be returned. The result can then be used inside Tableau as, just as any other table calculation can .

For the Kruskal Wallis test, when we retrieve the P value from a KW test, we saw above that it is in a decimal number format. This is what Tableau calls a “real” number, so we use the SCRIPT_REAL function.

Within that function we enter R code resembling the last line in my R-only attempt, which was the one that actually did the calculations.

You can put more than one R code line in here if you want to. However note that a Tableau R function can only handle returning a single value or vector by default. Here I have therefore specifically asked it to return just the p value as it would be hard to handle the general table of results and associated text. Read the R documentation for any function to understand which other types of value are possible to return from that function.

So, in Tableau, here’s the calculated field to create.

Kruskal Wallis p score:


SCRIPT_REAL(
"
kruskal.test(.arg1~.arg2, data=data.frame(.arg1,.arg2 ))$p.value
",
AVG([Score]),ATTR([Group]))

If you compare it to my first R script, you can see that I want it to replace “.arg1” with the first argument I gave after the R script (AVG([SCORE]) and “.arg2” with the GROUP. It’s safe, and necessary, to use ATTR() as you have to pass it an aggregation. Because we set the layout to by 1 row per respondent we know each row will have only a single group associated with that respondent (assuming the relationship between respondents and groups is 1:many, which it needs to be for a successful KW test).

You could use .arg3, arg4… etc. if you needed to pass more data to R, but that’s not necessary for the Kruskal Wallis test.

You can then double click your new calculated field and it’ll appear in the text pill and display the p value we saw above (remember, that 0.2384 number from above? Horray, it matches!) repeatedly, once per respondent. The nature of the KW test is that it returns one value that describes the whole population.

KW test
Being Tableau we can filter to show another single (see below) question, change the respondent pool, highlight, exclude and do all sorts of other things to the data and the new P value for the given breakdown is immediately shown (subject to the rules of table calculations…remember that SCRIPT_… Functions are table calculations!)

But what if we want the Kruskal P value for all the questions in the survey in one table, and not have to iterate manually through every question?

We can’t just unfilter or add questions willy-nilly, because a single p value will be calculated for the whole dataset of ( question 1 + question 2 ) responses, instead of one being calculated for ( question 1) and another for ( question 2 ). This is unlikely to be useful.

KW whole population

See how there’s no 0.2384 value in the above. Instead the same number is repeated for every single question and respondent. It’s compares the scores as though they were all for the same question.

By virtue of the SCRIPT_ Tableau functions being a table calculation though, this one is relatively easy to solve.

Right clicking on the R calculation field gives the normal Edit Table Calculation option. Pick that, and then select “Advanced” under Compute Using.

Simplistically, we want to partition the P values that are calculated and returned to 1 per survey question, so put “Question” in the partition box and the other fields into the addressing box.

Advanced compute by

Hit OK, Tableau refreshes its R stuff, and all is good. That 0.2384 is back (and the 0 for question two is the result of rounding a number so small that the p value shows significance at all practical levels.)
1 row per response

However, we’re getting 1 row for every respondent to every question instead of a nice summary of one row per question.

We can’t just remove the respondent ID from the Tableau view because, whilst this gives the layout we want, the SCRIPT_REAL calculation will pass only the aggregated values of the respondent data to R, i.e. one average value per question in this case, not 1 per person answering the question.

This will give either an error or a wrong answer, depending on which R function you’re trying to call. In this case it would be passing the average response to the question to R to perform the KW test on, whereas R needs each individual response to the question to provide its answers.

The solution to this issue is the same as the classic trick they teach in Tableau Jedi school that allows data to be dynamically hidden yet remain within scope of a table calculation. Namely that when you use a table calculation as a filter, the calculations are mostly done before a filter is applied, in contrast to a non-table calculation where the calculations are done on the post-filtered data.

So, create a simple dummy table calculation that returns something unique for the first row. For instance, create a calculated field just called “Index” that has the following innocuous function in it.


INDEX()

Now if you put that on filters and set it to only show result “1” (the first result) you will indeed get 1 result shown – and the 0.2384 P value remains correct.

1 row

But we actually wanted the first result per question, right, not just the single first result?

Luckily our index function is also a table calculation, so you can do exactly the same compute-by operation we did about to our R-based field.

Right click the index filter pill, go back to “Edit table calculation”, and fill the Compute Using -> Advanced box in in the same way as we previously did to our R calculation function, putting “Question” into the partition box and the rest into addressing.

Tada!

1 row per question

In the interests of neatness we can hide the useless respondent ID and group columns as they don’t tell us anything useful. Right click the respective Rows pill and click ‘Show header” to turn the display off, whilst leaving it in the pills section and hence level of detail.

Clean table

Bonus tips:

  • You can send the values of Tableau parameters through the R function, controlling either the function itself or the data it sends to R – so for instance allowing a novice user to change the questions scored or the groups they want to compare without having to mess with any calculated fields or filters.
  • The R-enabled calculation can be treated as any other table calculation is in Tableau – so you can use it in other fields, modify it and so on. In this example, you could create a field that highlights questions where the test showed that the differences were significant with at 95% certainty like this.

Highlight 95% significance:


IF [Kruskal Wallis test p value] <= 0.05 THEN "Significant" ELSE "Not significant" END

Pop it on the colours card and you get:

Highlight significance

Now it’s easy to see that, according to Kruskal-Wallis, there was a definite difference in the way that people answered question 2 depending on which group they were in, whereas at this stringent level, there was no proven difference in how they answered question 1.

You can download the complete workbook here.