Future features coming to Tableau 10.2 and beyond – that they didn’t blog about

Having slowly de-jetlagged from this year’s (fantastic and huge) Tableau conference, I’d settled down to write up my notes regarding the always-thrilling “what new features are on the cards?” sessions, only to note that Tableau have already done a pretty good job of summarising it on their own blog here, here and here.

There’s little point in my replicating that list verbatim, but I did notice that a few things that I’d noted down from the keynote announcements that weren’t immediately obvious in Tableau’s blog posts. I have listed some of those for below for reference. Most are just fine details, but one or two seem more major to me.

Per the conference, I’ll divide this up into “probably coming soon” vs “1-3 year vision”.

Coming soon:

Select from tooltip – a feature that will no doubt seem like it’s always been there as soon as we get it.

We can already customise tool tips to show pertinent information about a data point that don’t influence the viz itself. For example, if we’re scatter-plot analysing sales and profit per customer, perhaps we’d like to show whether the customer is a recent customer vs a long term customer in the tool tip when hovered over.

In today’s world, as you hover over a particular customer’s datapoint, the tooltip indeed may tell you that it’s a recent customer. But what’s the pattern in the other datapoints that are also recent customers?

In tomorrow’s world you’ll be able to click where it tells you “recent customer” and all the other “recent customers” in the viz will be highlighted. It’s nothing that you can’t get the same end result today with the use of the highlighter tool, but likely far more convenient in certain situations..

A couple of new web-authoring features, to add to the list on the official blog post.

  1. You can create storypoints on the web
  2. You’ll be able to enable full-screen mode on the web

Legends per measure: this might not sound all that revolutionary, but when you think it through, it enables this sort of classic viz: a highlighted table on multiple measures – where each measure is highlighted independently of the others.

legendpermeasure.PNG
Having average sales of £10000 doesn’t any more have to mean that the high customer age of 100 in the same table is highlighted as though it was tiny.

Yes, there are workarounds to make something that looks similar to the above today – but it’s one of those features that I have found those people yet to be convinced of the merits of Tableau react negatively to when it turns out it’s not a simple operation, after they compare it to other tools (Excel…). Whilst recreating what you made in another tool is often exactly the wrong approach to using a new tool, this type of display is one of the few I see a good case for making easy enough to create.

In the 1-3 year future:

Tableau’s blog does talk about the new super-fast data engine, Hyper, but doesn’t dwell on one cool feature that was demoed on stage.

Creating a Tableau extract is sometimes a slow process. Yes, Hyper should make it faster, but at the end of the day there are factors like remote database performance and network speed that might mean there’s simply no practical way to speed it up.  Today you’re forced to sit and stare at the extract creation process until it’s done.

Hyper, though, can do its extract-making process in the background, and let you use it piece-by-piece, as it becomes available.

So if you’re making an extract of sales from the last 10 years, but so far only the information from the last 5 years has arrived to the extract creation engine, you can already start visualising what happened in the last 5 years. Of course you’ll not be able to see years 6-10 at the moment, as it’s still winging its way to you through the wifi. But you can rest safe in the knowledge that once the rest of the data has arrived it’ll automatically update your charts to show the full 10 year range. No more excuses for long lunches, sorry!

It seems to me that this, and features like incremental refresh, also open the door to enabling near real-time analysis within an extract.

Geographic augmentation – Tableau can plot raw latitude and longitude points with ease. But in practice, they are just x-y points shown over a background display; there’s no analytical concept present that point x,y is part of the state of Texas whereas point y,z is within New York. But there will be. Apparently we will be able to roll up long/lat pairs to geographic components like zip, state, and so on, even when the respective dimension doesn’t appear in the data.

Web authoring – the end goal is apparently that you’ll be able to do pretty much everything you can do publishing-wise in Tableau Desktop on the web. In recent times, each iteration has added more and more features – but in the longer term, the aim is to get to absolute parity.

We were reassured that this doesn’t mean that the desktop product is going away; it’s simply a different avenue of usage, and the two technologies will auto-sync so that you could start authoring on your desktop app, and then log into a website from a different computer and your work will be there waiting for you, without the need to formally  publish it.

It will be interesting to see whether, and how, this affects licensing and pricing as today there is a large price differential between for instance a Tableau Online account and Tableau Desktop Professional, at least in year one.

And finally, some collaboration features on Tableau server.

The big one, for me, is discussions (aka comments).  Right alongside any viz when published will be a discussion pane. The intention is that people will be able to comment, ask questions, explain what’s shown and so on.

But, doesn’t Tableau Server already have this? Well, yes, it does have comments, but in my experience they have not been greatly useful to many people.

The most problematic issue in my view has been the lack of notifications. That is to say,  a few months after publishing a delightful dashboard, a user might have a question about a what they’re seeing and correctly pop a comment on the page displaying the viz. Great.

But the dashboard author, or whichever SME might actually be able to answer the question, isn’t notified in any way.  If they happen to see that someone commented by chance, then great, they can reply (note that the questioner will not be notified that someone left them an answer though). But, unless we mandate everyone in the organisation to manually check comments on every dashboard they have access to every day, that’s rather unlikely to be the case.

And just opening the dashboard up may not even be enough, as today they tend to be displayed “below the fold” for any medium-large sized dashboard. So comments go unanswered, and people get grumpy and stop commenting, or never notice that they can even comment.

The new system however will include @user functionality, which will email the user when a comment or question has been directed at them.  I’m also hoping that you’ll be able to somehow subscribe to dashboards, projects or the server such that you get notified if any comments are left that you’re entitled to see , whether or not you’re mentioned in them.

As they had it on the demo at least, the comments also show on the right hand side of the dashboard rather than below it – which given desktop users tend to have wide rather than tall screens should makes them more visible. They’ll also be present in the mobile app in future.

Furthermore, each time a comment is made, the server will store and show the state of the visualisation at that time, so that future readers can see exactly what the commenter was looking at when they made their comments. This will be great for the very many dashboards that are set up to autorefresh or allow view customisation.

Conversation.PNG

(My future comment wishlist #1: ability to comment on an individual datapoint, and have that comment shown wherever that datapoint is seen).

Lastly, sandboxes. Right now, my personal experience has been that there’s not a huge incentive to publish work-in-progress to a Tableau server in most cases. Depending on your organisation’s security setup, anything you publish might automatically become public before you’re ready, and even if not, then unless you’re pretty careful with individual permissions it can be the case that you accidentally share your file too widely, or not widely enough, and/or end up with a complex network of individually-permissioned files that are easy to get mixed up.

Besides, if you always operate from the same computer, there’s little advantage (outside of backups) to publishing it if you’re not ready for someone else to look at it. But now, with all this clever versioning, recommendy, commenty, data-alerty stuff, it becomes much more interesting to do so.

So, there will apparently be a user sandbox; a private area on the server where each Tableau user can upload and work on their files, safe in the knowledge that what they do there is private – plus they can customise which dashboards, metrics and so on are shown when they enter their sandbox.

But, better yet, team sandboxes! So, in one click, you’ll be able to promote your dashboard-in-progress to a place where just your local analytics team can see it, for instance, and get their comments, feedback and help developing it, without having to fiddle around with setting up pseudo-projects or separate server installations for your team.

Furthermore, there was mention of a team activity newsfeed, so you’ll be able to see what your immediate team members have been up to in the team sandbox since you last took a peek. This should be helpful for raising awareness of what each team member is working on high, further enhancing the possibilities for collaboration and reducing the likelihood of duplicate work.

Finally, it’s mentioned on Tableau’s blogs, but I wanted to extend a huge cheer and many thanks for the forthcoming data driven alerting feature! Lack of this style of alerting and insufficient collaboration features were the two most common complaints I have heard about Tableau Server from people considering the purchase of something that can be decidedly non-trivial in cost. Other vendors have actually gone so far as to sell add-on products to try and add these features to Tableau Server, many of which are no doubt very good -but it’s simply impossible to integrate them into the overall Tableau install as seamlessly as Tableau themselves could do.

Now we’re in 2016, where the average Very Important And Busy Executive feels like they don’t have time to open up a dashboard to see where things stand, it’s a common and obvious feature request to want to be alerted only when there is actually something to worry about – which may then result in opening the dashboard proper to exploring what’s going on. And, I have no doubt, creative analysts are going to find any number of uses to put it to outside of the obvious “let me know if my sales are poor today”.

(My future data driven alert wishlist #1: please give include a trigger to the effect of “if this metric has an unusual value”, meaning to base it on a statistical calculation derived from on historic variance/std dev/ etc. rather than having to put a flat >£xxxx in as criteria).

What people claim to believe: Hillary Clinton edition

Back to political opinion polls today I’m afraid. Yep, the UK’s Brexit is all done and dusted (haha) but now our overseas friends seem to be facing what might be an even more unlikely choice in the grand US presidential election 2016.

Luckily, the pollsters are on hand to guide us through the inner minds and intentions of the voters-to-be. At last glance, it was looking pretty good for a Clinton victory -although, be not complacent ye Democrats, given the lack of success in the field of polling with regards to the afore-mentioned Brexit or perhaps the 2015 General Election here in the UK.

Below is perhaps my favourite most terrifying poll of recent times. It’s a recent poll carried out by the organisation “Public Policy Polling” concerning residents of the state of Florida. As usual, they asked several questions about the respondents’ characteristics and viewpoints, which lets us divide up the responses into those coming from Clinton supporters vs those coming from Trump supporters.

There are many insidious facts one could elucidate here on both sides, but given that at the moment the main polls are very in favour of a Clinton win (but see previous comment re complacency…), let’s pick out some that might hold relevance in a world where Clinton semi-landslides to victory.

Firstly, it shouldn’t particularly matter, but one can’t help but notice that Clinton is of the female persuasion. But, hey, rational voters look at policies, competence, experience or similar attributes, so a basic demographic fact alone doesn’t matter, right?

Wrong: the survey shows that just 69% of all respondents thought that gender didn’t make a difference. And, predictably, twice as many thought that the US would be better off with a male president than those who thought it would be better of with a female president. The effect is notably strongest within Trump supporters, where nearly 20x the proportion of people think the US would be better with a male president than with a female one.

manorwoman

Now, I can imagine some kind of halo effect where it’s hard for people to totally differentiate “my favourite candidate is a man and I can’t imagine having a favourite candidate that is not like him” from “my favourite candidate is a man but the fact he happens to be a man is incidental”.

But that nearly 40% of Trump supporters here claim that generically the president should be a man (implying that if it was Ms Trump vs Mr Clinton, they might vote differently), it seems potentially a stronger signal of inequality than that, especially when compared to the lower bias between Clinton supporters and preferring a woman – which is equally as illogical, but at least has a lower incidence. We can note also a pro-male bias in the “not sure” population too.

Of course we don’t actually have an example of what the US is like when it has a female president, because none of the 43 serving presidents to date have been women.

But we do know part of what Hilary Clinton is already presidentially responsible for apparently. “Coincidentally” (hmm…) her husband was one of the previous 43 male presidents, and apparently the majority of Trump supporters think it’s perfectly right to hold her responsible for his “behaviour”.

Yep, anything he did, for good or bad (which, let’s face it, is probably biased towards the bad for those people who support the opposing party and/or don’t appreciate cheating spouses) is in some sense his wife’s fault, for the Trumpians.

responsible

But if she’s so obviously bad, then why does she actually poll quite well, at the time of writing? Well, of course there can be only one reason. The whole election is a fraud. And given we haven’t actually had the election yet, I guess the allegation must also entail that poll respondents are also lying about their intentions, and/or that all the publishers of polls are equally as corrupt as the electoral system of the US.

rigged

Yes, THREE-QUARTERS of Trump supporters polled here apparently believe that if, as seems quite likely, Clinton wins then it can only be because the election was rigged. The whole democratic process is a sham. The US has fallen prey to semi-visible forces of uber-powerful corruption. We should presumably therefore ignore the result and give Trump the golden throne (to fit inside his golden house). Choice of winner aside, this is a pretty scary indictment on the respect that citizens feel for their own democratic system. This is not to say whether they are right or wrong to feel this way; to us Brits, I think it sometimes seems that in the US money has even greater hold over some theoretically democratic outcomes in the US than it does over here – but that so many have so little regard for the system is surely…a concern.

But wait, it’s not just that she may hypothetically commit electoral fraud in the near future. She has apparently already committed crimes serious enough that she should already be locked up in prison.

prison

Over EIGHTY PERCENT of Trump supporters polled here think she should literally go to prison; and this isn’t predicated on her winning. Well, there’s no shortage of bad things that can be laid at her door I’m sure, she has after all been serving at a high level of politics for a while already and, without being an expert, it seems like there are many serious allegations that people lay at the Clintons’ feet. But it’s perhaps quite surprising that the large majority of her opponent’s supporters want to throw someone who is likely to be their next president in jail. I don’t think even the Blair war-crimes movement ever got quite that far!

Unless…well. I’m only sad they didn’t ask the same question about Trump. Perhaps we could be more at ease if at least the same proportion of people thought he should be locked up. An oft overlooked fact is that analysis is often meaningless without some sort of carefully-chosen comparison. Perhaps there’s a baseline figure of people that think any given prominent politician should be jailed (but I’ve not seen research on that).

It’s hard to imagine though that the fact Trump has himself actually appeared to threaten her with jail doesn’t play some role here with his supporters though. It is apparently unprecedented for a major party nominee to have said publicly that his opponent should be jailed – but say it he did, most famously during their second presidential debate. As the Guardian reports:

Trump, embracing the spirit of the “lock her up” mob chants at his rallies, threatened: “If I win I am going to instruct my attorney general to get a special prosecutor to look into your situation – there has never been so many lies and so much deception,” he threatened.

Clinton said it was “awfully good” that someone with the temperament of Trump was not in charge of the law in the country, provoking another Trump jab: “Because you’d be in jail.”

Eric Holder, who once was the US attorney general, didn’t really seem to like that plan.

So we’ve established that in the eyes of the average Florida Trump supporter polled here that if Clinton wins then the whole shebang was fraudulent, she already should have been locked up in prison, and, besides, the fact that she’s a women should probably ban her from applying to the office of the president in the first place. That’s a strong indictment. But, of course, there’s another level to explore.

Is Hillary Clinton a malevolent paranormal entity, intent on destroying humankind?

demon

Erm…2 out of every 5 Trump supporters here think yes, she definitely is an actual demon. And the majority aren’t sure that she is not an actual demon.

Even only just over 50% of the “not sure” supporters are also sure she’s not an actual demon. It’s also entertaining to contemplate the c. 10% of her supports that think she might be demonic yet still fancy her as president.

The lower figures might be down to some variant of the excellent StarSlateCodex’s concept of the “Lizardman’s Constant” which can perhaps be summed up as there’s a lower bound % of people who will believe, or claim to believe, any polled sentiment.

But there they benchmark that at around 4%, and ten times that proportion of Trump supporters here respond that they are certain that Clinton is a literal demon. There are many ways to introduce biases that lead to this sort of result, which StarSlateCodex does go over. But 40% is…big…if this poll is even remotely respectable.

So, where has this idea that she’s a demon come from? Have Trump supporters as a collective seen some special evidence that proves this must be true, that somehow the rest of us have overlooked? Surely each individual doesn’t randomly become subject to these thoughts which even believers would probably term an unusual state of affairs -is there no smoke without fire? (pun intended)

Well, perhaps it has something to do with a subset of famous-enough people have stated that she is.

Trump himself did refer to her as a devil, although in fairness that just maybe possibly might be an unfortunate turn of phrase, if we want to be charitable. After all, to his credit, evidence suggests he’s not great at following a script (or at least not one you’d imagine a typical political spinner would write).

Perhaps more pertinently, for certain a certain subsection of viewers anyway, is presenter Alex Jones of “Infowars” fame (a website that apparently gets more monthly visitors than e.g. the Economist or Newsweek), he who Trump says of “your reputation is amazing…I will not let you down”, who did go on a bit of a rant on this subject.

MediaMatters have kindly transcribed:

She is an abject, psychopathic, demon from Hell that as soon as she gets into power is going to try to destroy the planet. I’m sure of that, and people around her say she’s so dark now, and so evil, and so possessed that they are having nightmares, they’re freaking out… I mean this woman is dangerous, ladies and gentleman. I’m telling you, she is a demon. This is Biblical.

There’s so much more if you’re into that sort of stuff; see it all on this video, including the physical evidence he presents of Clinton’s demonness (spoiler alert: she smells bad, and Obama is obviously one too because sometimes flies land on him).

Unfortunately I’m not aware of time series data on perception of Clinton’s level of demonicness – so I’m afraid there’s no temporal analysis to present on causal factors here.

At first glance some of this might seem kind of amusing in a macabre way – especially to us foreigners for whom the local political process is hugely less pleasant or equitable than it should be, but it doesn’t usually come with claims of supernatural possession. But the outcome may not be so funny. In the likely (but not certain) event that Clinton wins, Florida at least seems to have a significant bunch of people who think the whole debacle was rigged, and Clinton should have a gender change, an exorcism and a long spell in jail before even being considered for for the presidency.

Update 1: this sort of stuff probably doesn’t help matters – from former Congressmen / Radio host Joe Walsh:

Update 2: the polls are a lot closer now then they were when I started writing.

Do good and bad viz choices exist?

Browsing the wonderful timeline of Twitter one evening, I noted an interesting discussion on subjects including Tableau Public, best practice, chart choices and dataviz critique. It’s perhaps too long to go into here, but this tweet from Chris Love caught my eye.

Not being particularly auspicious with regards to summarising my thoughts into 140 characters, I wanted to explore some thoughts around the subject here. Overall, I would concur with the sentiment as expressed – particularly when it had to be crammed into such a small space, and taken out of context as I have here 🙂

But, to take the first premise, whilst there are probably no viz types that are inherently terrible or universally awesome, I think one can argue that there are good or bad viz choices in many situations. It might be the case in some instances that there’s no best or worst viz choice (although I think we may find that there often is, at least out of the limited selection most people are inclined to use). Here I am imagining something akin to a data-viz version of Harris’ “moral landscape“; it may not be clear what the best chart is, but there will be local maximums that are unquestionably better for purpose than some surrounding valleys.

So, how do we decide what the best, or at least a good, viz choice is? Well, it surely comes down to intention. What is the aim of the author?

This is not necessarily self-evident, although I would suggest defaulting to something like “clearly communicating an interesting insight based on an amalgamation of datapoints” as a common one. But there are others:

  • providing a mechanism to allow end-users to explore large datasets which may or may not contain insights,
  • providing propaganda to back up an argument,
  • or selling a lot of books or artwork

to name a few.

The reason we need to understand the intention is because that should be the measure of whether the viz is good or bad.

Imagine my aim is to communicate that 10% of my customers are so unprofitable that we would be better off without them to an audience of ten may-as-well-be-clones business managers – note that the details of the audience is very important here too.

I’ll go away and draw 2 different visualisations of the same data (perhaps a bar chart and, hey, why not, a 3-d hexmap radial chart 🙂 ). I’ll then give version 1 to five of the managers, and version 2 to the other five. Half an hour later, I’ll quiz them on what they learned . Simplistically, I shall feel satisfied that whichever one of them generated the correct understanding in the most managers was the better viz in this instance.

Yes yes, this isn’t a perfect double-blind controlled experiment, but hopefully the point is apparent. “Proper” formal research on optimising data visualisation is certainly done, and very necessary it is too. There’s far too many examples to list, but classics in the field might include the paper “Graphical Perception” by Cleveland and McGill, which helped us understand which types of charts were conducive to being visually decoded accurately by us humans and our built-in limitations.

Commercially, companies like IBM or Autodesk or Google have research departments tackling related questions. In academia, there’s groups like the University of Washington Interactive Data Lab (which, interestingly enough, started out as the Stanford Vizualisation Group whose work on “Polaris” was later released commercially as none other than Tableau software).

If you’re looking for ideas to contribute to on this front, Stephen Few maintains a list of some research he’d like to see done on the subject in future, and no doubt there are infinitely many more possibilities if none of those pique your curiosity.

But the point is: for certain given aims, it is often possible to use experimental procedures and the resulting data, to say, as surely as we can say many things, visualisation A is better than visualisation B at achieving its aim.

But not go too far in expressing certainty here! There are several things to note, all contributing to the fact that very often there is not one best viz for a single dataset – context is key.

  • What is the aim of the viz? We covered that one already. Using a set of attractive colours may be more important than correct labelling on axes if you’re wanting to sell a poster for instance. Certain types of chart make for easier and more accurate types of particular comparisons than others. If you’re trying to learn or teach how to create a particular type of uber-creative chart in a certain tool, then you’re going to rather fail to accomplish that if you end up making a bar chart.
  • Who is the audience? For example, some charts can convey a lot of information is a small space; for instance box-and-whisker plots. An analyst or statistician will probably very happily receive these plots to understand and compare distributions and other descriptive stats in the right circumstances. I love them.However, extensive experience tells me that, no, the average person in the street does not. They are far less intuitive than bar or line charts to the non-analytically inclined/trained. However inefficient you might regard it, a table and 3 histograms might communicate the insight to them more successfully than a boxplot would. If they show an interest, by all means take the time to explain how to read a box plot; extol the virtues of the data-based lifestyle we all know; rejoice in being able to teach a fellow human a useful new piece of knowledge. But, in reality, your short-term job is more likely to be to communicate an important insight rather than provide an A-level statistics course – and if you don’t do well at fulfilling what you’re being employed to do, then you might not be employed to do it for all that long.

As well as there being no single best viz type in a generic sense, there’s also no one universally worst viz type. If there was, the datarati would just ban it. Which, I guess, some people are inclined to do – but, sorry, pie charts still exist. And they’re still at least “locally-good” in some contexts – like this one (source: everywhere on the internet):

pie

But, hey, you don’t have the time to run multiple experiments on multiple audiences. Let’s imagine you also are quite new to the game, with very little personal experience. How would you know which viz type to pick? Well, this is going to be a pretty boring answer sorry – and there’s more to elaborate on later, but, one way relates to the fact that, just like in any other field there, are actually “experts” in data viz. And outside of Michael Gove’s deluded rants, we should acknowledge they usually have some value.

In 1928, Bertrand Russell wrote an essay called ‘On the Value of Scepticism‘, where he laid out 3 guidelines for life in general.

 (1) that when the experts are agreed, the opposite opinion cannot be held to be certain;

(2) that when they are not agreed, no opinion can be regarded as certain by a non-expert;

and (3) that when they all hold that no sufficient grounds for a positive opinion exist, the ordinary man would do well to suspend his judgment.

So, we can bastardise these a bit to give it a dataviz context. If you’re really unsure of what viz to pick, then refer to some set of experts (to which we must acknowledge there’s subjectivity in picking…perhaps more on this in future).

If “experts” mostly think that data of type D used to convey an insight of type I to an audience of type A for purpose P is best represented in a line chart, then that’s probably the way to go if you don’t have substantial reason to believe otherwise. Russell would say that at least you can’t be held as being “certainly wrong” in your decision, even if your boss complains. Likewise, if there’s honestly no concurrence in opinion, then, have a go and take your pick of the suggestions – again, no-one should tell you off for because you did something unquestionably wrong!

For example, my bias is towards feeling that, when communicating “standard” insights efficiently via charts to a literate but non-expert audience, you can’t go too far wrong in reading some of Stephen Few’s books. Harsh and austere they may seem at times, but I believe them to be based on quality research in fields such as human perception as well as experience in the field.

But that’s not to say that his well founded, well presented guidelines, are always right. Just because 90% of the time you might be most successful in representing a certain type of time series as a line chart doesn’t mean that you always will be. Remember also, you may have a totally different aim to the audience to whom Mr Few aims his books at, in which case you cannot assume at all that the same best-practice standards would apply.

And, despite the above guidelines, because (amongst other reasons) not all possible information is ever available to us at any given time, sometimes experts are simply wrong. It turns out that the earth probably isn’t the centre of the universe, despite what you’d probably hear if you went back to experts from a millennia ago. You should just take care to find some decent reason to doubt the prevailing expertise, rather than simply ignoring it.

What we deem as the relative “goodness” of data viz techniques is also surely not static over time. For one, not all forms of data visualisation have existed since the dawn of mankind.

The aforementioned box and whisker plot is held to have been invented by John Tukey. He was only born in 1915, so if I were to travel back 200 years in time with my perfectly presented plot, then it’s unlikely I’d find many people to who find it intuitive to interpret. Hence, if my aim was to be to communicate insights quickly and clearly, then on the balance of probabilities this would probably be a bad attempt. It may not be the worst attempt, as the concept is still valid and hence could likely be explained to some inhabitants of the time – but in terms of bang for buck, there’d be no doubt be higher peaks in the “communicating data insights quickly” landscape available to me nearby.

We should also remember that time hasn’t stopped. Contrary to Francis Fukuyama’s famous essay and book, we probably haven’t reached the end of history even politically just yet, and we most certainly haven’t done so in the world of data. Given the rate of usable data creation, it might be that we’ve only dipped our toe in so far. So, what we think is best practice today may likely not be the same a hundred years hence; some of it may not be so even next year.

Some, but not all, obstacles or opportunities surround technology. Already the world has moved very quickly from graph paper, to desktop PCs, to people carrying around super-computers that only have small screens in their pockets. The most effective, most efficient, ways to communicate data insights will differ in each case. As an example I’m very familiar with, the  Tableau software application, clearly acknowledged this in their last release which includes facilities for displaying data differently depending on what device they’re been viewed on. Not that we need to throw the baby out with the bathwater, but even our hero Mr Tukey may not have had the iPhone 7 in mind when considering optimum data presentation.

Smartwatches have also appeared, albeit are not so mainstream at the moment. How do you communicate data stories when you have literally an inch of screen to play with? Is it possible? Almost certainly so, but probably not in the same way as on a 32 inch screen; and are the personal characteristics and needs of smart watch users anyway the same as the audience who views vizzes on a larger screen?

And what if Amazon (Echo), Google (Home) and others are right to think that in the future a substantial amount of our information based interactions may be done verbally, to a box that sits on the kitchen counter and doesn’t even have a screen? What does “data visualisation” mean in this context? Is it even a thing? But a lot of the questions I might want to ask my future good friend Alexa might well be questions that can only answered by some transformation and re-presentation in audio form of data.

I already can verbally ask my phone to provide me some forms of dataviz. In the below example, it shows me a chart and a summary table. It also provides me a very brief audio summary for the occasions where I can’t view the screen, shown in the bold text above the chart. But, I can’t say I’ve heard of a huge amount of discussion about how to optimise the audio part of the “viz” for insight. Perhaps there should be.

image

Technology aside though, the field should not rest on its laurels; the line chart may or may not ever die, but experimentation and new ideas should always be welcomed. I’d argue that we may be able to prove  in many cases that, today, for a given audience, for a given aim, with a given dataset, out of the various visualisations we most commonly have access to, that one is demonstrably better than another, and that we can back that up via the scientific method.

But what if there’s an even better one out there we never even thought of? What if there is some form of time series that is best visualised in a pie chart? OK, it may seem pretty unlikely but, as per other fields of scientific endeavour, we shouldn’t stop people testing their hypotheses – as long as they remain ethical – or the march of progress may be severely hampered.

Plus, we might all be out of a job. If we fall into the trap of thinking the best of our knowledge today is the best of all knowledge that will ever be available, that the haphazard messy inefficiencies of creativity are a distraction from the proven-efficient execution of the task at hand, then it’ll not be too long before a lot of the typical role of a basic data analyst is swallowed up in the impending march of our robotic overlords.

Remember, a key job of a lot of data-people is really to answer important questions, not to draw charts. You do the second in order to facilitate the first, but your personal approach to insight generation is often in actuality a means to another end.

Your customer wants to know “in what month were my sales highest?”. And, lo and behold, when I open a spreadsheet in the sort of technology that many people treat as the norm these days, Google sheets, I find that I can simply type or speak in the question “What month were my sales highest?” and it tells me very clearly, for free, immediately, without employing anyone to do anything or waiting for someone to get back from their holiday.

capture

Yes, that feature only copes with pretty simplistic analysis at the moment, and you have to be careful how you phrase your questions – but the results are only going to get better over time, and spread into more and more products. Microsoft PowerBI already has a basic natural language feature, and Tableau is at a minimum researching into it. Just wait until this is all hooked up to the various technological “cognitive services” which are already on offer in some form or other. A reliable, auto-generated answer to “what will my sales be next week if I launch a new product category today?” may free up a few more people to spend time with their family, euphemistically or otherwise.

So in the name of progress, we can and should, per Chris’ original tweet, be open to giving and receiving constructive criticism, whether positive or negative. There is value in this, even in the unlikely event that we have already hit on the single best, universal, way of of representing a particular dataset for all time.

Recall John Stuart Mill’s famous essay, “On Liberty” (written in 1869, yes, even before the boxplot existed). It’s so very quotable for many parts of life, but let’s take for example a paragraph from chapter two, regarding the “liberty of thought and discussion”. Why shouldn’t we ban opinions, even when we believe we know them to be bad opinions?

But the peculiar evil of silencing the expression of an opinion is, that it is robbing the human race; posterity as well as the existing generation; those who dissent from the opinion, still more than those who hold it.

If the opinion is right, they are deprived of the opportunity of exchanging error for truth: if wrong, they lose, what is almost as great a benefit, the clearer perception and livelier impression of truth, produced by its collision with error.

Are pie charts good for a specific combination of time series data, audience and aim?

Well – assuming a particularly charitable view of human discourse –  after rational discussion we will either establish that yes, they actually are, in which case the naysayers can “exchange error for truth” to the benefit of our entire field.

Or, if the consensus view of “no way” holds strong, then, having been tested, we will have reinforced the reason why this is in both the minds of the questioner, and ourselves – hence helping us remember the good reasons why we hold our opinions, and ensuring we never lapse into the depths of pseudo-religious dogma.

Remember the exciting new features Tableau demoed at #data15 – have we got them yet?

As we get closer towards the thrills of this year’s Tableau Conference (#data16), I wanted to look back at one of the most fun parts of the last year’s conference – the “devs on stage” section. That’s the part where Tableau employees announce and demonstrate some of the new features that they’re working on. No guarantees are made as to whether they’ll ever see the light of day, let alone be in the next release –  but, in reality, the audience gets excited enough that there’d probably be a riot if none of them ever turned up.

Having made some notes of what was shown in last year’s conference (which was imaginatively entitled #data15), I decided to review the list and see how many of those features have turned up so far. After all, it’s all very well to announce fun new stuff to a crowd of 10,000 over-excited analysts…but does Tableau tend to follow through on it? Let’s check!

(Please bear in mind that these are just the features I found significant enough to scrawl down through the jet-lag; it’s not necessarily a comprehensive review of what was on show.)

Improvements in the Data category:

Feature Does it exist yet?
Improvements to the automatic data cleanup feature recently released that can import Excel type files that are formatted in an otherwise painful way for analysis Yes – Tableau 9.2 brought features like “sub-table detection” to its data interpreter feature
Can now understand hundreds of different date formats Hmm…I’m not sure.  I’ve not had any problems with dates, but then again I was lucky enough never to have many!
The Data Source screen will now allow Tableau to natively “union” data (as in SQL UNION), as well as join it, just by clicking and dragging. Yes – Tableau 9.3 allows drag and drop unioning. But only on Excel and text files. Here’s hoping they expand the scope of that to databases in the future.
Cross-database joins Yes, cross-database joins are in Tableau 10.

Improvements in the Visualisation category:

Feature Does it exist yet?
Enhancements to the text table visualisation Yes – Tableau 9.2 brought the ability to show totals at the top of columns, and 9.3 allowed excluding totals from colour-coding.
Data highlighter Yes – Tableau 10 includes the highlighter feature.
New native geospatial geographies Yes – 9.2 and 9.3 both added or updated some geographies.
A connector to allow connection to spatial data files No – I don’t think I’ve seen this one anywhere.
Custom geographic territory creation Yes – Tableau 10 has a couple of methods to let you do that.
Integration with Mapbox Yes- Tableau 9.2 lets you use Mapbox maps.
Tooltips can now contain worksheets themselves. No – not seen this yet.

Improvements in the Analysis category:

Feature Does it exist yet?
Automatic outlier detection No
Automatic cluster detection Yes, that’s a new Tableau 10 feature
You can “use” reference lines / bands now for things beyond just static display Hmm…I don’t recall seeing any changes in this area. No?

Improvements in the Self-Service category:

Feature Does it exist yet?
There will be a custom server homepage for each user Not sure – the look and feel of the home page has changed, and the user can mark favourites etc. but I have not noticed huge changes in customisation from previous versions.
There will be analytics on the workbooks themselves  Yes – Tableau 9.3 brought content analytics to workbooks on server.Some metadata is shown in the content lists directly, plus you can sort by view count.
Searching will become better Yes – also came with Tableau 9.3. Search shows you the most popular results first, with indicators as to usage.
Version control Yes – Tableau 9.3 brought workbook revision history for server, and Tableau 10 enhanced it.
Improvements to security UI Yes – not 100% sure which version, but the security UI changed. New features were also added, such as setting and locking project permissions in 9.2.
A web interface for managing the Tableau server Not sure about this one, but I don’t recall seeing it anywhere. I’d venture “no”, but am open to correction!

Improvements in the Dashboarding category:

Feature Does it exist yet?
Improvements to web editing Yes – most versions of Tableau since then have brought improvements here. In Tableau 10 you can create complete dashboards from scratch via the web.
Global formatting  Yes, this came in Tableau 10.
Cross datasource filtering Yes, this super-popular feature also came with Tableau 10.
Device preview Yes, this is available in Tableau 10.
Device specific dashboards. Yes, also from Tableau 10.

Improvements in the Mobile category:

Feature Does it exist yet?
A  Tableau iPhone app Yes – download it here. An Android app was also released recently.
 iPad app – Vizable Was actually launched at #data15, so yes, it’s here.

Summary

Hey, a decent result! Most of the features demonstrated last year are already in the latest official release.

And for some of those that aren’t, such as outlier detection, it feels like a framework has been put in place for the possible later integration of them. In that particular case, you can imagine it being located in the same place, and working in the same way, as the already-released clustering function.

There are perhaps a couple that it’s slightly sad to see haven’t made it just yet – I’m mainly thinking of embedded vizzes in tooltips here. From the celebratory cheers, that was pretty popular with the assembled crowds when demoed in 2015, so it’ll be interesting to see whether any mention of development on that front is noted in this year’s talks.

There are also some features released that I’d like to see grow in scope – the union feature would be the obvious one for me. I’d love to see the ability to easily union database tables beyond Excel/text sources. And now we have cross-database joins, perhaps even unioning between different technology stacks.

Bonus points due: In my 2015 notes, I had mentioned that a feature I had heard a lot of colleague-interest in, that was not mentioned at all in the keynote, was data driven alerting; the ability to be notified only if your KPI goes wild for instance. Sales managers might get bored of checking their dashboards each day just to see if sales were down when 95% of the time everything is fine, so why not just send them an email when that event actually occurs?

Well, the exciting news on that front is that some steps towards that have been announced for Tableau 10.1, which is in beta now so will surely be released quite soon.

Described as “conditional subscriptions”, the feature will allow you to “receive email updates when data is present in your viz”. That’s perhaps a slight abstraction from the most obvious form of data-driven alerting. But it’s easy to see that, with a bit of thought, analysts will be able to build vizzes that give exactly the sort of alerting functionality my colleagues, and many many others in the wider world, have been asking for. Thanks for that, developer heroes!

 

Help decide who self-driving cars should kill

Automated self-driving cars are surely on their way. Given the direction of technological development, this seems a safe enough prediction to make – at least when taking the coward’s option of not specifying a time frame.

A self-driving car is, after all, a data processor, and we like to think that we’re getting better at dealing with data every day. Simplistically, in such a car sensors provide some data (e.g. “there is a pedestrian in front of the car”), some automated decision-making module comes up with an intervention (“best stop the car”), and a process is carried out to enact that decision (“put the brakes on”).

Here for example is a visualisation of what a test Google automated car “sees”.

Capture.PNG

My hope and expectation is that, when they have reached a sophisticated enough level of operation and are at a certain threshold of prevalence, road travel will become safer.

Today’s road travel is not super-safe. According to the Association for Safe International Road Travel, around 1.3 million people die in road crashes each year – and 20-50 million more are injured or disabled. It’s the single leading cause of death amongst some younger demographics.

Perhaps automated vehicles could save some of these lives, and prevent many of the serious injuries. After all, a few years ago, The Royal Society for the Prevention of Accidents claimed that 95% of road accidents involve some human error, and 76% were solely due to human factors. There is a lot at stake here. And of course there are many more positive impacts (as well as some potential negatives) one might expect from this sort of automation beyond direct life-saving, which we’ll not go into here.

At this moment in time, humanity is getting closer to developing self-driving cars; perhaps surprisingly close to anyone who does not follow the topic. Certainly we do not have any totally automated car capable of (or authorised to be) driving every road safely at the moment, and that will probably remain true for a while yet. But, piece by piece, some manufacturers are automating at least some of the traditionally human aspects of driving, and several undoubtedly have their sights on full automation one day.

Some examples:

Landrover are shortly to be testing semi-autonomous cars that can communicate with other such cars around them.

The test fleet will be able to recognise cones and barriers using a forward facing 3D-scanning camera; brake automatically when it senses a potential collision in a traffic jam; talk to each other via radio signals and warn of upcoming hazards; and know when an ambulance, police car, or fire engine is approaching.

BMW already sells a suite of “driver assistance” features on some cars, including what they term intelligent parking, intelligent driving and intelligent vision. For people with my driving skill level (I’m not one of the statistically improbable 80% of people who think they are above average drivers), clearly the parking assistant is the most exciting: it both finds a space that your car would actually fit into, and then does the tricky parallel or perpendicular parking steering for you. Here it is in action:

Nissan are developing a “ProPilot” featuring, which also aims to help you drive safely, change lanes automatically, navigate crossroads and park.

Tesla are have probably the most famous “autopilot” system available right now. This includes features that will automatically keep your car in lane at a sensible speed, change lanes safely for you, alert the driver to unexpected dangers and park the car neatly for you. This is likely most of what you need for full automation for some simpler trips, although they are clear its a beta feature and that it is important you keep your hands on the steering wheel and remain observant when using it. Presumably preempting our inbuilt tendency towards laziness, it even goes so far as to sense when you haven’t touched the wheel for a while and tells you to concentrate; eventually coming to a stop if it can’t tell you’re still alive and engaged.

Here’s a couple of people totally disobeying the instructions, and hence nicely displaying its features.

And here’s how to auto-park a Tesla:

 

Uber seems particularly confident (when do they not?). Earlier this month, the Guardian reported that:

Uber passengers in Pittsburgh will be able to hail self-driving cars for the first time within the next few weeks as the taxi firm tests its future vision of transportation in the city. The company said on Thursday that an unspecified number of autonomous Ford Fusions will be available to pick up passengers as with normal Uber vehicles. The cars won’t exactly be driverless – they will have human drivers as backup – but they are the next step towards a fully automated fleet.

uber.jpg

 

And of course Google have been developing a fully self-driving car for a few years now. Here’s a cheesy PR video to show their fun little pods in action.

But no matter how advanced these vehicles get, road accidents will inevitably happen.

In recent times there has been a fatality famously associated with the Tesla autopilot – although as Tesla are obviously at pains to point out, one should remember that it is technically a product in beta and they are clear that you should always concentrate on the road and be ready to take over manually; so this accident might, at best, be attributed to a mix of the autopilot and the human in reality.

However, there will always be some set of circumstances or seemingly unlikely event that neither human or computer would be able to handle without someone getting injured or killed. Computers can’t beat physics, and if another car is heading up your one-way road, which happens to have a brick wall on one side and a high cliff on the other side, at 100 mph then some sort of bad incident is going to happen. The new question we have to ask ourselves in the era of automation is: exactly what incident should that be?

This obviously isn’t actually a new question. In the uncountable number of human-driven road incidents requiring some degree of driver intervention to avoid danger that happen each day, a human is deciding what to do. We just don’t codify it so formally. We don’t sit around planning it out in advance.

In the contrived scenario I described above, where you’re between a wall and a cliff with an oncoming car you can’t get around, perhaps you instinctively know what you’d do. Or perhaps you don’t – but if you are unfortunate enough to have it happen to you, you’ll likely do something. This may or may not the same action as you’d rationally pick beforehand, given the scenario. We rely on a mixture of human instinct, driver training and reflexes to handle these situations, implicitly accepting that the price of over a million deaths a year is worth paying to be able to undergo road travel.

So imagine you’re the programmer of the automated car. Perhaps you believe you might eliminate just half of those deaths if you do your job correctly; which would of course be an awesome achievement. But the car still needs to know what to do if it finds itself between a rock and a hard place. How should it decide? In reality, this is obviously complicated far further insomuch as there are a near-infinite number of scenarios in reality and no-one can explicitly program for each one (hence the need for data-sciencey techniques to learn from experience rather than simple “if X then Y” code). But, simplistically, what “morals” should your car be programmed with when it comes to potentially deadly accidents?

  • Should it always try and save the driver? (akin to a human driver’s instinct for self-preservation, if that’s what you believe we have.)
  • Or should it concentrate on saving any passengers in the same car as the driver?
  • How about the other car driver involved?
  • Or any nearby, unrelated, pedestrians?
  • Or the cute puppy innocently strolling along this wall-cliff precipice?
  • Does it make a difference if the car is explicitly taking an option (“steer left and ram into the car on the opposite side of the road”) vs passively continuing to do what it is doing (“do nothing which will result in you hitting the pedestrian standing in front of the wall”).
    • You might think this isn’t a rational factor, but anyone who has studied the famous “trolley problem” thought experiment will realise people can be quite squeamish about this. In fact, this whole debate boils down to some extent as being a realisation of that very thought experiment.
  • Does it make a difference how many people are involved? Hitting a group of 4 pedestrians vs a car that has 1 occupant? Or vice versa?
  • What about interactions with probabilities? Often you can’t be 100% sure that an accident will result in a death. What if the choice is between a 90% chance of killing 1 person or a 45% chance of killing two people?
  • Does it make a difference what the people are doing? Perhaps the driver is ignoring the speed limit, or pedestrians are jaywalking somewhere they shouldn’t. Does that change anything?
  • Does it even perhaps make a difference as to who the people involved are? Are some people more important to save than others?

Well, the MIT Media Lab is now giving you the opportunity to feed into those sorts of decisions, via its Moral Machine website.

To quote:

From self-driving cars on public roads to self-piloting reusable rockets landing on self-sailing ships, machine intelligence is supporting or entirely taking over ever more complex human activities at an ever increasing pace. The greater autonomy given machine intelligence in these roles can result in situations where they have to make autonomous choices involving human life and limb. This calls for not just a clearer understanding of how humans make such choices, but also a clearer understanding of how humans perceive machine intelligence making such choices.

Effectively, they are crowd-sourcing life-and-death ethics. This is not to say that any car manufacturer will necessarily take the results into account, but at least they may learn what the responding humans (which we must note is far from a random sample of humanity) think they should do, and the level of certainty we feel about it.

Once you arrive, you’ll be presented with several scenarios, and asked what you think the car should do in that scenario. There will always be some death involved (although not always human death!). It’ll also give you a textual description of who and what is happening. It’s then up to you to pick out of the two options given which the car should do.

Here’s an example:

car_ethics.PNG

You see there that a child is crossing the road, although the walk signal is on red, so they should really have waited. The car can choose to hit the child who will then die, or it can choose to ram itself into an inconvenient obstacle whereby the child will live, but the driver will die. What should it do?

You get the picture; click through a bunch on those and not only does MIT gather a sense of humanity’s moral data on these issues, but you get to compare yourself to other respondents on axes such as “saving more lives”, “upholding the law” and so on. You’ll also find out if you have implied gender, age or “social value” preferences in who you choose to kill with your decisions.

This comparison report isn’t going to be overly scientific on an individual level (you only have a few scenarios to choose from apart from anything else) but it may be thought-provoking.

After all, networked cars of the future may well be able to consult the internet and use facts it finds there to aid decisions. A simple extension of Facebook’s ability to face-recognise you in your friends’ photos could theoretically lead to input variables in these decisions like “Hey, this guy only has 5 twitter friends, he’ll be less missed than this other one who has 5000!” or  “Hey, this lady has a particularly high Klout score (remember those?) so we should definitely save her!”.

You don’t think we’d be so callous as to allow the production of a score regarding “who should live?”. Well, firstly, we have to. Having the car kill someone by not changing its direction or speed, when the option is there that it could do so, is still a life-and-death decision, even if it results in no new action.

Plus we already do use scores in domains that infer mortality. Perhaps stretching the comparison to its limits, here’s one example (and please do not take it that I necessarily approve or disapprove of its use, that’s a story for another day – it’s just the first one that leaps to mind).

The National Institute for Health and Care Excellence (NICE) provides guidance to the UK National Health Service on how to improve healthcare. The NHS, nationalised as it is (for the moment…beware our Government’s slow massacre of it though), still exists within the framework of capitalism and is held to account on sticking to a budget. It has to buy medicines from private companies and it can only afford so many. This implies that not everyone can have every treatment on the market. So how does it decide what treatments should be offered to who?

Under this framework, we can’t simply go on “give whatever is most likely to save this person’s life” because some of the best treatments may cost so much that giving it to 10 people, of which 90% will probably be cured, might mean that another 100 people who could have been treated at an 80% success rate will die, because there was no money left for the cheaper treatment.

So how does it work? Well, to over-simplify, they have famously used a data-driven process involving a Quality-adjusted life year (QALYS) metric.

A measure of the state of health of a person or group in which the benefits, in terms of length of life, are adjusted to reflect the quality of life. One QALY is equal to 1 year of life in perfect health.

QALYs are calculated by estimating the years of life remaining for a patient following a particular treatment or intervention and weighting each year with a quality-of-life score (on a 0 to 1 scale). It is often measured in terms of the person’s ability to carry out the activities of daily life, and freedom from pain and mental disturbance.

At least until a few years ago, they had guidelines that an intervention that cost the NHS less that £20k per QALY gained was deemed cost effective. It’s vital to note that this “cost effectiveness” was not the only factor that feeds into whether the treatment should be offered or not, but it was one such factor.

This seemingly quite emotionless method of measurement sits ill with many people: how can you value life in money? Isn’t there a risk that it penalises older people? How do you evaluate “quality”? There are many potential debates, both philosophical and practical.

But if this measure isn’t to be used, then how should we decide how to divide up a limited number of resources when there’s not enough for everyone, and those who don’t get them may suffer, even die?

Likewise, if an automated car cannot keep everyone safe, just as a human-driven car has never been able to, then on which measure involving which data should we base the decision as to who to save on?

But even if we can settle on a consensus answer to that, and technology magically improves to the point where implementing it reliably is childsplay, actually getting these vehicles onto the road en masse is not likely to be simple. Yes, time to blame humans again.

Studies have already looked at the sort of questions that the Moral Machine website poses you. “The Social Dilemma of Autonomous Vehicles” by Bonnefan et al is a paper, published in the journal Science, in which the researchers ran their own surveys as to what people thought these cars should be programmed to do in terms of the balance between specifically protecting the driver vs minimising the total number of causalities, which may include other drivers, pedestrians, and so on.

In general respondents fitted what the researchers termed a utilitarian mindset: minimise the number of casualties overall, no need to try and save the driver at all costs.

In Study 1 (n = 182), 76% of participants thought that it would be more moral for AVs to sacrifice one passenger, rather than kill ten pedestrians (with a 95% confidence interval of 69—82). These same participants were later asked to rate which was the most moral way to program AVs, on a scale from 0 (protect the passenger at all costs) to 100 (minimize the number of casualties). They overwhelmingly expressed a moral preference for utilitarian AVs programmed to minimize the number of casualties (median = 85, Fig. 2a).

(This is also reflected in the results of the Moral Machine website at the time of writing.)

Horray for the driving public; selfless to the last, every life matters, etc. etc. Or does it?

Well, later on, the survey tackled questions around, not only what should these vehicles do in emergencies, but how comfortable would they personally be if vehicles did behave that way, and lastly, how likely would they be to buy one that exhibited that behaviour?

Of course, even in thought experiments, bad things seem worse if they’re likely to happen to you or those you love.

even though participants still agreed that utilitarian AVs were the most moral, they preferred the selfprotective model for themselves.

Once more, it appears that people praise utilitarian, self-sacrificing AVs, and welcome them on the road, without actually wanting to buy one for themselves.

Humans, at least in that study, appear have a fairly high consensus that minimising causalities is key in these decisions. But we also have a predictable tendency to be the sort of freeloaders that prefer for everybody else to follow a net-safety-promoting policy, as long as we don’t have to ourselves. This would seem to be a problem that it’s unlikely even the highest quality data or most advanced algorithm will solve for us at present.

The Tableau #MakeoverMonday doesn’t need to be complicated

For a while, a couple of  key members of the insatiably effervescent Tableau community, Andy Cotgreave and Andy Kriebel, have been running a “Makeover Monday” activity. Read more and get involved here – but a simplistic summary would be that they distribute a nicely processed dataset on a topic of the day that relates to someone else’s existing visualisation, and all the rest of us Tableau fans can have a go at making our own chart, dashboard or similar to share back with the community so we can inspire and learn from each other.

It’s a great idea, and generates a whole bunch of interesting entries each week. But Andy K noticed that each Monday’s dataset was getting way more downloads than the number of charts later uploaded, and opened a discussion as to why.

There are of course many possible reasons, but one that came through strongly was that, whilst they were interested in the principle, people didn’t think they had the time to produce something comparable to some of the masterpieces that frequent the submissions. That’s a sentiment I wholeheartedly agree with, and, in retrospect – albeit subconsciously – why I never gave it a go myself.

Chris Love, someone who likely interacts with far more Tableau users than most of us do, makes the same point in his post on the benefits of Keeping It Simple Stupid. I believe it was written before the current MakeoverMonday discussions began in earnest, but was certainly very prescient in its applications to this question.

Despite this awesome community many new users I speak to are often put off sharing their work because of the high level of vizzes out there. They worry their work simply isn’t up to scratch because it doesn’t offer the same level of complexity.

 

To be clear, the original Makeover Monday guidelines did include the guideline that it was quite proper to just spend an hour fiddling around with it. But firstly, after a hard day battling against the dark forces of poor data quality and data-free decisions at work, it can be a struggle to keep on trucking for another hour, however fun it would be in other contexts.

And that’s if you can persuade your family that they should let you keep tapping away for another hour doing what, from the outside, looks kind of like you forgot to finish work. In fact a lot of the worship I have for the zens is how they fit what they do into their lives.

But, beyond that, an hour is not going to be enough to “compete” with the best of what you see other people doing in terms of presentation quality.

I like to think I’m quite adept with Tableau (hey, I have a qualification and everything :-)), but I doubt I could create and validate something like this beauty using an unfamiliar dataset on an unfamiliar topic in under an hour.

 

It’s beautiful; the authors of this and many other Monday Makeovers clearly have an immense amount of skill and vision. It is fascinating to see both the design ideas and technical implementation required to coerce Tableau into doing certain non-native things. I love seeing this stuff, and very much hope it continues.

But if one is not prepared to commit the sort of time needed to do that regularly to this activity, then one has to try and get over the psychological difficulty of sharing a piece of work which one perceives is likely to be thought of as “worse” than what’s already there. This is through no fault of the MakeoverMonday chiefs, who make it very clear that producing a NYT infographic each week is not the aim here – but I certainly see why it’s a deterrent from more of the data-downloaders uploading their work. And it’s great to see that topic being directly addressed.

After all, for those of us who use Tableau for the day-to-day joys of business, we probably don’t rush off and produce something like this wonderful piece every time some product owner comes along to ask us an “urgent” question.

Instead, we spend a few minutes making a line chart, that gives them some insight into the answer to their question. We upload an interactive bar chart, with default Tableau colours and fonts, to let them explore a bit deeper and so on. We sit in a meeting and dynamically provide an answer to enable live decision-making that before we had tools like this would have had to wait a couple of weeks to get a csv report on. Real value is generated, and people are sometimes even impressed, despite the fact that we didn’t include hand-drawn iconography, gradient-filled with the company colours.

Something like this perhaps:

Yes, it’s “simple”, it’s unlikely to go Tableau-viral, but it makes a key story held within that data very clear to see. And its far more typical of the day-to-day Tableau use I see in the workplace.

For the average business question, we probably do not spend a few hours researching and designing a beautiful colour scheme in order to perform the underlying maths needed to make a dashboard combining a hexmap, a Sankey chart and a network graph in a tool that is not primarily designed to do any of those things directly.

No-one doubts that you can cajole Tableau into such artistry, and there is sometimes real value obtainable by doing so,  or that those who carry it out may be creative geniuses -but unless they have a day job that is very different than that of mine and my colleagues, then I suspect it’s not their day-to-day either. It’s probably more an expression of their talent and passion for the Tableau product.

Pragmatically, if I need to make, for instance, a quick network chart for “business”, then, all other things being equal, I’m afraid I’m more likely I get out a tool that’s designed to do that rather than take a bit more time to work out how to implement it in Tableau, no matter how much I love it (by the way, Gephi is my tool of choice for that – it is nowhere near as user friendly as Tableau, but it is specifically designed for that sort of graph visualisation; also recent versions of Alteryx can do the basics). Honestly, it’s rare for me that these more unusual charts need to be part of a standard dashboard; our organisation is simply not at a level of viz-maturity where these diagrams are the most useful for most people in the intended audience, if indeed they are for many organisations.

And if you’re a professional whose job is creating awesome newspaper style infographics, then I suspect that you’re not using Tableau as the tool that provides the final output either, more often than not. That’s not its key strength in my view; that’s not how they sell it – although they are justly proud of the design-thought that does go into the software in general. But if paper-WSJ is your target audience, you might be better of using a more custom design-focused tool, like Adobe Illustrator (and Coursera will teach you that specific use-case, if you’re interested).

I hope nothing here will cause offence. I do understand the excitement and admire anyone’s efforts to push the boundaries of the tool – I have done so myself, spending way more time than is strictly speaking necessary in terms of a theoretical metric of “insights generated per hour” to make something that looks cool, whether in or out of work. For a certain kind of person it’s fun, it is a nice challenge, it’s a change from a blue line on top of an orange line, and sometimes it might even produce a revelation that really does change the world in some way.

This work surely needs to be done; adherents to (a bastardised version of) Thomas Kuhn’s theory of scientific revolutions might even claim this “pushing to the limits” as one of the ways of engendering the mini-crisis necessary to drive forward real progress in the field. I’m sure some of the valuable Tableau “ideas“, that feed the development of the software in part, have come from people pushing the envelope, finding value, and realising there should be an easier way to generate it.

There’s also the issue of engagement: depending on your aim, optimising your work for being shared worldwide may be more important to you than optimising it for efficiency, or even clarity and accuracy. This may sound like heresy, and it may even touch on ethical issues, but I suspect a survey of the most well-known visualisations outside of the data community would reveal a discontinuity with the ideals of Stephen Few et al!

But it may also be intimidating to the weary data voyager when deciding whether to participate in these sort of Tableau community activities if it seems like everyone else produces Da Vinci masterpieces on demand.

Now, I can’t prove this with data right now, sorry, but I just think it cannot be the case. You may see a lot of fancy and amazing things on the internet – but that’s the nature of how stuff gets shared around; it’s a key component of virality. If you create a default line chart, it may actually be the best answer to a given question, but outside a small community who is actively interested in the subject domain at hand, it’s not necessarily going to get much notice. I mean, you could probably find someone who made a Very Good Decision based even on those ghastly Excel 2003 default charts with the horrendous grey background if you try hard enough.

excel2003

Never forget…

 

So, anyway, time to put my money where my mouth is and actually participate in MakeoverMonday. I don’t need to spend even an hour making something if I don’t want to, right?  (after all, I’ve used up all my time writing the above!)

Tableau is sold with emphasis on its speed of data sense-marking, claiming to enable producing something reasonably intelligible 10-100x faster than other tools. If we buy into that hype, then spending 10 minutes of Tableau time (necessitating making 1 less cup of tea perhaps) should enable me to produce something that it could have taken up to 17 hours to produce in Excel.

OK, that might be pushing the marketing rather too literally, but the point is hopefully clear. For #MakeoverMonday, some people may concentrate on how far can they push Tableau outside of its comfort zone, others may focus on how they can integrate the latest best practice in visual design, whereas here I will concentrate on whether I can make anything intelligible in the time that it takes to wait for a coffee in Starbucks (on a bad day) – the “10 minute” viz.

So here’s my first “baked in just 10 minutes” viz on the latest MakeoverMonday topic – the growth of the population of Bermuda. Nothing fancy, time ran out just as I was changing fonts, but hey, it’s a readable chart that tells you something about the population change in Bermuda over time. Click through for the slightly interactive version – although of course, it, for instance, has the nasty default tooltips, thanks to the 10 minutes running out just as I was changing the font for the chart titles…

Bermuda population growth.png

 

 

#VisualizeNoMalaria: Let’s all help build an anti-Malaria dataset

As well as just being plain old fun, data can also be an enabler for “good” in the world. Several organisations are clearly aware of this; both Tableau and Alteryx now have wings specifically for doing good. There are whole organisations set up to promote beneficial uses of data, such as DataKind, and a bunch of people write reports on the topic – for example Nesta’s report “Data for good“.

And it’s not hard to get involved. Here’s a simple task you can do in a few minutes (or a few weeks if you have the time) from the comfort of your home, thanks to a collaboration between Tableau, PATH and the Zambian government: Help them map Zambian buildings.

Whyso? For the cause of eliminating of the scourge of malaria from Zambia. In order to effectively target resources at malaria hotspots (and in future to predict where the disease might flare up); they’re

developing maps that improve our understanding of the existing topology—both the natural and man-made structures that are hospitable to malaria. The team can use this information to respond quickly with medicine to follow up and treat individual malaria cases. The team can also deploy resources such as indoor spraying and bed nets to effectively protect families living in the immediate vicinity.

Zambia isn’t like Manhattan. There’s no nice straightforward grid of streets that even a crazy tourist could understand with minimal training. There’s no 3d-Google-Earth-building level type resource available. The task at hand is therefore establishing, from satellite photos, a detailed map of where buildings and hence people are. One day no doubt an AI will be employed for this job, but right now it remains one for us humans.

Full instructions are in the Tableau blog post, but honestly, it’s pretty easy:

  • If you don’t already have an OpenStreetMap user account, make a free one here.
  • Go to http://tasks.hotosm.org/project/1985 and log in with the OpenStreetMap account
  • Click a square of map, “edit in iD editor”, scan around the map looking for buildings and have fun drawing a box on top of them.

It may not be a particularly fascinating activity for you to do over the long term, but it’s more fun than a game of Threes – and you’ll be helping to build a dataset that may one day save a serious amount of lives, amongst other potential uses.

Well done to all concerned for making it so easy! And if you’ve never poked around the fantastic collaborative project that is OpenStreetMap itself, there’s a bunch of interesting stuff available there for the geographically-inclined data analyst.

 

Is the EU referendum actually a great conspiracy?

(Sorry to anyone bored by the great/hideous Brexit referendum – this is the last post on the topic, well, at least until the event actually happens 🙂 )

Today is the day!  All us UK citizens can cast our direct-democracy vote as to whether the UK should remain in the EU, or say goodbye. It’s been a long, torrid, at times revolting, journey in terms of output from the campaigners, politicians and media. “It is as though the sewers have burst”, said Nick Cohen in the Observer, somewhat accurately. But the vote is today and it’ll therefore all be over soon.

Or will it? Yougov have surveyed on many, many EU referendumy topics. One of the latest included questioning respondents on various conspiracy-esque statements about the result of the referendum. I don’t use the word “conspiracy” in a necessarily derogatory tone – some perceived “conspiracies” turn out to be true, although many do not.

Anyway, here were the statements offered up to the public to pronounce on whether they thought they were probably true, probably false or don’t know.

  • There are plans for further EU integration and enlargement that the EU are deliberately not announcing till after the referendum
  • The BBC & ITN are not commissioning an exit poll in order to allow the vote to be fixed without anyone telling
  • MI5 is working with the UK government to try and stop Britain leaving the EU
  • It is likely that the EU referendum will be rigged

I have listed them in my perception of order of seriousness, although several are open to interpretation regarding the scope and intentionality they imply. The first just relates to the timing of announcing EU events, the last implies the literal undermining of the entire democratic process, implying a pointless referendum beholden to corrupt, criminal actors.

But what did the respondents think of these? Did anyone seriously think that MI5 spies are secretly influencing the result? (*) That the whole referendum is a fraudulent scam?

(*) Well, it’s not quite MI5, but when the Conservative peer Baroness Warsi recently changed her view from Leave to Remain, there were people suggesting she was a  secret Remain campaign plant all this time. Amongst other far more horrific diatribes that I am reluctant to reproduce on this site.

Well, it turns out the answer is yes, a fair amount of people do agree with these statements. Please click through and interact with the visualisation below in order to see the proportion of people agreeing with each statement, with the ability to break it down by age, gender, social grade, region, which political party they voted for in the 2015 general election, and – perhaps most interestingly -how they reported that they intend to vote for in the EU referendum itself: leave vs remain.

EU referendum conspiracy theory poll2

 

A few things I noticed:

There’s a sizeable amount of people that agree with every one of those statements. That’s not to say that they are the same single cohort of people in each case, as the data is too high level to determine that, but every statement has at least 15% of people in favour. There’s not one statement that over half the surveyed people thought was probably false. Not one.

To take perhaps the most dramatic one – nearly a third of the surveyed population think that it’s likely that the EU referendum will be rigged. If this implies “direct” rigging i.e. fiddling with the results, then this is quite a terrifying indictment on our view of the legitimacy of our democratic process.

Sidenote: There does seem to be a movement to “bring your own pen” to the voting stations today, under the premise than the pencils that poll booths traditionally offer leave marks on the ballot papers that could be easily erased and replaced. Although this seems like one of the most annoying and time consuming ways I could imagine of fixing an election result! If you’re going to believe in an over-arching conspiracy here, then I suspect MI5 could have far more efficient methods…

When splitting by demographics and behaviour, clear differences emerge. Flicking through the interactive version will show you the full details not represented in the below text – but in summary, for most statements:

  • A fairly similar proportion of females and males believe they are true. But for those that don’t, females are more likely to say they don’t know whereas males are more likely to go for probably false.
  • Those of social grades ABC1 are generally less likely to think any of the statements are probably true than C2DE, and more likely to think they’re probably false.
  • There is a strong difference in the beliefs of the voters based on whether they’re likely to vote for Leave or Remain. Without exception, the Leavers are more likely to think the statements are probably true than the Remainers.  The proportion of Leavers who think the referendum is likely rigged is over four times the proportion of Remainers.

    EU referendum conspiracy theory poll

  • Digging down deeper into (the somewhat correlated, but not fully so) variable of which political party they supported in the 2015 election, there is one hugely obvious outlier. Those who voted UKIP are way more likely to agree with the statements than others, particularly regarding whether the EU referendum will be rigged. A majority, nearly two thirds, of UKIP voters believe this to be true, in comparison to between 14 and 23% of voters for other parties.

So, what does this mean?

Well, it shows a distinct lack of faith in the system set up for this referendum and trust in the “powers that be” – which is perhaps somewhat understandable, considering the ways the various campaigns have been run.

At first glance, the sheer level of disbelief in the overall integrity of the system seems a notable unhealthy sign of the times though – although I would like to see similar stats taken over previous years in order to determine whether the figure of 28% believing the referendum will be rigged is “normal” for every year. If so, it could certainly explain the non-amazing turnout the UK generally sees in elections.

…except that there’s a curious interaction regarding voting intention, political party and turnout. In a previous post here, we saw that UKIP supporters are one of the subsections of society that appear to be most likely to say that they will turn up and participate in the referendum. However, this is also the segment that is by far the most sceptical of the result being legitimate. UKIP was likely also one of the driving forces that led towards the referendum being called in the first place: if there was no visible block of desire to leave the EU, an issue that UKIP was originally set up to dedicate itself to, then there would have been no reason for a referendum.

That’s not to say other political parties don’t have members with anti-EU views in them, who are individually in places where they might be expected to have a higher influence in political shenanigans than the average UKIP candidate. Two of the highest profile Leave campaigners, Boris Johnson and Michael Gove, are both high-up members of the Conservative party.

But, simplistically, the people who most demanded the referendum and are most likely to go and vote in it also seem to be the people who are least likely to believe its results. Are we seeing a political form of Pascal’s wager?!

It would also suggest that no matter what the result is, the debate will be far from over. Particularly if the result goes to remain, it seems like nearly half of those voting to leave may feel that it has been rigged (of course people are likely forgive rigging more if it produces the answer they want). And even if it goes to Leave, one in ten Remainers are seemingly sceptical of its legitimacy already, which is a sizeable number of people who, even without the psychology surrounding losing a vote to those with different beliefs, believe that the entire system is invalid.

So recently we have learned:

Hey, it’s almost as if it isn’t really the time or place for such a consequential question about the future of the UK to be determined in this manner. Is it too late to call the whole thing off? (answer: yes, I guess it is).

Brexit: Which newspapers support Leave and which Remain?

Being a glutton for punishment, another Brexit question struck me. Which newspapers are formally standing in the Leave camp, and which in the Remain?

This question might strike you as beyond obvious based on the typical political outlook they adhere to and the output of their columnists – but it turns out it’s not as straightforward as I imagined.

Please feel free to click through and interact with the below dashboard. In the full version you can use a dropdown selector to colour code the marks based on who owns the paper, its general political outlook and which party it supported in the 2015 UK general election.

Where do newspapers officially stand on Brexit

A couple of things stood out to me:

  • Right now, the big arguments for Leave are coming to us tinged (well, totally submerged in) with arguments appealing to the right wing of the political spectrum. However, there are papers who typically hold right-wing views that are pro Remain, albeit a minority. All the more left-wing papers that have declared are pro-Remain.
  • In fact even within papers owned by the same organisation / person, it can be that some back Leave and some Remain.

    The big shocker to me here was the Mail on Sunday backing Remain. One of the big scare campaigns from Remain boils down to “dreadful immigrants will come and eat your children if you don’t vote Leave”. The Mail on Sunday famously loves this sort of stuff – a 5-second Google found “Free hotels for the Calais stowaways in soft touch Britain” as a prime example of what they publish.

    Now, whether this is proprietors hedging their bets, or decisions made at an editor rather than proprietor level I do not know – but it’s not quite what I expected. You can see the same sort of division in the Murdoch papers too.
    Capture

The EU referendum: do voters understand what they’re voting on?

The UK’s EU referendum is now less than a week away. We’re each going to individually vote on something that could dramatically affect the future of our lives and even the structure of society the UK, so it’s a potentially important one. Recent sick tragedies have added to the mess that is the provably wrong claims from the Leave camp, and the responses that seem largely ineffective, and possibly not much less biased, from the Remain camp.

Clear-cut facts seem in short supply within the public consciousness; and yet surely one of the assumptions behind the validity of a direct-democracy referendum is that those who are enfranchised to participate in the decision have something akin to “perfect knowledge” about what they are to vote on. Or at least pretty-good knowledge, if we want to grant some leniency.

If one knows almost nothing, or, worse yet, holds false beliefs around the issues to be balloted on, then to choose the option most in-line to the priorities of the voter themself, irrespective of what they are, becomes a matter of chance, a dangerous reliance on instinct, or a matter of fallible heuristics. Logically, one might assume then that the voters with the most true knowledge about the relevant issues would be in a position to make “better” decisions.

So, do we, the British population have a decent knowledge of the key issues that apparently govern the EU battleground? Battles are being fought between camps on  economics, immigration, legislative power and democratic credentials.There have been various polls on this issue, trying to establish the knowledge of the electorate. Below I have chosen one from Ipsos Mori, who asked various subsections of the eligible voting population to give their views on several “EU facts”.

One of the subsections they divided on was the self-reported response as to whether the respondent was thinking to vote for Leave, Remain or was currently undecided. This opens up the possibility I wanted to investigate: is one side more well informed about the relevant facts than the other? One might – arguably – then risk a claim that this is the side that may be executing more effectively with regard to “data driven decision making”, being technically more “qualified” to participate in decisions relating to matters of this domain.

This is admittedly arguable for several reasons. Firstly, the precise choice of the questions may not be an accurate reflection of the points of highest relevance to this decision. Ipsos Mori could not ask about every possible EU fact, so there is a possible selection bias here. However, they did ask questions on most of the topics that each side specifically campaigns on, so it seems in line with what the campaigners think are the priorities driving people’s decisions.

Another question is, with the confusing contradictory mess of the claims being put out there, is it really safe to say there is a “correct” answer? For some potential questions, my view is no: establishing a true net economic value of the EU seems beyond us at the moment for instance. However, Ipsos Mori did at least work with an external “fact checking charity”, Full Fact, to try and establish a set of questions and respective answers that could be held as independently true.

Unfortunately, Ipsos seem to have decided to release the detailed results of the survey (which is nice) in a 500+ page PDF (which is not). So to get to the bottom of my question, it seemed appropriate to extract and visualise some of this data.

Let’s get to it!

Economics

Please tell me whether you think the following statement is true or false: The UK annually pays more into the EU’s budget than it gets back

Most of the respondents were correct to imagine that the UK pays more in to the EU budget than it receives back (directly back is what is implied, I believe, more on this later). 90% of Leave fans believed this, although only just over half of Remain and Undecideds chose the correct option. Both of those were more uncertain, although a quarter of Remain campaigners incorrectly though we received back more than we put in.

Correct answer according to Ipsos Mori: TRUE

Sheet 1

There is a widely-known argument that the financial benefits of being in the EU are nonetheless  net-positive due to things like the increase ease of business, investment and so on. The StrongerIn campaign writes:

And we get out more than we put in. Our annual contribution is equivalent to £340 for each household and yet the CBI says that all the trade, investment, jobs and lower prices that come from our economic partnership with Europe is worth £3000 per year to every household.

The whole financial aspect of the decision is one heavily campaigned on, very selectively, by the different sides to the point where they seem to contradict each other directly (not rare). It’s possible in the resulting confusion that some respondents may have included those non-direct factors, which could make the statement true.  It might have been helpful if the question had made it very clear that it was about direct transfers of money with no external factors.

Winner: Leave

What proportion of Child Benefit claims awarded in the UK do you think children living outside the UK in other countries in the European Economic Area (EEA)?

Organisations such as MigrationWatch, and the obvious media outlets that like to cause drama with such figures, have stated that the UK is paying a pile of expensive-sounding child benefits to children that live outside the UK, in the EU. It’s true that this, in accordance with the current law, is happening. But what proportion of child benefit is actually going abroad like that? Is it a worrying amount? (if one could set a mark as to when it would be worrying…).

Correct answer according to Ipsos Mori: 0.3%

Sheet 1

OK, we’re all way out here! Only 11% of both the Leave and Remain camps got this right. Leave were more likely to estimate stupendously high amounts. Almost half of Leave though it was at least 13%, which would overstate reality by 43x  (and 20% though it was nearly a third, a whole two orders of magnitude higher than real life.).

That’s not to say the Remainers were correct,  over half still over-estimated it to various degrees, and c. 10% understated it.

Winner: Remain.

In 2014, international investment into the UK was £1,034bn. To the best of your knowledge, what share of this total amount do you think comes from businesses based in the following countries or regions?

Part of the Remain case for remaining in the EU is the supposed positive effects it has on investment in the UK. A letter from a bunch of business folk publicised by the SrongerIn campaign says:

…almost three-quarters of foreign investors cite access to the EU’s single market as a key reason for their investment in Britain.

The Vote Leave campaign disagrees on the significance:

Trade, investment and jobs will benefit if we Vote Leave… Today the USA is a more important source of investment in the UK than the EU is.

This “share of investment” is not the only metric of significance to this discussion, but it is relevant. Does the UK get a lot of investment from EU, or is it mere pennies? How well do we know where investment comes from today?

Correct answer according to Ipsos Mori: The EU provided 48% of international investment into the UK in 2014.

Sheet 2

All groups very much underestimate the percentage of international investment that comes from the EU. The Leavers at the most extreme, with a median response of 28% vs the Remainers 35%. In both cases this seems largely down to wildly overestimating the amount of investment that comes from China.

Winner: Remain

 

To the best of your knowledge, what share of this budget do you think was spent on staff, administration and maintenance of buildings?

We accept that for the EU to exist, it has to have a budget, that has to be paid for by those within it (and arguably a few of those outside of it, but that’s a different story). But is it spent in way likely to make effective impact, or does the majority of it go on bureaucratic administration tasks and staff costs?

Correct answer according to Ipsos Mori: 6%

Sheet 2

Ha, way out, on both sides. Leave people are most inaccurate, thinking that the proportion of EU budget going on admin, staff and buildings as actually 5x larger than it is. Remain aren’t that much better though, estimating it to be over 3x reality.

Winner: Remain

Please identify the top 3 contributors to the EU budget in 2014

Respondents were then given a list of 10 countries and asked to identify which were the top 3, in descending order, in terms of contribution to the EU budget – i.e. what was the direct financial cost of the EU to them.

Correct answer according to Ipsos Mori: 

  1. Germany
  2. France
  3. Italy

Top:

Sheet 4

Second most:

Sheet 4

Third most:

Sheet 4

Well, both Leave and Remain did better than half marks when stating which country made the highest contribution to the EU budget – Germany. But identifying #2 and #3 were trickier; no subpopulation got even half marks on identifying the correct answer.

Of course, probably the most relevant datapoint driving people’s voting decisions is where voters think the UK sits in the ranking of budget pay-ins. It isn’t actually in the top 3 (it is in fourth place, after Italy). However most respondents clearly thought it did feature in the top 3 contributors. Leave were particularly bad for this, with nearly a third thinking it was the single top contributor, and around 90% convinced it was in the top 3. The figures for Remain don’t show a huge pile of knowledge though – 17% and 80% respectively.

Winner: Remain

Please identity the three which received the most from the EU in 2014.

The above covers putting money into the budget, but part of what the EU does is give money back to countries directly, for example to support farming or development of the more deprived areas of a country. So, same 10 countries, can we identify those the top three in terms of receiving money directly from the EU budget?

Correct answer according to Ipsos Mori: 

  1. Poland
  2. France
  3. Spain

Top:Sheet 4

Second most:Sheet 4

Third most:Sheet 4

Hmm, we’re even worse at understanding the money flowing back into EU countries! Around half of all populations got that Poland receives the most, OK. But after that the uncertainty was huge, with no more than 1 in 5 people of any sub-population coalescing around any answer, right or not.

Focusing on the UK, about 9% of Leavers thought it was somewhere in the top 3 recipients, whereas the Remainers were much more wrong about this, with 22% claiming the UK was in that list.

Winner: Leave

Democracy

Please tell me whether you think the following statement is true or false: The members of the European Parliament (MEPs) are directly elected by the citizens of each member state they represent

MEPs are our representatives in Europe, and yes, they are elected by us. The last election was in 2014, although with a pathetic turnout of 34% it does sound like the majority of Britain didn’t notice. But do we at least know these people for whom we should have voted for 2 years ago exist?

Correct answer according to Ipsos Mori: Yes

Sheet 1

Umm…not really. At least a little over half of Leave and nearly two thirds of Remain knew that they elect MEPs, but that still leaves a highly significant number of people who either are convinced that MEPs are unelected, or don’t know. The next chance to elect UK MEPs is likely to be in 2019, so let’s hope we can spread the word before then.

Winner: Remain (but not by a lot)

Laws and regulations

Which of the following, if any, are laws or restrictions that are in place, due to be put in place, or are suggested by the EU for implementation in the UK?

Ah, the EU laws craziness! Did you know, Europe bans us from <<insert anything fun>> and makes us do <<insert anything miserable>>? Well, in honesty, it does have some influence on what will later be entered into British law.

Below are a list of a few fun potential legislative bits and pieces. Which ones are actually true and somehow related to the EU? As there are quite a few of them, the answers according to Ipsos Mori are inline.

Sheet 3

Actually, we all did better than I expected. Only 8% of Leavers thought we’d have to rename our sausages as “emulsified high fat offal tubes”, which funnily enough the EU hasn’t made us do. Maybe we should anyway. Sausages aren’t that great for you.

Perhaps more interestingly, most of us don’t realise the restrictions that the EU has influenced us towards – although the list is perhaps “summarising” it a bit. The classic “Bendy Bananas Ban” has been categorised here as true (which only 35% of Leavers thought was the case, vs an even worse 15% of Remainers).

You’ll no doubt will be amazed to hear that the law doesn’t actually read “you can’t have bananas that are too bendy”. But it does actually come from somewhere in terms of real legislation. To be exact (brace yourself for excitement): the COMMISSION IMPLEMENTING REGULATION (EU) No 1333/2011.

It states that:

…subject to the special provisions for each class and the tolerances allowed, the bananas must be…free from malformation or abnormal curvature of the fingers…

But that’s only really for “top class” bananas. Go for a class 2 and you can expect that:

The following defects of the fingers are allowed, provided the bananas retain their essential characteristics as regards quality, keeping quality and presentation:
— defects of shape,
— skin defects due to scraping, rubbing or other causes, provided that the total area affected does not cover more than 4 cm-sq of the surface of the finger.

So if you get an abnormally curved top class banana, the EU has let you down. However one measures that.

Winner: 5:3 to Remain (although there’s an obvious pattern that drives this results: Leave are always more likely to think any law is made by EU, whereas Remain don’t think any law is – so Remain are lucky there are more false statements than true statements really).

To the best of your knowledge, which of these laws or taxes in force in the UK are as a result of EU regulations?

And now for current laws. Again,  the answers according to Ipsos Mori are inline.

Sheet 3

Hmm…we’re less good at knowing the truth of this one, whether Leave or Remain. In fact in some instances the results are strikingly similar between groups. Around 60% of both Leave and Remain know that EU regulations surround the cap in working hours (although actually there are exemptions for certain types of jobs). But only 23% of each side understand that 2 year guarantees are a result of such regulations.

Believing that the national living wage is a result of EU regulations is similarly thought true by 19% of both populations, even though it’s false. All in all, the differences between Leave and Remain are probably lower than the general level of ignorance on this topic.

Nonetheless…

Winner: 5:3 to Leave, by my count, although some questions are super-close.

Which of the following, if any, do you think are areas where only the EU has power to pass rules, and not individual EU countries?

We’ve covered which existing or proposed regulations and laws are influenced by the EU above – but what about topics where on the whole only the EU has the power to legislate?

Answers according to Ipsos Mori are inline.

Sheet 3

Hey, we’re not too shoddy on this one compared to some of the other questions. Both Leave and Remain beat 50% on knowing which domains were EU-regulated, and likewise both sides did even better at knowing which ones were done so domestically.

Again, some of the differences in responses between group were pretty small. The most notable ones were perhaps that Leave were 8 percentage points better at knowing that the EU has the power to rule on fishing industry controls, whereas the Remainers were better at knowing that the EU does not control laws around sentences for crimes committed by non-British nationals by 7 pp.

Winner: 4:2 to Remain (again, differences between groups are very small in some cases).

Immigration

Another hot topic in the debate; an argument that, at the most despicable end of the Leave campaigners boils down to “if we remain in the EU then the UK will be overrun with nasty foreigners who just don’t deserve all the good things we have”. Of course many Leavers are far less obnoxious in their views, and may have more benign concerns around resourcing and space. The Remainers, depending on their views, might pursue the argument that immigration is net benefit to the UK (or at least not net detriment), the more ethical option, or that leaving the EU is not likely to make so much difference anyway.

But are we deriving  our viewpoints with accurate knowledge as to the incidence of migration into the UK?

Out of every 100 residents in the UK, about how many do you think were born in an EU member state other than the UK?

Correct answer according to Ipsos Mori: 5%

Sheet 2

Looks like the median respondent is way out again: all subpopulations over-estimate the percentage dramatically. Leave produces the most out-there answer, thinking one in five UK residents where born in an EU country other than the UK; 400% of the real value.

Remain do better, but still come up with a median answer that is double that of reality.

Winner: Remain

Summary

Horray, we’re done. Did we learn anything? Well, when totting up the scores the final Leave vs Remain results by my slightly rough scoring method above are:

  • Leave: 3
  • Remain: 8

Overall winner: Remain

So, can we go so far as to say that well-informed voters are more likely to make the choice to Remain?  And hence, if we assume a perfect electorate should have perfect knowledge, then is Remain the correct way to swing?

Well, it is surely the case that the Remain voters were more likely to be a bit more accurate as a population in most of the questions above by my measure, but that conclusion is still rather a strong one to draw.

What really shows through here is the general level of ignorance in all populations; whilst it would be nice to say that 100% of Remainers got things right and 100% of Leavers got things wrong and hence Remain is the only decision we could say was based on evidence, the reality was far more mixed. There were plenty of questions where the majority of both groups got it wrong.

This is quite concerning if one has an ambition that the results of a referendum are predicated on voters basing their choice on some semblance of the reality of the present or potential future. In fact, I’ve had Brennan’s book on ‘The Ethics of Voting‘ on my to-read list for a while, and I’m now a little scared to read it in case it makes me decide that Churchill was actually wrong to imagine that democracy was even the least worst form of Government! Perhaps we are simply not yet in a place where it makes sense to hold a referendum on this topic, although there is certainly no stopping it now.

It’s also apparent that there are voters on each side that hold their opinions “despite” what they think they know about certain domains. That is to say: we can infer that nearly 1 in 5 of the Remain voters are committed to remain, despite the fact that they (incorrectly) think the UK pays the highest amount of the EU budget, and/or (incorrectly) think that the proportion of EU born people living in the UK is actually twice as high as it really is. Although the data is not available in a granular enough fashion to perform a per-respondent analysis on it to see if these two subsections of people consist of the same individuals, this does suggest that there are reasons not elucidated in any one of these questions regarding why one might choose to vote stay or go, and hence the conclusion is incomplete.

That said, for those of us currently desiring a Remain verdict, it seems that it would do no harm to try and spread some of the more validated “truths” to the nay-sayers. Given the mess that both sides have created whilst campaigning, it may be debatable how effective that can be amongst the noise; but, if we want to believe in the validity of referendum politics, then we must try to believe that true knowledge has some impact on one’s voting choices.

However, there are yet further psychological forces to counteract even the most ardent advocate of facts driving decisions: given research suggests that we tend to disregard anyone whose opinion disagrees with us, and that we  often make up reasons to explain our behaviour after we’ve executed it (Kahneman writes excellently on this), the war for votes requires something more than simply winning the battle to expound the truth.