Workaround for the Tableau “custom SQL” restriction on SQL Server variables

SQL Server (among other databases) has a handy feature for easy re-use of queries, in the guise of variables. You can declare variables and use them repeatedly in any query in the current session. That’s pretty handy for any complicated query forms you use repeatedly, where each time you might need to change some basic criteria.

As a trivial example: here’s how to select all the customers I have that live in postcode 12345

DECLARE @postcode varchar(5)
SET @postcode = '12345'

SELECT POSTAL_CODE, CUSTOMER_NO FROM [MY_CUSTOMERS] WHERE POSTAL_CODE = @postcode

I can then run it another day for postcode 23456 by just changing the second line to

SET @postcode = '23456'

This example is a bit silly because you might as well just type 12345 or 23456 into the query itself – but imagine a more complex SQL statement, or set of coordinated SQL statements, with screenfuls of code, and you might see how it makes it much easier to maintain.

But, sad to say, you can’t use these sort of variables as part of the custom SQL feature in Tableau. If you try, then you’ll get this slightly unhelpful message:

[Microsoft][SQL Server Native Client 11.0][SQL Server]Incorrect syntax near the keyword 'DECLARE'.
[Microsoft][SQL Server Native Client 11.0][SQL Server]Incorrect syntax near ')'.

But rather than have to hardcode your queries and modify them for every extract you can use a very simple alternative: our friend, the Tableau parameter.

For each SQL variable, create a Tableau parameter of the same data type. You can put in a default value as the “Current value” when setting it up.

capture

Then, when connecting to your database, paste in the original SQL statement you had into the Custom SQL box, but removing the variable declaration and setting from the top.
SELECT POSTAL_CODE, CUSTOMER_NO FROM [MY_CUSTOMERS] WHERE POSTAL_CODE = @postcode

capture

One at a time, highlight each SQL variable there, i.e. the @postcode part of the above, press the “Insert Parameter” button under the text field and select the appropriate Tableau parameter.

(There is actually the option to create a new parameter at this stage, if you didn’t do so in advance).

You’ll end up with a similar looking SQL statement, but with the variables now referred to in this sort of way:

SELECT POSTAL_CODE, CUSTOMER_NO FROM [MY_CUSTOMERS]
WHERE POSTAL_CODE = <Parameters.Postcode>

And it’ll work!

You can then use the standard Tableau functionality to change the parameters in Tableau, and behind the scenes your SQL statement will update appropriately.

Note firstly, for performance, that each change of this sort of parameter is going to cause the query to be re-run on the database (unless it’s already been cached).

Secondly, if you have created a Tableau Data Extract from this datasource, then altering the parameter won’t change anything until you refresh the extract.

Finally, the Tableau parameter is simply a normal Tableau parameter, just like anything else. So there’s nothing to stop a Tableau user altering it, which depending on your setup might be either exactly what you want, or exactly what you don’t want. If the latter, you might consider having the datasource involving parameters in a separate workbook to the user-facing dashboard (publishing to Tableau Server to share), or at least making sure it’s not visible on the end-user dashboards if they have no reason to change it.

Quick sidenote: I wondered if the “Initial SQL” feature might work, to declare variables upon connection to the datasource which could then be used in custom SQL later. But sadly not, I guess it doesn’t count as the same session. If you try, you’ll see a message like:
[Microsoft][SQL Server Native Client 11.0][SQL Server]Must declare the scalar variable "@postcode"

Advertisements

Free up-to-date UK postcode latitude longitude data

Unless your data comes pre geo-encoded, if you’re trying to do some UK-based geospatial analysis you’ll probably need some easy way of translating addresses into latitude/longitude pairs or some similar co-ordinate system.

Whilst full-address geocoders are available, if you don’t actually need that level of precision then looking up a full postcode is often good enough and far faster (note to US readers: UK postcodes are way more precise than US zipcode – BPH Postcodes quotes an average of about 15 properties per postcode)

A few years ago this data could be a little challenging to obtain –  for free anyway. But now there are various sources offering that up sans charge in various formats and levels of up-to-dateness; some more useful than others

My current favourite is probably Doogal. There, Chris Bell has provided all sorts of address-based tools there. Need to generate a random postcode or calculate the elevation of any route? He has it all.

Most interesting to me are the big CSVs of the UK’s postcodes, past and present, obtainable by pressing “download” at the top right of this page.

It’s full postcode to long/lat mapping and includes the below columns too, which allow for some very useful groupings or lookups.

  • Postcode
  • In Use?
  • Latitude
  • Longitude
  • Easting
  • Northing
  • GridRef
  • County
  • District
  • Ward
  • DistrictCode
  • WardCode
  • Country
  • CountyCode
  • Constituency
  • Introduced
  • Terminated
  • Parish
  • NationalPark
  • Population
  • Households
  • Built up area
  • Built up sub-division
  • Lower layer super output area
  • Rural/urban
  • Region
  • Altitude

From his blog and various appreciative comments, it sounds like it’s kept up to date nice and regularly, which is truly a service to mankind.

If you have any problems with his file, then some other alternatives would include the free version of Codepoint Open (although it seems to show Eastings/Northings rather than long/lat – although it is possible to convert between the two), or a UK-Postcode API.

All this is overkill if you just have  <100 such lookups to make – you can find many web-based batch converters that’ll do the job for you very quickly, and often to address-level accuracy, if you don’t have too many to do. Doogal’s is here.

A few notes on Tableau 10’s cross-database join feature

The new version of Tableau, number 10, got released whilst I was busy sunning myself on holiday. This mean there was at least something to look forward to in returning to the world of work 🙂

It has many, many new features. I suspect one of the most impactful, in this era of data lying around everywhere you look, may end up being cross-database joins.

This means that Tableau can now do “proper” joins (i.e. not blends) that span different databases, even different database technologies. So for instance, you can execute an inner join from one table from SQL Server onto data from another table in Redshift, and analyse the result as though it all came from the same place. Better yet, you can publish the resulting datasource as a Tableau Server datasource so everyone can join in the fun, without even having to care where it came from in the first place.

Below a few notes / warnings from my early experimentations. Some may sound critical, but, overall, this is truly a great feature that will enable the less ETL-loving folks to get more done quicker, and/or without the resources of tech database specialists or sofwtare being needed.

  1. Excel counts as a database (as does any other supported file format). This means you can simulate database joins between multiple Excel files for instance without having to modify them in any way.
  2. You can use a custom SQL statement as the “table” you’re joining to or from in any join: i.e. you’re not restricted to just the formal tables on the database. This means that if you’re an SQL wizard who would prefer to have the database preprocess stuff before it hits Tableau, you can do that in the context of a cross-database join.
  3. Warning! I believe (from experience 🙂 ) Tableau downloads all the data from the tables you choose to your local computer before performing the join. This is kind of predictable if you think it through, but not totally. If you know what an Alteryx “in-database tool” is, then this is not like one of those.

    The joining is on your computer, even if you don’t need to care about that. This means that if you’re inner joining a table of 10 rows in one database to 10 out of a possible 1 billion rows in another database to create an extract, Tableau will be downloading 1 billion and 10 rows to your computer before later returning you the post-join 10 rows as your datasource. You can apply data source filters to your tables before extracting though which might help alleviate some of the potential pain here.

  4. You can do left, inner and right joins very easily, joining on any number of fields as the key. But I didn’t see a facility for conditional joins, in the sense of saying “download all sales from Redshift for a customer whose info is in Oracle where the sales are between the start and end date of the customer’s first 6 weeks according to Oralce”. You would have to let it download and show you all of the joined data and then use formula in Tableau afterwards to filter out what you don’t want.
  5. Some data connection types aren’t allowed to be involved in cross database joins at the moment. A Tableau blog post lists these as:
    • Tableau Server
    • Firebird
    • Google Analytics
    • Microsoft Analysis Services
    • Microsoft PowerPivot
    • Odata
    • Oracle Essbase
    • Salesforce
    • SAP BW
    • Splunk
    • Teradata OLAP Connector

      The big shame for me on that list is Tableau Server. Tableau Server datasources are a very cool way for non-DBAs to share analytically-useful data around the organisation (or DBA professionals to provide a lovely structured, governed, datasource ideal for Tableau users to use). I have however found that they can result in semi-silos, as exporting or manipulating them has certain limitations vs standard databases. If one could cross-database join these types of datasource one day it would be an amazing way of adding extra flexibility to that feature

  6. Performance may be super great, or it may be super slow, depending on your exact database setup.

But all those compliments and complaints aside, debt is very much due to the makers of Tableau 10 for providing one fewer reason for analysts to load up other software, or pester other departments, before getting started on analysing all your lovely data.

Help decide who self-driving cars should kill

Automated self-driving cars are surely on their way. Given the direction of technological development, this seems a safe enough prediction to make – at least when taking the coward’s option of not specifying a time frame.

A self-driving car is, after all, a data processor, and we like to think that we’re getting better at dealing with data every day. Simplistically, in such a car sensors provide some data (e.g. “there is a pedestrian in front of the car”), some automated decision-making module comes up with an intervention (“best stop the car”), and a process is carried out to enact that decision (“put the brakes on”).

Here for example is a visualisation of what a test Google automated car “sees”.

Capture.PNG

My hope and expectation is that, when they have reached a sophisticated enough level of operation and are at a certain threshold of prevalence, road travel will become safer.

Today’s road travel is not super-safe. According to the Association for Safe International Road Travel, around 1.3 million people die in road crashes each year – and 20-50 million more are injured or disabled. It’s the single leading cause of death amongst some younger demographics.

Perhaps automated vehicles could save some of these lives, and prevent many of the serious injuries. After all, a few years ago, The Royal Society for the Prevention of Accidents claimed that 95% of road accidents involve some human error, and 76% were solely due to human factors. There is a lot at stake here. And of course there are many more positive impacts (as well as some potential negatives) one might expect from this sort of automation beyond direct life-saving, which we’ll not go into here.

At this moment in time, humanity is getting closer to developing self-driving cars; perhaps surprisingly close to anyone who does not follow the topic. Certainly we do not have any totally automated car capable of (or authorised to be) driving every road safely at the moment, and that will probably remain true for a while yet. But, piece by piece, some manufacturers are automating at least some of the traditionally human aspects of driving, and several undoubtedly have their sights on full automation one day.

Some examples:

Landrover are shortly to be testing semi-autonomous cars that can communicate with other such cars around them.

The test fleet will be able to recognise cones and barriers using a forward facing 3D-scanning camera; brake automatically when it senses a potential collision in a traffic jam; talk to each other via radio signals and warn of upcoming hazards; and know when an ambulance, police car, or fire engine is approaching.

BMW already sells a suite of “driver assistance” features on some cars, including what they term intelligent parking, intelligent driving and intelligent vision. For people with my driving skill level (I’m not one of the statistically improbable 80% of people who think they are above average drivers), clearly the parking assistant is the most exciting: it both finds a space that your car would actually fit into, and then does the tricky parallel or perpendicular parking steering for you. Here it is in action:

Nissan are developing a “ProPilot” featuring, which also aims to help you drive safely, change lanes automatically, navigate crossroads and park.

Tesla are have probably the most famous “autopilot” system available right now. This includes features that will automatically keep your car in lane at a sensible speed, change lanes safely for you, alert the driver to unexpected dangers and park the car neatly for you. This is likely most of what you need for full automation for some simpler trips, although they are clear its a beta feature and that it is important you keep your hands on the steering wheel and remain observant when using it. Presumably preempting our inbuilt tendency towards laziness, it even goes so far as to sense when you haven’t touched the wheel for a while and tells you to concentrate; eventually coming to a stop if it can’t tell you’re still alive and engaged.

Here’s a couple of people totally disobeying the instructions, and hence nicely displaying its features.

And here’s how to auto-park a Tesla:

 

Uber seems particularly confident (when do they not?). Earlier this month, the Guardian reported that:

Uber passengers in Pittsburgh will be able to hail self-driving cars for the first time within the next few weeks as the taxi firm tests its future vision of transportation in the city. The company said on Thursday that an unspecified number of autonomous Ford Fusions will be available to pick up passengers as with normal Uber vehicles. The cars won’t exactly be driverless – they will have human drivers as backup – but they are the next step towards a fully automated fleet.

uber.jpg

 

And of course Google have been developing a fully self-driving car for a few years now. Here’s a cheesy PR video to show their fun little pods in action.

But no matter how advanced these vehicles get, road accidents will inevitably happen.

In recent times there has been a fatality famously associated with the Tesla autopilot – although as Tesla are obviously at pains to point out, one should remember that it is technically a product in beta and they are clear that you should always concentrate on the road and be ready to take over manually; so this accident might, at best, be attributed to a mix of the autopilot and the human in reality.

However, there will always be some set of circumstances or seemingly unlikely event that neither human or computer would be able to handle without someone getting injured or killed. Computers can’t beat physics, and if another car is heading up your one-way road, which happens to have a brick wall on one side and a high cliff on the other side, at 100 mph then some sort of bad incident is going to happen. The new question we have to ask ourselves in the era of automation is: exactly what incident should that be?

This obviously isn’t actually a new question. In the uncountable number of human-driven road incidents requiring some degree of driver intervention to avoid danger that happen each day, a human is deciding what to do. We just don’t codify it so formally. We don’t sit around planning it out in advance.

In the contrived scenario I described above, where you’re between a wall and a cliff with an oncoming car you can’t get around, perhaps you instinctively know what you’d do. Or perhaps you don’t – but if you are unfortunate enough to have it happen to you, you’ll likely do something. This may or may not the same action as you’d rationally pick beforehand, given the scenario. We rely on a mixture of human instinct, driver training and reflexes to handle these situations, implicitly accepting that the price of over a million deaths a year is worth paying to be able to undergo road travel.

So imagine you’re the programmer of the automated car. Perhaps you believe you might eliminate just half of those deaths if you do your job correctly; which would of course be an awesome achievement. But the car still needs to know what to do if it finds itself between a rock and a hard place. How should it decide? In reality, this is obviously complicated far further insomuch as there are a near-infinite number of scenarios in reality and no-one can explicitly program for each one (hence the need for data-sciencey techniques to learn from experience rather than simple “if X then Y” code). But, simplistically, what “morals” should your car be programmed with when it comes to potentially deadly accidents?

  • Should it always try and save the driver? (akin to a human driver’s instinct for self-preservation, if that’s what you believe we have.)
  • Or should it concentrate on saving any passengers in the same car as the driver?
  • How about the other car driver involved?
  • Or any nearby, unrelated, pedestrians?
  • Or the cute puppy innocently strolling along this wall-cliff precipice?
  • Does it make a difference if the car is explicitly taking an option (“steer left and ram into the car on the opposite side of the road”) vs passively continuing to do what it is doing (“do nothing which will result in you hitting the pedestrian standing in front of the wall”).
    • You might think this isn’t a rational factor, but anyone who has studied the famous “trolley problem” thought experiment will realise people can be quite squeamish about this. In fact, this whole debate boils down to some extent as being a realisation of that very thought experiment.
  • Does it make a difference how many people are involved? Hitting a group of 4 pedestrians vs a car that has 1 occupant? Or vice versa?
  • What about interactions with probabilities? Often you can’t be 100% sure that an accident will result in a death. What if the choice is between a 90% chance of killing 1 person or a 45% chance of killing two people?
  • Does it make a difference what the people are doing? Perhaps the driver is ignoring the speed limit, or pedestrians are jaywalking somewhere they shouldn’t. Does that change anything?
  • Does it even perhaps make a difference as to who the people involved are? Are some people more important to save than others?

Well, the MIT Media Lab is now giving you the opportunity to feed into those sorts of decisions, via its Moral Machine website.

To quote:

From self-driving cars on public roads to self-piloting reusable rockets landing on self-sailing ships, machine intelligence is supporting or entirely taking over ever more complex human activities at an ever increasing pace. The greater autonomy given machine intelligence in these roles can result in situations where they have to make autonomous choices involving human life and limb. This calls for not just a clearer understanding of how humans make such choices, but also a clearer understanding of how humans perceive machine intelligence making such choices.

Effectively, they are crowd-sourcing life-and-death ethics. This is not to say that any car manufacturer will necessarily take the results into account, but at least they may learn what the responding humans (which we must note is far from a random sample of humanity) think they should do, and the level of certainty we feel about it.

Once you arrive, you’ll be presented with several scenarios, and asked what you think the car should do in that scenario. There will always be some death involved (although not always human death!). It’ll also give you a textual description of who and what is happening. It’s then up to you to pick out of the two options given which the car should do.

Here’s an example:

car_ethics.PNG

You see there that a child is crossing the road, although the walk signal is on red, so they should really have waited. The car can choose to hit the child who will then die, or it can choose to ram itself into an inconvenient obstacle whereby the child will live, but the driver will die. What should it do?

You get the picture; click through a bunch on those and not only does MIT gather a sense of humanity’s moral data on these issues, but you get to compare yourself to other respondents on axes such as “saving more lives”, “upholding the law” and so on. You’ll also find out if you have implied gender, age or “social value” preferences in who you choose to kill with your decisions.

This comparison report isn’t going to be overly scientific on an individual level (you only have a few scenarios to choose from apart from anything else) but it may be thought-provoking.

After all, networked cars of the future may well be able to consult the internet and use facts it finds there to aid decisions. A simple extension of Facebook’s ability to face-recognise you in your friends’ photos could theoretically lead to input variables in these decisions like “Hey, this guy only has 5 twitter friends, he’ll be less missed than this other one who has 5000!” or  “Hey, this lady has a particularly high Klout score (remember those?) so we should definitely save her!”.

You don’t think we’d be so callous as to allow the production of a score regarding “who should live?”. Well, firstly, we have to. Having the car kill someone by not changing its direction or speed, when the option is there that it could do so, is still a life-and-death decision, even if it results in no new action.

Plus we already do use scores in domains that infer mortality. Perhaps stretching the comparison to its limits, here’s one example (and please do not take it that I necessarily approve or disapprove of its use, that’s a story for another day – it’s just the first one that leaps to mind).

The National Institute for Health and Care Excellence (NICE) provides guidance to the UK National Health Service on how to improve healthcare. The NHS, nationalised as it is (for the moment…beware our Government’s slow massacre of it though), still exists within the framework of capitalism and is held to account on sticking to a budget. It has to buy medicines from private companies and it can only afford so many. This implies that not everyone can have every treatment on the market. So how does it decide what treatments should be offered to who?

Under this framework, we can’t simply go on “give whatever is most likely to save this person’s life” because some of the best treatments may cost so much that giving it to 10 people, of which 90% will probably be cured, might mean that another 100 people who could have been treated at an 80% success rate will die, because there was no money left for the cheaper treatment.

So how does it work? Well, to over-simplify, they have famously used a data-driven process involving a Quality-adjusted life year (QALYS) metric.

A measure of the state of health of a person or group in which the benefits, in terms of length of life, are adjusted to reflect the quality of life. One QALY is equal to 1 year of life in perfect health.

QALYs are calculated by estimating the years of life remaining for a patient following a particular treatment or intervention and weighting each year with a quality-of-life score (on a 0 to 1 scale). It is often measured in terms of the person’s ability to carry out the activities of daily life, and freedom from pain and mental disturbance.

At least until a few years ago, they had guidelines that an intervention that cost the NHS less that £20k per QALY gained was deemed cost effective. It’s vital to note that this “cost effectiveness” was not the only factor that feeds into whether the treatment should be offered or not, but it was one such factor.

This seemingly quite emotionless method of measurement sits ill with many people: how can you value life in money? Isn’t there a risk that it penalises older people? How do you evaluate “quality”? There are many potential debates, both philosophical and practical.

But if this measure isn’t to be used, then how should we decide how to divide up a limited number of resources when there’s not enough for everyone, and those who don’t get them may suffer, even die?

Likewise, if an automated car cannot keep everyone safe, just as a human-driven car has never been able to, then on which measure involving which data should we base the decision as to who to save on?

But even if we can settle on a consensus answer to that, and technology magically improves to the point where implementing it reliably is childsplay, actually getting these vehicles onto the road en masse is not likely to be simple. Yes, time to blame humans again.

Studies have already looked at the sort of questions that the Moral Machine website poses you. “The Social Dilemma of Autonomous Vehicles” by Bonnefan et al is a paper, published in the journal Science, in which the researchers ran their own surveys as to what people thought these cars should be programmed to do in terms of the balance between specifically protecting the driver vs minimising the total number of causalities, which may include other drivers, pedestrians, and so on.

In general respondents fitted what the researchers termed a utilitarian mindset: minimise the number of casualties overall, no need to try and save the driver at all costs.

In Study 1 (n = 182), 76% of participants thought that it would be more moral for AVs to sacrifice one passenger, rather than kill ten pedestrians (with a 95% confidence interval of 69—82). These same participants were later asked to rate which was the most moral way to program AVs, on a scale from 0 (protect the passenger at all costs) to 100 (minimize the number of casualties). They overwhelmingly expressed a moral preference for utilitarian AVs programmed to minimize the number of casualties (median = 85, Fig. 2a).

(This is also reflected in the results of the Moral Machine website at the time of writing.)

Horray for the driving public; selfless to the last, every life matters, etc. etc. Or does it?

Well, later on, the survey tackled questions around, not only what should these vehicles do in emergencies, but how comfortable would they personally be if vehicles did behave that way, and lastly, how likely would they be to buy one that exhibited that behaviour?

Of course, even in thought experiments, bad things seem worse if they’re likely to happen to you or those you love.

even though participants still agreed that utilitarian AVs were the most moral, they preferred the selfprotective model for themselves.

Once more, it appears that people praise utilitarian, self-sacrificing AVs, and welcome them on the road, without actually wanting to buy one for themselves.

Humans, at least in that study, appear have a fairly high consensus that minimising causalities is key in these decisions. But we also have a predictable tendency to be the sort of freeloaders that prefer for everybody else to follow a net-safety-promoting policy, as long as we don’t have to ourselves. This would seem to be a problem that it’s unlikely even the highest quality data or most advanced algorithm will solve for us at present.