Transactions by active subscribers formulae in Tableau

This blog returns back from the dead (dormant?) with a quick note-to-self on how to do something that sounds simple but proved slightly complicated in practice, using Tableau.

Here’s a scenario, although many others would fit the same pattern. Imagine you have a business that is subscription based, where people can subscribe and cancel whenever they wish. Whilst subscribed, your customer can buy products from you anytime they want.  They can’t buy products if not subscribed.

What you want to know is, for a given cohort of subscribers, how many of those people who are still subscribed purchased a product within their first, second…nth month?

So we want to be able to say things like: “out of the X people that started a subscription last year, Y% of those who were still subscribed for at least Z months bought a product in their Zth month”.

It’s the “who were still subscribed” part that made this a little tricky, at least with the datasource I was dealing with.

Here’s a trivially small example of what I had – a file that has 1 row per sale per customer.

Subscriber ID Length of subscription Month of subscription Transaction type
1 5 1 Sale
1 5 2 Sale
1 5 3 Sale
2 7 1 Sale
2 7 6 Sale
3 1 1 Sale
4 8 1 Sale
4 8 2 Sale
4 8 4 Sale
4 8 5 Sale
5 1 Sale
5 2 Sale
5 3 Sale
5 8 Sale
5 9 Sale

For simplicity, let’s assume every customer has at least one sale. The columns tell you:

  • the ID number of the subscriber
  • the length of the subscription from start to finish, in months. If the length is blank then it means it’s still active today so we don’t know how long it will end up being.
  • the month number of the product sale
  • a transaction type, which for our purposes is always “sale”

Example interpretation: subscriber ID 1 had a subscription that lasted 5 months. They purchased a product in month 1, 2 and 3 (but not 4 or 5).

It’d be easy to know that you had 5 people at the start (count distinct subscriber ID), and that you had 2 transactions in month 3 (count distinct subscriber ID where month of subscription = 3). But how many of those 5 people were still subscribed at that point?

Because this example is so small, you can easily do that by eye. You can see in the data table that we had one subscription, ID 3, that only had a subscription length of 1 month. Everyone else stayed longer than 3 months – so there were 4 subscriptions left at month 3.

The figure we want to know is what proportion of the active subscribers at month 3 bought a product. The correct answer is the number of subscriptions making a product purchase at month 3 divided by the number of subscriptions still active at month 3. Here, that’ s 2 / 4 = 50%.

So how do we get that in Tableau, with millions of rows of data? As you might guess, one method involves the slightly-dreaded “table calculations“. Layout is usually important with table calculations. Here’s one way that works. We’ll build it up step by step, although you can of course combine many of these steps into one big fancy formula if you insist.

Firstly, I modified the data source (by unioning) so that when a subscription was cancelled it generated a “cancelled subscription” transaction.  That looked something like this after it was done.

Subscriber ID Length of subscription Month of subscription Transaction type
1 5 1 Sale
1 5 2 Sale
1 5 3 Sale
1 5 5 Cancelled subscription
2 7 1 Sale
2 7 6 Sale
2 7 7 Cancelled subscription
3 1 1 Sale
3 1 1 Cancelled subscription
4 8 1 Sale
4 8 2 Sale
4 8 4 Sale
4 8 5 Sale
4 8 8 Cancelled subscription
5 1 Sale
5 2 Sale
5 3 Sale
5 8 Sale
5 9 Sale

Note there’s the original sales transactions and now a new “cancel” row for every subscription that was cancelled. In these transactions the “month of subscription” is set to the actual month the subscription was cancelled, which we know from the field “Length of subscription”

Here are the formulae we’ll need to work out, for any given month, how many people were still active, and how many of those bought something:

  • The total count of subscribers in the cohort:
    Count of distinct subscribers in cohort

    { FIXED : [Count of distinct subscribers]}
  • The number of subscribers who cancelled in the given month:
    Count of distinct subscribers cancelling

     COUNTD(
     IF [Transaction type] = "Cancelled subscription"
     THEN [Subscriber ID]
     ELSE NULL
     END
     )
  • Derived from those two figures, the number of subscribers who are still active at the given month:
    Count of subscribers still active

    AVG([Count of distinct subscribers in cohort]) - (RUNNING_SUM([Count of distinct subscribers cancelling])- [Count of distinct subscribers cancelling])
  • The number of subscribers who made a purchase in the given month:
    Count of distinct subscribers making purchase

     COUNTD(
     IF [Transaction type] = "Sale" THEN [Subscriber ID]
     ELSE NULL
     END
     )
  • Finally, derived from the last two results, the proportion of subscribers who made a purchase as a percentage of those who are still active
    Proportion of distinct active subscribers making purchase

    [Count of distinct subscribers making purchase] / [Count of subscribers still active]

Let’s check if it the logic worked, by building a simple text table. Lay months on rows, and the above formulae as columns.

Table proof

That seems to match expectations. We’re certainly seeing the 50% of actives making a purchase on month 3 that were manually calculated above.

Plot a line chart with month of subscription on columns and proportion of distinct active subscribers making purchase on rows, and there we have the classic rebased propensity to purchase curve.

Propensity curve

(although this data being very small and very fake makes the curve look very peculiar!)

Note that we first experimented with this back in ye olde days of Tableau, before the incredible Level Of Detail calculations were available. I have found many cases where it’s worth re-visiting past table calculation work and considering if LoD expressions would work better, and this may well be one of them.

 

 

Advertisements

Actually you can use variables, CTEs and other fancy SQL with Tableau after all

A few months ago, I blogged about how you can use Tableau parameters when connecting to many database datasources in order to exert the same sort of flexibility that SQL coders can build into their queries using SQL variables.

This was necessary because Tableau does not let you use SQL variables, common table expressions, temp table creation or other such fanciness when defining the data you wish to analyse, even via custom SQL. You can copy and paste a SQL query using those features that works fine in Microsoft’s SQL Server Management Studio or Oracle’s SQL Developer into Tableau’s custom SQL datasource box, and you’ll get nothing but errors.

But it turns out that there are actually ways to use these features in Tableau, as long as you only need to run them once per session – upon connection to the database.

Why do this? Well, if you have a well-designed data warehouse or other analytically-friendly datasource hopefully you’ll never need to. However, if you’re accessing “sub-optimally” designed tables for your task, or large raw unprocessed operational data tables, or other such scenarios that consistently form one of the bane of an analyst’s life, then you might find manipulating or summarising data before you use it in Tableau makes life much easier, much faster, or both.

To do this, we’ll use the “Initial SQL” feature of Tableau. This is supported on some, but not all, of the database types Tableau can connect to. You’ll know if your data connection does support it, because an option to use it will appear if you click on the relevant connection at the top left of the data source screen.

capture

If you click that option, you’ll get a text box where you can go crazy with your SQL in order to set up and define whatever final data table you’d like to connect to. Note that you’ll be restricted to the features of your database and the permissions your login gives you – so no standard table creation if you don’t have write-permissions etc. etc. But, to take the example of SQL Server, in my experience most “normal” users can write temporary tables.

Tableau isn’t great at helping you write this initial SQL – so if you’re not super-proficient and intend to write something complicated then you might want to play with it in a friendlier SQL tool first and paste it into Tableau when you know it works, or ask a friendly database-guru to do it for you.

Below then is an example of how one can use variables, CTEs and temporary tables in order to pre-build a data table you can then analyse in Tableau, by virtue of pasting code into the initial SQL box. This code will be re-run and hence refreshed every time you open the workbook. But as the name “initial SQL” suggests, it will not be refreshed every time you create or modify a new chart if you’ve not re-connected to the datasource inbetween.

(For what it’s worth, this uses the default demo database in SQL server – AdventureWorks. It’s the equivalent of Tableau’s famous Superstore dataset, in the Microsoft world 🙂 )

DECLARE @granularity AS varchar(50);
SET @granularity = 'yearly';

WITH OrdersCTE AS
(
SELECT OrderDate, SalesOrderID, TotalDue
FROM [SalesLT].[SalesOrderHeader]
)

SELECT
CASE WHEN @granularity = 'yearly' THEN CONVERT(varchar,YEAR(OrderDate))
WHEN @granularity = 'monthly' THEN CONVERT(varchar,YEAR(OrderDate)) + '-' + CONVERT(varchar,MONTH(OrderDate))
WHEN @granularity = 'daily' THEN CONVERT(date,OrderDate)
END AS Period,

COUNT(SalesOrderID) AS CountOfSales,
SUM(TotalDue) AS TotalDue
INTO #initialSQLDemo
FROM OrdersCTE

GROUP BY
CASE WHEN @granularity = 'yearly' THEN CONVERT(varchar,YEAR(OrderDate))
WHEN @granularity = 'monthly' THEN CONVERT(varchar,YEAR(OrderDate)) + '-' + CONVERT(varchar,MONTH(OrderDate))
WHEN @granularity = 'daily' THEN CONVERT(date,OrderDate)
END;

Now, this isn’t supposed to be an SQL tutorial, so I’ll not explain in detail what the above does. But for those of you already familiar with (T)SQL, you’ll note that I set a variable as to how granular I want my dates to be, use a CTE to build a cut-down version of a sales table with fewer columns, and then aggregate my sales table to the level of date I asked for in my variable into a temporary table I called #initialSQLDemo.

In the above, it’s set to summarise sales by year. This means Tableau will only ever receive 1 row per year. You’ll not be able to drill down into more granular dates – but if this is detailed enough for what you want, then this might provide far better performance than if you connected to a table with 1 row per minute and, implicitly or explicitly, ask Tableau to constantly aggregate it in expensive ways.

Later on, perhaps I realise I need daily data. In which case, I can just change the second line above to :

SET @granularity = 'daily';

…which will expose data at a daily level to Tableau, and I’m good to go.

SQL / Adventureworks aficionados will probably realise that my precise example is very contrived, but hopefully it’s clear how the initial SQL feature works, and hence in the real world you’ll be able to think of cases that are truly useful. Note though that if your code does something that is slow, then you will experience this slowness whenever you open your workbook.

A quick note on temporary tables:

If you’re following along, you might notice something. Although I created a temp table called #initialSQLDemo (which being temporary, will be deleted from the database as soon as you close your session), it never appears in the Table list on the left of the Tableau data source  screen.

capture

Why so? Well, in SQL Server, temporary tables are created in a database called “tempdb” that Tableau doesn’t seem to show. This is not a fatal deal though, as they’re still accessible via the “New Custom SQL” option shown above.

In my example then, I dragged New Customer SQL to the data pane and entered a simple “give me everything” into the resulting box.

capture

Now, there is a downside in that I understand using custom SQL can reduce performance in some cases, as Tableau is limited in how it can optimise queries. But a well-designed temporary table might in any case be intrinsically faster to use that an over-sized ill-designed permanent table.

When venturing back into Tableau’s main viz-design screen, you’ll see then your temporary table is treated just as any other table source would be. Tableau doesn’t care that it’s temporary.

capture

Although we should note that if you want to use the table in the future in a different Tableau workbook, you’d have to have it run the initial SQL again there, as standard temporary tables are not shareable and do not persist between database sessions.

 

Future features coming to Tableau 10.2 and beyond – that they didn’t blog about

Having slowly de-jetlagged from this year’s (fantastic and huge) Tableau conference, I’d settled down to write up my notes regarding the always-thrilling “what new features are on the cards?” sessions, only to note that Tableau have already done a pretty good job of summarising it on their own blog here, here and here.

There’s little point in my replicating that list verbatim, but I did notice that a few things that I’d noted down from the keynote announcements that weren’t immediately obvious in Tableau’s blog posts. I have listed some of those for below for reference. Most are just fine details, but one or two seem more major to me.

Per the conference, I’ll divide this up into “probably coming soon” vs “1-3 year vision”.

Coming soon:

Select from tooltip – a feature that will no doubt seem like it’s always been there as soon as we get it.

We can already customise tool tips to show pertinent information about a data point that don’t influence the viz itself. For example, if we’re scatter-plot analysing sales and profit per customer, perhaps we’d like to show whether the customer is a recent customer vs a long term customer in the tool tip when hovered over.

In today’s world, as you hover over a particular customer’s datapoint, the tooltip indeed may tell you that it’s a recent customer. But what’s the pattern in the other datapoints that are also recent customers?

In tomorrow’s world you’ll be able to click where it tells you “recent customer” and all the other “recent customers” in the viz will be highlighted. It’s nothing that you can’t get the same end result today with the use of the highlighter tool, but likely far more convenient in certain situations..

A couple of new web-authoring features, to add to the list on the official blog post.

  1. You can create storypoints on the web
  2. You’ll be able to enable full-screen mode on the web

Legends per measure: this might not sound all that revolutionary, but when you think it through, it enables this sort of classic viz: a highlighted table on multiple measures – where each measure is highlighted independently of the others.

legendpermeasure.PNG
Having average sales of £10000 doesn’t any more have to mean that the high customer age of 100 in the same table is highlighted as though it was tiny.

Yes, there are workarounds to make something that looks similar to the above today – but it’s one of those features that I have found those people yet to be convinced of the merits of Tableau react negatively to when it turns out it’s not a simple operation, after they compare it to other tools (Excel…). Whilst recreating what you made in another tool is often exactly the wrong approach to using a new tool, this type of display is one of the few I see a good case for making easy enough to create.

In the 1-3 year future:

Tableau’s blog does talk about the new super-fast data engine, Hyper, but doesn’t dwell on one cool feature that was demoed on stage.

Creating a Tableau extract is sometimes a slow process. Yes, Hyper should make it faster, but at the end of the day there are factors like remote database performance and network speed that might mean there’s simply no practical way to speed it up.  Today you’re forced to sit and stare at the extract creation process until it’s done.

Hyper, though, can do its extract-making process in the background, and let you use it piece-by-piece, as it becomes available.

So if you’re making an extract of sales from the last 10 years, but so far only the information from the last 5 years has arrived to the extract creation engine, you can already start visualising what happened in the last 5 years. Of course you’ll not be able to see years 6-10 at the moment, as it’s still winging its way to you through the wifi. But you can rest safe in the knowledge that once the rest of the data has arrived it’ll automatically update your charts to show the full 10 year range. No more excuses for long lunches, sorry!

It seems to me that this, and features like incremental refresh, also open the door to enabling near real-time analysis within an extract.

Geographic augmentation – Tableau can plot raw latitude and longitude points with ease. But in practice, they are just x-y points shown over a background display; there’s no analytical concept present that point x,y is part of the state of Texas whereas point y,z is within New York. But there will be. Apparently we will be able to roll up long/lat pairs to geographic components like zip, state, and so on, even when the respective dimension doesn’t appear in the data.

Web authoring – the end goal is apparently that you’ll be able to do pretty much everything you can do publishing-wise in Tableau Desktop on the web. In recent times, each iteration has added more and more features – but in the longer term, the aim is to get to absolute parity.

We were reassured that this doesn’t mean that the desktop product is going away; it’s simply a different avenue of usage, and the two technologies will auto-sync so that you could start authoring on your desktop app, and then log into a website from a different computer and your work will be there waiting for you, without the need to formally  publish it.

It will be interesting to see whether, and how, this affects licensing and pricing as today there is a large price differential between for instance a Tableau Online account and Tableau Desktop Professional, at least in year one.

And finally, some collaboration features on Tableau server.

The big one, for me, is discussions (aka comments).  Right alongside any viz when published will be a discussion pane. The intention is that people will be able to comment, ask questions, explain what’s shown and so on.

But, doesn’t Tableau Server already have this? Well, yes, it does have comments, but in my experience they have not been greatly useful to many people.

The most problematic issue in my view has been the lack of notifications. That is to say,  a few months after publishing a delightful dashboard, a user might have a question about a what they’re seeing and correctly pop a comment on the page displaying the viz. Great.

But the dashboard author, or whichever SME might actually be able to answer the question, isn’t notified in any way.  If they happen to see that someone commented by chance, then great, they can reply (note that the questioner will not be notified that someone left them an answer though). But, unless we mandate everyone in the organisation to manually check comments on every dashboard they have access to every day, that’s rather unlikely to be the case.

And just opening the dashboard up may not even be enough, as today they tend to be displayed “below the fold” for any medium-large sized dashboard. So comments go unanswered, and people get grumpy and stop commenting, or never notice that they can even comment.

The new system however will include @user functionality, which will email the user when a comment or question has been directed at them.  I’m also hoping that you’ll be able to somehow subscribe to dashboards, projects or the server such that you get notified if any comments are left that you’re entitled to see , whether or not you’re mentioned in them.

As they had it on the demo at least, the comments also show on the right hand side of the dashboard rather than below it – which given desktop users tend to have wide rather than tall screens should makes them more visible. They’ll also be present in the mobile app in future.

Furthermore, each time a comment is made, the server will store and show the state of the visualisation at that time, so that future readers can see exactly what the commenter was looking at when they made their comments. This will be great for the very many dashboards that are set up to autorefresh or allow view customisation.

Conversation.PNG

(My future comment wishlist #1: ability to comment on an individual datapoint, and have that comment shown wherever that datapoint is seen).

Lastly, sandboxes. Right now, my personal experience has been that there’s not a huge incentive to publish work-in-progress to a Tableau server in most cases. Depending on your organisation’s security setup, anything you publish might automatically become public before you’re ready, and even if not, then unless you’re pretty careful with individual permissions it can be the case that you accidentally share your file too widely, or not widely enough, and/or end up with a complex network of individually-permissioned files that are easy to get mixed up.

Besides, if you always operate from the same computer, there’s little advantage (outside of backups) to publishing it if you’re not ready for someone else to look at it. But now, with all this clever versioning, recommendy, commenty, data-alerty stuff, it becomes much more interesting to do so.

So, there will apparently be a user sandbox; a private area on the server where each Tableau user can upload and work on their files, safe in the knowledge that what they do there is private – plus they can customise which dashboards, metrics and so on are shown when they enter their sandbox.

But, better yet, team sandboxes! So, in one click, you’ll be able to promote your dashboard-in-progress to a place where just your local analytics team can see it, for instance, and get their comments, feedback and help developing it, without having to fiddle around with setting up pseudo-projects or separate server installations for your team.

Furthermore, there was mention of a team activity newsfeed, so you’ll be able to see what your immediate team members have been up to in the team sandbox since you last took a peek. This should be helpful for raising awareness of what each team member is working on high, further enhancing the possibilities for collaboration and reducing the likelihood of duplicate work.

Finally, it’s mentioned on Tableau’s blogs, but I wanted to extend a huge cheer and many thanks for the forthcoming data driven alerting feature! Lack of this style of alerting and insufficient collaboration features were the two most common complaints I have heard about Tableau Server from people considering the purchase of something that can be decidedly non-trivial in cost. Other vendors have actually gone so far as to sell add-on products to try and add these features to Tableau Server, many of which are no doubt very good -but it’s simply impossible to integrate them into the overall Tableau install as seamlessly as Tableau themselves could do.

Now we’re in 2016, where the average Very Important And Busy Executive feels like they don’t have time to open up a dashboard to see where things stand, it’s a common and obvious feature request to want to be alerted only when there is actually something to worry about – which may then result in opening the dashboard proper to exploring what’s going on. And, I have no doubt, creative analysts are going to find any number of uses to put it to outside of the obvious “let me know if my sales are poor today”.

(My future data driven alert wishlist #1: please give include a trigger to the effect of “if this metric has an unusual value”, meaning to base it on a statistical calculation derived from on historic variance/std dev/ etc. rather than having to put a flat >£xxxx in as criteria).

Remember the exciting new features Tableau demoed at #data15 – have we got them yet?

As we get closer towards the thrills of this year’s Tableau Conference (#data16), I wanted to look back at one of the most fun parts of the last year’s conference – the “devs on stage” section. That’s the part where Tableau employees announce and demonstrate some of the new features that they’re working on. No guarantees are made as to whether they’ll ever see the light of day, let alone be in the next release –  but, in reality, the audience gets excited enough that there’d probably be a riot if none of them ever turned up.

Having made some notes of what was shown in last year’s conference (which was imaginatively entitled #data15), I decided to review the list and see how many of those features have turned up so far. After all, it’s all very well to announce fun new stuff to a crowd of 10,000 over-excited analysts…but does Tableau tend to follow through on it? Let’s check!

(Please bear in mind that these are just the features I found significant enough to scrawl down through the jet-lag; it’s not necessarily a comprehensive review of what was on show.)

Improvements in the Data category:

Feature Does it exist yet?
Improvements to the automatic data cleanup feature recently released that can import Excel type files that are formatted in an otherwise painful way for analysis Yes – Tableau 9.2 brought features like “sub-table detection” to its data interpreter feature
Can now understand hundreds of different date formats Hmm…I’m not sure.  I’ve not had any problems with dates, but then again I was lucky enough never to have many!
The Data Source screen will now allow Tableau to natively “union” data (as in SQL UNION), as well as join it, just by clicking and dragging. Yes – Tableau 9.3 allows drag and drop unioning. But only on Excel and text files. Here’s hoping they expand the scope of that to databases in the future.
Cross-database joins Yes, cross-database joins are in Tableau 10.

Improvements in the Visualisation category:

Feature Does it exist yet?
Enhancements to the text table visualisation Yes – Tableau 9.2 brought the ability to show totals at the top of columns, and 9.3 allowed excluding totals from colour-coding.
Data highlighter Yes – Tableau 10 includes the highlighter feature.
New native geospatial geographies Yes – 9.2 and 9.3 both added or updated some geographies.
A connector to allow connection to spatial data files No – I don’t think I’ve seen this one anywhere.
Custom geographic territory creation Yes – Tableau 10 has a couple of methods to let you do that.
Integration with Mapbox Yes- Tableau 9.2 lets you use Mapbox maps.
Tooltips can now contain worksheets themselves. No – not seen this yet.

Improvements in the Analysis category:

Feature Does it exist yet?
Automatic outlier detection No
Automatic cluster detection Yes, that’s a new Tableau 10 feature
You can “use” reference lines / bands now for things beyond just static display Hmm…I don’t recall seeing any changes in this area. No?

Improvements in the Self-Service category:

Feature Does it exist yet?
There will be a custom server homepage for each user Not sure – the look and feel of the home page has changed, and the user can mark favourites etc. but I have not noticed huge changes in customisation from previous versions.
There will be analytics on the workbooks themselves  Yes – Tableau 9.3 brought content analytics to workbooks on server.Some metadata is shown in the content lists directly, plus you can sort by view count.
Searching will become better Yes – also came with Tableau 9.3. Search shows you the most popular results first, with indicators as to usage.
Version control Yes – Tableau 9.3 brought workbook revision history for server, and Tableau 10 enhanced it.
Improvements to security UI Yes – not 100% sure which version, but the security UI changed. New features were also added, such as setting and locking project permissions in 9.2.
A web interface for managing the Tableau server Not sure about this one, but I don’t recall seeing it anywhere. I’d venture “no”, but am open to correction!

Improvements in the Dashboarding category:

Feature Does it exist yet?
Improvements to web editing Yes – most versions of Tableau since then have brought improvements here. In Tableau 10 you can create complete dashboards from scratch via the web.
Global formatting  Yes, this came in Tableau 10.
Cross datasource filtering Yes, this super-popular feature also came with Tableau 10.
Device preview Yes, this is available in Tableau 10.
Device specific dashboards. Yes, also from Tableau 10.

Improvements in the Mobile category:

Feature Does it exist yet?
A  Tableau iPhone app Yes – download it here. An Android app was also released recently.
 iPad app – Vizable Was actually launched at #data15, so yes, it’s here.

Summary

Hey, a decent result! Most of the features demonstrated last year are already in the latest official release.

And for some of those that aren’t, such as outlier detection, it feels like a framework has been put in place for the possible later integration of them. In that particular case, you can imagine it being located in the same place, and working in the same way, as the already-released clustering function.

There are perhaps a couple that it’s slightly sad to see haven’t made it just yet – I’m mainly thinking of embedded vizzes in tooltips here. From the celebratory cheers, that was pretty popular with the assembled crowds when demoed in 2015, so it’ll be interesting to see whether any mention of development on that front is noted in this year’s talks.

There are also some features released that I’d like to see grow in scope – the union feature would be the obvious one for me. I’d love to see the ability to easily union database tables beyond Excel/text sources. And now we have cross-database joins, perhaps even unioning between different technology stacks.

Bonus points due: In my 2015 notes, I had mentioned that a feature I had heard a lot of colleague-interest in, that was not mentioned at all in the keynote, was data driven alerting; the ability to be notified only if your KPI goes wild for instance. Sales managers might get bored of checking their dashboards each day just to see if sales were down when 95% of the time everything is fine, so why not just send them an email when that event actually occurs?

Well, the exciting news on that front is that some steps towards that have been announced for Tableau 10.1, which is in beta now so will surely be released quite soon.

Described as “conditional subscriptions”, the feature will allow you to “receive email updates when data is present in your viz”. That’s perhaps a slight abstraction from the most obvious form of data-driven alerting. But it’s easy to see that, with a bit of thought, analysts will be able to build vizzes that give exactly the sort of alerting functionality my colleagues, and many many others in the wider world, have been asking for. Thanks for that, developer heroes!

 

Workaround for the Tableau “custom SQL” restriction on SQL Server variables

SQL Server (among other databases) has a handy feature for easy re-use of queries, in the guise of variables. You can declare variables and use them repeatedly in any query in the current session. That’s pretty handy for any complicated query forms you use repeatedly, where each time you might need to change some basic criteria.

As a trivial example: here’s how to select all the customers I have that live in postcode 12345

DECLARE @postcode varchar(5)
SET @postcode = '12345'

SELECT POSTAL_CODE, CUSTOMER_NO FROM [MY_CUSTOMERS] WHERE POSTAL_CODE = @postcode

I can then run it another day for postcode 23456 by just changing the second line to

SET @postcode = '23456'

This example is a bit silly because you might as well just type 12345 or 23456 into the query itself – but imagine a more complex SQL statement, or set of coordinated SQL statements, with screenfuls of code, and you might see how it makes it much easier to maintain.

But, sad to say, you can’t use these sort of variables as part of the custom SQL feature in Tableau. If you try, then you’ll get this slightly unhelpful message:

[Microsoft][SQL Server Native Client 11.0][SQL Server]Incorrect syntax near the keyword 'DECLARE'.
[Microsoft][SQL Server Native Client 11.0][SQL Server]Incorrect syntax near ')'.

But rather than have to hardcode your queries and modify them for every extract you can use a very simple alternative: our friend, the Tableau parameter.

For each SQL variable, create a Tableau parameter of the same data type. You can put in a default value as the “Current value” when setting it up.

capture

Then, when connecting to your database, paste in the original SQL statement you had into the Custom SQL box, but removing the variable declaration and setting from the top.
SELECT POSTAL_CODE, CUSTOMER_NO FROM [MY_CUSTOMERS] WHERE POSTAL_CODE = @postcode

capture

One at a time, highlight each SQL variable there, i.e. the @postcode part of the above, press the “Insert Parameter” button under the text field and select the appropriate Tableau parameter.

(There is actually the option to create a new parameter at this stage, if you didn’t do so in advance).

You’ll end up with a similar looking SQL statement, but with the variables now referred to in this sort of way:

SELECT POSTAL_CODE, CUSTOMER_NO FROM [MY_CUSTOMERS]
WHERE POSTAL_CODE = <Parameters.Postcode>

And it’ll work!

You can then use the standard Tableau functionality to change the parameters in Tableau, and behind the scenes your SQL statement will update appropriately.

Note firstly, for performance, that each change of this sort of parameter is going to cause the query to be re-run on the database (unless it’s already been cached).

Secondly, if you have created a Tableau Data Extract from this datasource, then altering the parameter won’t change anything until you refresh the extract.

Finally, the Tableau parameter is simply a normal Tableau parameter, just like anything else. So there’s nothing to stop a Tableau user altering it, which depending on your setup might be either exactly what you want, or exactly what you don’t want. If the latter, you might consider having the datasource involving parameters in a separate workbook to the user-facing dashboard (publishing to Tableau Server to share), or at least making sure it’s not visible on the end-user dashboards if they have no reason to change it.

Quick sidenote: I wondered if the “Initial SQL” feature might work, to declare variables upon connection to the datasource which could then be used in custom SQL later. But sadly not, I guess it doesn’t count as the same session. If you try, you’ll see a message like:
[Microsoft][SQL Server Native Client 11.0][SQL Server]Must declare the scalar variable "@postcode"

A few notes on Tableau 10’s cross-database join feature

The new version of Tableau, number 10, got released whilst I was busy sunning myself on holiday. This mean there was at least something to look forward to in returning to the world of work 🙂

It has many, many new features. I suspect one of the most impactful, in this era of data lying around everywhere you look, may end up being cross-database joins.

This means that Tableau can now do “proper” joins (i.e. not blends) that span different databases, even different database technologies. So for instance, you can execute an inner join from one table from SQL Server onto data from another table in Redshift, and analyse the result as though it all came from the same place. Better yet, you can publish the resulting datasource as a Tableau Server datasource so everyone can join in the fun, without even having to care where it came from in the first place.

Below a few notes / warnings from my early experimentations. Some may sound critical, but, overall, this is truly a great feature that will enable the less ETL-loving folks to get more done quicker, and/or without the resources of tech database specialists or sofwtare being needed.

  1. Excel counts as a database (as does any other supported file format). This means you can simulate database joins between multiple Excel files for instance without having to modify them in any way.
  2. You can use a custom SQL statement as the “table” you’re joining to or from in any join: i.e. you’re not restricted to just the formal tables on the database. This means that if you’re an SQL wizard who would prefer to have the database preprocess stuff before it hits Tableau, you can do that in the context of a cross-database join.
  3. Warning! I believe (from experience 🙂 ) Tableau downloads all the data from the tables you choose to your local computer before performing the join. This is kind of predictable if you think it through, but not totally. If you know what an Alteryx “in-database tool” is, then this is not like one of those.

    The joining is on your computer, even if you don’t need to care about that. This means that if you’re inner joining a table of 10 rows in one database to 10 out of a possible 1 billion rows in another database to create an extract, Tableau will be downloading 1 billion and 10 rows to your computer before later returning you the post-join 10 rows as your datasource. You can apply data source filters to your tables before extracting though which might help alleviate some of the potential pain here.

  4. You can do left, inner and right joins very easily, joining on any number of fields as the key. But I didn’t see a facility for conditional joins, in the sense of saying “download all sales from Redshift for a customer whose info is in Oracle where the sales are between the start and end date of the customer’s first 6 weeks according to Oralce”. You would have to let it download and show you all of the joined data and then use formula in Tableau afterwards to filter out what you don’t want.
  5. Some data connection types aren’t allowed to be involved in cross database joins at the moment. A Tableau blog post lists these as:
    • Tableau Server
    • Firebird
    • Google Analytics
    • Microsoft Analysis Services
    • Microsoft PowerPivot
    • Odata
    • Oracle Essbase
    • Salesforce
    • SAP BW
    • Splunk
    • Teradata OLAP Connector

      The big shame for me on that list is Tableau Server. Tableau Server datasources are a very cool way for non-DBAs to share analytically-useful data around the organisation (or DBA professionals to provide a lovely structured, governed, datasource ideal for Tableau users to use). I have however found that they can result in semi-silos, as exporting or manipulating them has certain limitations vs standard databases. If one could cross-database join these types of datasource one day it would be an amazing way of adding extra flexibility to that feature

  6. Performance may be super great, or it may be super slow, depending on your exact database setup.

But all those compliments and complaints aside, debt is very much due to the makers of Tableau 10 for providing one fewer reason for analysts to load up other software, or pester other departments, before getting started on analysing all your lovely data.

Creating my first Tableau web data connector : part 3

At last, the final part of the trials and tribulations of creating my first Tableau Web Data Connector… Part 1 went through the pre-requisites, and building what I might generously term my “user interface”. Part 2 was a struggle against the forces of web security. And in this part, we battle against the data itself, until the error-message dragon is truly slayed.

So, the in the last part, we ended up with a connector that could ask for the username of a BoardGameGeek user, connect to a CORS-enabled version of the API, and complain that whatever objects it found “are not valid as a React child”.

That makes sense, as I didn’t yet tell it what sort of data objects to expect.

As I’m following the Tableau web data connector tutorial, which accesses data from URLs like this one, I figured I’d look at what data that URL returns precisely, compare it to the web data connector tutorial code, and then, when I understand the relationship between the code and the data from the Yahoo URL, I might be able to adapt the code to fit the format my chosen data arrives in.

Here’s the data one gets if you point your web browser to the URL that the tutorial is teaching you how to build a WDC from:

Capture

OK, stock-pickers will immediately recognise that it’s stock quote data, with dates, highs, lowers, symbols and all the other lovely gubbins that comes along with basic stock charts. It’s in JSON format, which is human-readable, but only in small doses.

So I need to understand the relationship between the above data, and this piece of code:

if (data.query.results) {
              var quotes = data.query.results.quote;
              var ii;
              for (ii = 0; ii < quotes.length; ++ii) {
                  var entry = {'Ticker': quotes[ii].Symbol,
                               'Day': quotes[ii].Date,
                               'Close': quotes[ii].Close};
                  dataToReturn.push(entry);
              }

I can kind of see it by eye – there’s obviously JSON entries for Symbol, Date, Close, query and so on. But can’t we make it easier?

Yes we can, because there’s such a thing as a web JSON viewer. Past all that stock-related text into something like http://www.jsoneditoronline.org/ and you get a nice hierarchical visualisation of the JSON structure, like this:

Capture

So if we assume that “data” is referring to the overall set of data itself, and use dot notation to traverse the hierarchy, we can see a line in the code that says:

var quotes = data.query.results.quote;

That would seem to fit into the JSON structure above, where below the “object” level you have a level called query, a sublevel called results, and a sublevel called quote.

Capture

The variables “quotes” that the code creates therefore is basically referring to everything at/below the “quote” level of the JSON hierarchy.

Then you get a bunch of records which are the data of interest itself. These are numbered 0 for the first datapoint, 1 for the next, and so on.

If you know a bit of basic programming, you might note that the WDC code has “for loop” with a counter variable called “ii” that is set to 0 at first and is incrementing by one each time it runs. That seems to fit nicely in with the idea of it iterating through datapoint 0, 1, …n until it gets to the end (which is effectively the length of the quotes dataset, i.e. quotes.length).

Each one of these JSON records then has a few attributes – and they include the ones mentioned in the code as being part of whichever “quotes” subrecord we’re on; “Symbol”, “Date”, “Close”.

   var entry = {'Ticker': quotes[ii].Symbol,
                               'Day': quotes[ii].Date,
                               'Close': quotes[ii].Close};

Capture

Awesome. So where does “Ticker”, “Day” and “Close” on the left hand side of those code lines come from?

Well, that’s up to us. Remember the “add code for defining columns” part of the tutorial?

myConnector.getColumnHeaders = function() {
    var fieldNames = ['Ticker', 'Day', 'Close'];
    var fieldTypes = ['string', 'date', 'float'];
    tableau.headersCallback(fieldNames, fieldTypes);
}

There they’ve defined the fieldnames they want to have Tableau display (which can be totally different from the JSON names we saw above in the datasource) and the field types.

Field types can be any from this list:

  • bool
  • date
  • datetime
  • float
  • int
  • string

So, let’s look at what’s returned when I make the call to my Boardgames datasource.

Capture.PNG

OK, I can see some fields in there, but it’s kind of messy. So I pasted it back into the online JSON editor.

Here’s what the relevant part looks like when formatted nicely:

Capture.PNG

So, the interesting information there is probably the boardgame’s name, year published, URLs to its image and thumbnail, and the various statuses (own, want, etc.) and the number of times its been played (numplays).

I therefore edited the “getColumnHeaders” code snippet to reflect the fields I wanted, and the type of data that would feature in them.

myConnector.getColumnHeaders = function() {
		var fieldNames = ['BoardgameName','YearPublished','ImageURL','ImageThumbnailURL', 'Plays','Owned','PreviouslyOwned','ForTrade','Want','WantToPlay','WantToBuy','Wishlist','Preordered','LastModified'];
		var fieldTypes = ['string','int','string','string','int','int','int','int','int','int','int','int','int','datetime'];
		tableau.headersCallback(fieldNames, fieldTypes);
	}

Now I’ve defined the fields, I can go back to the retrieving results section of code (if (data.query.results)…) and, now knowing the structure of the JSON generated by my API, parse out and assign to the above variables the data I want.

I decided to call the collection of data I was building “games” rather than “quotes”, because that’s what it is. I next noted that each individual “game” within the JSON is listed in a hierarchy below the “item” entry which itself is below “items”.

Capture.PNG

(“Amyitis” is the name of a boardgame, rather than an allergy to people called Amy, believe it or not).

So, I assigned all the items.item data to “games”

if (data.items.item) {
var games = data.items.item;

And copied the tutorial to loop through all the “items.item” entries, each of which is a boardgame, until we’d reached the end i.e. when the number of times we looped is the same as the length of the data table..

var ii;
for (ii = 0; ii < games.length; ++ii) {

Finally, it’s time to assign the relevant bit of data returned from the API to the variables I’d defined above.

Capture

At first I got a little confused, because the JSON output had a bunch of _ and $ entries that didn’t seem similar to what was returned in the tutorial dataset. But it turns out that’s nothing to worry about. Just treat them as though they were any other text.

var entry = {'BoardgameName': games[ii].name[0]._,
'YearPublished': games[ii].yearpublished[0],
'ImageURL': 'https:' + games[ii].image[0],
'ImageThumbnailURL': 'https:' + games[ii].thumbnail[0],
'Plays': games[ii].numplays[0],
'Owned': games[ii].status[0].$.own,
'PreviouslyOwned': games[ii].status[0].$.prevowned,
'ForTrade': games[ii].status[0].$.fortrade,
'Want': games[ii].status[0].$.want,
'WantToPlay': games[ii].status[0].$.wanttoplay,
'WantToBuy': games[ii].status[0].$.wanttobuy,
'Wishlist': games[ii].status[0].$.wishlist,
'Preordered': games[ii].status[0].$.preordered,
'LastModified': games[ii].status[0].$.lastmodified
};
dataToReturn.push(entry);

In the above, you can think about games[ii] as being the individual boardgame record. The “for loop” we defined above substitutes each record into the ‘ii’ variable, so it’s accessing games[0], games[1] etc. which translates into data.items.item[0] and so on, if you remember how we defined the games variable above.

Then to find the boardgame’s name we need to traverse into the “name” chid of the boardgame itself, then look for the first entry below that (always the first one here, so we can refer to that as entry [0]), and look for the field that is shown as an underscore, _.

Capture.PNG

 

Hence:

'BoardgameName': games[ii].name[0]._,

Rinse and repeat this for each element of interest, and you’ve collected your dataset ready for Tableau to use!

Of course, the world is not perfect, and, in reality, I did not manage to do the above without making the odd mistake here and there. I got blank fields sometimes, when I knew they shouldn’t be, or fields with the wrong data in. As I was just using Notepad and the online simulator, I wasn’t really getting many useful error messages.

Lesson learned: you can use the code

tableau.log("YOUR MESSAGE HERE");

to display the message you write in the Google Chrome developer console. You might remember from part 2 that you can bring that up by pressing Ctrl Shift I in Google Chrome, and selecting the “Console” tab.

Why is that useful? Well, one, if it displays your message then it means that piece of code ran, so you can check you’re not skipping over any important section. And secondly, you can append text stored in variables to it. So for instead I could write:

tableau.log("the name of the game is " + games[ii].name[0]._);

And, if my code is correct, it should print out the name of the boardgame as it runs.

If I messed up, perhaps using the wrong index, or forgetting the underscore, it will output something different, and perhaps also an error message, guiding me towards the source of the problem.

That, as they say, is basically that! I ran it through the hosted Tableau Web Data Connector simulator again,  this ticking the box so it will “automatically continue to data gather phase”. And, lo and behold, below you can see that it has interpreted the information about that famous Amyitis game into a nice tabular format.

Capture.PNG

Once you’ve got to that stage, you’ve already won. Just load up Tableau Desktop for real, select to connect to a web data connector and up should pop your interface, and later, your data.

 

Creating my first Tableau web data connector – part 2

Now for the next part of the creating-a-web-data-connector journey: previously, I had got all the software I needed to create a web data connector installed (well, to be fair Notepad comes with Windows so that wasn’t hard) and designed my user interface.

Now it was time to move on to Tableau tutorial section 5 – using the information retrieved from the user of my UI in order to have Tableau connect to the BoardGameGeek API and retrieve the relevant data for analysis.

I read through the Tableau WDC tutorial, and, using some previous programming-other-stuff knowledge, realised that this tutorial was a mix of explaining the fundamentals of connecting to a generic web data connector, and explaining some very specific details of the example Yahoo finance web data connector the example showed.

The first thing to do then was to determine which bits were essential for my simple attempt and which were only there to facilitate the specific Yahoo API.

  • The “code for defining columns” section is obviously important. I need columns to put my data in.
  • The Yahoo Query YQL less so – my API is nothing to do with Yahoo.
  • The helper functions (buildUri, getFormattedDate,makeTwoDigits), no, these are just functions to help construct the YQL query needed to access their example API. It’s great to know that it’s possible to use helper functions like this if needed, but chances are, if I did need some, they’d not be the same unless my API was very similar to their Yahoo example.
  • The “code to get stock data”, well, I don’t want stock data, but I do want some sort of data so obviously some of that’s going to be relevant, so I pasted that into my file as-is, noting from the code that some of it was clearly going to need to be replaced given I don’t want boring finance data when exciting boardgames data is available instead.

It seemed obvious that providing the weblink to my API of choice -http://boardgamegeek.com/xmlapi2/collection?username=[whatever] – was going to be key.  So I replaced their connectionURI  code with a simple text variable of that, using the user-supplied BGG username I had got from the previous part.

connectionUri = 'http://boardgamegeek.com/xmlapi2/collection?username=' + bggUsername

 

(a URI is a Uniform Resource Identifier)

Of course this wasn’t going to be enough to make it work – I have not yet told Tableau what data I want it to receive and how to format it – but figured I’d run it through the simulator and see what happened. Nothing especially good did. But I did learn about “CORS”.

Now, again, no particular error messages tend to pop up by default even when things go fatally wrong in the simulator . It might look like it hung, it might go blank or similar, but no friendly message.

Lesson learned: If you are working with the super-basic tools I had at hand, what you need then is the built in developer tools Chrome has which you can bring up by pressing Ctrl Shift + I together. Switch it to show the “Console” tab, and you might see an error message there to help you out, if you’re lucky.

Here’s what I saw:

Untitled picture

Now, it’s been a long time since I did any serious web developing, and I had never suffered an Access-Control-Allow-Origin problem before. A swift bit of Googling led me to this nice explanation where I learnt that it’s a security feature “implemented in browsers to restrict interaction between documents (or scripts) that have different origins”. That makes sense; I don’t own either the BoardGameGeek site nor Github, I’m sad to say.

Of the 3 solutions that Jvaneyck suggests, it seemed like if I could find a JSONP source for my API instead of the XML one then that would be simplest, given I have no control over either server nor intention of building any infrastructure. So, can I get the BoardGameGeek API in JSONP format? Well, another quick Google ™ let me to a post on the BoardGameGeek forum, where a kind user named strathmeyer had offered up “Ajax-ready BGG API for any web programmers out there”.

The best thing is that I’ve enabled CORS headers, so you can use the API straight from the web browser

Yes, here indeed that was the best thing! Access-Control-Allow-Origin issue solved, and all thanks to the work of someone else 🙂

Lesson learned: I later found that Tableau 1) has a whole page dedicated to this exact problem, and 2) there’s an example in their SDK of various methods to get around this issue when connecting to XML. So next time, I should probably RT original M before going Google crazy.

But for now, all I needed to do was change the URI I pointed to from my web data connector to:

connectionUri = 'http://bgg-api.herokuapp.com/api/v1/collection?username=' + bggUsername

I ran that through the simulator once more, of course not expecting it to work before I had defined what data I was actually after – but it did eliminate the previous error message so I knew things were moving on nicely.

So when it ran through, Ctrl Shift + I back into Chrome to bring up the inspector and saw this message.

Untitled picture

My variable is not defined?! Where did it go? Oops, my fault, I’d embedded a previous function in this one. I needed to be more careful with my bracket-pasting finger. But it was simplicity itself to move the code back to where it should be, and re-simulate.

 

Untitled picture

OK, “uncaught invariant violation” is perhaps a little cryptic, but it’s sounds like it’s talking about having found something it didn’t expect. That could be my data (spoiler: it was). So now I needed to tell it what to expect and what to do with it.

Tune back in, same place, different time, for the exciting conclusion!

 

The Tableau #MakeoverMonday doesn’t need to be complicated

For a while, a couple of  key members of the insatiably effervescent Tableau community, Andy Cotgreave and Andy Kriebel, have been running a “Makeover Monday” activity. Read more and get involved here – but a simplistic summary would be that they distribute a nicely processed dataset on a topic of the day that relates to someone else’s existing visualisation, and all the rest of us Tableau fans can have a go at making our own chart, dashboard or similar to share back with the community so we can inspire and learn from each other.

It’s a great idea, and generates a whole bunch of interesting entries each week. But Andy K noticed that each Monday’s dataset was getting way more downloads than the number of charts later uploaded, and opened a discussion as to why.

There are of course many possible reasons, but one that came through strongly was that, whilst they were interested in the principle, people didn’t think they had the time to produce something comparable to some of the masterpieces that frequent the submissions. That’s a sentiment I wholeheartedly agree with, and, in retrospect – albeit subconsciously – why I never gave it a go myself.

Chris Love, someone who likely interacts with far more Tableau users than most of us do, makes the same point in his post on the benefits of Keeping It Simple Stupid. I believe it was written before the current MakeoverMonday discussions began in earnest, but was certainly very prescient in its applications to this question.

Despite this awesome community many new users I speak to are often put off sharing their work because of the high level of vizzes out there. They worry their work simply isn’t up to scratch because it doesn’t offer the same level of complexity.

 

To be clear, the original Makeover Monday guidelines did include the guideline that it was quite proper to just spend an hour fiddling around with it. But firstly, after a hard day battling against the dark forces of poor data quality and data-free decisions at work, it can be a struggle to keep on trucking for another hour, however fun it would be in other contexts.

And that’s if you can persuade your family that they should let you keep tapping away for another hour doing what, from the outside, looks kind of like you forgot to finish work. In fact a lot of the worship I have for the zens is how they fit what they do into their lives.

But, beyond that, an hour is not going to be enough to “compete” with the best of what you see other people doing in terms of presentation quality.

I like to think I’m quite adept with Tableau (hey, I have a qualification and everything :-)), but I doubt I could create and validate something like this beauty using an unfamiliar dataset on an unfamiliar topic in under an hour.

 

It’s beautiful; the authors of this and many other Monday Makeovers clearly have an immense amount of skill and vision. It is fascinating to see both the design ideas and technical implementation required to coerce Tableau into doing certain non-native things. I love seeing this stuff, and very much hope it continues.

But if one is not prepared to commit the sort of time needed to do that regularly to this activity, then one has to try and get over the psychological difficulty of sharing a piece of work which one perceives is likely to be thought of as “worse” than what’s already there. This is through no fault of the MakeoverMonday chiefs, who make it very clear that producing a NYT infographic each week is not the aim here – but I certainly see why it’s a deterrent from more of the data-downloaders uploading their work. And it’s great to see that topic being directly addressed.

After all, for those of us who use Tableau for the day-to-day joys of business, we probably don’t rush off and produce something like this wonderful piece every time some product owner comes along to ask us an “urgent” question.

Instead, we spend a few minutes making a line chart, that gives them some insight into the answer to their question. We upload an interactive bar chart, with default Tableau colours and fonts, to let them explore a bit deeper and so on. We sit in a meeting and dynamically provide an answer to enable live decision-making that before we had tools like this would have had to wait a couple of weeks to get a csv report on. Real value is generated, and people are sometimes even impressed, despite the fact that we didn’t include hand-drawn iconography, gradient-filled with the company colours.

Something like this perhaps:

Yes, it’s “simple”, it’s unlikely to go Tableau-viral, but it makes a key story held within that data very clear to see. And its far more typical of the day-to-day Tableau use I see in the workplace.

For the average business question, we probably do not spend a few hours researching and designing a beautiful colour scheme in order to perform the underlying maths needed to make a dashboard combining a hexmap, a Sankey chart and a network graph in a tool that is not primarily designed to do any of those things directly.

No-one doubts that you can cajole Tableau into such artistry, and there is sometimes real value obtainable by doing so,  or that those who carry it out may be creative geniuses -but unless they have a day job that is very different than that of mine and my colleagues, then I suspect it’s not their day-to-day either. It’s probably more an expression of their talent and passion for the Tableau product.

Pragmatically, if I need to make, for instance, a quick network chart for “business”, then, all other things being equal, I’m afraid I’m more likely I get out a tool that’s designed to do that rather than take a bit more time to work out how to implement it in Tableau, no matter how much I love it (by the way, Gephi is my tool of choice for that – it is nowhere near as user friendly as Tableau, but it is specifically designed for that sort of graph visualisation; also recent versions of Alteryx can do the basics). Honestly, it’s rare for me that these more unusual charts need to be part of a standard dashboard; our organisation is simply not at a level of viz-maturity where these diagrams are the most useful for most people in the intended audience, if indeed they are for many organisations.

And if you’re a professional whose job is creating awesome newspaper style infographics, then I suspect that you’re not using Tableau as the tool that provides the final output either, more often than not. That’s not its key strength in my view; that’s not how they sell it – although they are justly proud of the design-thought that does go into the software in general. But if paper-WSJ is your target audience, you might be better of using a more custom design-focused tool, like Adobe Illustrator (and Coursera will teach you that specific use-case, if you’re interested).

I hope nothing here will cause offence. I do understand the excitement and admire anyone’s efforts to push the boundaries of the tool – I have done so myself, spending way more time than is strictly speaking necessary in terms of a theoretical metric of “insights generated per hour” to make something that looks cool, whether in or out of work. For a certain kind of person it’s fun, it is a nice challenge, it’s a change from a blue line on top of an orange line, and sometimes it might even produce a revelation that really does change the world in some way.

This work surely needs to be done; adherents to (a bastardised version of) Thomas Kuhn’s theory of scientific revolutions might even claim this “pushing to the limits” as one of the ways of engendering the mini-crisis necessary to drive forward real progress in the field. I’m sure some of the valuable Tableau “ideas“, that feed the development of the software in part, have come from people pushing the envelope, finding value, and realising there should be an easier way to generate it.

There’s also the issue of engagement: depending on your aim, optimising your work for being shared worldwide may be more important to you than optimising it for efficiency, or even clarity and accuracy. This may sound like heresy, and it may even touch on ethical issues, but I suspect a survey of the most well-known visualisations outside of the data community would reveal a discontinuity with the ideals of Stephen Few et al!

But it may also be intimidating to the weary data voyager when deciding whether to participate in these sort of Tableau community activities if it seems like everyone else produces Da Vinci masterpieces on demand.

Now, I can’t prove this with data right now, sorry, but I just think it cannot be the case. You may see a lot of fancy and amazing things on the internet – but that’s the nature of how stuff gets shared around; it’s a key component of virality. If you create a default line chart, it may actually be the best answer to a given question, but outside a small community who is actively interested in the subject domain at hand, it’s not necessarily going to get much notice. I mean, you could probably find someone who made a Very Good Decision based even on those ghastly Excel 2003 default charts with the horrendous grey background if you try hard enough.

excel2003

Never forget…

 

So, anyway, time to put my money where my mouth is and actually participate in MakeoverMonday. I don’t need to spend even an hour making something if I don’t want to, right?  (after all, I’ve used up all my time writing the above!)

Tableau is sold with emphasis on its speed of data sense-marking, claiming to enable producing something reasonably intelligible 10-100x faster than other tools. If we buy into that hype, then spending 10 minutes of Tableau time (necessitating making 1 less cup of tea perhaps) should enable me to produce something that it could have taken up to 17 hours to produce in Excel.

OK, that might be pushing the marketing rather too literally, but the point is hopefully clear. For #MakeoverMonday, some people may concentrate on how far can they push Tableau outside of its comfort zone, others may focus on how they can integrate the latest best practice in visual design, whereas here I will concentrate on whether I can make anything intelligible in the time that it takes to wait for a coffee in Starbucks (on a bad day) – the “10 minute” viz.

So here’s my first “baked in just 10 minutes” viz on the latest MakeoverMonday topic – the growth of the population of Bermuda. Nothing fancy, time ran out just as I was changing fonts, but hey, it’s a readable chart that tells you something about the population change in Bermuda over time. Click through for the slightly interactive version – although of course, it, for instance, has the nasty default tooltips, thanks to the 10 minutes running out just as I was changing the font for the chart titles…

Bermuda population growth.png

 

 

Creating my first Tableau web data connector – the highs and the lows: part 1

After having successfully (enough) completed my introduction to creating a real live Tableau web data connector, I wanted to follow the lead of one of the inspirations for this otherwise unnecessary effort – Chris Love’s Persiscope on the subject of sharing Tableau-related failure as well as unbridled success –  and document something about the less smooth aspects of the journey it took. If nothing else, they’ll be useful notes if I have to do this again for real in the future.

First of all, I gave myself an unnecessarily harsh setup 🙂 . I was working on my company computer (shhhhh) as it has Tableau Desktop installed, and I figured that’d be important.  The significance of that is that I don’t have the admin rights needed to install any new software on it. And I don’t do web development as part of the day job, so I was restricted to using only very generic tools. In practice, this meant Windows Notepad for the most part, and Google Chrome for testing.

I had downloaded the Tableau Data Connector software development kit  I was a little worried at claims that I’d need to install a web server to use it, which I’m not sure our work IT security overlords would appreciate – but in the end, I learned that you can successfully create a Tableau web data connector without having:

  • the web data connector SDK
  • a web server
  • admin rights
  • …Tableau (yes, seriously).

What I did need:

  • a web browser (I used Chrome)
  • a text editor (I used Window notepad, because, well, what more could you want?)

I had also decided to try and work it out without asking anyone anything. The Tableau user community is pretty amazing in terms of generously giving up answers, and I’m sure someone would have solved any problem I encountered so fast all the fun would have gone.

My approach then was to follow the Tableau Web Data Connector tutorial. In this tutorial, they build a web data connector to a financial website to retrieve stock data. I figured for a first try I’d just follow along and just put the details of the BoardGameGeek API I wanted to use in instead of the Yahoo API.

I’m not going to go through the tutorial line by line, as you might as well just read the original. Instead I’ll just intend to highlight some points where I had problems, and what I had to do to overcome them.

The first thing I’d note in hindsight is that the tutorial is not actually “how to build the simplest possible web data connector” as I had imagined it would be from its description as “basic”. It surely is a relatively simple one (no authentication needed etc.) but it does contain at least 3 functions that are only necessary because of the specific API they were connecting to. These are:

  • buildUri
  • getFormattedDate
  • makeTwoDigits

Pretty obvious when you see what they do – but it would be a mistake to imagine you should just copy the function structure of the basic connector and replace it with your own code. You won’t need the above 3 functions for the most part. I felt the tutorial could make that clearer, especially for someone even more novice than I, who had never touched web programming (I’ve done some in the distant past).

But anyway, begin at the beginning. I raced happily through parts 1 and 2 of the tutorial, copying and pasting their base code. Nothing important to change there unless you want to give your version a more appropriate title.

Part 3 instructed me how to create a form for the user to enter information about their data on. Again, I can imagine some data connectors won’t need to prompt the analyst using it for any information, in which case you could also disregard this section.  However I did actually need that feature, as I wanted it to let me enter a BoardGameGeek username in and only retrieve board games that person had in their collection.

In the mean time, I had discovered – joyfully – that there’s a hosted Tableau Web Data Connector simulator on Github, thank you so much! The reason it made me happy is that it meant that I didn’t need to install a web server on my own computer, or do anything with the SDK that might have involved needing to beg for admin rights over my laptop.

Once I developed a connector in Notepad, I could just upload it somewhere (I chose Github Pages myself) and use the hosted simulator to see if it works – which is the reason you don’t actually need a copy of Tableau in the first place.  I’m sure it would have been far quicker to iterate if I had installed it locally, but it works just fine in a hosted version – as long as you beware the cache.

Lesson learned: It seems like there’s a lot of caching going on somewhere between or within the hosted Tableau Web Data Connector simulator and my web host (Github). Oftentimes I’d make changes to the code, reupload, but it didn’t seem to change the result as shown in the simulator. Sometimes closing browsers, deleting cookies etc. helped – but not always. I’m sure there is some science to this that one could work out, but given I didn’t need to make a lot of edits in developing a relatively simple data connector, I basically settled for renaming my file “version 1.html”, “version 2.html” etc. whenever I made a modification, which guaranteed a lack of annoying caching action.

But anyway, defining the user interface per the tutorial went smoothly. I entered the web address where I’d uploaded my connector: “amedcalf.github.io/BGGConnector.html”, unticked the “automatically continue to data phase” box (as the data phase is something I defined later, in line with the tutorial), and ran the “interactive phase”.

Up popped my bare-bones form asking for a BoardGameGeek username, I filled it in and pressed OK, and it took me back to the simulator and showed me a form that indicated that it had indeed recognised the information I gave it. Success!

Capture

No data was returned just yet – but it clearly recognised what I had entered into the form just fine.

Lesson learned: Although this wasn’t a problem for me in this stage, I later noticed that if there’s anything wrong in your code then you do not always get an error message per se. Sometimes the simulator just doesn’t close your UI, or it does but doesn’t return you any information – i.e. acts like a faulty web page. In my experience that tends to be when I’d made a minor mistake in my code – a missing semicolon or bracket or similar. Once that was fixed, the simulator worked nicely again (if you want to go wild and use something other than notepad that does syntax highlighting, you’ll probably not suffer this much.) So you should assume that it’s your code, not the simulator, that is broken in such instances 🙂

It became clear the user interface is defined in normal HTML – so if you are a nifty web designer/coder, or want to learn to be one, you can make it as beautiful as you like. That wasn’t the point of the exercise for me though, so I just made a plain default font and default box to type information into for my effort.

So, that was the user interface “designed” and tested, which takes us up to part 4 of the Tableau tutorial.

Next up: time to actually access the data I wanted. This proved to be a little more difficult first time through though.