New website launch from the Office of National Statistics

Yesterday, the UK Office of National Statistics, the institution that is “responsible for collecting and publishing statistics related to the economy, population and society”, launched its new website.

As well as a new look, they’ve concentrated on improving the search experience and making it accessible to mobile device users.

The front page is a nice at-a-glance collection of some of the major time series ones sees in the news (employment rate, CPI, GDP growth etc.) . And there’s plenty of help-yourself downloadable data; they claim to offer 35,000 time series which you can explore and download with their interactive time series explorer tool.

Kaggle now offers free public dataset and script combos

Kaggle, a company most famous for facilitating competitions that allow organisations to solicit the help of teams of data scientists to solve their problems in return for a nice big prize, recently introduced a new section useful even for the less competitive types: “Kaggle Datasets“.

Here they host “high quality public datasets” you can access for free. But what is especially nice is that as well as the data download itself, they host any scripts, code and results that people have already written to handle them, plus some general discussion.

For example, on the “World Food Facts” page you can see a script that “ByronVergoesHouwens” wrote to see which countries ate the most sugar, and also a chart that that script produced.  In fact you can even execute scripts online, thanks to their “Kaggle Scripts” product.

It looks like the datasets will be added to regularly, but right now the list is:

  • Amazon Fine Food Reviews
  • Twitter US Airline Sentiment
  • SF Salaries
  • First GOP debate Twitter Sentiment
  • 2013 American Community Survey
  • US Baby Names
  • May 2015 Reddit Comments
  • 2015 Notebook UX Survey
  • NIPS 2015 Papers
  • Iris (yes, the one you will have seen many times already if you’ve read ANY books/tutorials on clustering in R or similar!)
  • Meta Kaggle
  • Health Insurance Marketplace
  • US Dept of Education: College Scoreboard
  • Ocean Ship Logbooks (1750-1850)
  • World Development Indicators
  • World Food Facts
  • Hilary Clinton’s Emails (sounds fun…:-))

You made a chart. So what?

In the latest fascinating Periscope video from Chris Love, the conversation centred around a question that can be summarised as “Do data visualisations need a ‘so what’?“.

There are many ways of rephrasing this: one could ask whether it is (always) the responsibility of the viz author to highlight the story that their visualisations show? Or can a data visualisation be truly worthy of high merit even if it doesn’t lead the viewer to a conclusion?

This topic resonates strongly with me: part of my day job involves maintaining a reference library of the results from the analytical research or investigation we do. We publish this widely within our organisation, so that any employee who has cause or interest in what we found in the past can help themselves to the results. The title we happened to give the library is “So what?“.

Although the detailed results of our work may be reported in many different formats, each library entry has a templated front page that includes the same sections for each study:

  1. The title of the work.
  2. The question that the work intended to address.
  3. A summary of the scope and dataset that went into the analysis.
  4. A list of the main findings.
  5. And finally, the all-important “So what?” section.

Note the distinction between findings (e.g. “customers who don’t buy anything for 50 days are likely to never buy anything again”) and the so what (“we recommend you call the customer when it has been 40 days since you saw them if you wish to increase your sales by 10%”).

The simple answer

With the above in mind, my position is probably quite obvious. If you are going to demand a binary yes/no answer as to whether a data visualisation should have a “so what?”, then my simplistic generalization would be that the answer is yes, it should.

Most of the time, especially in a business context, the main intention behind the whole analytics pipeline is to provide some nugget of information that will lead to a specific decision or action being taken. If the data visualisation doesn’t lead to (or preferably even spoon-feed) a conclusion then there is a high risk that the audience might feel that they wasted their time in looking at it.

In reality though, the black-and-white answer portrayed above is naturally a series of various shades of grey.

A slightly more refined answer

Two key considerations are paramount to deciding whether a particular viz has to have a “so what” to be valuable.

The audience

Please note that I write this from the perspective of visualisations aimed at communities that are not necessarily all data scientist type professionals. If your intended audience is a startup data company populated entirely by computer science PhDs who live and breath dataviz, then the answers may differ. But for most of us, hobbyists or pros, this is not the audience we have, or seek.

A rule of thumb here then might be:

  • If your audience consists entirely of other analysts, then no, it is not essential to have a “so what?” aspect to your viz. However under many circumstances it still would be extremely useful to do so.
  • If your audience includes non-analysts, particularly those people who might term themselves “busy executives” or claim that they “don’t need data to make decisions” (ugh) then it is in general absolutely essential that your viz points towards a “so what”, if a viz is indeed what you intend to deliver.

Why is it OK to lose the “so what” for analysts? Well, only because these people are probably very capable of using a well-designed viz to generate their own conclusions in an analytically safe way. It’s not that they don’t need a “so what”: they almost certainly do – it’s just that you can feel more secure that, whilst not producing it yourself, you can rely on them to do that aspect of the work properly.

They might even be better than you at interpreting the results, if for instance they have extensive subject domain knowledge that you don’t. Interpretation of data is almost always a mix of analytical proficiency and domain-specific knowledge.

Even the best technical analyst cannot have knowledge of all domains. This is why it’s generally not good to let a brand spanking new super-IQ multiple-PhD analyst join an existing company and sit on their own in a dark computer-filled room for a year before entering into discussion as to what kind of analysis you might be interested in to add maximum value to your world.

The lack of an explicit “so what?” ruins many great dashboards

I’m going to go a step further and say that in many cases – especially in non-data focussed organisations – “general” dashboards turn out to be not very useful.

This may be a controversial statement in a world where every analytical software provider sells fancy new ways to make dashboards, every consultant can build ten for you quicksmart, and every “stakeholder” falls over in amazement when they see that they can view and interact with several facets of data at once in a way that was never possible with their tedious plain .csv files.

But a pattern I have often seen is:

  1. Someone sees or suggests a fancy new tool that claims dashboarding as one of its abilities (and this is not to denigrate any tool; this happens plenty even with my favourite tool de jour, Tableau).
  2. A VIP loves the theoretical power of what they see and decides they need a dashboard “on sales” for example.
  3. A analyst happily creates a “sales dashboard” – usually based on what they think the VIP should probably want to see, given that “sales” is not a very fully fleshed out description of anything.
  4. The sponsor VIP is very happy and views it as soon as it’s accessible.
  5. They may even go and visit it every day for the first week, rejoicing in the up-to-date, integrated, comprehensive, colourful data. Joy!
  6. The administrator checks the server logs next month and realises that no-one in the entire company opened the sales dashboard since week 1.
  7. The analyst is sad.

Why? Everyone (sort of…arguably…) did their job. But, after the novelty wore off, the decision maker probably got bored or “too busy” to open the dashboard every day. At best, perhaps they ask an analytical type to monitor what’s going on with the dashboard. At worst, perhaps they go back to making up decisions based on the random decay of radioactive isotopes, or something similar.

They got “too busy” because, after they had waited for the  dashboard to load up, they’d see a few nice charts with interactive filters to go through in order to try and determine whether there was anything they should actually go and do in the real world based on what they showed.

Sales are a bit up in Canada vs yesterday, horray! Yesterday they were a bit down, boo! Do I need to do something about this? Who knows? Do I want to fiddle around with 50 variations of a chart to try work it out? No, it’s not my job and quite possibly I don’t have the time or expertise (and nor should I need it) to do that, sayeth the VIP.

So are dashboards useless? Of course not. But they have to be implemented with the reality of the audience capability, interest and use-case in mind. Most dashboards (at least those that are not solely for analysts to explore) should start with

  • At least 1 clear pre-defined question to address; and
  • 1 clear pre-defined action that might realistically take place based on the answer to the question.

But I don’t want a computer running my business!

Shouldn’t you check that it would definitely be a bad idea before saying that? 🙂

But seriously, the above is not to say one has necessarily to commit blindly to taking the pre-defined action – not every organisation is ready for, or suited to, prescriptive analytics.

However, if there is no way at all that an answer that a dashboard provides could possibly lead to influencing an action, then is it really worth one’s time working on it, at least in a business context?

  • “Sales dashboard” is not a question or an action.
  • “Am I getting fewer sales this year than last year?” is a question.
  • “If I am getting fewer sales this year then I will spend more on marketing this year” is an action.
  • “What form of marketing gave the best ROI last year?” is a question.
  • “If I need to do more marketing this year then I’ll advertise using the method that gave the greatest ROI last year” is an action.

The list of questions doesn’t need to be exhaustive, in fact it usually can’t be. If someone can use a dashboard to answer 100 questions not even imagined at the time of creation, then great. Indeed this is one of the potential strengths of a well-designed dashboard – but there should be at least 1 question in mind before it is created.

Why does checking my dashboard bore me?

Note that in that example above, the listed actions actually imply that the dashboard user is only interested in the results shown on the dashboard under one particular condition: if the sales this year are lower than last year.

For 99 days in a row they might check the dashboard and see that the sales are higher this year, and hence do nothing. On the 100th day, perhaps there was a dramatic fall, so that day is the day when the appropriate advertising action is considered.

However, consider how many people will actually persist in checking the dashboard for 100 days in a row when 99% of the time the check results in no new action.

I myself am obviously very analytically inclined, am happy to and (like to think) efficient at interpreting data, and yet even I have automated rules in my Outlook email client to immediately delete unread almost every “daily report” that gets emailed to me automatically (ssssh, don’t tell anyone, that’s just between you and I). Even the simple act of double-clicking to open the attachment is too much effort in comparison with the expected value of seeing the report contents on an average day.

In this sort of circumstance, what might enable a dashboard to be truly useful is the concept of alerting.

A possible use case is as follows:

  1. A sales dashboard aimed at answering the question of whether we are getting fewer sales this year is set up.
  2. Every day, alerting software routinely checks this data, and emails the VIP (only) if it shows that yes, sales have fallen. The email also provides a direct web link to the targeted sales dashboard.
  3. When the VIP receives this email, knowing that there is something “interesting” to see, they may well be concerned enough to open the dashboard and, to the best of their ability, use whatever context is available there to decide on their next action.
  4. If the information they need isn’t there, or they don’t have the time / expertise / inclination to interpret it, then of course they will legitimately request some more work from their analyst. But at least here we see that “data” provided a trigger that has alerted a relevant decision maker that they need to…make a decision, and made it easy for them to use the dashboard tool at their disposal specifically on the day that they are likely to gain value from doing so.
  5. Everyone is happy (well, except about the poor sales).

There is an implicit “so what” in the scenario above.

Main findings

  • Sales are lower than last year.
  • Last year, TV adverts produced tremendous ROI.

So what?

  • To make sure the sales keep growing, consider buying some advertising.
  • To be safest, use a method that was proven effective last year, TV.

But aren’t there some occasions that a “so what” isn’t needed?

Yes, rules of thumb have exceptions. There are some scenarios in which one might legitimately consider not producing an explicit “so what”.

Here are a few I could think of quickly.

1: Exploratory analysis: maybe you just got access to a dataset but you don’t really know what it contains, what the typical patterns and distributions are or its scope. Building a few visualisations on top of that is a great way for an analyst to get a quick understanding of the potential of what they have, and, later, what sort of “so what?” questions could potentially be asked.

2: Data quality testing: in a similar vein to the above, you can often use histograms, line charts and so on to get a quick idea of whether your data is complete and correct. If your viz shows that all your current customers were born in the 19th century then something is probably wrong.

3: Getting inspiration: got too much time on your hands and can’t think of some other work to do? (!!!) You could pick a dataset, or set of datasets, and spend some time graphing their attributes and looking for interesting patterns, outliers, and so on that could form the basis of interesting questions.

  • Why does x correlate with y?
  • Why is x look like a Gaussian distribution whereas y looks like a gamma distribution?
  • Why does store X sell the most of product Y?

This doesn’t have to be done on an individual basis. An interactive dataviz might be a great basis for a group brainstorming discussion, whether within a group of analysts or a far wider audience of interested parties..

4: Learning technical skills: perhaps you are trialling new analysis software or techniques, or trying to improve your existing skills. Working with data you’re already familiar with in new tools is a great way to learn them; perhaps even recreating something you did elsewhere if it’s relevant. The aim here is to increase your skillset, not derive new insights.

5: “How to” guides for others to follow: whether formal training or blog posts (showing fancy extreme edge cases others can marvel at perhaps?), maybe your emphasis is not on what the data actually contains in a subject domain sense, but rather a demonstration of how to use a certain generic analytical feature or technique. Here the data is just a placeholder to provide a practical example for others to follow.

6: You’re an artist: perhaps you’re not actually trying to use data as a tool to generate insight, but rather to create art. This is no lesser a task than classic data analysis, but it’s a very different one, with very different priorities. Think for example of Nathalie Miebach, whose website’s tagline is:

“translating science data into sculpture, installations and musical scores.”


This might be fine art, but it does not try to lead to business insight.

7: You want to focus on promoting your work and become famous :-): a controversial one perhaps; but it is not always plain old bar charts that happen to show the greatest insights that get shared around the land of Twitter, Facebook and other such attention-requesting mediums.

If your goal is generically to get “coverage” – perhaps to increase advertising revenue based on CPM or to become more well-known for your work – and you feel that you have to choose between generating a true insight and making something that looks highly attractive, then the latter might actually be a better bet.

But one should acknowledge what they’re doing; perhaps the skills you demonstrate in doing this are closer to the afore-mentioned “data artist” than “data analyst”.

I have a sneaking suspicion for instance that – not to re-raise a never-ending debate! – David McCandless’ books are probably picked up in higher volumes than Stephen Few’s when both are presented together in a bookshop.

  • McCandless’ “Information is Beautiful“, a series of pretty, sometimes fascinating, infographics, many of which have little in the way of conclusions, is currently rank #1788 in Amazon UK books.
  • Stephen Few’s “Show me the numbers“, a more hardcore text on best practice in presenting information, at #7952, with a cover consisting of very unglamourous bar and line charts.

This is not to compare one to the other in terms of worthwhileness; they are aimed at totally different audiences whose desire to have a book in the “data visualisation” category is motivated by very different reasons.

Even amongst the specialist dataviz-analyst community Tableau has, I note that around half of the visualisations that Tableau picks as their public “Viz of the day” are variations on geographical maps.

Geo-maps tend to look “fancier” and more enticing than bar charts, even though they are applicable only to analysis of a very specific type of data, and can provide only certain types of insights. For most organisations, whilst there is often relevance in geospatial analysis, I suspect that “geo-maps” analytics forms far less than 50% of total analytical output.

It’s therefore very unlikely that the winning “Viz of the day” entries actually reflect how Tableau is actually used most of the time. Hence you might conclude that, if you want to be in the running for that particular honour, you should bias your work towards visualisation with the sort of attention-grabbing graphics that maps often provide, irrespective of whether another form might generate a similar or stronger “so what?” output.

8: Regulatory / reporting requirements: in some circumstances you might be bound by regulation or other authority to produce certain analytical reports, irrespective of whether you think they add value or provide insight. Think for instance of the fields of accounting, for publicly traded companies, healthcare companies, investment products and so on.

9:Your job is explicitly, literally, to “visualise data”. It’s possible to imagine, perhaps in a large business department, employing someone whose job is to repeatedly convert data, for instance, from text tables into a best-practice chart form, without going further. It would be another person’s job to derive the “so what?”.

You could think of this as a horizontal slice of the analytics pipeline vs the “beginning-to-end” vertical pipeline. After all, analysts often rely on other people with different skills (eg. IT) to do a preparatory phase of data analysis, the data provision/manipulation itself (including extract, transform and load operations). They could also rely on people to do the conclusion-forming stage.

Many companies do seem to have a de-facto version of this setup by employing people to “create reports”. By this, they may mean something akin to blindly getting up-to-date data into a certain agreed template or dashboard format that managers are supposed to use to derive decisions from.

However, unless your managers happen to be keen analysts, or your organisation is extraordinarily predictable, then I tend to be concerned about the efficiency and reliability of this method for anything other than, for example, the regulatory purposes mentioned above. It’s hard to imagine someone else consistently gaining optimal insights from a chart they had no control over designing, without a large amount of overhead-inducing iteration between chart-creator and insight-finder. Let’s face it – most non-quant managers would prefer a bullet-point summary of findings rather than a 10 tab Excel workbook full of charts if they’re honest.

There may be many more such scenarios; do let me know in the comments!

Hang on, isn’t there a “so what” in some of the above?

Did you notice the semantic trickery in the above “no so what” viz reasons? In fact, most do either have an implicit “so what” or are simply facilitating the later creation of one.

Items 1, 2 & 3 could be considered as part of the data preparation phase of the analytics pipeline. It would be unlikely (and unwanted) for the products of them to be the end of the analysis. Almost certainly, they’re a step 1 to a further analysis. An implicit “so what” here is either that the data is safe to proceed with, or it is not.

The output of these approaches can also be useful for establishing baselines for metrics, even if this isn’t the intended use at this point. For instance, if your exploration reveals the average customer purchases £5 of products, this may be useful down the line to compare next year’s sales to. Did your later interventions improve sales or not?

Items 4 & 5 come down to being technical training for either yourself or for others. Once trained, you’re likely to be off analysing “so what?” scenarios next. If we’re looking to contrive a “so what?” here, it might be “so I am ready to put my skills to good use tackling real questions”.

Item 6 is quite unique. The data visualisation itself may never be useful as a “so what?” to anyone. It was never intended to be. It’s for a totally different audience who would no more ask “so what?” of data-inspired art than they would of Da Vinci’s ‘Mona Lisa‘.

Item 7 again might be considered data-use for the sake of something other than intrinsic “analysis”. This type of work might well have an explicit “so what?”, that could even be part of its allure. But it’s not the primary reason why it the visualisation was created, so it might not. Sometimes it could be considered a variant of #6 with a specific goal.

It may also be itself a tool that generates useful data. If viewcount is what is important to the creator, then they may be tracking that pageviews on their own “so what”-enabled dashboard in order to determine what sort of output creates the most value for them..

Item 8 and 9 are mid-parts of the analytics pipeline. Although you may not be explicitly defining a “so what”, you’re enabling someone else to come up with their own later.

For better or worse, mandatory reporting regulations are there for someone’s perceived reason. A chart of fund performance is supposed to be there to help inform potential clients whether they would like to invest or not, not simply to provide a nice curved shape.

And if your job is to create “standard” reports or charts, then almost certainly someone else is completing the later step of interpreting them to form their own “so what?”. Or at least they are supposed to be.

To conclude: (why) are we valuable?

Fiddling around with data may be somewhere between Big Bang level geekery and the sexiest job of the 21st century, and holds a personal fascination for some of us. But if we want someone to employ us to do it, or to add value in some other way to the world, we should remember why data as a vocation exists. For the average data analyst, it’s not to make a series of lines that look pleasant (although it’s always nice when that happens).

To quote the viz-lord Stephen Few, in his book “Now You See It“:

We analyse information not for its own sake but so we can make informed decisions. Good decisions are based on understanding. Analysis is performed to arrive at clear, accurate, relevant, and thorough understanding.

(my emphasis added)

Outside of that book, he uses the term “data sensemaking” frequently, which is a good description of what organisations tend to want from their data analysts, even if they don’t know to phrase it in that manner. It must be stressed again, many “busy execs” are far happier with a few bullet points or alerts on potential issues than a set of even the most beautiful, most best-practice, visualisations.

When one exists within the analyst community, it can be hard to remember that not everyone enjoys “data”. Even many of those who are intrigued may not yet have had the time, privilege or education that leads towards quick, accurate interpretation of data. It can be frustrating, or even impossible, for a non-quant specialist to try and understand the real-world implication of an abstract representation of some measure: they simply don’t want to, or can’t – and shouldn’t need to – hunt for their own takeaways in many cases.

When a crime is committed, we hope a professional detective will put together the clues and provide the real world interpretation that allows us to successfully confront the criminal in court. When non-trivial data appears, we should hope that a professional analyst is on hand to put together the data-clues and provides a real world interpretation that lets us successfully confront whatever issue is at hand.

Bonus addendum: some “so whats” are worse than no “so whats”

Before we go, there is perhaps one extra risk of “so whatting” a viz that should be considered. Producing a conclusion that could lead to action tends to necessitate taking a position; essentially you move from presenting a picture to arguing for the implications of what it shows.

Much data can provide multiple distinct answers to the same question if it is manipulated enough. There are indeed lies, damned lies and statistics, and dataviz inspired variations of all three.

If the analyst approaches the “so what?” aspect with bias, then human psychology is such that they may be inclined to provide an awesome conclusion that just coincidentally happens to match their pre-analysis viewpoint; c.f. “confirmation bias“. Of course, many organisations effectively employ people or subcontract out work to for this exact reason, but that is generally not an ethically fantastic, or professionally fulfilling, position (and whole other organisations exist to debunk such guff).

It’s pretty much impossible to provide even a basic chart without the risk of bias. Data analysis is surely part art, mixed amongst the maths and science – one can of course debate the precise split. But a data vizzer has inherently made some explicit choices: what source of data to use, how much data, which type of visualisation, which comparisons to make, the format of chart and much more – all of which can induce, consciously or not, bias to the audience.

Many best-practice “rules” of dataviz, and analytics in general, are in fact designed to reduce the risk of this. This is a very key reason as to why it’s worth learning them. Outside of those memorisable basics though, it’s often interesting to try and test the opposing view to what you’re presenting as your “so what?”.

Perhaps this year you have a higher proportion of female customers than last year. So what? “Our 10 year strategy to redesign our product to be especially attractive to women has been successful, we deserve a bonus”? Well, perhaps, but what if:

  • Last year had a weirdly low proportion of female purchasers vs normal and you’re just seeing basic regression to the mean?
  • Or, for the past 9 years the proportion of women buying your product has plummeted 10% every year; only to increase 2% in the latest year. Does that make your 10 year strategy a success?
  • Or this year was the first year you advertised in Cosmo, instead of FHM. Have other changes produced a variable that confounds your results?
  • Or men have stopped buying your product whilst women continue to buy it at the exactly same rate…does that count as success?

The right data displayed in the right way can help you eliminate or confirm these and other possibilities.

For any decision where the benefit likely outweighs the cost, it’s worth doing the exercise of disproving your first intuition in order to provide comfort that you are supporting the best quality of decision making; not to mention reducing the risk that some joker with half a spreadsheet invalidates your finely crafted interpretation of your charts.

The journey towards creating a Tableau Web Data Connector

I recently watched a video by Tableau/Alteryx/general guru Chris Love in which he emphasised the importance for people to not only retrospectively present their mindblowing successes at cajoling Tableau into running the universe – but also the journey leading up to that point, including any outright failures. After all, there is no shame in failing to do what one sets out to achieve; instead, what would actually be detrimental would be to refuse to acknowledge and learn from failures.

With that in mind, I considered what features of Tableau I was not at all familiar with, might be of future benefit to me, and are likely complex enough to require a “journey” rather than a double-click.

I settled on web data connectors. This a feature I know of, but never encountered either as an author or a user. I don’t have a desperate need for them right now in my day-to-day activities, but I can imagine many useful scenarios and have at least some intellectual curiosity about how they work. So here goes with post #1 on “how do I learn how to make one?”.

What do I think a Tableau web data connector is?

I was lucky enough to go to the Tableau Conference in 2014, where (I think) the concept was announced. From that, my current belief is that a web data connector is a feature of Tableau whereby you can have it directly access structured data from other websites that have a usable API.

Fitness is a common example usecase. The likes of record and analyse your step count for instance, and show you a few nice charts – but they also offer an API through which you can export your data out of their website and into your analytical software to do your own custom analysis, perhaps using different techniques or adding more external data to get further insight from all the juicy information they store on ones’ activity (or lack of).

A web data connector for a site then would let you access whatever data it makes available to you in a way suitable for immediate consumption in Tableau, avoiding the need to manually export and rebuild it into a local database and so on before analysis. Hopefully it will also allow your Tableau analysis to update when new info is in the datasource.

What do I think I probably need to create a Tableau web data connector?

Web data connectors were a feature of Tableau 9.1, which I already have installed.

From my memory of the Tableau conference, I believe that web data connectors are written in Javascript, so a good working knowledge of that language is probably useful. I can’t say I have this…many years ago I did dabble in a little Javascript as it happens, as I wanted to make a matched betting calculator for a website I had to enable people to make lots of free money from bookies (a topic for another day, but yes, there is a genuine opportunity to make lots of $$$ sitting in front of the internet every day, although that sentence couldn’t sound any scammier I know!).

However that was a long time ago and I can’t really remember anything about it. Except that it’s basically a case of writing code in a text file, and I already have Notepad++ installed which I know supports syntax highlighting of Javascript – so that’s probably an acceptable tool to help me write it in.

I also know Tableau has famously good help pages on their website, so am pretty confident I can look there to learn how to do it.

I also know that many people in the Tableau community, most notably probably the Information Lab, have kindly released examples of web data connectors that one can perhaps dissect to see what it’s all about.

What API shall I try and connect to?

My criteria in picking an API that may or may not work with this were that it should:

  • support XML or JSON, which I guess are the formats that Tableau web connectors most likely use.
  • be free to access; I’m feeling no urge to invest financially for a first attempt.
  • not require logging into with a username/password – as a) I suspect logging into an API needing authentication adds complexity, and b) should I develop something that vaguely works, I don’t really want to be responsible for anything “security-wise” for my first attempt.
  • be of a subject matter I find interesting, as I have no immediate need in my career or life to do this, so it’ll have to be vaguely mentally stimulating in order for me to bother having a go at it for any length of time.

So, the winner, for now anyway, is the BoardGamesGeek API.

For the non-initiated, BoardGameGeek is a website visited by, well, the clue is in the name, and I’m afraid I am rather the target audience – although I do my level best these days to keep my habit under control. Gone are the days when the box collection was so big that it had to be stored in a cardboard mountain on the living room floor as no shelving storage system could realistically hold it. I remain, however, extraordinarily interested in the topic, even having got rid of most of them (at least as long as you don’t poke around my phone too much…believe it or not you can have a passable game of Agricola et al on an iPhone).

Back to the subject at hand: one feature of BoardGameGeek is that users can record a list of which boardgames they own. This is publicly accessible info on the normal web; see my now-minuscule collection now here.

They also offer this information via an API, which is documented here. Here for instance is the API call needed to return the info in the above boardgamegeek collection in XML:

Sometimes it takes a couple of tries to return actual data, but when it does, you get something starting like this:


Quite clearly there’s some potentially interesting data available here, but as it stands, it’s not going to slot into Tableau nicely for analysis at all without some preprocessing by other tools first. But who wants to have to load TWO analytics programs when one will do?

Therefore, I would like to build a web data connector to allow Tableau to connect to it directly nice and live.

Next steps I have in mind

  1. Try using someone else’s web data connector so I can see what they look like in practice.
  2. Go on Tableau’s site and see what instructions are available about building them.
  3. ????
  4. PROFIT! (yes, just like in South Park).