You made a chart. So what?

In the latest fascinating Periscope video from Chris Love, the conversation centred around a question that can be summarised as “Do data visualisations need a ‘so what’?“.

There are many ways of rephrasing this: one could ask whether it is (always) the responsibility of the viz author to highlight the story that their visualisations show? Or can a data visualisation be truly worthy of high merit even if it doesn’t lead the viewer to a conclusion?

This topic resonates strongly with me: part of my day job involves maintaining a reference library of the results from the analytical research or investigation we do. We publish this widely within our organisation, so that any employee who has cause or interest in what we found in the past can help themselves to the results. The title we happened to give the library is “So what?“.

Although the detailed results of our work may be reported in many different formats, each library entry has a templated front page that includes the same sections for each study:

  1. The title of the work.
  2. The question that the work intended to address.
  3. A summary of the scope and dataset that went into the analysis.
  4. A list of the main findings.
  5. And finally, the all-important “So what?” section.

Note the distinction between findings (e.g. “customers who don’t buy anything for 50 days are likely to never buy anything again”) and the so what (“we recommend you call the customer when it has been 40 days since you saw them if you wish to increase your sales by 10%”).

The simple answer

With the above in mind, my position is probably quite obvious. If you are going to demand a binary yes/no answer as to whether a data visualisation should have a “so what?”, then my simplistic generalization would be that the answer is yes, it should.

Most of the time, especially in a business context, the main intention behind the whole analytics pipeline is to provide some nugget of information that will lead to a specific decision or action being taken. If the data visualisation doesn’t lead to (or preferably even spoon-feed) a conclusion then there is a high risk that the audience might feel that they wasted their time in looking at it.

In reality though, the black-and-white answer portrayed above is naturally a series of various shades of grey.

A slightly more refined answer

Two key considerations are paramount to deciding whether a particular viz has to have a “so what” to be valuable.

The audience

Please note that I write this from the perspective of visualisations aimed at communities that are not necessarily all data scientist type professionals. If your intended audience is a startup data company populated entirely by computer science PhDs who live and breath dataviz, then the answers may differ. But for most of us, hobbyists or pros, this is not the audience we have, or seek.

A rule of thumb here then might be:

  • If your audience consists entirely of other analysts, then no, it is not essential to have a “so what?” aspect to your viz. However under many circumstances it still would be extremely useful to do so.
  • If your audience includes non-analysts, particularly those people who might term themselves “busy executives” or claim that they “don’t need data to make decisions” (ugh) then it is in general absolutely essential that your viz points towards a “so what”, if a viz is indeed what you intend to deliver.

Why is it OK to lose the “so what” for analysts? Well, only because these people are probably very capable of using a well-designed viz to generate their own conclusions in an analytically safe way. It’s not that they don’t need a “so what”: they almost certainly do – it’s just that you can feel more secure that, whilst not producing it yourself, you can rely on them to do that aspect of the work properly.

They might even be better than you at interpreting the results, if for instance they have extensive subject domain knowledge that you don’t. Interpretation of data is almost always a mix of analytical proficiency and domain-specific knowledge.

Even the best technical analyst cannot have knowledge of all domains. This is why it’s generally not good to let a brand spanking new super-IQ multiple-PhD analyst join an existing company and sit on their own in a dark computer-filled room for a year before entering into discussion as to what kind of analysis you might be interested in to add maximum value to your world.

The lack of an explicit “so what?” ruins many great dashboards

I’m going to go a step further and say that in many cases – especially in non-data focussed organisations – “general” dashboards turn out to be not very useful.

This may be a controversial statement in a world where every analytical software provider sells fancy new ways to make dashboards, every consultant can build ten for you quicksmart, and every “stakeholder” falls over in amazement when they see that they can view and interact with several facets of data at once in a way that was never possible with their tedious plain .csv files.

But a pattern I have often seen is:

  1. Someone sees or suggests a fancy new tool that claims dashboarding as one of its abilities (and this is not to denigrate any tool; this happens plenty even with my favourite tool de jour, Tableau).
  2. A VIP loves the theoretical power of what they see and decides they need a dashboard “on sales” for example.
  3. A analyst happily creates a “sales dashboard” – usually based on what they think the VIP should probably want to see, given that “sales” is not a very fully fleshed out description of anything.
  4. The sponsor VIP is very happy and views it as soon as it’s accessible.
  5. They may even go and visit it every day for the first week, rejoicing in the up-to-date, integrated, comprehensive, colourful data. Joy!
  6. The administrator checks the server logs next month and realises that no-one in the entire company opened the sales dashboard since week 1.
  7. The analyst is sad.

Why? Everyone (sort of…arguably…) did their job. But, after the novelty wore off, the decision maker probably got bored or “too busy” to open the dashboard every day. At best, perhaps they ask an analytical type to monitor what’s going on with the dashboard. At worst, perhaps they go back to making up decisions based on the random decay of radioactive isotopes, or something similar.

They got “too busy” because, after they had waited for the  dashboard to load up, they’d see a few nice charts with interactive filters to go through in order to try and determine whether there was anything they should actually go and do in the real world based on what they showed.

Sales are a bit up in Canada vs yesterday, horray! Yesterday they were a bit down, boo! Do I need to do something about this? Who knows? Do I want to fiddle around with 50 variations of a chart to try work it out? No, it’s not my job and quite possibly I don’t have the time or expertise (and nor should I need it) to do that, sayeth the VIP.

So are dashboards useless? Of course not. But they have to be implemented with the reality of the audience capability, interest and use-case in mind. Most dashboards (at least those that are not solely for analysts to explore) should start with

  • At least 1 clear pre-defined question to address; and
  • 1 clear pre-defined action that might realistically take place based on the answer to the question.

But I don’t want a computer running my business!

Shouldn’t you check that it would definitely be a bad idea before saying that? 🙂

But seriously, the above is not to say one has necessarily to commit blindly to taking the pre-defined action – not every organisation is ready for, or suited to, prescriptive analytics.

However, if there is no way at all that an answer that a dashboard provides could possibly lead to influencing an action, then is it really worth one’s time working on it, at least in a business context?

  • “Sales dashboard” is not a question or an action.
  • “Am I getting fewer sales this year than last year?” is a question.
  • “If I am getting fewer sales this year then I will spend more on marketing this year” is an action.
  • “What form of marketing gave the best ROI last year?” is a question.
  • “If I need to do more marketing this year then I’ll advertise using the method that gave the greatest ROI last year” is an action.

The list of questions doesn’t need to be exhaustive, in fact it usually can’t be. If someone can use a dashboard to answer 100 questions not even imagined at the time of creation, then great. Indeed this is one of the potential strengths of a well-designed dashboard – but there should be at least 1 question in mind before it is created.

Why does checking my dashboard bore me?

Note that in that example above, the listed actions actually imply that the dashboard user is only interested in the results shown on the dashboard under one particular condition: if the sales this year are lower than last year.

For 99 days in a row they might check the dashboard and see that the sales are higher this year, and hence do nothing. On the 100th day, perhaps there was a dramatic fall, so that day is the day when the appropriate advertising action is considered.

However, consider how many people will actually persist in checking the dashboard for 100 days in a row when 99% of the time the check results in no new action.

I myself am obviously very analytically inclined, am happy to and (like to think) efficient at interpreting data, and yet even I have automated rules in my Outlook email client to immediately delete unread almost every “daily report” that gets emailed to me automatically (ssssh, don’t tell anyone, that’s just between you and I). Even the simple act of double-clicking to open the attachment is too much effort in comparison with the expected value of seeing the report contents on an average day.

In this sort of circumstance, what might enable a dashboard to be truly useful is the concept of alerting.

A possible use case is as follows:

  1. A sales dashboard aimed at answering the question of whether we are getting fewer sales this year is set up.
  2. Every day, alerting software routinely checks this data, and emails the VIP (only) if it shows that yes, sales have fallen. The email also provides a direct web link to the targeted sales dashboard.
  3. When the VIP receives this email, knowing that there is something “interesting” to see, they may well be concerned enough to open the dashboard and, to the best of their ability, use whatever context is available there to decide on their next action.
  4. If the information they need isn’t there, or they don’t have the time / expertise / inclination to interpret it, then of course they will legitimately request some more work from their analyst. But at least here we see that “data” provided a trigger that has alerted a relevant decision maker that they need to…make a decision, and made it easy for them to use the dashboard tool at their disposal specifically on the day that they are likely to gain value from doing so.
  5. Everyone is happy (well, except about the poor sales).

There is an implicit “so what” in the scenario above.

Main findings

  • Sales are lower than last year.
  • Last year, TV adverts produced tremendous ROI.

So what?

  • To make sure the sales keep growing, consider buying some advertising.
  • To be safest, use a method that was proven effective last year, TV.

But aren’t there some occasions that a “so what” isn’t needed?

Yes, rules of thumb have exceptions. There are some scenarios in which one might legitimately consider not producing an explicit “so what”.

Here are a few I could think of quickly.

1: Exploratory analysis: maybe you just got access to a dataset but you don’t really know what it contains, what the typical patterns and distributions are or its scope. Building a few visualisations on top of that is a great way for an analyst to get a quick understanding of the potential of what they have, and, later, what sort of “so what?” questions could potentially be asked.

2: Data quality testing: in a similar vein to the above, you can often use histograms, line charts and so on to get a quick idea of whether your data is complete and correct. If your viz shows that all your current customers were born in the 19th century then something is probably wrong.

3: Getting inspiration: got too much time on your hands and can’t think of some other work to do? (!!!) You could pick a dataset, or set of datasets, and spend some time graphing their attributes and looking for interesting patterns, outliers, and so on that could form the basis of interesting questions.

  • Why does x correlate with y?
  • Why is x look like a Gaussian distribution whereas y looks like a gamma distribution?
  • Why does store X sell the most of product Y?

This doesn’t have to be done on an individual basis. An interactive dataviz might be a great basis for a group brainstorming discussion, whether within a group of analysts or a far wider audience of interested parties..

4: Learning technical skills: perhaps you are trialling new analysis software or techniques, or trying to improve your existing skills. Working with data you’re already familiar with in new tools is a great way to learn them; perhaps even recreating something you did elsewhere if it’s relevant. The aim here is to increase your skillset, not derive new insights.

5: “How to” guides for others to follow: whether formal training or blog posts (showing fancy extreme edge cases others can marvel at perhaps?), maybe your emphasis is not on what the data actually contains in a subject domain sense, but rather a demonstration of how to use a certain generic analytical feature or technique. Here the data is just a placeholder to provide a practical example for others to follow.

6: You’re an artist: perhaps you’re not actually trying to use data as a tool to generate insight, but rather to create art. This is no lesser a task than classic data analysis, but it’s a very different one, with very different priorities. Think for example of Nathalie Miebach, whose website’s tagline is:

“translating science data into sculpture, installations and musical scores.”

miebachart

This might be fine art, but it does not try to lead to business insight.

7: You want to focus on promoting your work and become famous :-): a controversial one perhaps; but it is not always plain old bar charts that happen to show the greatest insights that get shared around the land of Twitter, Facebook and other such attention-requesting mediums.

If your goal is generically to get “coverage” – perhaps to increase advertising revenue based on CPM or to become more well-known for your work – and you feel that you have to choose between generating a true insight and making something that looks highly attractive, then the latter might actually be a better bet.

But one should acknowledge what they’re doing; perhaps the skills you demonstrate in doing this are closer to the afore-mentioned “data artist” than “data analyst”.

I have a sneaking suspicion for instance that – not to re-raise a never-ending debate! – David McCandless’ books are probably picked up in higher volumes than Stephen Few’s when both are presented together in a bookshop.

  • McCandless’ “Information is Beautiful“, a series of pretty, sometimes fascinating, infographics, many of which have little in the way of conclusions, is currently rank #1788 in Amazon UK books.
  • Stephen Few’s “Show me the numbers“, a more hardcore text on best practice in presenting information, at #7952, with a cover consisting of very unglamourous bar and line charts.

This is not to compare one to the other in terms of worthwhileness; they are aimed at totally different audiences whose desire to have a book in the “data visualisation” category is motivated by very different reasons.

Even amongst the specialist dataviz-analyst community Tableau has, I note that around half of the visualisations that Tableau picks as their public “Viz of the day” are variations on geographical maps.

Geo-maps tend to look “fancier” and more enticing than bar charts, even though they are applicable only to analysis of a very specific type of data, and can provide only certain types of insights. For most organisations, whilst there is often relevance in geospatial analysis, I suspect that “geo-maps” analytics forms far less than 50% of total analytical output.

It’s therefore very unlikely that the winning “Viz of the day” entries actually reflect how Tableau is actually used most of the time. Hence you might conclude that, if you want to be in the running for that particular honour, you should bias your work towards visualisation with the sort of attention-grabbing graphics that maps often provide, irrespective of whether another form might generate a similar or stronger “so what?” output.

8: Regulatory / reporting requirements: in some circumstances you might be bound by regulation or other authority to produce certain analytical reports, irrespective of whether you think they add value or provide insight. Think for instance of the fields of accounting, for publicly traded companies, healthcare companies, investment products and so on.

9:Your job is explicitly, literally, to “visualise data”. It’s possible to imagine, perhaps in a large business department, employing someone whose job is to repeatedly convert data, for instance, from text tables into a best-practice chart form, without going further. It would be another person’s job to derive the “so what?”.

You could think of this as a horizontal slice of the analytics pipeline vs the “beginning-to-end” vertical pipeline. After all, analysts often rely on other people with different skills (eg. IT) to do a preparatory phase of data analysis, the data provision/manipulation itself (including extract, transform and load operations). They could also rely on people to do the conclusion-forming stage.

Many companies do seem to have a de-facto version of this setup by employing people to “create reports”. By this, they may mean something akin to blindly getting up-to-date data into a certain agreed template or dashboard format that managers are supposed to use to derive decisions from.

However, unless your managers happen to be keen analysts, or your organisation is extraordinarily predictable, then I tend to be concerned about the efficiency and reliability of this method for anything other than, for example, the regulatory purposes mentioned above. It’s hard to imagine someone else consistently gaining optimal insights from a chart they had no control over designing, without a large amount of overhead-inducing iteration between chart-creator and insight-finder. Let’s face it – most non-quant managers would prefer a bullet-point summary of findings rather than a 10 tab Excel workbook full of charts if they’re honest.

There may be many more such scenarios; do let me know in the comments!

Hang on, isn’t there a “so what” in some of the above?

Did you notice the semantic trickery in the above “no so what” viz reasons? In fact, most do either have an implicit “so what” or are simply facilitating the later creation of one.

Items 1, 2 & 3 could be considered as part of the data preparation phase of the analytics pipeline. It would be unlikely (and unwanted) for the products of them to be the end of the analysis. Almost certainly, they’re a step 1 to a further analysis. An implicit “so what” here is either that the data is safe to proceed with, or it is not.

The output of these approaches can also be useful for establishing baselines for metrics, even if this isn’t the intended use at this point. For instance, if your exploration reveals the average customer purchases £5 of products, this may be useful down the line to compare next year’s sales to. Did your later interventions improve sales or not?

Items 4 & 5 come down to being technical training for either yourself or for others. Once trained, you’re likely to be off analysing “so what?” scenarios next. If we’re looking to contrive a “so what?” here, it might be “so I am ready to put my skills to good use tackling real questions”.

Item 6 is quite unique. The data visualisation itself may never be useful as a “so what?” to anyone. It was never intended to be. It’s for a totally different audience who would no more ask “so what?” of data-inspired art than they would of Da Vinci’s ‘Mona Lisa‘.

Item 7 again might be considered data-use for the sake of something other than intrinsic “analysis”. This type of work might well have an explicit “so what?”, that could even be part of its allure. But it’s not the primary reason why it the visualisation was created, so it might not. Sometimes it could be considered a variant of #6 with a specific goal.

It may also be itself a tool that generates useful data. If viewcount is what is important to the creator, then they may be tracking that pageviews on their own “so what”-enabled dashboard in order to determine what sort of output creates the most value for them..

Item 8 and 9 are mid-parts of the analytics pipeline. Although you may not be explicitly defining a “so what”, you’re enabling someone else to come up with their own later.

For better or worse, mandatory reporting regulations are there for someone’s perceived reason. A chart of fund performance is supposed to be there to help inform potential clients whether they would like to invest or not, not simply to provide a nice curved shape.

And if your job is to create “standard” reports or charts, then almost certainly someone else is completing the later step of interpreting them to form their own “so what?”. Or at least they are supposed to be.

To conclude: (why) are we valuable?

Fiddling around with data may be somewhere between Big Bang level geekery and the sexiest job of the 21st century, and holds a personal fascination for some of us. But if we want someone to employ us to do it, or to add value in some other way to the world, we should remember why data as a vocation exists. For the average data analyst, it’s not to make a series of lines that look pleasant (although it’s always nice when that happens).

To quote the viz-lord Stephen Few, in his book “Now You See It“:

We analyse information not for its own sake but so we can make informed decisions. Good decisions are based on understanding. Analysis is performed to arrive at clear, accurate, relevant, and thorough understanding.

(my emphasis added)

Outside of that book, he uses the term “data sensemaking” frequently, which is a good description of what organisations tend to want from their data analysts, even if they don’t know to phrase it in that manner. It must be stressed again, many “busy execs” are far happier with a few bullet points or alerts on potential issues than a set of even the most beautiful, most best-practice, visualisations.

When one exists within the analyst community, it can be hard to remember that not everyone enjoys “data”. Even many of those who are intrigued may not yet have had the time, privilege or education that leads towards quick, accurate interpretation of data. It can be frustrating, or even impossible, for a non-quant specialist to try and understand the real-world implication of an abstract representation of some measure: they simply don’t want to, or can’t – and shouldn’t need to – hunt for their own takeaways in many cases.

When a crime is committed, we hope a professional detective will put together the clues and provide the real world interpretation that allows us to successfully confront the criminal in court. When non-trivial data appears, we should hope that a professional analyst is on hand to put together the data-clues and provides a real world interpretation that lets us successfully confront whatever issue is at hand.

Bonus addendum: some “so whats” are worse than no “so whats”

Before we go, there is perhaps one extra risk of “so whatting” a viz that should be considered. Producing a conclusion that could lead to action tends to necessitate taking a position; essentially you move from presenting a picture to arguing for the implications of what it shows.

Much data can provide multiple distinct answers to the same question if it is manipulated enough. There are indeed lies, damned lies and statistics, and dataviz inspired variations of all three.

If the analyst approaches the “so what?” aspect with bias, then human psychology is such that they may be inclined to provide an awesome conclusion that just coincidentally happens to match their pre-analysis viewpoint; c.f. “confirmation bias“. Of course, many organisations effectively employ people or subcontract out work to for this exact reason, but that is generally not an ethically fantastic, or professionally fulfilling, position (and whole other organisations exist to debunk such guff).

It’s pretty much impossible to provide even a basic chart without the risk of bias. Data analysis is surely part art, mixed amongst the maths and science – one can of course debate the precise split. But a data vizzer has inherently made some explicit choices: what source of data to use, how much data, which type of visualisation, which comparisons to make, the format of chart and much more – all of which can induce, consciously or not, bias to the audience.

Many best-practice “rules” of dataviz, and analytics in general, are in fact designed to reduce the risk of this. This is a very key reason as to why it’s worth learning them. Outside of those memorisable basics though, it’s often interesting to try and test the opposing view to what you’re presenting as your “so what?”.

Perhaps this year you have a higher proportion of female customers than last year. So what? “Our 10 year strategy to redesign our product to be especially attractive to women has been successful, we deserve a bonus”? Well, perhaps, but what if:

  • Last year had a weirdly low proportion of female purchasers vs normal and you’re just seeing basic regression to the mean?
  • Or, for the past 9 years the proportion of women buying your product has plummeted 10% every year; only to increase 2% in the latest year. Does that make your 10 year strategy a success?
  • Or this year was the first year you advertised in Cosmo, instead of FHM. Have other changes produced a variable that confounds your results?
  • Or men have stopped buying your product whilst women continue to buy it at the exactly same rate…does that count as success?

The right data displayed in the right way can help you eliminate or confirm these and other possibilities.

For any decision where the benefit likely outweighs the cost, it’s worth doing the exercise of disproving your first intuition in order to provide comfort that you are supporting the best quality of decision making; not to mention reducing the risk that some joker with half a spreadsheet invalidates your finely crafted interpretation of your charts.

Advertisements