5 Power BI features that might make Tableau users a little jealous

New year, new blog post, new tool version to play with! It’s clear that the field of data-related stuff progresses extremely rapidly at present, and hence it behoves those of us of an analyst bent to, now and then, go explore tools that we don’t use day-to-day. We may already have our favourites in each category, but, unless we’ve done a recent review, it’s quite possible the lesser-loved packages have developed a whole new bunch of goodies since the last checkup.

With that in mind, I’ve taken a look at the latest version of Microsoft Power BI. It’s billed in this manner by its creators:

Power BI transforms your company’s data into rich visuals for you to collect and organize so you can focus on what matters to you.

It’s therefore an obvious competitor for software like Tableau, Qlikview, chart.io, and many others, and largely can replace Microsoft’s previous PowerView offering, which was accessed directly via Excel. In a similar way to the Tableau suite, there’s a Power BI desktop package that analysts install locally on their computer primarily to manipulate data and construct visuals, and a web-based Power BI service that allows for publication and distribution of the resulting file. Actually the online service is pretty powerful in terms of allowing you to create reports and dashboards via the web, and includes a few other nifty features designed to improve the usability of this software genre – so even some analysts might get a lot out of the web-based version alone.

A lot of Power BI is actually free of charge to use, although there is an enhanced “Pro” edition at around US$10 a month, replete with plenty of more enterprisey features as you can see on their comparison chart. If you’re working somewhere with an Office 365 subscription, you might find you already have access to Power BI, even if you didn’t know about it. So, there’s not much to stop you having a play with it if you’re even remotely interested.

Anyhow, this post is not to review Power BI overall, but rather to point out 5 features that stood out to me as not being present in my current dataviz software of choice, Tableau. These therefore aren’t necessarily the general “5 best features of Power BI” – both Tableau and Power BI can create a pretty line chart, so it’s not really worth pointing that out in this context. My choices should then really be considered from the context of someone already deeply familiar with what Tableau or other competitors already offer.

Also note that software packages aren’t supposed to be feature-identical; many programs aimed at solving the same sort of problems may be completely different in their philosophy of design. Adding some features necessitates a cost in terms of whether other features can be supported. This then is not a request to Tableau and competitors to copy these features. But I do vehemently think it’s useful for day-to-day data practitioners to remain aware of what software features are out there in the wild today, just case it gives you a better option to solve a particular problem you encounter one day.

As a spoiler: for what it’s worth, my dive into Power BI hasn’t resulted in me throwing my lovely copy of Tableau away, not a chance; you can pry that from my cold dead hands etc. There’s a certain fluidity in Tableau, especially when used for adhoc analysis, that I’ve not yet encountered in its more obvious competitors, which seems very conducive to digging for insights.

But it has led me to believe that the Microsoft offering has improved substantially since the time years ago I used to battle against v1 PowerPivot (which itself was great for some specific data manipulation activities…but eventually I got tired of the out-of-memory errors!). And, especially due to the way its licensed – to be blunt, far cheaper than Tableau for some configurations – it’ll remain in my mind when considering tools for future projects.

So, in no particular order, here’s some bits and pieces that piqued my curiosity:

1: Focus mode

Let’s start with a simple one. Dashboards typically contain several charts or tables that are designed to provide insight upon a given topic. Ideally the combination of content that makes up a dashboard should usually fit on a single screen, and an overall impression of “is it good or bad news?” should be available at a glance.

In designing dashboards, especially those that are useful for multiple audiences, there’s often therefore a tension between providing enough visualisations such that every user has the information they need, vs making the screen so cluttered or hard to navigate through that no user enjoys the experience of trying to decipher 1-inch square charts whatsoever.

For cases where a particular chart on a dashboard is of interest to a user, Power BI has a “focus” mode that allows the observer to zoom in and interact with that single chart on a dashboard or report on a near-fullscreen basis, without requiring any extra development work on the part of the analyst.

It’s a simple enough concept – the user just clicks a button on whichever visualisation they’re interested in, and it zooms in to fill up most of the screen until they click out of it. It keeps its original interactivity, plus displays some extra meta-information that might be useful (last refresh time etc.). But the main point is it becomes big enough to potentially help generate deeper insights for a particularly interested end user in a way that a little 1 inch square chart shoved at the bottom of a dashboard might struggle to do, even if the 1 inch version is more appropriate for the average dashboard viewer.

If that description isn’t clear, then it’s probably better seen in video form. For example:

 

2: Data driven alerts

Regular readers might have established that I’m a big fan of alerting, when it comes to trying to promote data driven decision making. I’m fairly convinced that many dashboards come with a form of “engagement decay”, where the stakeholder is initially obsessively excited with their ability to access data. But as time goes on they get quite bored of checking to see if everything’s OK – especially if everything usually is OK – and hence stop taking the time to consult a potentially valuable source of decision making.

So, for these types of busy execs, and anyone else wanting to optimise productivity, I like alerts. Just have the dashboard send some sort of notification whenever there’s actually something “interesting” to see.

Sure enough, Power BI has the capacity to alert the user upon certain KPI events, via its own web-based notification centre or, more usefully, email or phone app.

powerbialert

The implementation is pretty simple and somewhat restrictive at the moment. Alerts can only be set up on “numeric tiles featuring cards, KPIs, and gauges”, the alert triggers are basic above X or below X type affairs, and you’re restricted to being alerted once an hour or once a day. So there’s a lot of potential room for development – I’d like to see statistical triggers for instance – “alert me if something unusual happens”.

The good news for Tableau users is that Tableau has promised a similar feature will be coming to their software in the future (and to some extent an analyst can create similar functionality event now with the “don’t send email if view is empty” option recently added). But if you want a nice simple “send me an email whenever my sales drop below £10,000” feature that non-analytical folks can easily use, then Power BI can do that right now.

3: Custom visualisations

All mainstream dataviz products should be able to squeeze out the tried-and-tested basic varieties of visuals; line chart, bar chat, scatterplot et al. And >= 90% of the time this is often enough, in fact usually the best approach for clarity. But sometimes, for better or worse, that’s not sufficient for certain use-cases. You can see this tension surfacing within the Tableau community where, despite the large number of proven chart types it can handle,  there are even larger number of blogs, references documents et al. as to what form one has to coerce your data into order to simulate more esoteric visualisation types within software that has not been natively designed to produce them.

A couple of common examples in recent times would include Sankey charts or hexagonal binning. Yes, you can construct these types of viz in Tableau and other competing products – but it requires a bit of workaroundy pre-work, and entirely interrupts the naturalistic method of exploring data that these tools seek to provide. For example, an average user wishing to construct a Sankey chart in Tableau, may want to search out and thoroughly read one or many of a profusion of useful posts, including those here, here, here, and here and several more places throughout the wilds of the web.

It’s very cool that these resources exist – but imagine if instead of having to rely on researching and recreating clever people’s ingenious workarounds, an expert could just provide a one-click solution to your problem. Or you could share your genius more directly with your peers.

Power BI presents an API where an advanced user can create their own visualisation types. These then integrate within Power BI’s toolbox, as though Microsoft had provided them in the base package. Hence data vizzers of all skill levels can use that type of visual without the need for any programming or mathematical workarounds. It should be noted that the procedure for creating these does require learning a superset of Javascript called Typescript, which would certainly not be expected of most Power BI audiences.

But this barrier is alleviated via the existence of a public gallery of these visualisations that Microsoft maintains, which allows generous developers to share their creations world-wide. A Power BI user wouldn’t have to think about the mathematical properties underyling a Sankey plot – they could just download a Sankey chart type addin such as this one.

sankey.PNG

Now, this open access does introduce some risks of course. Thanks to Spiderman, we all know what great power comes with. And even on the public custom visuals gallery, you’ll see some entries that, well, let’s say Stephen Few might object to.

pyramid

Bonus feature: you can also display native R graphics in your Power BI dashboard, with some limitations.

4: “Pin anything to  dashboard” for non-analyst end users

To understand this one, you need to know something about the Power BI object types. Simply that a “report” is made out of a “dataset”, and a “dashboard” is usually, but not exclusively, made out of components of reports*. A dataviz expert can publish any combination of those (or even publish a mixed set of them as a content pack, which any interested users can download to use with a few clicks – another potentially nifty idea!).

(* Tableau users – you can then think of a report as a worksheet, but a worksheet that can support multiple vizzes with arbitrary placement.)

Reports are what they sound like; the electronic equivalent of a notebook with between zero and many data visualisations on each page concerning a particular topic. Note though an important limitation of being restricted to a single datasource per report. In Power BI you create reports with the simple drag and drop of charting components and configurations, after selecting the appropriate datasource. Charts stick around, in interactive form, wherever you drag them to, almost  as though you were making a Powerpoint slide. No “containers” needed, Tableau-fans 🙂

Dashboards however have a more fixed format; always appearing as though they were a set of tiles, each with a different item in. There’s no restriction on data sources, but some restrictions on functionality; such as no-cross filtering between independent tiles. A dashboard tile can be any viz from any report, a whole report itself (which can then cross-filter within the scope of the report) or some miscellaneous other stuff including “live” Excel workbooks, static images, and even answers to natural language questions you may have asked in the fancy Q&A functionality (“what were our sales last month?”).

So, what’s this about non-analysts? Well, a difference between Power BI dashboards and those from some other tools is that even people considered as as being solely viz consumers can legitimately create their own dashboards. A non-analytical end-user can choose to pin any individual chart from any individual report (or the other types of items listed above) to a new dashboard and hence create a smorgasbord showing exactly the parts of each report / pre-made dashboard they are actually interested in all on one page. After all, the individual viz consumer is by definition best placed to know what’s most important to them.

Here’s what that looks like in reality:

This is perhaps one approach to solving the problem that often in reality the analyst is designing a dashboard for an multi-person audience, within which each individual has slightly different needs. Each user might be interested in a different 3 of the 5 charts in your dashboard. Here, each user could then choose to pin their favourite 3 to their own start up page, or any other dashboard they have control over, together with their favourite data table from another report and most loved Excel workbook, if they insist.

How this actually plays out in practice with novice users would be interesting to see. I think a certain type of non-analyst power user would find this pretty useful, and it’s a more realistic a concept of “even non-analysts can make dashboards with no training” than a lot of these types of tools foolishly promise.

5: More powerful data manipulation tools

This one is more for advanced users. Power BI lets you manipulate the data (you might even say business-user “ETL”) before you start employing it in your visualisations. Most dashboarding tools likely let you do this to some extent – Tableau recently improved its ability to union data for instance, together with some cleaning features, and it’s had joining and blending for a while. You can also write VizQL formulae to produce calculations at the time of connecting to data.

Power BI’s query editor seems to be more powerful than many, with a couple of particular nice features.

Firstly, it uses a language called ‘M’ which is specifically designed with data mashups in mind. Once you’ve obtained your data with the query editor, you can then go on to use the DAX language (designed for data analysis, and whose CALCULATE() function has a soft spot in my heart from previous projects) throughout Power BI in terms of working on data you already have access to.

The query editor is fully web-data enabled; even scraping data right off appropriately formatted web pages without any scripting work at all. Here’s the Microsoft team grabbing and applying a few transforms to IMDB data.

One query-editor feature I particularly like somewhat addresses the disadvantage that some of these user-friendly manipulation tools have vs scripting languages like R; that of reproducibility.

In Power BI, as you go through and apply countless modifications to your incoming dataset, a list of “applied steps” appears to the side of your data pane. Here’s an example from the getting started guide.

appliedsteps

It’s a chronological list of everything you’ve done to manipulate the data, and you also have the ability to go back and delete or edit the steps as you please. No more wondering “how on earth did I get the data into this format?” after an hour of fiddling around transforming data.

There’s plenty of built-in options for cleaning up mucky data; including unpivoting, reordering, replacing values and a fill-down type operation that fills down data until it next sees a value in the same column,  which handles those annoying Excel sheets where each group of rows only has its name filled in on the top row. Unioning and joining is of course very possible,  and you’ll have access to a relationships diagram view, for anyone who fancies having a look at, or modifying, how tables relate to each other.

Analysts are not limited to connecting to existing data either. Non-DBA types can create new tables directly in Power BI and type or paste data directly into them if you wish (although I’d be wary of over-using this feature…be sure to future-proof your work!). You can also upload your standard Excel workbooks directly to the service for web Power BI to access to its underlying data.

If Power BI already has the data tables you want, but they’re just formatted suboptimally or over-granular, then you can use DAX to create calculated tables whereby you use the contents of other imported tables to build your own in-memory virtual table. This might allow you to, for instance, reduce your use of intermediate database temporary tables for some operations, perhaps performing some 1-time aggregation before analysing for instance.

Why version 9 will {FIX} Tableau for me (and workarounds in the mean time)

Excitement builds within the dataviz world as the next version of Tableau gets close to launch, supposedly within the next 60 days.

It has many new features, which data geeks and other fans can see an preview being dripped out piece by piece in the Tableau blog, and summaries elsewhere, but one has really caught my attention, even though it is probably on the less glamorous side of things.

The general topic is “level of detail” calculations; a new type of calculations in the forthcoming version of Tableau, which I first saw in action via the Information Lab’s wonderful preview video. All the ones I’ve read about so far (INCLUDE, EXCLUDE, ….) seem that they will be very useful indeed for such super-common analytics tasks as comparing something with a larger group of somethings it’s part of, but potentially most thrilling to me is FIXED.

These calculations are interesting to me in that they provide a way to use data from other rows of data to calculate something in this row. Tableau table calculations  can do some common cases of this, e.g. TOTAL, WINDOW_SUM et al. but not yet flexible enough to what I imagine is necessary for most in-depth analysts on a regular basis; specifically accessing data at a pre-defined granularity not included in the visual analysis design itself that is neither “current row” or “all rows” – but something in-between.

Of course, not proclaiming to be a leading Tableau expert, it could simply be that I don’t know the easy way of doing it in Tableau today so comments most welcome if I am missing something obvious – but even with various training and Tableau doctor sessions a few questions remain unsolved in my mind.To highlight this, I’m going to steal something very similar (but far simpler) to the example Tableau gave in their 3rd example here, about sales by customer tenure.Imagine a very simple table like this:

And now that you want to know the total transaction value for these customers based on their first transaction date. The business question might be something to the effect of “Are our newer customers bringing in the same level of revenue as our older ones?”. This silly sample data is so small it’s trivial to calculate in ones’ head, but you can imagine many real life equivalent tables that consist of millions of customers and hundreds of millions of transactions.The sort of data structure that would make it wonderfully easy to calculate this today in Tableau would be to add another column, e.g. “first transaction date”. This could then be used in a dimension so that one can pop it on rows, colours, shapes or what have you, put transaction values into the measures and play away.

Note the point here is that every unique customer has the same “first transaction date” in all their records, whereas the actual transaction date varies as they buy more products over time.

That column is very easy to calculate in Excel and its ilk. But, today, it does not seem so easy to calculate that in Tableau in aggregate to answer such seemingly simple questions like “how much did customers whose first transaction was in 2001 bring us over their lifetime so far vs those starting in 2002? “.

To be fair, with small examples like this it’s usually easy enough to do this even in current Tableau by creating a series of Tableau calculated fields and using them in Tableau sets to represent all customers with a minimum transaction date of a certain year. I’m not going to write about that here, as hopefully I won’t need to do it ever again in the near future! I’d note though that, as well as being as intuitive as most of Tableau is, in the real world this technique does not seem to scale at all well, for instance when you have large number of customers who joined in many different years. I have even managed to get Tableau to tell me it’s out of memory whilst trying to doing this sort of thing with too many records, which is a very rare occurrence indeed.

So personally I have tended to force myself to switch to other tools to pre-build this extra column when I need to do something like this in Tableau.

I’m all for using the right tool for the right job, and there are many great tools available. No serious data-person should restrict themselves to a single tool.  However I haven’t experienced an ad-hoc visualisation tool I like nearly as much as Tableau, so I end up spending time taking the data from Tableau out to a another tool just to make a file to re-import back into Tableau.

Unquestionably, doing so interrupts the flow of analysis, and can be a proper time-sapping pain if the source of your data is for instance a TDE or something else that most other packages don’t natively open. Here’s a reference  list of some tools and methods I’ve to do this successfully in the past – and probably will continue doing so for the next 59 days or so.

Microsoft Excel: I probably don’t need to explain what this one is, to anyone who read as far down as here anyway!

Example method: many possible options, 1 of which would be this style of array formula if we are sure no transactions in the year 2099 or afterwards exist in the dataset.
{=MIN(IF(Sheet1!$A17=Sheet1!$A$13:$A$19,Sheet1!$B$13:$B$19,DATE(2099,1,1)))}

Pros:

  • Very easy to do, most analysts will be able to create a formula for this very quickly.
  • You probably already own the software.

Cons:

  • Excel cannot handle files with over 1 million rows, and it begins to struggle long, long before that.
  • Generally will have to re-do or extend the formula when the rest of the data updates. Excel has some live-linking abilities, but I’ve never found them terribly reliable.
  • As ever with Excel, because it’s so easy to use and supremely flexible about most things, it’s easy to make a mistake, not notice and hence create inaccurate analysis. Imagine if you forgot the dollar signs in the above formula!

SQL server (or other RDMS):

Example method: again many possible options. 1 of which could be:

SELECT data.[Customer ID], data.[Transaction date], data.[Transaction value], data_copy.FirstTransactionDate
FROM data
LEFT JOIN
(SELECT [Customer ID], MIN([Transaction date]) AS FirstTransactionDate FROM data GROUP BY [Customer ID]) data_copy
ON data.[Customer ID] = data_copy.[Customer ID]

Pros:

  • Very often my data is coming from a SQL server anyway, so why not retrieve and manipulate it in one go?
  • Performance CAN be great over even large datasets
  • You can include data updates just by refreshing the query

Cons:

  • SQL is not a visually intuitive language. Personally I would recommend most analysts who have access to relational databases to take the time to learn at least the basics – but beyond the basics it can certainly get rather complex and take a lot of thinking and debugging time that could be better spent in providing insights. Especially for the case where you are running against a database that was designed perhaps for operational or transactional reasons, and not aimed at analysts.
  • SQL servers in companies are often centrally maintained by a technology department and may be being used for many non-analytical yet mission-critical tasks by many other people. Business analysts do not always have a good amount of control over those environments; for instance I have effectively low priority read only access a lot of ones I use.
  • For the above reason, you may need a lot of documentation to understand where to look in the database; badgering the database managers for data dictionaries, entity relationship diagrams and other complexities takes time on both sides. It might well be that such documentation doesn’t exist. And then “educated guessing” is about the best you can do.
  • For the above reasons, performance CAN be terrible depending on what you want to do
  • Accessing a remote SQL server can’t be done if you have a dodgy internet connection or outside firewalls in some cases (think ‘travelling’ for instance).
  • Some SQL databases or licenses may be very expensive, if you don’t already have access to one.

Powerpivot

Example method:

=CALCULATE(MIN(Table1[Transaction date]),ALLEXCEPT(Table1,Table1[Customer ID]))

Pros:

  • Is a free download
  • It fits directly into Excel, so relatively user friendly and easy for any decent Excel user, especially those who pride themselves on pivot table skills.
  • It enables Excel to handle a much bigger volume of data and introduces new functions that can be used for many useful things that standard Excel pivot tables can’t do.
  • Everything seems to be precalculated so performance can be extremely good after the initial setup of your data is done.
  • The language it uses for formulae, DAX, is for the most part pretty similar to standard Excel functions where it can be.
  • The CALCULATE() function used above uses a syntax I find very intuitive, flexible and really like a lot; possibly my favourite implementation of a feature like the ones you need to do this task.

Cons:

  • You, and other people who want to manipulate your Excel file if any, need to have Powerpivot addin installed. This needs downloading and temporary admin rights if you’re running Excel 2010. I have no experience with Excel 2013 variants.
  • Formulas in Powerpivot are written in “DAX” language. DAX is not a language that all that many people speak in my experience (although the basics aren’t complicated).
  • The version you can use seems dependent on other Microsoft infrastructure. For instance, for me, I am restricted to the 2008 version due to other software that runs. 7 years is a LONG time for analysis tools!
  • Behind the scenes it is creating something similar to a SSAS cube. This means that when you connect Tableau to it as a datasource you suffer the loss of many of Tableau’s finest features, as it’s effectively an OLAP cube. Many of Tableau’s competitors won’t connect to it at all.
  • The big showstopper for me: it’s an in-memory software I believe, so if you try and use row-scanning operations like the CALCULATE() formula above, especially with a FILTER() over millions of rows I have consistently had “out of memory” errors. This means it whirs away for half an hour and then gives the error message and totally refuses to do what you asked it to. I come up against this a LOT. To be fair, maybe a version that isn’t 7 years old has far better performance than this one, but sadly with our current infrastructure that’s not an option for me. It can handle data far bigger than native Excel without question, but it still seems to have a limit that one comes up against frequently even when not dealing with mega-terabyte-Hadoop type stuff. I would love to see if new versions of it have helped on that front at all as it’s a great tool for it’s (non-existent) cost.

Alteryx:

Example method: Alteryx files look like visual flow charts, so below an example picture. One option for the crux of the matter is probably a multi-row formula tool with the following formula, grouped by Customer ID, after the data is sorted by customer ID then transaction date.

IF ISNULL([Row-1:Transaction date]) THEN [Transaction date] ELSE [Row-1:First transaction date] ENDIF

Pros:

  • It is blazing fast. Really, really fast. I have seen instances where the exact same task as made PowerPivot crash out of memory after 20 minutes has been completed successfully in under 30 seconds in Alteryx.
  • After a little experience, the flowchart interface is easy to use and I find I make fewer mistakes here than in some other tools.
  • If you need to repeat it with updated data, just re-run the Alteryx routine and you replicate the results without needing to recreate or edit anything.

Cons:

  • If you don’t already have it, and most people don’t, Alteryx isn’t the cheapest of tools. I think the “normal” version is around £2500 per user these days. Definitely not over-priced for what the package can do – but obviously it is not cost effective if your only task really is  to add a column to a table once!
  • The interface is very different from many “normal” tools, which means it might put many inexperienced users off trying it out unless they really need to.
  • Super-petty criticism, but on my computer it takes a relatively long time for Alteryx to load and get itself into a usable state; a definite “flow interrupter”.

Tableau 9 (my hope):Example method:  I didn’t get around to installing the beta test version yet so this is assumed from reading the blog posts of many wiser people than myself – but if I understand correctly it will take just <=6 little words (depending on how you count).

{Fixed [Customer ID]: MIN([Transaction Date])} 

Pros:

  • If this is true, then you won’t have to leave Tableau – and your flow of analysis – at all to do the task.
  • It will refresh as and when the data does, and work offline with extracts or online without.
  • Typing the above in will take approximately 5 seconds or less!

Cons:

  • None come to mind, although I have one potential concern; that of performance. Functions like this I would assume have to scan over a lot of rows to get the result. Tableau has (almost) always seemed a very fast tool to me so I’m optimistic – and Alteryx has shown these functions can run superfast –  but I’m curious to know how well these type of “level of detail” functions work over medium-large datasets (100,000s – 10,000,000s of rows). I will look forward to finding this out in the not too distant future.

Either way this feature will surely save me, and I’m sure many many others, an incredible amount of time on very common, and quite dull, analytical tasks. I am grateful indeed.