The new version of Tableau, number 10, got released whilst I was busy sunning myself on holiday. This mean there was at least something to look forward to in returning to the world of work 🙂
This means that Tableau can now do “proper” joins (i.e. not blends) that span different databases, even different database technologies. So for instance, you can execute an inner join from one table from SQL Server onto data from another table in Redshift, and analyse the result as though it all came from the same place. Better yet, you can publish the resulting datasource as a Tableau Server datasource so everyone can join in the fun, without even having to care where it came from in the first place.
Below a few notes / warnings from my early experimentations. Some may sound critical, but, overall, this is truly a great feature that will enable the less ETL-loving folks to get more done quicker, and/or without the resources of tech database specialists or sofwtare being needed.
- Excel counts as a database (as does any other supported file format). This means you can simulate database joins between multiple Excel files for instance without having to modify them in any way.
- You can use a custom SQL statement as the “table” you’re joining to or from in any join: i.e. you’re not restricted to just the formal tables on the database. This means that if you’re an SQL wizard who would prefer to have the database preprocess stuff before it hits Tableau, you can do that in the context of a cross-database join.
- Warning! I believe (from experience 🙂 ) Tableau downloads all the data from the tables you choose to your local computer before performing the join. This is kind of predictable if you think it through, but not totally. If you know what an Alteryx “in-database tool” is, then this is not like one of those.
The joining is on your computer, even if you don’t need to care about that. This means that if you’re inner joining a table of 10 rows in one database to 10 out of a possible 1 billion rows in another database to create an extract, Tableau will be downloading 1 billion and 10 rows to your computer before later returning you the post-join 10 rows as your datasource. You can apply data source filters to your tables before extracting though which might help alleviate some of the potential pain here.
- You can do left, inner and right joins very easily, joining on any number of fields as the key. But I didn’t see a facility for conditional joins, in the sense of saying “download all sales from Redshift for a customer whose info is in Oracle where the sales are between the start and end date of the customer’s first 6 weeks according to Oralce”. You would have to let it download and show you all of the joined data and then use formula in Tableau afterwards to filter out what you don’t want.
- Some data connection types aren’t allowed to be involved in cross database joins at the moment. A Tableau blog post lists these as:
- Tableau Server
- Google Analytics
- Microsoft Analysis Services
- Microsoft PowerPivot
- Oracle Essbase
- SAP BW
- Teradata OLAP Connector
The big shame for me on that list is Tableau Server. Tableau Server datasources are a very cool way for non-DBAs to share analytically-useful data around the organisation (or DBA professionals to provide a lovely structured, governed, datasource ideal for Tableau users to use). I have however found that they can result in semi-silos, as exporting or manipulating them has certain limitations vs standard databases. If one could cross-database join these types of datasource one day it would be an amazing way of adding extra flexibility to that feature
- Performance may be super great, or it may be super slow, depending on your exact database setup.
But all those compliments and complaints aside, debt is very much due to the makers of Tableau 10 for providing one fewer reason for analysts to load up other software, or pester other departments, before getting started on analysing all your lovely data.