A first look at Alteryx 10’s Network Analysis tool

Network visualisation tool iconAlteryx version 10 was recently released, with all sorts of juicy new features in realms such as usability, data manipulation and statistical modelling. Perhaps one of the most interesting ones for me though is the new Network Analysis tool.

This provides an easy way to make network graph visualisations natively, something that many general purpose analytical tools don’t do (or require workarounds). Behind the scenes, it uses R, but, as per the other Alteryx R tools, you don’t need to worry about that.

Until now, I had used the Gephi for such work; it’s a great free open-source program which is tremendously capable at this style of analysis, but not always particularly friendly or easy to use, and requires data to be exported into it.

In a previous post I wrote about the basics of getting data into Gephi and visualising it. The very simple example I gave there is easily replicable in Alteryx. Here’s how:

First create your tables of nodes (the dots) and edges (the lines between the dots).

The documentation states that your nodes must have a unique identifier with the fieldname of “_name_” and the edges must have fields “from” and “to”. Actually in practice I found it often works fine even without using those specific field names, but it is to rename columns in Alteryx (use the Select tool for instance) so one might as well follow the instructions where possible.

So for a basic example, here’s our table of nodes:

_name_ label Category
1 A Cat1
2 B Cat1
3 C Cat1
4 D Cat2
5 E Cat2
6 F Cat2
7 G Cat3
8 H Cat3
9 I Cat3
10 J Cat3

And edges:

From To
1 2
1 3
1 4
1 7
1 9
2 8
2 7
2 1
2 10
3 6
3 8

Pop a “Network Analysis” tool onto the canvas. It’s in the Predictive section of the Alteryx toolbar. Then hook up your nodes file to the N input and edges file to the E input.

Alteryx network viz workflow

There’s some configuration options on the Network Analysis tool I’ll mention briefly shortly, but for now, that’s it, job done! Press the run button and enjoy the results.

The D output of the tool gives you a data table, 1 row per node, showing various graph-related statistics per node: betweenness, degree, closeness, pagerank and evcent. You can then directly use these statistics later on in your workflow.

The I output gives you a interactive graphical representation of your network with cool features like ability to search for a given node, tooltips upon hover, click to drag/highlight nodes, some summary stats and a histogram of various graph statistics that describe the characteristics of your network like this:

Capture

Although for most tools the “auto-browse” function of Alteryx 10 negates the need for a Browse tool, you will need one connected to the I output if you want to see the graphic representation of your network.

There are some useful configuration options in the Network Visualisation tool itself in 3 categories; nodes, edges and layout.

Perhaps the 3 most interesting ones are:

  • ability to size nodes either based on their network statistics or another variable,
  • ability to have directed (A connects to B, B might not connect to A) or undirected (A connects to B implies B connects to A) edges.
  • ability to group nodes by either network statistics or another variable (e.g. to differentiate between Facebook friends and Facebook groups).

Here for example is the above diagram where the nodes are sized by degree (# connections), coloured by my variable “Category” and the edges are set to directed.

Options for network viz tool

Network viz with options


Sidenote 1: There seems to be a trick to getting the group-by-variable to work though, which I’m not sure is intentional(?). I found that the tool would only recognise my grouping variables if they were specifically of type “String”.

Alteryx text from an input file usually defaults to type “V_string” but the Network Viz tool would not let me select my “Category” field to group nodes by if I left it at that. However it’s very easy to convert from V_string to String by use of a Select tool

Select tool to string

Sidenote 2: For people like me who are locked down to an old version of Internet Explorer (!) – the happy news is that the Alteryx network viz works even in that situation. In previous versions of Alteryx I found that the “interactive” visualisations tended to fail if one had an old version of IE installed.


Overall, the tool seems to work well, and is as quick and easy to use as users of Alteryx have probably come to expect. It even, dare I say it, has an element of fun to it.

It’s not going to rival – and probably never will try to – the flexibility of Gephi for those hand-crafting large complex networks with a need for in-depth customisation options and output. Stick with that if you need the more advanced features (or if you can’t afford to buy Alteryx!).

But for many people, I believe it contains enough features even in this first version to do the basics of what most analysts probably want a network viz for, and will save you hours in comparison vs finding and learning another package.

At least for relatively small numbers of nodes anyway; on my first try I found it hard to gain much insight from the display of a larger network as the viewing area was quite small – but some of this is innate to the nature of the visualisation type itself. I have also not yet experimented very much with the different layout options available, some of which might dramatically improve things if they have similar impact to the Gephi layout options. Picking the optimum location to display each node is a distinctly non-trivial task for software to do!

Remember also that as the “D” output gives a data table of network stats per node, one could always use that output to pre-filter another incarnation of the network viz tool and show only the most “interesting” nodes if that was more useful.

In general this new Alteryx tool is so easy to use and gives such quick results that I hope to see it promote effective use of such diagrams in real-world situations where they can be useful. At the very least, I’m sure it’ll convince a few new “data artisans” to give network analysis a try.

Gephi basics: simple network graph analysis from spreadsheet data

Several interesting phenomena can be modelled and analysed using graph theory. Graph theory, which Wikipedia tells me first had a paper published about it in 1736 (!) can at its most basic perhaps be thought of as mathematical techniques to analyse problems where one can represent the protagonists as a set of objects (nodes) and lines connecting them (edges).

A common example would be analysis of social networks (each person is a node, each friendship connecting them an edge), referral schemes (people involved are nodes, the act of referring them are edges) or in a more physical sense perhaps transport (each airport a node, each flight between them an edge).

Most common business analysis tools I have seen do not really try and tackle the classic graph/network visualisations between objects very well. So far it seems it hasn’t been a traditional avenue of analysis for most businesses, but as the most obvious application, “social data”, becomes ever more interesting I’m sure interest will only rise.

gephilogo#Luckily there is a pretty awesome tool. Not only is it super-fully-featured in this sphere, but it’s licensed under Gnu GPL, and hence free! Kudos to the heros who create and maintain Gephi.

Whilst it produces research publication level output and has no shortage of advanced features, coming to it originally as a total graph-novice I did not find it overly easy to use. This is no slight on the software; my sense is the sheer power of it and target audience is not conducive to a hand-holding wizard type system, especially when it’s a labour of love!

Here then follows a few notes on how to get the most basic data from a spreadsheet on one’s computer into Gephi, and how to visualise a stupidly simple graph from it. Of course in reality the data involved will be far larger and more interesting than this fictional example, but hopefully it helps to get started.

Here’s the data to use in this example. First my set of nodes:

ID Label
1 A
2 B
3 C
4 D
5 E
6 F
7 G
8 H
9 I
10 J

Read this as saying I have 10 “objects” (e.g. people on Facebook), which have the names A, B, C etc.

and now the set of edges:

Source Target
1 2
1 3
1 4
1 7
1 9
2 8
2 7
2 1
2 10
3 6
3 8

Read this as saying I have 11 connections (e.g. friendships between people), which involve my 10 nodes.

Note that some nodes may have no connections to other nodes, and others may have very many. Here we see that node ID 1 (“A”) has a connection to node ID 2 (“B”). This doesn’t necessarily imply that B is connected to A in our example, which makes it a “directed graph” – for instance B might be a Twitter follower of A even though A is not following B. Gephi can naturally handle both directed and undirected graphs with ease.

Note that, for Gephi, the column headers are important.

  • Every node should have a column called “ID” and, if you want to show some sort of human-readable labels, then one called “Label”.
  • Every edge should have a column called “Source” and one called “Target” which are the ID numbers of the nodes that should be connected (for an undirected graph like this one it doesn’t really matter which goes in source and which in target – but for 1-way friendships it would).
  • You can add any other columns you like to the file, which you can then use in Gephi itself if you wish, but should always try and ensure you have at least the above ones.

Excel files must be in CSV format. So, if your data is not in CSV format, the first step is to save the data as a CSV, instead of a XLSX.

The CSV format supports only 1 “worksheet”, so you will have to create 2 files, one for the nodes and one for the edges. Technically you can just use the edges file and have Gephi assume that every entry in the edges table relates to a node it should create (and that there are no other nodes) but I find it safer at first to approach it explicitly with the separate files.

Now it’s time to open Gephi!

First, create a new project (or use the existing one) and press the “Data Laboratory” button at the top of the screen

pic1

Now choose “Import spreadsheet” from the resulting screen.

pic2

Select your nodes CSV file. Make sure the dialogue box is set to recognise your file as a “Nodes table” and not an “Edges table” or you will get an error about it being in the wrong format.

Click through to “Next” and “Finished”. You’ll see there are various importing options which are useful in more complex cases – but not needed for this example.

You’ll be returned to the “Data laboratory” screen, hopefully with your nodes data showing like this:

pic3

Note that this data is now embedded in Gephi. If you change something in your CSV it will not automatically update. Likewise there are many features in Gephi to add, remove, filter or calculate new columns which are not passed back to the CSV. Be sure to save in Gephi often!

Now, repeat that process but this time selecting to import your edges csv as a “Edges table”. You should end up with something like this when clicking on the little “Edges” button at the top left of the screen.

4

Now for the fun part. Switch back to the Overview section using the button at the top left of the screen. You should see your data, visualised as a graph!

6

It may not look super pretty at first, but you can see that it’s accomplished its task. Perhaps in a later post I will go through a few of the formatting options (Gephi can produce very, very beautiful output if one is prepared to try!) but for a few quick tips now:

At the bottom left of the main display window there are some formatting controls.

7

Right click and drag to pan. Use the mouse scroll button to zoom.

The button with the letter “T” applies the labels to the nodes. You may need to zoom in or adjust the size slider to the right to see them properly. This section is also where you find the basic colours, sizes etc. (there are also functions to go into later that let you colour code each node based on a variable or characteristics).

If you don’t like where a node has been physically placed on the screen there are 2 key options.

  1. Click it and drag it to where you prefer. Gephi will keep the edge properly connected.
  2. If it’s more than the odd one, or you wish to experiment with several different positioning algorithms in order to find the most effective one for your data, then note the “layout” box to the left.

8
Here you can have Gephi apply algorithms to fulfill certain sorting, manipulation or positioning operations, and it’s very fast, even on a lot of data. The software comes with some built in ones and there are more possibilities to download extra – more on this in a future followup. They’re quite non-destructive, so it’s quite possible to save your file and play with them – although note a lack of an actual “undo” in this software!

Some, like the one pictured have some parameters you can set and then press “Run” to actually apply it. I have had some success with “Force Atlas 2” in making sense of datasets somewhat larger than this example.

Here’s an example of how Force Atlas 2 and a bit of formatting made it clear that in the test dataset I had one lonely node, “E”, who has no connections. It’s also easy to see that A and B are amongst the most heavily connected nodes.

9

That’s the basics of how to get data in and on the screen covered. Gephi does far more than this; there are all sort of formatting, partitioning, ranking, calculating, filtering and many more abilities to help get insights out of graphs – but popping the data in is and having the gratification of seeing a visualised network is the first step.

Happy graphing!