Intermediate Tableau guide for data science and business intelligence professionals
The greatest value of a picture is when it forces us to notice what we never expected to see. – John W. Tukey
Let’s assume that you have some data with you and you wish to garner some insights from it. Coding is not your forte and you don’t know how to get started.
Let me tell you this – you can make something as descriptive / insightful as the image below, with gestures as simple as drag and drop. And it doesn’t even require a single ounce of coding. Now that is the power of Tableau for you!
For all those reading this who have been acquainted with Tableau, can plot a few basic charts on it, and wish to learn more about its wide horizons, this article is meant for you.
As for those who have yet to be introduced to the beauty and simplicity of Tableau, quickly go through Tableau for Beginners first. Practice making a few simple visualisations and then rush back here!
In this article we are going to discuss a few core functionalities of Tableau which help in making really dynamic graphs. So let’s quickly get started!
Table of Contents
- Dealing with Different Data Sources
- Data Blending
- Conditional Combination of Data
- Calculations and Level of Detail
- Parameter Control
- End Notes
1. Dealing with different data sources
Its not practical to store all data in a single table. In order to avoid anomalies related to updates, data is almost always distributed in multiple tables that have some relation with each other. Let’s understand the same with an example.
Consider the situation where a Superstore, on the verge of expansion, perceives that the number of Returned orders has been increasing by day. To ascertain the analysis and come up with the right action plan, they plotted the following chart to understand the products that were being returned :
As can be observed, Binders have the maximum number of items being returned. But judging by the color of the bars, Machines and Tables have the highest percent of return ( returned / bought ) :
This seems pretty similar to plotting just another chart, but the trick here was that it was created by using the combined data of two tables : Orders and Returns. This is known as a Join.
Let’s try making the same to get a better understanding :
- Go ahead and download the Sample Superstore Sales data from here.
- Open Tableau and import the same :
The dataset is made up of 3 tables : Orders, People and Returns, and the ones that we are interested in at the moment are Orders and Returns.
- Drag Orders onto the empty area and use the Data Interpreter to get rid of the erroneous data.
- Drag Returns in the same area to get something like this :
For two tables to be joined, there has to be the presence of at least one common field. Here, Tableau automatically Inner Joined the two tables, based on the commonality of the column Order ID. By way of inner join, the combined data only consists of those rows that have the same Order ID in both the tables.
You can change the Join Type as well as the Joining Field in Tableau, but you need to ensure that it’s sensible.
Changing the Join type (Default Inner to Right) :
Changing the Join field :
See how I tried to join the two tables based on Row ID of Orders and Order ID of Returns? Since the two are not compatible, we don’t see any records, and plus that red mark near the circles shows an error.
Let’s get back to working on the chart now :
- Drag Sub-Category of table Orders onto Rows and Number of Records of Returns onto Columns.
- Drag Row ID onto the Marks Section and convert its measure to Count instead of Sum. Now drag this onto the chart :
- Finally go to Show Me and choose the Bullet Graph to get the desired chart.
Here we have used Inner Join, but you can always choose between Inner, Right, Left and Full Outer based on your requirements.
1.2 Data Blending
Data blending is quite similar to Joins, with the difference being that Joining requires the data to be from the same data source. In the above example, we used different tables from the same Excel file. But Data Blending comes into the picture when you are working with different Data Sources. Let’s understand the same with an example.
The Superstore has another vertical in the form of a Coffeechain which is spread across as many states as the Superstore is. But they are considering shutting down some of the branches after observing the following plot :
As can be seen there are some branches that are doing equally well as the Superstore such as California and New York, while many are not, such as Iowa and New Mexico. Just like in Joins, here the trick is that both the datasets, dealing with the two verticals, were stored in different Data Sources, an Excel file and a TDE database.
Why not we plot the same as well to get a better understanding? We will begin by blending the Superstore data and the Sample – CoffeeChain database. You can find the data for the latter here as well:
- First import the Sample-Superstore.xls
- Go to the Worksheet, click on Data -> New Data Source and choose the downloaded CoffeeChain TDE file, to see both the datasets imported :
- Click on the States under the Superstore database and drag it onto Rows and drag Sales onto Columns.
- Click on the CoffeeChain database and drag Sales onto Columns as well to get :
Here you must have observed a few things; let’s take them up one by one :
- To the top-left of the screen, the Superstore database now has a blue mark on it – this indicates the Primary Source.
- The CoffeeChain data has an orange mark on it – this indicates the Secondary Source.
- There is an orange chain link next to State under CoffeeChain – This indicates the Blended Field, meaning that this field is common in both the datasets.
- There are ‘>29 null’ values in the chart. This is because there are not as many states in the CoffeeChain dataset (Secondary Source) as there are in the Superstore dataset (Primary Source).
Let’s rectify these null values by interchanging Steps 3 and 4 :
- Click on the States under the CoffeeChain database and drag it onto Rows and drag Sales onto Columns.
- Click on the Superstore database and drag Sales onto Columns as well to get :
- Change the chart of each graph from the Marks section to Area chart :
- Right click on the second chart and choose Dual Axis to merge the two graphs into one :
- Now all that is left is changing the color scheme and you just finished blending!
2. Conditional Combination of Data
Now that your data is all ready, blended or joined, let’s start making some interesting dashboards. From here on out, we will be using only the Superstore data: Orders + Returns (Left-Joined).
Let’s start off by considering the example of a Survey analysis. In a survey of Food Consumption, under the section of Food Preferences, instead of ‘Low Fat’, you may have ‘LF’, or instead of ‘Regular’, you may have ‘reg’.
In such cases, during data visualisation, you face issues like the following:
As you can see, due to different nomenclatures, this visualisation is not ideal. So, one possible solution to this is Grouping where you can place ‘LF‘ and ‘Low Fat‘ in one group, and ‘reg‘ and ‘Regular‘ in another :
Let’s understand this a bit better with the help of the following Dashboard :
The above is the Returns Analysis across Categories and their Sub Categories. Although, it cannot be seen, in the bar graph, Copiers have the maximum Return percentage, followed by Furnishings :
As far as the Line Chart goes, it seems that the Sales Team had been right all along. The Returns were in fact increasing quite rapidly, but luckily from what we can see, the rise is slowly receding.
From the pie charts, you can clearly analyse the Returns of each Category. As can be observed, Technology suffered the maximum number of Returns.
Another analysis, which I am going to leave for you to make, could be the Return distribution across the various States. Once you finish learning how the above graphs were made, you can easily make this too. So, let’s get started :
- We are going to begin with the following graph, and do take note that the dataset is a Left Join of Orders and Returns :
- In another sheet, make this graph :
- You see the Dimension of Returned under Returns? This is what we are going to use to make our groups. Right click on this Dimension :
- Go to Create – > Group. Since we are going to make groups of products that were returned or not, click on Null -> Group, and change the name :
- Do the same for Yes, and get to the final configuration :
- Click on Apply -> OK, after which you will see a group by the name of ReturnedOrNot under Dimensions.
- Drag this new Dimension over the Color in the Marks Pane to get the following view :
This step automatically segregated the Sales of each Sub Category based on whether the Orders had the Null or Yes value under Returns or not.
The remaining steps are merely customisation. Let’s do those as well :
- Drag Sales onto Label in the Marks Pane
- Right click on the Sales pill -> Quick Table Calculation -> Percent of Total
- Right click on the Sales pill again -> Compute Using -> Table (across)
Also, in the Legends, you will most likely see ‘In/Out’ as the aliases. You can change this as per your requirements, by right clicking on the In/Out blue pill in the Marks Pane, and choosing Edit Alias.
Let’s shift to the pie chart that we had made. We are going to apply the same ReturnedOrNot group to this as well. First create two duplicates of this sheet, and work on one of them :
- Drag the group onto the chart, to get something like this :
- Right click on Returned, to get :
- Choose Keep Only, so that you just see the Returned graph. This gives you the Total loss from Returns. All that is remaining is labelling with the Sales value.
- Now go to the second duplicate of the sheet, repeat the first 2 steps, and instead of keeping the Returned chart, we are going to keep the Not Returned chart. This will give you the sales of the Remaining / Retained Orders.
Its equally easy making the Line Chart :
- Drag Order Date to Columns and Sales to Rows to get a Line Chart
- On top of this chart, drag the ReturnedOrNot Dimension, but just like you had before, exclude the Not Returned Line, right clicking on it and choosing Exclude.
All that is left now, is combining the above Worksheets into one Dashboard. Why don’t you try making the chart for State wise Returns distribution as well?
After taking note of the Returns analysis, your organisation decided that the increase in Returns was not that alarming, and that it should not be constituted as a reason for non – expansion.
But the Superstore is only going to expand in those States where the Sales and Profit both have crossed a certain margin, for example, 40,000 and 10,000 respectively :
So, Sets, as are created above, are really similar to groups. In Set, you group data that fulfils a particular set condition. Another interpretation could be: Groups help you attain a higher level hierarchy, as we had seen in the previous example, whereas Sets help you attain a lower level granularity.
Let’s understand this better by creating the above Dashboard :
- We are going to start off with a simple Map Chart. Since we are going to map our Sales and Profits onto this graph, right click on States and choose Create -> Set.
- Fill in the fields with the following demo information :
- Right click on States once again, and this time fill information for the Profit Margin :
- Now we are going to join the two sets to obtain our desired configuration. So right-click on SalesAbove40k set -> Create Combined Set :
- Fill in the fields with the following configuration :
This step joins the above two conditions for Sales and Profit, to get the requisite combined computation. To view the results :
- Drag this new Set on top of the Map Chart, and convert it into a Filled Map instead of the Symbol Map :
You can always customise the above chart by changing the colour, adding labels etc.
The line chart is as easy to make as the one we had made previously for Trend of Return. Here we have excluded the States that belong to the No Expansion group, like we had excluded the Not Returned there.
3. Calculated Field
The Superstore dataset is pretty comprehensive. It offers quite a lot of information and field sets. But like all data, there is always the possibility of extracting more features. Calculated Fields help you do exactly that while also allowing you to carry out both simple and complex calculations on the data.
So what is a Calculated Field?
To put it in simple words, it’s a formula that you apply to your data, where the various Measures act as the variables.
How to create one?
Just simply go to Analysis, click on Create Calculated Field and something like this will pop up :
Here is where you write your formulas, and as you can see, Tableau provides you with the various syntaxes too, so that you never feel lost! You can also apply ‘If-Else’ conditions, ‘Case’ conditions (as we shall see next) and of course the usual mathematical computations too, which we will explore now.
So what Calculation to start with? Let’s begin with something simple, that is, Average Sales associated with the Orders. The most apt formula for the same would be Total Sales / Total Number of Orders. To convert this formula in Tableau terms, Total Sales implies the SUM of Sales, whereas Total Number of Orders means their COUNT.
Let’s get to the application then :
- In the Calculation Box, make the following changes :
- To see the cumulative SalesRevenue (Calculated Field), drag Measure Names from Dimensions to the empty area to see the above computed result :
So what you have basically accomplished with the Calculated Field is create a Measure of your own, which you can use just like Sales and Profits.
Obviously this was just a gist of what Calculated Fields can do. They can be used for various complex calculations as well, and the glimpse of one such instance can be seen in the following section.
4. Parameter Control
Filters such as the following are a great way of interacting with the visualisations on Tableau :
Just like filters are measures through which you can look at various aspects of your data, a Parameter is another great feature. It can be used in place of Filters, and can exhibit its own dynamic property too.
So what are Parameters? These act as variables of an equation, which you can change to get different results each time.
Let’s try and understand this with the help of an example. So far we have been making separate graphs for separate Measures. Whenever we had to analyse Sales, Profits, Quantity or Discount of the various Categories of Products, we had to make different charts, everything being the same BUT the Measures.
With the power of Parameter Control, there is actually a pretty easy way of going about this, without the repetitions.
Consider this for Sales :
And the following for Profit :
Just with a simple click on the MeasureValue list, you are getting cumulative results across various Dimensions. Here, the MeasureValue is our Parameter, since we are able to change its value.
Let’s get to making one of our own now. Parameters rely heavily on Calculated Fields, so you are going to get a nice chance to practice what you learned above:
- We are going to start off with a blank screen this time.
- Click on the downward arrow near Dimensions, and choose Create Parameter :
- Since we are going to interchange the Measure Values between Sales, Profits, Quantity and Discount, fill in the fields shown below and click OK :
- Right click on the newly generated Parameter, and choose Show Parameter Control:
Do not expect the drop down list to start magically creating graphs on its own. There are still some computations to be done.
So far we have only allocated the names to the Parameter, but not the Values that they are supposed to take up. So for this purpose, we are going to create a Calculated Field.
- Go to Analysis -> Create Calculated Field, and fill it with the following self explanatory details :
- You will find the newly created field under Measures. So first drag Order Date to Columns and then NameOfMeasure to Rows :
There you go! Go ahead and try changing the MeasureValues now.
Now onto the chart we have created in the Dashboard. To see the lines for the individual Categories, simply drag that Dimension on top of the chart :
For creating other charts, do as you normally would, but instead of dragging the individual Measures onto the Rows / Columns, drag the NameOfMeasure parameter instead.
5. End Note
That brings us to the end of this article. But don’t worry, I will be back again with another article on Tableau!
Meanwhile, I think it’s customary to give you a Dashboard to make :
You may face a tad bit of difficulty in the beginning, but if you truly apply yourself, you are bound to get it. And of course if there are ever any doubts, or if you wish for me to cover any concept in the next article, do leave them as comments.
All the best to you Data Explorers!
Learn, Engage, Compete & Get Hired
Leave a Reply Your email address will not be published. Required fields are marked *