Data Visualization – Bad Representation of Data
This article was published as a part of the Data Science Blogathon.
Data Visualization is a technique used to transform data (in numerical/text forms) into an image that can easily be interpreted by the audience. It is used to communicate to drive action, to inform, or even to entertain, but most of the time, it is used to persuade. It is easy to convince people using data visualization.
The critical part is when persuasions were accompanied by the intention to mislead your audience. In this article, you will be provided five (5) bad representations of data that should be avoided at all cost and some tips and solutions to address those concerns.
(1) Using the Wrong Graph
There are graphs/charts that seem good at first but provide a bad representation of data, and would only confuse your audience. In a telecasted news report of FOX on a 2012 Presidential Run, it was shown a pie chart that seems off as shown below:
The use of a pie chart should represent data as part of a whole, which means the sum of all data in a pie chart should be equivalent to 100%. Contrary to a representation made by Fox News, the total percent of the pie chart shown is 193%. This could be very troublesome if shown without the label. It would look convincing.
In comparing data, the best way to represent them is through a bar graph as shown below:
In this new graph that represents the Presidential Run of 2012, the category Back Palin has the highest percent. Proper usage of a graph to represent our data, will make it easier for the audience to know the message you are trying to convey.
(2) Removing the Baseline or Adjusting Y-Axis
The graph above is shown as if Back Palin has an overwhelming advantage against Back Huckabee and Back Romney when in reality, it is just around 7% to 10%. To fix this, always start your Y-Axis with a baseline and use proper scales.
(3) Going Against Conventions
In presenting your data, the norms should be observed. As shown in the graph above, we try to represent the largest data with a lighter color, but what is accepted in society is by using a darker color to highlight a specific portion of the graph. Imagine a more complex graph such as geo-mapping and heat map, and we try to change the conventional way of presenting heat / dense area, it would lead to misconception and confusion. To fix this, always check the social norms in the area or with your audience when preparing your data visuals.
(4) Unlabeled Graph and (5) Overloaded Data
The chart shown above is also a bad representation of data. First, it does not have labels. Second, there are no specific data that are being highlighted. Lastly, the other data behind the tall bars in the front cover information being shown by other bars. No labels and showing voluminous data in a graph would render confusion to your audience and would give a harder time to identify your point or the message you are trying to communicate using data visualization. Eventually, you will be explaining it verbally and the purpose of presenting data using data visualization is not met.
So, what should we do?
In data visualization, we should also remember the 4Ws:
- Which data is important to show?
- What do you want to emphasize in the data?
- What options do I have for displaying the data?
- Which option is most effective in communicating the data?
Data are very wide. As a data visualization practitioner, you should only pick essential data to be shown to your target audiences. After which, highlight or emphasize the information you want to convey by changing the other data colors to gray or faded colors.
In displaying your data and knowing the most effective in communicating your data, the chart below will surely help you decide:
Emphasize the most important data. Orient graphs for legibility. Organize graph/table. Avoid overloading graphs.
The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.