Why do Scientists need to be better at Visualising Data?
With large chunks of data entering our system, we need to organize data into meaningful information so that information can become much more accessible through graphics and help us identify and take actions for the choices.
Data Visualization is an effective way to give shape to abstract data through colorful pictures, vectors, charts, dashboards, and graphics. Images are effective for storytelling, especially when that story needs good comprehension from the reader’s side to provide an output for the same. Scientific visuals are essential for analyzing data, communicating experimental results, and even for making discoveries.
Why Do Data Scientists Need Visualization to Understand Things?
The human brain needs a visual representation to make sense of large amounts of numbers or data. It can conceptualize and translate it into meaningful knowledge. Visuals are processed 60,000 times faster than simple text or numbers because an image just needs 13 milliseconds to be elaborated.
Bang Wong, creative director of MIT’s Broad Institute says, “ Plotting the data allows us to see the underlying structure of the data you wouldn’t otherwise see if you are looking at a table.”
Another great example would be Sunday Times Bestseller- Factfulness by Hans Rosling. This book provides ten reasons why we are wrong about the world and why things are better than we think through data analysis. Rosling was a medical doctor, professor of international health, and renowned public educator. The book, through exceptional data analysis, inventive visual explanations, data stories, and simple presentation design provide us the glimpse of a world which we so pessimistically think is approaching its doomsday when the figures have a different story to tell.
Why Now?
We live in an age where digital transformation has led to the production of 2.5 quintillion bytes of data every day. This unstructured data is an untapped goldmine of information that can be used for several purposes. Now, with a plethora of information, the need to make sense of it arises because our decision-making will be based on it.
Few scientists care about the visualization of data as they do with generating data or writing about it. It is seen as a last resort that tends to litter science with poor data visualization with confounding consequences. It can mislead scientists and readers and can also reduce the progress of scientific research.
There are two annual conferences dedicated to scientific data visualization and a new journal Nature Methods ran a column for six years about creating better figures and graphs. All these attempts are made to improve scientific visualization which requires a better understanding of the strengths, weaknesses, and instinct biases of the human brain. Applying this knowledge would lead to better application of visualizing data and deciphering it at the same time.
Chart Choice
Our perception works on the way we want to see things. We choose to see what we want to see. In the early ’80s, it was found that the particulars of human perception affect our ability to decipher graphic displays of data. It means that not every type of chart is conducive to our perception of studying and deciphering it.
There are some charts we end up struggling with, so it was found in the Journal of the American Statistical Association by Cleveland and McGill that people are best at reading charts which are based on the length of bars or lines. However, it was also brought into light that it is compatible when one needs to discern small differences between values rather than vast differences.
However, at the same time,
Misleading Pie-Charts
Pie-charts are useful only when the goal is to compare the parts with the whole or to show that the parts add up to a whole. A pie-chart is overused and is seldomly done justice to the aid of the reader. It is not a judicious choice if the reading demands precision.
Data visualization experts like Edward Tufte in his international treatise writes that the only design worse than a pie chart is several of them. It is a standard practice in several scientific disciplines to use the pie-chart in displaying the type of data and it is difficult to fight the tradition.
Behind Bars
“Bar graphs are something that you should use if you are visualizing counts or proportions,” says Tracy Weissegerber, a physiologist who studies how research is done and reported. “But they are not a very effective strategy for visualizing continuous data.”
Tracy found that even though it is highly used in the research, it often leads to several shortcomings as the bar is not able to represent continuous data which could lead to loss of significant information. Information like how many samples are represented by that bar- what is the difference in that sample taken which are being compared to, are we talking about averages or extremes — these microscopic details which are overlooked and can cause an error in making a judgment.
There are other good alternative graphs for small continuous data- Scatterplots, box plots, histograms. They all reveal the distribution of the data yet are rarely used.
Rainbows
Color can be very effective to show different aspects of data. It can add zest to a rather boring text. However, human perception of color isn’t straight forward and most scientists need to have a better understanding of the visual system when choosing colors to represent their data.
The rainbow palette has several serious drawbacks. Cartographers argue that the “Roy G. Biv” scale makes maps and other figures difficult to interpret and at times, misleading as well. And for the people who are color blind, they are completely unintelligible.
Also, the rainbow is perceived unevenly by the human brain. People see color in terms of hue, saturation, and lightness. Human brains rely most heavily on lightness to interpret shapes and depth and therefore tend to see the brightest colors as representing peaks and darker colors as valleys.
Even scientists can fall prey to their own illusion of interpreting their own data.
To avoid the rainbow problem, some researchers have come up with mathematically based palettes that match with the perceptual change in their colors to change in the corresponding data.
It must also be taken into account that the rainbow color scale is the default for software scientists for visualizing the data. In 2014, MathWorks switched the default to an improved color scheme called parula. In 2015, a cognitive scientist and a Data Scientist developed a new default color scheme called Viridis for making plots with the popular Python programming language.
Visualizing the future
Many data visualization problems occur because scientists are not aware of them or aren’t convinced that better figures are worth the extra effort. O’Donoghue says that he has been chairing the annual Vizbi conference, focused on visualizing biological science, teaching these methods to the scientists, and combing the literature for evidence of the best and worst practices, but it will take time to gain momentum.
As data is becoming more complex, scientists need to improvise the way they use data to handle the complexity. To make those visualizations effective for both the scientists and the general public-data visualization designers will have to apply the best approach that will conflate with humans’ way of visualizing and processing rather than against it.