In Bayesian statistics results appear in the form of the posterior distribution: measure of uncertainty quantified in the terms of probability. Bayesian statistics is a mature field. However, visualization of the posterior distributions have not been understood as a distinct problem. Inappropriate visualization methods developed for other types of problems are widely used. Methods for visualizing posterior over the space of complex objects (e.g. graphs, phylogenetic trees, clusterings, alignments, covariance matrix etc.) are immature. Here I try to establish a few principles which can be used to filter out improper visualization techniques and develop the correct ones.
Of course, to judge if a certain figure is better than another we need to understand the context. Here I am focusing on the visualization intended for communication. It means that we already have a posterior distribution, and we want to present it (or some of it’s features) in a way which would be honest, easy to understand and hard to misinterpret.
Principle 1: Uncertainty should be visualized
In Bayesian statistics, uncertainty is an essential part of the results. Not visualizing it is as wrong as concealing a half of the result.
Principle 2: Visualization of variability ≠ Visualization of uncertainty
Boxplot is a striking example for this principle. Boxplot is a prefect tool for showing a variability in the data, but it should not be used for visualizing the posterior distribution.The inner interval of the boxplot contains almost the same probability mass as the outer intervals, but are presented it a completely different way. This deceives the reader, leaving the overconfident impression about the estimates.
Principle 3: Equal probability = Equal ink
Here is the same distribution is presented with four different methods united under the same principles: same probability mass is visualized with the same amount of ink. The ink is represented either by the size of the colored area or by the color concentration. These figures are intuitive and hard to misinterpret.
Principle 4: Do not overemphasize of the point estimate
If the mean (median or mode) of the distribution is heavily highlighted, this can yield overconfident impression about the estimates.
Here I highlight the means with the thin white lines (color of the background).
Principle 5: Certain estimates should be emphasized over uncertain
More certain results are more significant and interesting then the vague improbable ones. Therefore, visualization should emphasize certain estimates. However, uncertain estimates are naturally represented by wider distribution, occupying more visual space.
Note: These principles (as any visualization principles) are contextual, and should be used (or not used) with the goals of this visualization in mind.