In Bayesian statistics results appear in the form of the posterior distribution: measure of uncertainty quantified in the terms of probability. Bayesian statistics is a mature field. However, visualization of the posterior distributions have not been understood as a distinct problem. Inappropriate visualization methods developed for other types of problems are widely used. Methods for visualizing posterior over the space of complex objects (e.g. graphs, phylogenetic trees, clusterings, alignments, covariance matrix etc.) are immature. Here I try to establish a few principles which can be used to filter out improper visualization techniques and develop the correct ones.

Of course, to judge if a certain figure is better than another we need to understand the context. Here I am focusing on the visualization intended for communication. It means that we already have a posterior distribution, and we want to present it (or some of it’s features) in a way which would be honest, easy to understand and hard to misinterpret.

**Principle 1: Uncertainty should be visualized**

In Bayesian statistics, uncertainty is an essential part of the results. Not visualizing it is as wrong as concealing a half of the result.

**Principle 2: Visualization of variability ≠ Visualization of u****ncertainty**

Boxplot is a striking example for this principle. Boxplot is a prefect tool for showing a variability in the data, but it should not be used for visualizing the posterior distribution.The inner interval of the boxplot contains almost the same probability mass as the outer intervals, but are presented it a completely different way. This deceives the reader, leaving the overconfident impression about the estimates.

**Principle 3: Equal probability = Equal ink**

Here is the same distribution is presented with four different methods united under the same principles: same probability mass is visualized with the same amount of ink. The ink is represented either by the size of the colored area or by the color concentration. These figures are intuitive and hard to misinterpret.

**Principle 4: Do not overemphasize of the point estimate**

If the mean (median or mode) of the distribution is heavily highlighted, this can yield overconfident impression about the estimates.

Here I highlight the means with the thin white lines (color of the background).

**Principle 5: Certain estimates should be emphasized over uncertain**

More certain results are more significant and interesting then the vague improbable ones. Therefore, visualization should emphasize certain estimates. However, uncertain estimates are naturally represented by wider distribution, occupying more visual space.

Note: These principles (as any visualization principles) are contextual, and should be used (or not used) with the goals of this visualization in mind.

[…] My favourite design in this figure: The height of each histogram is proportional to the square root of the maximum of the distribution. This is made to emphasize the certain estimates (see principles of posterior visualization) […]

LikeLike

[…] How to visualize an uncertainty about a time-dependent variable according to the principles of uncertainty visualization? […]

LikeLike

[…] Shubin has this great post from a few years ago on Bayesian visualization. He lists the following […]

LikeLike

[…] Shubin has this great post from a few years ago on Bayesian visualization. He lists the following […]

LikeLike

[…] Shubin has this great post from a few years ago on Bayesian visualization. He lists the following […]

LikeLike

LikeLike

LikeLike

LikeLike

Many thanks for the suggestions – Could you share the code to produce the graphics in this post and in the post on reinventing the histogram? Thanks!

LikeLike

Hi!

I was using my own python code for plotting this stuff rather than any standard library. Maybe Ill clean up the code and make it public some day, maybe…

LikeLike

LikeLike