When drawing a dot plot using matplotlib, I would like to offset overlapping datapoints to keep them all visible. For examples, if I have
CategoryA: 0,0,3,0,5
CategoryB: 5,10,5,5,10
I want each of the CategoryA
"0" datapoints to be set side by side, rather than right on top of each other, while still remaining distinct from CategoryB
.
In R (ggplot2
) there is a "jitter"
option that does this. Is there a similar option in matplotlib, or is there another approach that would lead to a similar result?
Edit: to clarify, the "beeswarm"
plot in R is essentially what I have in mind, and pybeeswarm
is an early but useful start at a matplotlib/Python version.
Edit: to add that Seaborn's Swarmplot, introduced in version 0.7, is an excellent implementation of what I wanted.
Seaborn provides histogram-like categorical dot-plots through
sns.swarmplot()
and jittered categorical dot-plots viasns.stripplot()
:I used numpy.random to "scatter/beeswarm" the data along X-axis but around a fixed point for each category, and then basically do pyplot.scatter() for each category:
One way to approach the problem is to think of each 'row' in your scatter/dot/beeswarm plot as a bin in a histogram:
This obviously involves binning the data, so you may lose some precision. If you have discrete data, you could replace:
with:
An alternative approach that preserves the exact y-coordinates, even for continuous data, is to use a kernel density estimate to scale the amplitude of random jitter in the x-axis:
This second method is loosely based on how violin plots work. It still cannot guarantee that none of the points are overlapping, but I find that in practice it tends to give quite nice-looking results as long as there are a decent number of points (>20), and the distribution can be reasonably well approximated by a sum-of-Gaussians.
Extending the answer by @user2467675, here's how I did it:
The
stdev
variable makes sure that the jitter is enough to be seen on different scales, but it assumes that the limits of the axes are 0 and the max value.You can then call
jitter
instead ofscatter
.Not knowing of a direct mpl alternative here you have a very rudimentary proposal:
Seaborn's swarmplot seems like the most apt fit for what you have in mind, but you can also jitter with Seaborn's regplot: