Histograms and scatterplots are great methods of visualizing data and the relationship between variables, but recently I have been wondering about what visualization techniques I am missing. What do you think is the most underused type of plot?
Answers should:
- Not be very commonly used in practice.
- Be understandable without a great deal of background discussion.
- Be applicable in many common situations.
- Include reproducible code to create an example (preferably in R). A linked image would be nice.
Check out Edward Tufte's work and especially this book
You can also try and catch his travelling presentation. It's quite good and includes a bundle of four of his books. (i swear i don't own his publisher's stock!)
By the way, i like his sparkline data visualization technique. Surprise! Google's already written it and put it out on Google Code
Mosaic plots seem to me to meet all four criteria mentioned. There are examples in r, under mosaicplot.
Summary plots? Like mentioned in this page:
Visualizing Summary Statistics and Uncertainty
Horizon graphs (pdf), for visualising many time series at once.
Parallel coordinates plots (pdf), for multivariate analysis.
Association and mosaic plots, for visualising contingency tables (see the vcd package)
I really like dotplots and find when I recommend them to others for appropriate data problems they are invariably surprised and delighted. They don't seem to get much use, and I can't figure out why.
Here's an example from Quick-R:
I believe Cleveland is most responsible for the development and promulgation of these, and the example in his book (in which faulty data was easily detected with a dotplot) is a powerful argument for their use. Note that the example above only puts one dot per line, whereas their real power comes with you have multiple dots on each line, with a legend explaining which is which. For instance, you could use different symbols or colors for three different time points, and thence easily get a sense of time patterns in different categories.
In the following example (done in Excel of all things!), you can clearly see which category might have suffered from a label swap.
Regarding sparkline and other Tufte idea, the YaleToolkit package on CRAN provides functions
sparkline
andsparklines
.Another package that is useful for larger datasets is hexbin as it cleverly 'bins' data into buckets to deal with datasets that may be too large for naive scatterplots.