Using matplotlib to label points on a scatter plot

2019-05-08 05:12发布

问题:

I am trying to use the DataCursor approach (https://stackoverflow.com/a/4674445/1301710) to label points using matplotlib. I have several thousand points and would like to see their label on mouseover. However, there are two differences: one, I'm making a scatter plot and two, I want to label names for each point and not just the x,y coordinates.

here is my code

import os
import matplotlib.pyplot as plt

class DataCursor(object):
text_template = 'x: %0.2f\ny: %0.2f'
x, y = 0.0, 0.0
xoffset, yoffset = -20, 20
text_template = 'x: %0.2f\ny: %0.2f'

def __init__(self, ax, labels,x,y):
    self.ax = ax
self.xlist = x
self.ylist = y
self.labels = labels
    self.annotation = ax.annotate(self.text_template, 
            xy=(self.x, self.y), xytext=(self.xoffset, self.yoffset), 
            textcoords='offset points', ha='right', va='bottom',
            bbox=dict(boxstyle='round,pad=0.5', fc='yellow', alpha=0.5),
            arrowprops=dict(arrowstyle='->', connectionstyle='arc3,rad=0')
            )
    self.annotation.set_visible(False)

def __call__(self, event):
    self.event = event
    xdata, ydata = event.artist.get_data()
    #self.x, self.y = xdata[event.ind], ydata[event.ind]
    self.x, self.y = event.mouseevent.xdata, event.mouseevent.ydata
self.label = self.labels[self.xlist.index(self.x)]
    if self.x is not None:
        self.annotation.xy = self.x, self.y
        self.annotation.set_text(self.label)
        self.annotation.set_visible(True)
        event.canvas.draw()

def process():
#code to make ht_dict here
# ht_dict has the following format: 'ht1' = [nov14count, nov21count] where each key is a string and each value is a list of two integers

print("Start making scatter plot..")
hts = []
nov14 = []
nov21 = []
for key in ht_dict.keys():
    nov14.append(ht_dict[key][0])
    nov21.append(ht_dict[key][1])
hts.append(key)
fig = plt.figure()
scatter = plt.scatter(nov14, nov21)


fig.canvas.mpl_connect('pick_event', DataCursor(plt.gca(), hts, nov14, nov21))
scatter.set_picker(5)
plt.show()  

process()

I am getting the following error though:

AttributeError: 'CircleCollection' object has no attribute 'get_data'

I want to be able to see the string stored in the list hts on mouseover at the x and y coordinates stored in nov14 and nov21 lists respectively at the same indices. I'm not sure what to make of this error and would appreciate any help. Another question I have is (from trying out changes on the existing plot in the DataCursor thread) that using index of to get back label as I am currently doing will give me a value doesn't exist in list error since the clicked on value may not be exactly same as the value in the list. Do you have any suggestions of a better way to display some label/name about a point?

Any guidance or pointers to documentation where I could read up would be appreciated.

Thanks!

回答1:

Adopting the approach for annotating using the label, given in the examples section on the documentation page of the mpldatacursor, you could do something along these lines (plot a single point with each scatter plot to be able to set an individual label for each point):

import matplotlib.pyplot as plt
from mpldatacursor import datacursor
import random

fig, ax = plt.subplots()
ax.set_title('Click on a dot to display its label')

# Plot a number of random dots
for i in range(1, 1000):
    ax.scatter([random.random()], [random.random()], label='$ID: {}$'.format(i))

# Use a DataCursor to interactively display the label for a selected line...
datacursor(formatter='{label}'.format)

plt.show()

Unfortunately it's rather inefficient, i.e., hardly usable with more than, say, 1000 points.

Result example image: