Python create word plot between two lists in matpl

2019-06-02 10:52发布

问题:

Say I have two lists of words,

list1 = ['Cat', 'Dog', 'Elephant', 'Giraffe' 'Monkey']
list2 = ['Cat', 'Dog', 'Eagle', 'Elephant', 'Monkey']

And I would like to create a plot that looks like this:

Cat     ->  Cat
Dog     ->  Dog
Elephant \  Eagle
Giraffe   > Elephant
Monkey   -> Monkey

Basically, a word 'ladder' plot with arrows connecting each common word between the two lists. If a given word in list1 doesn't have a counterpart in list2 (like Eagle and Giraffe in the example) then there is no arrow needed.

I am not aware of a way to do this in matplotlib. Does anyone know how to do this in matplotlib (perhaps in conjunction with networkx?)? Bonus points if the plot works for an arbitrary number of lists (say with another set of arrows connecting list2 and list3 too, etc).

回答1:

I think that putting the data into a graph-based representation is a good approach for the problem as described, but perhaps you have a use-case where this is too heavy-weight. In the former, @xg.pltpy has made a suggestion already.

Here is one way to do it solely in matplotlib, using the powerful annotate functionality.

import matplotlib.pyplot as plt

# define drawing of the words and links separately.
def plot_words(wordlist, col, ax):
    bbox_props = dict(boxstyle="round4,pad=0.3", fc="none", ec="b", lw=2)
    for i, word in enumerate(wordlist):
        ax.text(col, i, word, ha="center", va="center",
                size=12, bbox=bbox_props)

def plot_links(list1, list2, cols, ax):
    connectionstyle = "arc3,rad=0"
    for i, word in enumerate(list1):
        try: # do we need an edge?
            j = list2.index(word)
        except ValueError:
            continue # move on to the next word

        # define coordinates (relabelling here for clarity only)
        y1, y2 = i, j
        x1, x2 = cols
        # draw a line from word in 1st list to word in 2nd list
        ax.annotate("", xy=(x2, y2), xycoords='data',
                    xytext=(x1, y1), textcoords='data',
                    arrowprops=dict(
                        arrowstyle="->", color="k", lw=2,
                        shrinkA=25, shrinkB=25, patchA=None, patchB=None,
                        connectionstyle=connectionstyle,))



# define several lists
list1 = ['Cat', 'Dog', 'Elephant', 'Giraffe', 'Monkey']
list2 = ['Cat', 'Dog', 'Eagle', 'Elephant', 'Monkey']
list3 = ['Cat', 'Mouse', 'Horse', 'Elephant', 'Monkey']


# now plot them all -- words first then links between them
plt.figure(1); plt.clf()
fig, ax = plt.subplots(num=1)

plot_words(list1, col=1, ax=ax)
plot_words(list2, col=2, ax=ax)
plot_words(list3, col=0, ax=ax)
plot_links(list1, list2, ax=ax, cols=[1,2])
plot_links(list1, list3, ax=ax, cols=[1,0])

ax.set_xlim(-0.5, 2.5)
ax.set_ylim(-0.5, len(list1)+0.5)

There are LOTs of options for the arrow type, see demo.

It would be cleaner to supply the patchA and patchB arguments in arrowprops, since annotate then automatically clips the arrow length to avoid the patches (here, the words). I leave that as an exercise for the reader ;)



回答2:

Check out matplotlib.pyplot.text. You can give an exact x,y coordinate for a point on a graph and it will 'plot' that word.

Here is a sloppy, but working example:

import matplotlib.pyplot as plt

list1 = ['Cat', 'Dog', 'Elephant', 'Giraffe', 'Monkey']
list2 = ['Cat', 'Dog', 'Eagle', 'Elephant', 'Monkey']
fig, ax = plt.subplots()
x = .5
y = 1
for i, word in enumerate(list1):
    ax.text(x,y,word)
    if word == list2[i]:
        ax.text(x+.25,y,'-> '+word)
    else:
        ax.text(x+.25,y,'/ '+list2[i])
    y = y-1/len(list1)



回答3:

Here is one exemple with networkx.

Disclaimer: Many of the code inside the for loops can be simplified and converted to one-liners (i.e. the position and label dictionaries can easily be converted to one-liners in python 3.5 or higher using this answer). For clarity, I believed it was better to explicit all the steps.

The first step is to create a directed graph in networkx. Then, for each element in list2, the following actions are performed:

  • The position and label in the plot of the node is stored in a dictionary.
  • A node is added to the graph. As the elements in the lists are repeated, the node name is not the animal in list2 but instead the name followed by 'list2', in order to have different nodes. That is why we need a label_dict.

For list1, the same steps are performed, adding one step more:

  • If the current animal is in list2, add an edge in the graph

Here is the example code, which works for any lenght of the lists and also if they have different lenghts.

import networkx as nx
list1 = ['Cat', 'Dog', 'Elephant', 'Giraffe', 'Monkey']
list2 = ['Cat', 'Dog', 'Eagle', 'Elephant', 'Monkey']
DG = nx.DiGraph()
pos_dict = {}; label_dict = {} # dictionary with the plot info
for i,animal in enumerate(list2):
    pos_dict['{}list2'.format(animal)] = (1,i)
    label_dict['{}list2'.format(animal)] = animal
    DG.add_node('{}list2'.format(animal))
for i,animal in enumerate(list1):
    pos_dict['{}list1'.format(animal)] = (0,i)
    label_dict['{}list1'.format(animal)] = animal
    DG.add_node('{}list1'.format(animal))
    if animal in list2:
        DG.add_edge('{}list1'.format(animal),'{}list2'.format(animal))

nx.draw_networkx(DG,
                 arrows=True,
                 with_labels=True,
                 node_color='w',
                 pos=pos_dict,
                 labels=label_dict,
                 node_size=2000)
plt.axis('off') # removes the axis to leave only the graph

The output image using networkx2.1 (in 2.0 the arrows look different) is this: