Pandas Seaborn Swarmplot doesn't plot

2019-06-07 07:17发布

I am trying to plot a seaborn swarmplot where col[2] is the freq and col[3] are the classes to be grouped by. Input is given below and the code too. Input

tweetcricscore,51,high active
tweetcricscore,46,event based
tweetcricscore,12,event based
tweetcricscore,46,event based
tweetcricscore,1,viewers 
tweetcricscore,178,viewers
tweetcricscore,46,situational
tweetcricscore,23,situational
tweetcricscore,1,situational
tweetcricscore,8,situational
tweetcricscore,56,situational

Code:

import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns

sns.set(style="whitegrid", color_codes=True)

df = pd.read_csv('input.csv', header = None)

df.columns = ['keyword','freq','class']

ax = sns.swarmplot(x="class", y="freq", data=df)

plt.show()

The code wouldn't plot nor would give any error. Any suggestion to optimize the code ?

2条回答
Ridiculous、
2楼-- · 2019-06-07 07:36

I think you need first read_csv, then create new column class by concanecate with fillna and last strip whitespaces:

import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import io

temp=u"""tweetcricscore 51 high active
tweetcricscore 46 event based
tweetcricscore 12 event based
tweetcricscore 46 event based
tweetcricscore 1 viewers 
tweetcricscore 178 viewers
tweetcricscore 46 situational
tweetcricscore 23 situational
tweetcricscore 1 situational
tweetcricscore 8 situational
tweetcricscore 56 situational"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), 
                 sep="\s+", #separator is arbitrary whitespace
                 names=['keyword','freq','class1','class2']) #set new col names
df['class'] = df['class1'] + ' ' + df['class2'].fillna('')
df['class'] = df['class'].str.strip()
print df
           keyword  freq       class1  class2        class
0   tweetcricscore    51         high  active  high active
1   tweetcricscore    46        event   based  event based
2   tweetcricscore    12        event   based  event based
3   tweetcricscore    46        event   based  event based
4   tweetcricscore     1      viewers     NaN      viewers
5   tweetcricscore   178      viewers     NaN      viewers
6   tweetcricscore    46  situational     NaN  situational
7   tweetcricscore    23  situational     NaN  situational
8   tweetcricscore     1  situational     NaN  situational
9   tweetcricscore     8  situational     NaN  situational
10  tweetcricscore    56  situational     NaN  situational

sns.set(style="whitegrid", color_codes=True)
ax = sns.swarmplot(x="class", y="freq", data=df)
plt.show()

graph

Solution if column class not contains whitespaces:

import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import io

temp=u"""tweetcricscore 51 highactive
tweetcricscore 46 eventbased
tweetcricscore 12 eventbased
tweetcricscore 46 eventbased
tweetcricscore 1 viewers 
tweetcricscore 178 viewers
tweetcricscore 46 situational
tweetcricscore 23 situational
tweetcricscore 1 situational
tweetcricscore 8 situational
tweetcricscore 56 situational"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), 
                 sep="\s+", #separator is arbitrary whitespace
                 names=['keyword','freq','class']) #set new col names
print df
           keyword  freq        class
0   tweetcricscore    51   highactive
1   tweetcricscore    46   eventbased
2   tweetcricscore    12   eventbased
3   tweetcricscore    46   eventbased
4   tweetcricscore     1      viewers
5   tweetcricscore   178      viewers
6   tweetcricscore    46  situational
7   tweetcricscore    23  situational
8   tweetcricscore     1  situational
9   tweetcricscore     8  situational
10  tweetcricscore    56  situational

sns.set(style="whitegrid", color_codes=True)
ax = sns.swarmplot(x="class", y="freq", data=df)
plt.show()

graph1

EDIT2:

If separator is , use:

import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import io

temp=u"""tweetcricscore,51,high active
tweetcricscore,46,event based
tweetcricscore,12,event based
tweetcricscore,46,event based
tweetcricscore,1,viewers
tweetcricscore,178,viewers
tweetcricscore,46,situational
tweetcricscore,23,situational
tweetcricscore,1,situational
tweetcricscore,8,situational
tweetcricscore,56,situational"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), names=['keyword','freq','class'])
print df
           keyword  freq        class
0   tweetcricscore    51  high active
1   tweetcricscore    46  event based
2   tweetcricscore    12  event based
3   tweetcricscore    46  event based
4   tweetcricscore     1      viewers
5   tweetcricscore   178      viewers
6   tweetcricscore    46  situational
7   tweetcricscore    23  situational
8   tweetcricscore     1  situational
9   tweetcricscore     8  situational
10  tweetcricscore    56  situational

sns.set(style="whitegrid", color_codes=True)
ax = sns.swarmplot(x="class", y="freq", data=df)
plt.show()
查看更多
Lonely孤独者°
3楼-- · 2019-06-07 07:54

After several trails in plotting swamplot with the dataset of more than 8-10k rows and with constant help and suggestions by jezreal. We come to conclusion that seaborn category plotting swarmplot cannot scale the large data like the other plots from seaborn which is also mentioned in the tutorial document. Hence I change the plotting style to bokeh scatter plot where I use the numeric values on y axis and grouped category names on x axis and this kinda solved my problem of plotting univariate data plotting with a category.

import numpy as np
import matplotlib.pyplot as plt
from pylab import*
import math
from matplotlib.ticker import LogLocator
import pandas as pd

from bokeh.models import BoxSelectTool, BoxZoomTool, LassoSelectTool
from bokeh.charts import Scatter, output_file, show
from bokeh.plotting import figure, hplot, vplot
from bokeh.models import LinearAxis

df = pd.read_csv('input.csv', header = None)

df.columns = ['user','freq','class']

scatter = Scatter( df, x='class', y='freq', color='class', marker='class', title=' User classification', legend=False)

output_file('output.html', title='output')

show(scatter)

This allowing grouping by class column with allocating colors and markers according to the groups. The freq is plotted along the y axis. output

Note: This might have accidentally worked as the data is discrete is nature.

查看更多
登录 后发表回答