Overlapping Histogram in For Loop with heapq.nlarg

2019-09-22 00:19发布

问题:

At the moment I've created a for loop which cycles through my arrays and dumps out the result on a PDF however I'm having trouble combining this with heapq.nlargest. I want to overlap my histograms with the top 10% scores for each array with pandas.

Currently, the code is

  x = list(df1.columns.values)
fig = plt.figure(num=None, figsize=(30, 200),  dpi=80, facecolor='w', edgecolor='w')


for i in range(6):#(len(df1.ix[i])):
    val= x[i]   
    y = df1.iloc[:,i]
    yy = heapq.nlargest(len(df1.iloc[:,i])//10, df1.iloc[:,i])


    ax = fig.add_subplot(len(df1.ix[0]),3,i+1)   
    plt.hist(y, bins=np.logspace(-4, 3, 100))
    plt.hist(yy, bins=np.logspace(-4, 3, 100))

    plt.savefig('D:/All Documents/Frequency_Distribution_Scores.pdf')

When I introduce

yy = heapq.nlargest(len(df1.iloc[:,i])*p//100, (df1.iloc[:,i]))
plt.hist(yy, bins=np.logspace(-4, 3, 100))

It seems to just plot the top 10% of values of the 1st array on all my graphs, rather than finding the top 10% of each array.

Anyone have any pointers? Cheers

回答1:

Solved it!

x = list(df1.columns.values)
fig = plt.figure(num=None, figsize=(30, 200),  dpi=80, facecolor='w',  edgecolor='w')


for i in range(len(df1.ix[i])):
    val= x[i]   
    y = df1.iloc[:,i]
    yy_s = np.sort(df1.iloc[:,i])[::-1]
    yy_s_trim0 = yy_s[np.where(yy_s > 0)]
    yy_10 = yy_s_trim0[0:(len(yy_s_trim0)/10)]

    ax = fig.add_subplot(len(df1.ix[0]),3,i+1)   
    plt.hist(y, bins=np.logspace(-4, 3, 100))
    plt.hist(yy_10, bins=np.logspace(-4, 3, 100))