At the moment I've created a for loop which cycles through my arrays and dumps out the result on a PDF however I'm having trouble combining this with heapq.nlargest. I want to overlap my histograms with the top 10% scores for each array with pandas.
Currently, the code is
x = list(df1.columns.values)
fig = plt.figure(num=None, figsize=(30, 200), dpi=80, facecolor='w', edgecolor='w')
for i in range(6):#(len(df1.ix[i])):
val= x[i]
y = df1.iloc[:,i]
yy = heapq.nlargest(len(df1.iloc[:,i])//10, df1.iloc[:,i])
ax = fig.add_subplot(len(df1.ix[0]),3,i+1)
plt.hist(y, bins=np.logspace(-4, 3, 100))
plt.hist(yy, bins=np.logspace(-4, 3, 100))
plt.savefig('D:/All Documents/Frequency_Distribution_Scores.pdf')
When I introduce
yy = heapq.nlargest(len(df1.iloc[:,i])*p//100, (df1.iloc[:,i]))
plt.hist(yy, bins=np.logspace(-4, 3, 100))
It seems to just plot the top 10% of values of the 1st array on all my graphs, rather than finding the top 10% of each array.
Anyone have any pointers? Cheers