Python: Plot month-wise normalised histogram

2019-05-30 18:15发布

问题:

I have a CSV file with data that look like this:

Time               Pressure
1/1/2017 0:00       5.8253
...                     ...
3/1/2017 0:10       4.2785
4/1/2017 0:20       5.20041
5/1/2017 0:30       4.40774
6/1/2017 0:40       4.03228
7/1/2017 0:50       5.011924
12/1/2017 1:00      3.9309888

I want to make a month-wise histogram (NORMALIZED) on the pressure data and finally write the plots into PDF. I understand that I need to use Groupby and Numpy.hist option,but I'm not sure how to use them. (I'm a newbie to Python). Please help!

CODE 1:

n = len(df) // 5
for tmp_df in (df[i:i+n] for i in range(0, len(df), n)):
    gb_tmp = tmp_df.groupby(pd.Grouper(freq='M'))
    ax = gb_tmp.hist()
    plt.setp(ax.xaxis.get_ticklabels(),rotation=90)
    plt.show()
    plt.close()

This gives me the following error message:

ValueError: range() arg 3 must not be zero

CODE 2:

df1 = df.groupby(pd.Grouper(freq='M'))
np.histogram(df1,bins=10,range=None,normed=True)

This returns another error message:

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I tried the above codes, but got these errors. Not sure if I'm using it right.

回答1:

A few simple steps. First you need to read your data file, into an array of cells. once you have your list of lists or rows of entry ( what ever you want to call them ) you need to collect all the observations for each month and take the average of each collection. Here I have implemented a simple buckets class to facilitate the aggregation of pressures into groups my months and provide the mean for each group. Lastly I plotted the result with matplotlib.

def readData(fn):
    fh = open(fn)
    lines = fh.read().split("\n")
    ret = [k.split("       ") for k in lines[1:]]
    fh.close()
    return(ret)

class buckets:
    def __init__(self):
        self.data = {}
    def add(self,key,value):
        if not(key in self.data.keys()):
            self.data[key]=[]
        self.data[key].append(value)
    def getMean(self,key):
        nums = []
        for k in range(0,len(self.data[key])):
            try:
                nums.append(self.data[key][k])
            except:
                pass
        return(sum(nums)/float(len(nums)))
    def keys(self):
        return(self.data.keys())

import matplotlib
import numpy as np

data = readData("data.txt")
container = buckets()

for k in data:
    print(k)
    container.add(k[0].split("/")[0],float(k[1]))

histoBars = []
histoTicks = [int(k) for k in list(container.keys())]
histoTicks.sort()
histoTicks = [str(k) for k in histoTicks]
x = np.arange(len(histoTicks))

for k in histoTicks:
        histoBars.append(container.getMean(k))

print(len(histoBars))
print(len(histoTicks))

import matplotlib.pyplot as plt
print(histoBars)
print(histoTicks)
fig, ax = plt.subplots()
plt.bar(x, histoBars)
plt.xticks( x, histoTicks )
plt.show()

A last quick note, I'm not really sure what data format your file is, it looked like the 2 columns were seperated by 7 spaces but then one of the samples only had 6, so you might have to change the delimiter or clean the table to make sure all the rows read without error.