I have a CSV
file with data that look like this:
Time Pressure
1/1/2017 0:00 5.8253
... ...
3/1/2017 0:10 4.2785
4/1/2017 0:20 5.20041
5/1/2017 0:30 4.40774
6/1/2017 0:40 4.03228
7/1/2017 0:50 5.011924
12/1/2017 1:00 3.9309888
I want to make a month-wise histogram (NORMALIZED) on the pressure data and finally write the plots into PDF. I understand that I need to use Groupby
and Numpy.hist
option,but I'm not sure how to use them. (I'm a newbie to Python). Please help!
CODE 1:
n = len(df) // 5
for tmp_df in (df[i:i+n] for i in range(0, len(df), n)):
gb_tmp = tmp_df.groupby(pd.Grouper(freq='M'))
ax = gb_tmp.hist()
plt.setp(ax.xaxis.get_ticklabels(),rotation=90)
plt.show()
plt.close()
This gives me the following error message:
ValueError: range() arg 3 must not be zero
CODE 2:
df1 = df.groupby(pd.Grouper(freq='M'))
np.histogram(df1,bins=10,range=None,normed=True)
This returns another error message:
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I tried the above codes, but got these errors. Not sure if I'm using it right.
A few simple steps. First you need to read your data file, into an array of cells. once you have your list of lists or rows of entry ( what ever you want to call them ) you need to collect all the observations for each month and take the average of each collection. Here I have implemented a simple buckets class to facilitate the aggregation of pressures into groups my months and provide the mean for each group. Lastly I plotted the result with matplotlib.
def readData(fn):
fh = open(fn)
lines = fh.read().split("\n")
ret = [k.split(" ") for k in lines[1:]]
fh.close()
return(ret)
class buckets:
def __init__(self):
self.data = {}
def add(self,key,value):
if not(key in self.data.keys()):
self.data[key]=[]
self.data[key].append(value)
def getMean(self,key):
nums = []
for k in range(0,len(self.data[key])):
try:
nums.append(self.data[key][k])
except:
pass
return(sum(nums)/float(len(nums)))
def keys(self):
return(self.data.keys())
import matplotlib
import numpy as np
data = readData("data.txt")
container = buckets()
for k in data:
print(k)
container.add(k[0].split("/")[0],float(k[1]))
histoBars = []
histoTicks = [int(k) for k in list(container.keys())]
histoTicks.sort()
histoTicks = [str(k) for k in histoTicks]
x = np.arange(len(histoTicks))
for k in histoTicks:
histoBars.append(container.getMean(k))
print(len(histoBars))
print(len(histoTicks))
import matplotlib.pyplot as plt
print(histoBars)
print(histoTicks)
fig, ax = plt.subplots()
plt.bar(x, histoBars)
plt.xticks( x, histoTicks )
plt.show()
A last quick note, I'm not really sure what data format your file is, it looked like the 2 columns were seperated by 7 spaces but then one of the samples only had 6, so you might have to change the delimiter or clean the table to make sure all the rows read without error.