reading a comma-delimited file with a date object

2020-04-21 02:59发布

I have a file with entries that look like

2013-12-11 23:00:27.003293,$PAMWV,291,R,005.8,M,A*36
2013-12-11 23:00:28.000295,$PAMWV,284,R,005.5,M,A*3F
2013-12-11 23:00:29.000295,$PAMWV,273,R,004.0,M,A*33
2013-12-11 23:00:30.003310,$PAMWV,007,R,004.9,M,A*3B

Considering the delimiters are actually a comma (','), this is a classic CSV file.

I've tried:

wind = loadtxt("/disk2/Wind/ws425.log.test", dtype(str,float), delimiter=',', usecols=(0,4))
ts= time.strptime(str(wind[:,0]), '%Y-%m-%d %H:%M:%S.%f')

and what I get is

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-31-484b71dea724> in <module>()
----> 1 ts= time.strptime(str(wind[:,0]), '%Y-%m-%d %H:%M:%S.%f')

/opt/Enthought/canopy/appdata/canopy-1.0.3.1262.rh5-x86_64/lib/python2.7/_strptime.pyc in _strptime_time(data_string, format)
    452 
    453 def _strptime_time(data_string, format="%a %b %d %H:%M:%S %Y"):
--> 454     return _strptime(data_string, format)[0]

/opt/Enthought/canopy/appdata/canopy-1.0.3.1262.rh5-x86_64/lib/python2.7/_strptime.pyc in _strptime(data_string, format)
    323     if not found:
    324         raise ValueError("time data %r does not match format %r" %
--> 325                          (data_string, format))
    326     if len(data_string) != found.end(): 
    327         raise ValueError("unconverted data remains: %s" %

ValueError: time data "['2013-12-12 00:00:02.251311' '2013-12-12 00:00:03.255296'\n     '2013-12-12 00:00:04.254294' ..., '2013-12-12 16:10:50.579022'\n '2013-12-12    16:10:51.607035' '2013-12-12 16:10:52.604020']" does not match format '%Y-%m-%d %H:%M:%S.%f'

I suspect I'm mis-using the data type assignment in time.strptime() but I've been unsuccessful in finding a correct type so far.

Suggestions?

5条回答
\"骚年 ilove
2楼-- · 2020-04-21 03:34

Oh the real problem here is that time.strptime does not support %f for microseconds, see here for a list of formatting characters supported by time.strptime and time.strftime.

What you do want is datetime.strptime which does support the %f formatting character for microseconds.

查看更多
做自己的国王
3楼-- · 2020-04-21 03:35

I had to do something like

>>> import numpy as np
>>> from datetime import datetime
>>> wind = np.loadtxt("ws425.log.test", delimiter=",", usecols=(0,4), dtype=object,
...                   converters={0: lambda x: datetime.strptime(x, "%Y-%m-%d %H:%M:%S.%f"),
...                               4: np.float})
>>> 
>>> wind
array([[datetime.datetime(2013, 12, 11, 23, 0, 27, 3293), 5.8],
       [datetime.datetime(2013, 12, 11, 23, 0, 28, 295), 5.5],
       [datetime.datetime(2013, 12, 11, 23, 0, 29, 295), 4.0],
       [datetime.datetime(2013, 12, 11, 23, 0, 30, 3310), 4.9]], dtype=object)

For time series data, though, I've switched to using pandas, because it makes a lot of things much easier:

>>> import pandas as pd
>>> df = pd.read_csv("ws425.log.test", parse_dates=[0], header=None, usecols=[0, 4])
>>> df
                           0    4
0 2013-12-11 23:00:27.003293  5.8
1 2013-12-11 23:00:28.000295  5.5
2 2013-12-11 23:00:29.000295  4.0
3 2013-12-11 23:00:30.003310  4.9

[4 rows x 2 columns]
>>> df[0][0]
Timestamp('2013-12-11 23:00:27.003293', tz=None)
查看更多
太酷不给撩
4楼-- · 2020-04-21 03:35

time.strptime() expects a string such as '2013-12-11 23:00:30.003310' but you are giving it a string representation of an array instead:

['2013-12-12 00:00:02.251311', '2013-12-12 00:00:03.255296', ...]

The minimal fix is to parse one item at a time:

ts = [time.strptime(s, '%Y-%m-%d %H:%M:%S.%f') for s in wind[:,0]]

Or you could use converters parameter for loadtxt:

from datetime import datetime
import numpy as np

def str2timestamp(timestr, epoch=datetime.fromtimestamp(0)):
    """Convert local time string into seconds since epoch (float)."""
    # np.datetime64 API is experimental so use datetime instead
    #NOTE: local time may be ambiguous, non-monotonous
    dt = datetime.strptime(timestr, '%Y-%m-%d %H:%M:%S.%f')
    return (dt - epoch).total_seconds()

wind = np.loadtxt('input.csv', usecols=(0, 4), delimiter=',',
                  converters={0: str2timestamp})
print(wind)

Output

[[  1.38679203e+09   5.80000000e+00]
 [  1.38679203e+09   5.50000000e+00]
 [  1.38679203e+09   4.00000000e+00]
 [  1.38679203e+09   4.90000000e+00]]
查看更多
聊天终结者
5楼-- · 2020-04-21 03:36

You just have some errors in your NumPy loadtxt call where you define the dtype. It should be dtype=[('date', 'str', 26), ('wind', 'float')]; you must specify the size of the string. Now you can reference the date field using its name, EG: wind['date']. Your strptime format is fine, but you want the datetime module from Python's datetime package, not time.

import numpy as np
from datetime import datetime
wind = loadtxt("/disk2/Wind/ws425.log.test", dtype=[('date', 'str', 26), ('wind', 'float')], delimiter=',', usecols=(0,4))
ts = [datetime.strptime(d, '%Y-%m-%d %H:%M:%S.%f') for d in wind['date']]

This returns the following:

[datetime.datetime(2013, 12, 11, 23, 0, 27, 3293),
 datetime.datetime(2013, 12, 11, 23, 0, 28, 295),
 datetime.datetime(2013, 12, 11, 23, 0, 29, 295),
 datetime.datetime(2013, 12, 11, 23, 0, 30, 3310)]

Maybe you want to feed that back into your NumPy array?

wind['date'] = np.array(ts, dtype='datetime64[s]')

this yields

array([('2013-12-11T23:00:27Z', 5.8), ('2013-12-11T23:00:28Z', 5.5),
   ('2013-12-11T23:00:29Z', 4.0), ('2013-12-11T23:00:30Z', 4.9)],
  dtype=[('date', 'S26'), ('wind', '<f8')])
查看更多
家丑人穷心不美
6楼-- · 2020-04-21 03:37

I am not sure what is wrong with numpy; but with csv it works great:

>>> import time
>>> import csv
>>> with open('t.txt') as f:
...    r = csv.reader(f)
...    w = [[time.strptime(i[0], '%Y-%m-%d %H:%M:%S.%f')]+i[1:] for i in r]
...
>>> w
[[time.struct_time(tm_year=2013, tm_mon=12, tm_mday=11, tm_hour=23, tm_min=0, tm_sec=27, tm_wday=2, tm_yday=345, tm_isdst=-1), '$PAMWV', '291', 'R', '005.8', 'M', 'A*36'], [time.struct_time(tm_year=2013, tm_mon=12, tm_mday=11, tm_hour=23, tm_min=0, tm_sec=28, tm_wday=2, tm_yday=345, tm_isdst=-1), '$PAMWV', '284', 'R', '005.5', 'M', 'A*3F'], [time.struct_time(tm_year=2013, tm_mon=12, tm_mday=11, tm_hour=23, tm_min=0, tm_sec=29, tm_wday=2, tm_yday=345, tm_isdst=-1), '$PAMWV', '273', 'R', '004.0', 'M', 'A*33'], [time.struct_time(tm_year=2013, tm_mon=12, tm_mday=11, tm_hour=23, tm_min=0, tm_sec=30, tm_wday=2, tm_yday=345, tm_isdst=-1), '$PAMWV', '007', 'R', '004.9', 'M', 'A*3B']]
查看更多
登录 后发表回答