numpy.genfromtxt with datetime.strptime converter

I have data similar to that seen in this gist and I am trying to extract the data with numpy. I am rather new to python so I tried to do so with the following code

import numpy as np
from datetime import datetime

convertfunc = lambda x: datetime.strptime(x, '%H:%M:%S:.%f')
col_headers = ["Mass", "Thermocouple", "T O2 Sensor",\
               "Igniter", "Lamps", "O2", "Time"]
data = np.genfromtxt(files[1], skip_header=22,\
                     names=col_headers,\
                     converters={"Time": convertfunc})

Where as can be seen in the gist there are 22 rows of header material. In Ipython, when I "run" the following code I receive an error that ends with the following:

TypeError: float() argument must be a string or a number

The full ipython error trace can be seen here.

I am able to extract the six columns of numeric data just fine using an argument to genfromtxt like usecols=range(0,6), but when I try to use a converter to try and tackle the last column I'm stumped. Any and all comments would be appreciated!

标签： python numpy ipython

2条回答

干净又极端

2楼-- · 2019-02-22 01:52

This is happening because np.genfromtxt is trying to create a float array, which fails because convertfunc returns a datetime object, which cannot be cast as float. The easiest solution would be to just pass the argument dtype='object' to np.genfromtxt, ensuring the creation of an object array and preventing a conversion to float. However, this would mean that the other columns would be saved as strings. To get them properly saved as floats you need to specify the dtype of each to get a structured array. Here I'm setting them all to double except the last column, which will be an object dtype:

dd = [(a, 'd') for a in col_headers[:-1]] + [(col_headers[-1], 'object')]
data = np.genfromtxt(files[1], skip_header=22, dtype=dd, 
                     names=col_headers, converters={'Time': convertfunc})

This will give you a structured array which you can access with the names you gave:

In [74]: data['Mass']
Out[74]: array([ 0.262 ,  0.2618,  0.2616,  0.2614])
In [75]: data['Time']
Out[75]: array([1900-01-01 15:49:24.546000, 1900-01-01 15:49:25.171000,
                1900-01-01 15:49:25.405000, 1900-01-01 15:49:25.624000], 
                dtype=object)

0人赞添加讨论(0) 举报

一纸荒年 Trace。

3楼-- · 2019-02-22 02:00

You can use pandas read_table:

    import pandas as pd
    frame=pd.read_table('/tmp/gist', header=None, skiprows=22,delimiter='\s+')

worked for me. You need to process the header separately since they are variable number of space separated.

0人赞添加讨论(0) 举报

numpy.genfromtxt with datetime.strptime converter

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间