可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I'm trying to add column names to a numpy ndarray, then select columns by their names. But it doesn't work. I can't tell if the problem occurs when I add the names, or later when I try to call them.

Here's my code.

data = np.genfromtxt(csv_file, delimiter=',', dtype=np.float, skip_header=1)

#Add headers
csv_names = [ s.strip('"') for s in file(csv_file,'r').readline().strip().split(',')]
data = data.astype(np.dtype( [(n, 'float64') for n in csv_names] ))

Dimension-based diagnostics match what I expect:

print len(csv_names)
>> 108
print data.shape
>> (1652, 108)

"print data.dtype.names" also returns the expected output.

But when I start calling columns by their field names, screwy things happen. The "column" is still an array with 108 columns...

print data["EDUC"].shape
>> (1652, 108)

... and it appears to contain more missing values than there are rows in the data set.

print np.sum(np.isnan(data["EDUC"]))
>> 27976

Any idea what's going wrong here? Adding headers should be a trivial operation, but I've been fighting this bug for hours. Help!

回答1:

The problem is that you are thinking in terms of spreadsheet-like arrays, whereas NumPy does use different concepts.

Here is what you must know about NumPy:

NumPy arrays only contain elements of a single type.
If you need spreadsheet-like "columns", this type must be some tuple-like type. Such arrays are called Structured Arrays, because their elements are structures (i.e. tuples).

In your case, NumPy would thus take your 2-dimensional regular array and produce a one-dimensional array whose type is a 108-element tuple (the spreadsheet array that you are thinking of is 2-dimensional).

These choices were probably made for efficiency reasons: all the elements of an array have the same type and therefore have the same size: they can be accessed, at a low-level, very simply and quickly.

Now, as user545424 showed, there is a simple NumPy answer to what you want to do (genfromtxt() accepts a names argument with column names).

If you want to convert your array from a regular NumPy ndarray to a structured array, you can do:

data.view(dtype=[(n, 'float64') for n in csv_names]).reshape(len(data))

(you were close: you used astype() instead of view()).

You can also check the answers to quite a few Stackoverflow questions, including Converting a 2D numpy array to a structured array and how to convert regular numpy array to record array?.

回答2:

Unfortunately, I don't know what is going on when you try to add the field names, but I do know that you can build the array you want directly from the file via

data = np.genfromtxt(csv_file, delimiter=',', names=True)

EDIT:

It seems like adding field names only works when the input is a list of tuples:

data = np.array(map(tuple,data), [(n, 'float64') for n in csv_names])

Programmatically add column names to numpy ndarray

问题:

回答1:

回答2:

收藏的人(0)

Programmatically add column names to numpy ndarray

问题:

回答1:

回答2:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮