Python: numpy.genfromtxt - Need column names that

I am working on importing CSV files with numpy.genfromtxt.

The data to be imported has a header of column names, and some of those column names contain characters that genfromtxt considers invalid. Specifically, some of the names contain "#" and " ". The input data cannot be changed as it is generated by other sources that I do not control.

Using names=True and comments=None, I am unable to bring in all of the column names that I need.

I've tried overriding numpy.lib.NameValidator.deletechars=None, but this does not affect the NameValidator class instance that is actually in use.

I understand that deletechars exists due to the recarray potential to access a field as if it were an attribute. However, I simply must be able to read in column names that include invalid characters, even if the characters are stripped off when read in.

Is there a way to force the NameValidator to not check for invalid characters, or to modify the characters it checks for? I am unable to modify numpy/lib/_iotools.py as I am not root and it would be bad to modify a shared installation.

标签： python numpy genfromtxt

3条回答

兄弟一词,经得起流年.

2楼-- · 2019-08-14 03:25

You do not explicitly state that numpy.genfromtxt is a hard requirement, so let me suggest that you try asciitable.

This module has a way to replace certain entries before parsing: http://cxc.harvard.edu/contrib/asciitable/#replace-bad-or-missing-values

And you can also define your own readers based on the existing ones: http://cxc.harvard.edu/contrib/asciitable/#advanced-table-reading

The output of asciitable reader are numpy arrays, so you should be able to replace the functions you currently use more or less directly with asciitable.

0人赞添加讨论(0) 举报

虎瘦雄心在

3楼-- · 2019-08-14 03:30

IMHO, genfromtxt is often used in cases where some simpler solutions would do.

So, unless you have some troublesome datasets (missing entries, multiple unknown column types), you're better off coding a quick and dirty parser (ie, skip some rows, parse the header, read the rest and reorganize at the end).

Now, if you really need genfromtxt, @ecatmur pointed justly that the deletechars argument of genfromtxt is sent to _iotools.NameValidator to constructs the set of characters to delete. Using deletechars=None tells NameValidator to use a default set. A first thing to try is to just not use deletechars=None, but an empty set or ''.

Note that no matter what, double quotes " and ending spaces will be deleted and similar names will be differentiated:

>>> fields = ["blah", "'blah'", "\"blah\"", "#blah", "blah "]
>>> np.lib._iotools.NameValidator(deletechars='').validate(fields)
... ('blah', "'blah'", 'blah_1', '#blah', 'blah_2')

The third and last entries would result in three columns named blah, so we have to rename them.

If this doesn't suit you, I'm afraid you're hitting a block: there's no current way to tell genfromtxt to accept a customized NameValidator. That could be a good idea, though, so you may want to raise the point on numpy's mailing list.

0人赞添加讨论(0) 举报

【Aperson】

4楼-- · 2019-08-14 03:36

NameValidator will use its default set for deletechars if constructed with deletechars=None, but if you pass in a non-None set then it will use that. And np.genfromtext takes a deletechars parameter which it passes to NameValidator.

So, you should be able to write

np.genfromtxt(..., deletechars=set())

for an empty set, or some subset of the default set("""~!@#$%^&*()-=+~\|]}[{';: /?.>,<"""):

deletechars = np.lib._iotools.NameValidator.defaultdeletechars - set("# ")
np.genfromtxt(..., deletechars=deletechars)

0人赞添加讨论(0) 举报

Python: numpy.genfromtxt - Need column names that

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间