-->

Is there a standard way to store XY data in python

2020-08-26 04:10发布

问题:

Is there a standard way to store (x,y), (x,y,z), or (x,y,z,t) data in python?

I know numpy arrays are used often for things like this, but I suppose you could do it also with numpy matrices.

I've seen the use of 2 lists zipped together, which side steps the use of numpy altogether.

XY_data = zip( [x for x in range(0,10)] , [y for y in range(0,10)] )

Is there a standard? If not, what is your favorite way, or the one which you have seen the most?

回答1:

One nice way is with a structured array. This gives all the advantages of numpy arrays, but a convenient access structure.

All you need to do to make your numpy array a "structured" one is to give it the dtype argument. This gives each "field" a name and type. They can even have more complex shapes and hierarchies if you wish, but here's how I keep my x-y data:

In [175]: import numpy as np

In [176]: x = np.random.random(10)

In [177]: y = np.random.random(10)

In [179]: zip(x,y)
Out[179]: 
[(0.27432965895978034, 0.034808254176554643),
 (0.10231729328413885, 0.3311112896885462),
 (0.87724361175443311, 0.47852682944121905),
 (0.24291769332378499, 0.50691735432715967),
 (0.47583427680221879, 0.04048957803763753),
 (0.70710641602121627, 0.27331443495117813),
 (0.85878694702522784, 0.61993945461613498),
 (0.28840423235739054, 0.11954319357707233),
 (0.22084849730366296, 0.39880927226467255),
 (0.42915612628398903, 0.19197320645915561)]

In [180]: data = np.array( zip(x,y), dtype=[('x',float),('y',float)])

In [181]: data['x']
Out[181]: 
array([ 0.27432966,  0.10231729,  0.87724361,  0.24291769,  0.47583428,
        0.70710642,  0.85878695,  0.28840423,  0.2208485 ,  0.42915613])

In [182]: data['y']
Out[182]: 
array([ 0.03480825,  0.33111129,  0.47852683,  0.50691735,  0.04048958,
        0.27331443,  0.61993945,  0.11954319,  0.39880927,  0.19197321])

In [183]: data[0]
Out[183]: (0.27432965895978034, 0.03480825417655464)

Others will probably suggest using pandas, but if your data is relatively simple, plain numpy might be easier.

You can add hierarchy if you wish, but often it's more complicated than necessary.

For example:

In [200]: t = np.arange(10)

In [202]: dt = np.dtype([('t',int),('pos',[('x',float),('y',float)])])

In [203]: alldata = np.array(zip(t, zip(x,y)), dtype=dt)

In [204]: alldata
Out[204]: 
array([(0, (0.27432965895978034, 0.03480825417655464)),
       (1, (0.10231729328413885, 0.3311112896885462)),
       (2, (0.8772436117544331, 0.47852682944121905)),
       (3, (0.242917693323785, 0.5069173543271597)),
       (4, (0.4758342768022188, 0.04048957803763753)),
       (5, (0.7071064160212163, 0.27331443495117813)),
       (6, (0.8587869470252278, 0.619939454616135)),
       (7, (0.28840423235739054, 0.11954319357707233)),
       (8, (0.22084849730366296, 0.39880927226467255)),
       (9, (0.429156126283989, 0.1919732064591556))], 
      dtype=[('t', '<i8'), ('pos', [('x', '<f8'), ('y', '<f8')])])

In [205]: alldata['t']
Out[205]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [206]: alldata['pos']
Out[206]: 
array([(0.27432965895978034, 0.03480825417655464),
       (0.10231729328413885, 0.3311112896885462),
       (0.8772436117544331, 0.47852682944121905),
       (0.242917693323785, 0.5069173543271597),
       (0.4758342768022188, 0.04048957803763753),
       (0.7071064160212163, 0.27331443495117813),
       (0.8587869470252278, 0.619939454616135),
       (0.28840423235739054, 0.11954319357707233),
       (0.22084849730366296, 0.39880927226467255),
       (0.429156126283989, 0.1919732064591556)], 
      dtype=[('x', '<f8'), ('y', '<f8')])

In [207]: alldata['pos']['x']
Out[207]: 
array([ 0.27432966,  0.10231729,  0.87724361,  0.24291769,  0.47583428,
        0.70710642,  0.85878695,  0.28840423,  0.2208485 ,  0.42915613])