How to retrieve the original order of key-word arg

2019-05-03 16:00发布

Retrieving the order of key-word arguments passed via **kwargs would be extremely useful in the particular project I am working on. It is about making a kind of n-d numpy array with meaningful dimensions (right now called dimarray), particularly useful for geophysical data handling.

For now say we have:

import numpy as np
from dimarray import Dimarray   # the handy class I am programming

def make_data(nlat, nlon):
    """ generate some example data
    """
    values = np.random.randn(nlat, nlon)
    lon = np.linspace(-180,180,nlon)
    lat = np.linspace(-90,90,nlat)
    return lon, lat, values

What works:

>>> lon, lat, values = make_data(180,360)
>>> a = Dimarray(values, lat=lat, lon=lon)
>>> print a.lon[0], a.lat[0]
-180.0 -90.0

What does not:

>>> lon, lat, data = make_data(180,180) # square, no shape checking possible !
>>> a = Dimarray(values, lat=lat, lon=lon)
>>> print a.lon[0], a.lat[0] # is random 
-90.0, -180.0  # could be (actually I raise an error in such ambiguous cases)

The reason is that Dimarray's __init__ method's signature is (values, **kwargs) and since kwargs is an unordered dictionary (dict) the best it can do is check against the shape of values.

Of course, I want it to work for any kind of dimensions:

a = Dimarray(values, x1=.., x2=...,x3=...)

so it has to be hard coded with **kwargs The chances of ambiguous cases occurring increases with the number of dimensions. There are ways around that, for instance with a signature (values, axes, names, **kwargs) it is possible to do:

a = Dimarray(values, [lat, lon], ["lat","lon"]) 

but this syntax is cumbersome for interactive use (ipython), since I would like this package to really be a part of my (and others !!) daily use of python, as an actual replacement of numpy arrays in geophysics.

I would be VERY interested in a way around that. The best I can think of right now is to use inspect module's stack method to parse the caller's statement:

import inspect
def f(**kwargs):
    print inspect.stack()[1][4]
    return tuple([kwargs[k] for k in kwargs])

>>> print f(lon=360, lat=180)
[u'print f(lon=360, lat=180)\n']
(180, 360)

>>> print f(lat=180, lon=360)
[u'print f(lat=180, lon=360)\n']
(180, 360)

One could work something out from that, but there are unsolvable issues since stack() catches everything on the line:

>>> print (f(lon=360, lat=180), f(lat=180, lon=360))
[u'print (f(lon=360, lat=180), f(lat=180, lon=360))\n']
[u'print (f(lon=360, lat=180), f(lat=180, lon=360))\n']
((180, 360), (180, 360))

Is there any other inspect trick I am not aware of, which could solve this problem ? (I am not familiar with this module) I would imagine getting the piece of code which is right between the brackets lon=360, lat=180 should be something feasible, no??

So I have the feeling for the first time in python to hit a hard wall in term of doing something which is theoretically feasible based on all available information (the ordering provided by the user IS valuable information !!!).

I read interesting suggestions by Nick there: https://mail.python.org/pipermail/python-ideas/2011-January/009054.html and was wondering whether this idea has moved forward somehow?

I see why it is not desirable to have an ordered **kwargs in general, but a patch for these rare cases would be neat. Anyone aware of a reliable hack?

NOTE: this is not about pandas, I am actually trying to develop a light-weight alternative for it, whose usage remains very close to numpy. Will soon post the gitHub link.

EDIT: Note I this is relevant for interactive use of dimarray. The dual syntax is needed anyway.

EDIT2: I also see counter arguments that knowing the data is not ordered could also be seen as valuable information, since it leaves Dimarray the freedom to check values shape and adjust the order automatically. It could even be that not remembering the dimension of the data occurs more often than having the same size for two dimensions. So right now, I guess it is fine to raise an error for ambiguous cases, asking the user to provide the names argument. Nevertheless, it would be neat to have the freedom to make that kind of choices (how Dimarray class should behave), instead of being constrained by a missing feature of python.

EDIT 3, SOLUTIONS: after the suggestion of kazagistar:

I did not mention that there are other optional attribute parameters such as name="" and units="", and a couple of other parameters related to slicing, so the *args construct would need to come with keyword name testing on kwargs.

In summary, there are many possibilities:

*Choice a: keep current syntax

a = Dimarray(values, lon=mylon, lat=mylat, name="myarray")
a = Dimarray(values, [mylat, mylon], ["lat", "lon"], name="myarray")

*Choice b: kazagistar's 2nd suggestion, dropping axis definition via **kwargs

a = Dimarray(values, ("lat", mylat), ("lon",mylon), name="myarray")

*Choice c: kazagistar's 2nd suggestion, with optional axis definition via **kwargs (note this involves names= to be extracted from **kwargs, see background below)

a = Dimarray(values, lon=mylon, lat=mylat, name="myarray")
a = Dimarray(values, ("lat", mylat), ("lon",mylon), name="myarray")

*Choice d: kazagistar's 3nd suggestion, with optional axis definition via **kwargs

a = Dimarray(values, lon=mylon, lat=mylat, name="myarray")
a = Dimarray(values, [("lat", mylat), ("lon",mylon)], name="myarray")

Hmm, it comes down to aesthetics, and to some design questions (Is lazy ordering an important feature in interactive mode?). I am hesitating between b) and c). I am not sure the **kwargs really brings something. Ironically enough, what I started to criticize became a feature when thinking more about it...

Thanks very much for the answers. I will mark the question as answered, but you are most welcome to vote for a), b) c) or d) !

=====================

EDIT 4 : better solution: choice a) !!, but adding a from_tuples class method. The reason for that is to allow one more degree of freedom. If the axis names are not provided, they will be generated automatically as "x0", "x1" etc... To use really just like pandas, but with axis naming. This also avoids mixing up axes and attributes into **kwargs, and leaving it only for the axes. There will be more soon as soon as I am done with the doc.

a = Dimarray(values, lon=mylon, lat=mylat, name="myarray")
a = Dimarray(values, [mylat, mylon], ["lat", "lon"], name="myarray")
a = Dimarray.from_tuples(values, ("lat", mylat), ("lon",mylon), name="myarray")

EDIT 5 : more pythonic solution? : similar to EDIT 4 above in term of the user api, but via a wrapper dimarray, while being very strict with how Dimarray is instantiated. This is also in the spirit of what kazagistar proposed.

 from dimarray import dimarray, Dimarray 

 a = dimarray(values, lon=mylon, lat=mylat, name="myarray") # error if lon and lat have same size
 b = dimarray(values, [("lat", mylat), ("lon",mylon)], name="myarray")
 c = dimarray(values, [mylat, mylon, ...], ['lat','lon',...], name="myarray")
 d = dimarray(values, [mylat, mylon, ...], name="myarray2")

And from the class itself:

 e = Dimarray.from_dict(values, lon=mylon, lat=mylat) # error if lon and lat have same size
 e.set(name="myarray", inplace=True)
 f = Dimarray.from_tuples(values, ("lat", mylat), ("lon",mylon), name="myarray")
 g = Dimarray.from_list(values, [mylat, mylon, ...], ['lat','lon',...], name="myarray")
 h = Dimarray.from_list(values, [mylat, mylon, ...], name="myarray")

In the cases d) and h) axes are automatically named "x0", "x1", and so on, unless mylat, mylon actually belong to the Axis class (which I do not mention in this post, but Axes and Axis do their job, to build axes and deal with indexing).

Explanations:

class Dimarray(object):
    """ ndarray with meaningful dimensions and clean interface
    """
    def __init__(self, values, axes, **kwargs):
        assert isinstance(axes, Axes), "axes must be an instance of Axes"
        self.values = values
        self.axes = axes
        self.__dict__.update(kwargs)

    @classmethod
    def from_tuples(cls, values, *args, **kwargs):
        axes = Axes.from_tuples(*args)
        return cls(values, axes)

    @classmethod
    def from_list(cls, values, axes, names=None, **kwargs):
        if names is None:
            names = ["x{}".format(i) for i in range(len(axes))]
        return cls.from_tuples(values, *zip(axes, names), **kwargs)

    @classmethod
    def from_dict(cls, values, names=None,**kwargs):
        axes = Axes.from_dict(shape=values.shape, names=names, **kwargs)
        # with necessary assert statements in the above
        return cls(values, axes)

Here is the trick (schematically):

def dimarray(values, axes=None, names=None, name=..,units=..., **kwargs):
    """ my wrapper with all fancy options
    """
    if len(kwargs) > 0:
        new = Dimarray.from_dict(values, axes, **kwargs) 

    elif axes[0] is tuple:
        new = Dimarray.from_tuples(values, *axes, **kwargs) 

    else:
        new = Dimarray.from_list(values, axes, names=names, **kwargs) 

    # reserved attributes
    new.set(name=name, units=units, ..., inplace=True) 

    return new

The only thing we loose is indeed *args syntax, which could not accommodate for so many options. But that's fine.

And its make it easy for sub-classing, too. How does it sound to the python experts here?

(this whole discussion could be split in two parts really)

=====================

A bit of background (EDIT: in part outdated, for cases a), b), c), d) only), just in case you are interested:

*Choice a involves:

def __init__(self, values, axes=None, names=None, units="",name="",..., **kwargs):
    """ schematic representation of Dimarray's init method
    """
    # automatic ordering according to values' shape (unless names is also provided)
    # the user is allowed to forget about the exact shape of the array
    if len(kwargs) > 0:
        axes = Axes.from_dict(shape=values.shape, names=names, **kwargs)

    # otherwise initialize from list
    # exact ordering + more freedom in axis naming 
    else:
        axes = Axes.from_list(axes, names)

    ...  # check consistency

    self.values = values
    self.axes = axes
    self.name = name
    self.units = units         

*Choices b) and c) impose:

def __init__(self, values, *args, **kwargs):
    ...

b) all attributes are naturally passed via kwargs, with self.__dict__.update(kwargs). This is clean.

c) Need to filter key-word arguments:

def __init__(self, values, *args, **kwargs):
   """ most flexible for interactive use
   """
   # filter out known attributes
   default_attrs = {'name':'', 'units':'', ...} 
   for k in kwargs:
       if k in 'name', 'units', ...:
           setattr(self, k) = kwargs.pop(k)
       else:
           setattr(self, k) = default_attrs[k]

   # same as before
   if len(kwargs) > 0:
       axes = Axes.from_dict(shape=values.shape, names=names, **kwargs)

   # same, just unzip
   else:
       names, numpy_axes = zip(*args)
       axes = Axes.from_list(numpy_axes, names)

This is actually quite nice and handy, the only (minor) drawback is that default parameters for name="", units="" and some other more relevant parameters are not accessible by inspection or completion.

*Choice d: clear __init__

def __init__(self, values, axes, name="", units="", ..., **kwaxes)

But is a bit verbose indeed.

==========

EDIT, FYI: I ended up using a list of tuples for the axes parameter, or alternatively the parameters dims= and labels= for axis name and axis values, respectively. The related project dimarray is on github. Thanks again at kazagistar.

2条回答
劳资没心,怎么记你
2楼-- · 2019-05-03 16:34

No, you cannot know the order in which items were added to a dictionary, since doing this increases the complexity of implementing the dicionary significantly. (For when you really really need this, collections.OrderedDict has you covered).

However, have you considered some basic alternative syntax? For example:

a = Dimarray(values, 'lat', lat, 'lon', lon)

or (probably the best option)

a = Dimarray(values, ('lat', lat), ('lon', lon))

or (most explicit)

a = Dimarray(values, [('lat', lat), ('lon', lon)])

At some level though, that need ordering are inherently positional. **kwargs is often abused for labeling, but argument name generally shouldn't be "data", since it is a pain to set programatically. Just make the two parts of the data that are associated clear with a tuple, and use a list to make the ordering preserved, and provide strong assertions + error messages to make it clear when the input is invalid and why.

查看更多
老娘就宠你
3楼-- · 2019-05-03 16:45

There is module especially made to handle this :

https://github.com/claylabs/ordered-keyword-args

without using module

def multiple_kwarguments(first , **lotsofothers):
    print first

    for i,other in lotsofothers.items():
         print other
    return True

multiple_kwarguments("first", second="second", third="third" ,fourth="fourth" ,fifth="fifth")

output:

first
second
fifth
fourth
third

On using orderedkwargs module

from orderedkwargs import ordered kwargs  
@orderedkwargs  
def mutliple_kwarguments(first , *lotsofothers):
    print first

    for i, other in lotsofothers:
        print other
    return True


mutliple_kwarguments("first", second="second", third="third" ,fourth="fourth" ,fifth="fifth")

Output:

first
second
third
fourth
fifth

Note: Single asterik is required while using this module with decorator above the function.

查看更多
登录 后发表回答