Pandas DataFrame.apply: create new column with dat

2020-03-26 07:26发布

I have a DataFrame (df) like this:

PointID  Time                 geojson
----     ----                 ----     
36F      2016-04-01T03:52:30  {'type': 'Point', 'coordinates': [3.961389, 43.123]}
36G      2016-04-01T03:52:50  {'type': 'Point', 'coordinates': [3.543234, 43.789]}

The geojson column contains data in geoJSON format (esentially, a Python dict).

I want to create a new column in geoJSON format, which includes the time coordinate. In other words, I want to inject the time information into the geoJSON info.

For a single value, I can successfully do:

oldjson = df.iloc[0]['geojson']
newjson = [df['coordinates'][0], df['coordinates'][1], df.iloc[0]['time'] ]

For a single parameter, I successfully used dataFrame.apply in combination with lambda (thanks to SO: related question

But now, I have two parameters, and I want to use it on the whole DataFrame. As I am not confident with the .apply syntax and lambda, I do not know if this is even possible. I would like to do something like this:

def inject_time(geojson, time):
"""
Injects Time dimension into geoJSON coordinates. Expects  a dict in geojson POINT format.
"""
geojson['coordinates'] = [geojson['coordinates'][0], geojson['coordinates'][1], time]
return geojson


df["newcolumn"] = df["geojson"].apply(lambda x: inject_time(x, df['time'])))

...but that does not work, because the function would inject the whole series.

EDIT: I figured that the format of the timestamped geoJSON should be something like this:

TimestampedGeoJson({
            "type": "FeatureCollection",
               "features": [
                 {
                   "type": "Feature",
                   "geometry": {
                     "type": "LineString",
                     "coordinates": [[-70,-25],[-70,35],[70,35]],
                     },
                   "properties": {
                     "times": [1435708800000, 1435795200000, 1435881600000]
                     }
                   }
                 ]
               })

So the time element is in the properties element, but this does not change the problem much.

1条回答
Juvenile、少年°
2楼-- · 2020-03-26 07:37

You need DataFrame.apply with axis=1 for processing by rows:

df['new'] = df.apply(lambda x: inject_time(x['geojson'], x['Time']), axis=1)

#temporary display long string in column
with pd.option_context('display.max_colwidth', 100):
    print (df['new'])

0    {'type': 'Point', 'coordinates': [3.961389, 43.123, '2016-04-01T03:52:30']}
1    {'type': 'Point', 'coordinates': [3.543234, 43.789, '2016-04-01T03:52:50']}
Name: new, dtype: object
查看更多
登录 后发表回答