Editing then concatenating values of several colum

I'm looking for a way to use pandas and python to combine several columns in an excel sheet with known column names into a new, single one, keeping all the important information as in the example below:

input:

ID,tp_c,tp_b,tp_p  
0,transportation - cars,transportation - boats,transportation - planes
1,checked,-,-
2,-,checked,-
3,checked,checked,-
4,-,checked,checked
5,checked,checked,checked

desired output:

ID,tp_all  
0,transportation  
1,cars  
2,boats  
3,cars+boats  
4,boats+planes  
5,cars+boats+planes

The row with ID of 0 contans a description of the contents of the column. Ideally the code would parse the description in the second row, look after the '-' and concatenate those values in the new "tp_all" column.

标签： python excel pandas

3条回答

走好不送

2楼-- · 2019-08-13 06:47

OK a more dynamic method:

In [63]:
# get a list of the columns
col_list = list(df.columns)
# remove 'ID' column
col_list.remove('ID')
# create a dict as a lookup
col_dict = dict(zip(col_list, [df.iloc[0][col].split(' - ')[1] for col in col_list]))
col_dict
Out[63]:
{'tp_b': 'boats', 'tp_c': 'cars', 'tp_p': 'planes'}
In [64]:
# define a func that tests the value and uses the dict to create our string
def func(x):
    temp = ''
    for col in col_list:
        if x[col] == 'checked':
            if len(temp) == 0:
                temp = col_dict[col]
            else:
                temp = temp + '+' + col_dict[col]
    return temp
df['combined'] = df[1:].apply(lambda row: func(row), axis=1)
df
Out[64]:
   ID                   tp_c                    tp_b                     tp_p  \
0   0  transportation - cars  transportation - boats  transportation - planes   
1   1                checked                     NaN                      NaN   
2   2                    NaN                 checked                      NaN   
3   3                checked                 checked                      NaN   
4   4                    NaN                 checked                  checked   
5   5                checked                 checked                  checked   

            combined  
0                NaN  
1               cars  
2              boats  
3         cars+boats  
4       boats+planes  
5  cars+boats+planes  

[6 rows x 5 columns]
In [65]:

df = df.ix[1:,['ID', 'combined']]
df
Out[65]:
   ID           combined
1   1               cars
2   2              boats
3   3         cars+boats
4   4       boats+planes
5   5  cars+boats+planes

[5 rows x 2 columns]

0人赞添加讨论(0) 举报

Rolldiameter

3楼-- · 2019-08-13 06:54

Here is one way:

newCol = pandas.Series('',index=d.index)
for col in d.ix[:, 1:]:
    name = '+' + col.split('-')[1].strip()
    newCol[d[col]=='checked'] += name
newCol = newCol.str.strip('+')

Then:

>>> newCol
0                 cars
1                boats
2           cars+boats
3         boats+planes
4    cars+boats+planes
dtype: object

You can create a new DataFrame with this column or do what you like with it.

Edit: I see that you have edited your question so that the names of the modes of transportation are now in row 0 instead of in the column headers. It is easier if they're in the column headers (as my answer assumes), and your new column headers don't seem to contain any additional useful information, so you should probably start by just setting the column names to the info from row 0, and deleting row 0.

0人赞添加讨论(0) 举报

贼婆χ

4楼-- · 2019-08-13 06:57

This is quite interesting as it's a reverse get_dummies...

I think I would manually munge the column names so that you have a boolean DataFrame:

In [11]: df1  # df == 'checked'
Out[11]:
    cars  boats planes
0
1   True  False  False
2  False   True  False
3   True   True  False
4  False   True   True
5   True   True   True

Now you can use an apply with zip:

In [12]: df1.apply(lambda row: '+'.join([col for col, b in zip(df1.columns, row) if b]),
                   axis=1)
Out[12]:
0
1                 cars
2                boats
3           cars+boats
4         boats+planes
5    cars+boats+planes
dtype: object

Now you just have to tweak the headers, to get the desired csv.

Would be nice if there were a less manual way / faster to do reverse get_dummies...

0人赞添加讨论(0) 举报

Editing then concatenating values of several colum

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间