Merge two dataframe in pandas

I am merging two csv(data frame) using below code:

import pandas as pd
a = pd.read_csv(file1,dtype={'student_id': str})
df = pd.read_csv(file2)
c=pd.merge(a,df,on='test_id',how='left')
c.to_csv('test1.csv', index=False)

I have the following CSV files

file1:

test_id, student_id
1, 01990
2, 02300
3, 05555

file2:

test_id, result
1, pass
3, fail

after merge

test_id, student_id , result
1, 1990, pass
2, 2300,
3, 5555, fail

If you notice student_id has 0 appended at the beginning and it's supposed to be considered as text but after merging and using to_csv function it converts it into numeric and removes leading 0.

How can I keep the column as "text" even after to_csv?

I think its to_csv function which saves back again as numeric Added dtype={'student_id': str} while reading csv.. but while saving it as to_csv .. it again convert it to numeric

标签： python csv pandas merge

4条回答

Juvenile、少年°

2楼-- · 2020-02-01 16:44

It's not dropping the leading zero on the merge, it's dropping it on the read_csv. You can fix this by specifying that column is a string at import time:

a = pd.read_csv('file1.csv', dtype={'student_id': str}, skipinitialspace=True)

The important part is the dtype parameter. You are telling pandas to import this column as a string. The skipinitialspace parameter is set to True, because the column headers are defined with spaces, so we strip it:

test_id, student_id
        ^ The student_id starts here, at the space

The final code looks like this:

a = pd.read_csv('file1.csv', dtype={'student_id': str}, skipinitialspace=True)
df = pd.read_csv('file2.csv')
results = a.merge(df, how='left', on='test_id')

With the results dataframe looking like this:

    test_id     student_id  result
0   1           01990       pass
1   2           02300       NaN
2   3           05555       fail

Then when you run to_csv your result should be:

test_id,student_id, result
1,01990, pass
2,02300,
3,05555, fail

0人赞添加讨论(0) 举报

家丑人穷心不美

3楼-- · 2020-02-01 16:55

Solution with join, first need read_csv with parameter dtype for convert student_id to string and remove whitespaces by skipinitialspace:

df1 = pd.read_csv(file1, dtype={'student_id': str}, skipinitialspace=True)
df2 = pd.read_csv(file2, skipinitialspace=True)

df = df1.join(df2.set_index('test_id'), on='test_id')
print (df)
   test_id student_id  result
0        1      01990    pass
1        2      02300     NaN
2        3      05555    fail

0人赞添加讨论(0) 举报

太酷不给撩

4楼-- · 2020-02-01 17:01

a = pd.read_csv(file1, dtype={'test_id': object})
df = pd.read_csv(file2, dtype={'test_id': object})

==============================================================

In[28]: pd.merge(a, b, on='test_id', how='left')
Out[28]: 
  test_id   student_id  result
0      01         1990    pass
1      02         2300     NaN
2     003         5555    fail

0人赞添加讨论(0) 举报

Evening l夕情丶

5楼-- · 2020-02-01 17:03

Caveat Please use merge or join. This answer is provided to give perspective on the flexibility pandas gives you and how many different ways there are to answer the same question.

a = pd.read_csv('file1.csv', converters=dict(student_id=str), skipinitialspace=True)
df = pd.read_csv('file2.csv')
results = pd.concat(
    [d.set_index('test_id') for d in [a, df]],
    axis=1, join='outer'
).reset_index()

0人赞添加讨论(0) 举报

Merge two dataframe in pandas

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间