I am merging two csv(data frame) using below code:
import pandas as pd
a = pd.read_csv(file1,dtype={'student_id': str})
df = pd.read_csv(file2)
c=pd.merge(a,df,on='test_id',how='left')
c.to_csv('test1.csv', index=False)
I have the following CSV files
file1:
test_id, student_id
1, 01990
2, 02300
3, 05555
file2:
test_id, result
1, pass
3, fail
after merge
test_id, student_id , result
1, 1990, pass
2, 2300,
3, 5555, fail
If you notice student_id has 0 appended at the beginning and it's supposed to be considered as text but after merging and using to_csv
function it converts it into numeric and removes leading 0.
How can I keep the column as "text" even after to_csv?
I think its to_csv function which saves back again as numeric Added dtype={'student_id': str} while reading csv.. but while saving it as to_csv .. it again convert it to numeric
It's not dropping the leading zero on the
merge
, it's dropping it on theread_csv
. You can fix this by specifying that column is a string at import time:The important part is the
dtype
parameter. You are telling pandas to import this column as a string. Theskipinitialspace
parameter is set to True, because the column headers are defined with spaces, so we strip it:The final code looks like this:
With the
results
dataframe looking like this:Then when you run
to_csv
your result should be:Solution with
join
, first needread_csv
with parameterdtype
for convertstudent_id
tostring
and remove whitespaces byskipinitialspace
:==============================================================
Caveat Please use
merge
orjoin
. This answer is provided to give perspective on the flexibility pandas gives you and how many different ways there are to answer the same question.