i am new to pandas. I have loaded csv using pandas.read_csv. i have tried not to specify dtype but it was way too slow. since it is a very large file, i also specified data type. however, sometimes in numeric columns, it contains "NA". i have used na_values = ['NA'], will it affect my data frame? i still want to preserve these rows. my question is if i specify data type and add na_values = ['NA'], will NA be tossed away? if yes, how can i maintain similar process time without losing these na? thank you very much!
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
From the pd.read_csv
docs:
na_values
: scalar,str
,list
-like, ordict
, defaultNone
Additional strings to recognize as
NA
/NaN
. Ifdict
passed, specific per-column NA values. By default the following values are interpreted as NaN: ‘’, ... ‘NA’, ...`.
Bold emphasis mine. These values are not tossed away, rather, they are converted to NaN
. Pandas is smart enough to automatically recognise those values without you explicitly stating it.