I have a csv file that has a few columns which are numbers and few that are string. When I try myDF.dtypes
it shows me all the string columns as object
.
Someone asked a related question before here about why this is done. Is it possible to recast the dtype
from object to string?
Also, in general, is there any easy way to recast the dtype
from int64
and float64
to int32
and float32
and save on the size of the data (in memory / on disk)?
All strings are represented as variable-length (which is what object
dtype is holding). You can do series.astype('S32')
if you want; but it will be recast if you then store it in a DataFrame or do much with it. This is for simplicity.
Certain serialization formats, e.g. HDFStore
stores the strings as fixed-length strings on disk though.
You can series.astype(int32)
if you would like and it will store as the new type.
df = your dataframe object with values
print('dtype in object form :')
print(df.dtypes[df.columns[0]]) // output: dtype('O')
print('\ndtype in string')
print(str(df.dtypes[df.columns[0]])) // output: 'object'