I'm trying to find a better way to assert the column data type in Python/Pandas of a given dataframe.
For example:
import pandas as pd
t = pd.DataFrame({'a':[1,2,3], 'b':[2,6,0.75], 'c':['foo','bar','beer']})
I would like to assert that specific columns in the data frame are numeric. Here's what I have:
numeric_cols = ['a', 'b'] # These will be given
assert [x in ['int64','float'] for x in [t[y].dtype for y in numeric_cols]]
This last assert line doesn't feel very pythonic. Maybe it is and I'm just cramming it all in one hard to read line. Is there a better way? I would like to write something like:
assert t[numeric_cols].dtype.isnumeric()
I can't seem to find something like that though.
You could use
ptypes.is_numeric_dtype
to identify numeric columns,ptypes.is_string_dtype
to identify string-like columns, andptypes.is_datetime64_any_dtype
to identify datetime64 columns:The
pandas.api.types
module (which I aliased toptypes
) has both ais_datetime64_any_dtype
and ais_datetime64_dtype
function. The difference is in how they treat timezone-aware array-likes:You can do this