How to check if float pandas column contains only

2020-06-01 08:27发布

问题:

I have a dataframe

df = pd.DataFrame(data=np.arange(10),columns=['v']).astype(float)

How to make sure that the numbers in v are whole numbers? I am very concerned about rounding/truncation/floating point representation errors

回答1:

Comparison with astype(int)

Tentatively convert your column to int and test with np.array_equal:

np.array_equal(df.v, df.v.astype(int))
True

float.is_integer

You can use this python function in conjunction with an apply:

df.v.apply(float.is_integer).all()
True

Or, using python's all in a generator comprehension, for space efficiency:

all(x.is_integer() for x in df.v)
True


回答2:

If you want to check multiple float columns in your dataframe, you can do the following:

col_should_be_int = df.select_dtypes(include=['float']).applymap(float.is_integer).all()
float_to_int_cols = col_should_be_int[col_should_be_int].index
df.loc[:, float_to_int_cols] = df.loc[:, float_to_int_cols].astype(int)

Keep in mind that a float column, containing all integers will not get selected if it has np.NaN values. To cast float columns with missing values to integer, you need to fill/remove missing values, for example, with median imputation:

float_cols = df.select_dtypes(include=['float'])
float_cols = float_cols.fillna(float_cols.median().round()) # median imputation
col_should_be_int = float_cols.applymap(float.is_integer).all()
float_to_int_cols = col_should_be_int[col_should_be_int].index
df.loc[:, float_to_int_cols] = float_cols[float_to_int_cols].astype(int)


回答3:

Here's a simpler, and probably faster, approach:

(df[col] % 1  == 0).all()

To ignore nulls:

(df[col].fillna(-9999) % 1  == 0).all()