need to fill the NA values with the past three val

need to fill the NA values with the past three values mean of that NA

this is my dataset

RECEIPT_MONTH_YEAR NET_SALES

0 2014-01-01 818817.20

1 2014-02-01 362377.20

2 2014-03-01 374644.60

3 2014-04-01 NA

4 2014-05-01 NA

5 2014-06-01 NA

6 2014-07-01 NA

7 2014-08-01 46382.50

8 2014-09-01 55933.70

9 2014-10-01 292303.40

10 2014-10-01 382928.60

标签： python-3.x time-series na forecasting fillna

3条回答

Juvenile、少年°

2楼-- · 2019-08-15 19:12

You could use fillna (assuming that your NA is already np.nan) and rolling mean:

import pandas as pd
import numpy as np

df = pd.DataFrame([818817.2,362377.2,374644.6,np.nan,np.nan,np.nan,np.nan,46382.5,55933.7,292303.4,382928.6], columns=["NET_SALES"])

df["NET_SALES"] = df["NET_SALES"].fillna(df["NET_SALES"].shift(1).rolling(3, min_periods=1).mean())

Out:

NET_SALES
0   818817.2
1   362377.2
2   374644.6
3   518613.0
4   368510.9
5   374644.6
6   NaN
7   46382.5
8   55933.7
9   292303.4
10  382928.6

If you want to include the imputed values I guess you'll need to use a loop.

0人赞添加讨论(0) 举报

该账号已被封号

3楼-- · 2019-08-15 19:34

is this dataset a .csv file or a dataframe. This NA is a 'NaN' or a string ?

import pandas as pd
import numpy as np
df=pd.read_csv('your dataset',sep=' ')
df.replace('NA',np.nan)
df.fillna(method='ffill',inplace=True)

you mention something about mean of 3 values..the above simply forward fills the last observation before the NaNs begin. This is often a good way for forecasting (better than taking means in certain cases, if persistence is important)

 ind = df['NET_SALES'].index[df['NET_SALES'].apply(np.isnan)]
 Meanof3 = df.iloc[ind[0]-3:ind[0]].mean(axis=1,skipna=True)
 df.replace('NA',Meanof3)

Maybe the answer can be generalised and improved if more info about the dataset is known - like if you always want to take the mean of last 3 measurements before any NA. The above will allow you to check the indices that are NaNs and then take mean of 3 before, while ignoring any NaNs

0人赞添加讨论(0) 举报

Deceive 欺骗

4楼-- · 2019-08-15 19:37

This is simple but it is working

df_data.fillna(0,inplace=True)
for i in range(0,len(df_data)):
if df_data['NET_SALES'][i]== 0.00:
    condtn = df_data['NET_SALES'][i-1]+df_data['NET_SALES'][i-2]+df_data['NET_SALES'][i-3]
    df_data['NET_SALES'][i]=condtn/3

0人赞添加讨论(0) 举报

need to fill the NA values with the past three val

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间