Plot pandas dataframe containing NaNs

2019-01-24 03:25发布

问题:

I have GPS data of ice speed from three different GPS receivers. The data are in a pandas dataframe with an index of julian day (incremental from the start of 2009).

This is a subset of the data (the main dataset is 3487235 rows...):

                    R2          R7         R8
1235.000000 116.321959  100.805197  96.519977
1235.000116 NaN         100.771133  96.234957
1235.000231 NaN         100.584559  97.249262
1235.000347 118.823610  100.169055  96.777833
1235.000463 NaN         99.753551   96.598350
1235.000579 NaN         99.338048   95.283989
1235.000694 113.995003  98.922544   95.154067

The dataframe has form:

Index: 6071320 entries, 127.67291667 to 1338.51805556
Data columns:
R2    3487235  non-null values
R7    3875864  non-null values
R8    1092430  non-null values
dtypes: float64(3)

R2 sampled at a different rate to R7 and R8 hence the NaNs which appear systematically at that spacing.

Trying df.plot() to plot the whole dataframe (or indexed row locations thereof) works fine in terms of plotting R7 and R8, but doesn't plot R2. Similarly, just doing df.R2.plot() also doesn't work. The only way to plot R2 is to do df.R2.dropna().plot(), but this also removes NaNs which signify periods of no data (rather than just a coarser sampling frequency than the other receivers).

Has anyone else come across this? Any ideas on the problem would be gratefully received :)

回答1:

The reason your not seeing anything is because the default plot style is only a line. But the line gets interupted at NaN's so only multiple consequtive values will be plotted. And the latter doesnt happen in your case. You need to change the style of plotting, which depends on what you want to see.

For starters, try adding:

.plot(marker='o')

That should make all data points appear as circles. It easily gets cluttered so adjusting markersize, edgecolor etc might be usefull. Im not fully adjusted to how Pandas is using matplotlib so i often switch to matplotlib myself if plots get more complicated, eg:

plt.plot(df.R2.index.to_pydatetime(), df.R2, 'o-')