tl;dr: how can I skip over periods where there is no data while plotting timeseries?
I'm running a long calculation and I'd like to monitor its progress. Sometimes I interrupt this calculation. The logs are stored in a huge CSV file which looks like this:
2016-01-03T01:36:30.958199,0,0,0,startup
2016-01-03T01:36:32.363749,10000,0,0,regular
...
2016-01-03T11:12:21.082301,51020000,13402105,5749367,regular
2016-01-03T11:12:29.065687,51030000,13404142,5749367,regular
2016-01-03T11:12:37.657022,51040000,13408882,5749367,regular
2016-01-03T11:12:54.236950,51050000,13412824,5749375,shutdown
2016-01-03T19:02:38.293681,51050000,13412824,5749375,startup
2016-01-03T19:02:49.296161,51060000,13419181,5749377,regular
2016-01-03T19:03:00.547644,51070000,13423127,5749433,regular
2016-01-03T19:03:05.599515,51080000,13427189,5750183,regular
...
In reality, there are 41 columns. Each of the columns is a certain indicator of progress. The second column is always incremented in steps of 10000. The last column is self-explanatory.
I would like to plot each column on the same graph while skipping over periods between "shutdown" and "startup". Ideally, I would also like to draw a vertical line on each skip.
Here's what I've got so far:
import matplotlib.pyplot as plt
import pandas as pd
# < ... reading my CSV in a Pandas dataframe `df` ... >
fig, ax = plt.subplots()
for col in ['total'] + ['%02d' % i for i in range(40)]:
ax.plot_date(df.index.values, df[col].values, '-')
fig.autofmt_xdate()
plt.show()
I want to get rid of that long flat period and just draw a vertical line instead.
I know about df.plot()
, but in my experience it's broken (among other things, Pandas converts datetime
objects in its own format instead of using date2num
and num2date
).
It looks like a possible solution is to write a custom scaler, but that seems quite complicated.
As far as I understand, writing a custom Locator
will only change the positions of ticks (little vertical lines and the associated labels), but not the position of the plot itself. Is that correct?
UPD: an easy solution would be to change the timestamps (say, recalculate them to "time elapsed since start"), but I'd prefer to preserve them.
UPD: the answer at https://stackoverflow.com/a/5657491/1214547 works for me with some modifications. I will write up my solution soon.
@Pastafarianist provides a good solution. However, I find a bug in the InvertedCustomTransform when I deal with the plotting with more than one break. For a example, in the following code the cross hair can't follow the cursor over the second and the third breaks.
enter image description here If change the 'transform_non_affine' function in the 'InvertedCustomTransform' class as follows it works well.
The reason maybe that the input 'a' for the transformation method is not the whole axis, it is only a numpy.array with length 1.
Here is a solution that works for me. It does not handle closely located breaks well (the labels may get too crowded), but in my case it doesn't matter.