Speed up gnuplot column iteration on input piped t

2019-07-08 04:47发布

I have a script that converts raw data for interactive use with gnuplot. The scipt takes arguments that let me apply some filtering on the data. In principle, it could be emulated by this script:

import time, sys
print('Generating data...', file=sys.stderr)
time.sleep(1)
print('x', '100*x', 'x**3', '2**x')
for x in range(*map(int, sys.argv[1:3])):
  print(x, 100*x, x**3, 2**x)

I plot the series of data in columns by piping the shell command to gnuplot and iterating over columns:

gnuplot> plot for [i=2:4] '< python3 pipe_data.py 1 11' u 1:i w l t columnhead(i)
Generating data...
Generating data...
Generating data...
gnuplot>

Currently, when I notice that the script takes too long to run it multiple times, I execute it outside of gnuplot and save the output to a file. However, it is cumbersome having to do it whenever I want to change the arguments to the script.

I would like gnuplot to execute '< python3 pipe_data.py' only once, so that only one Generating data... is printed on the screen. Is this possible?

Ideally, gnuplot would cache contents of special filenames starting with a <. This way it would be possible to tweak the appearance of the plot, without regenerating the data, e.g.:

gnuplot> plot for [i=2:4] '< python3 pipe_data.py 1 11' u 1:i w l t columnhead(i)
Generating data...
gnuplot> plot for [i=2:4] '< python3 pipe_data.py 1 11' u 1:i w lp t columnhead(i)
gnuplot> plot for [i=2:4] '< python3 pipe_data.py 5 12' u 1:i w lp t columnhead(i)
Generating data...
gnuplot> plot for [i=2:4] '< python3 pipe_data.py 1 11' u 1:i w p t columnhead(i)
gnuplot>

This could become problematic when the raw data changes, gnuplot would have no way of knowing this. But I still hope there is some way to achieve this effect. If not with just gnuplot, then maybe with some external tools?

For the record, I use gnuplot v4.6.

2条回答
爱情/是我丢掉的垃圾
2楼-- · 2019-07-08 05:43

I came up with a bash script that solves all the issues for me:

  • when iterating over columns, the script for crunching data is run only once
  • when tweaking plot display, there's no need to wait for the script to run anymore
  • I can pass other arguments to the script and have the results cached
  • I can come back to previous arguments without having to wait
  • it works with any script, regardless of what and how many arguments it may take
  • if any file (script or data) changes, it will be picked up and re-run
  • it is completely transparent, there's no additional set-up nor tuning needed
  • I don't have to worry about cleaning up, because the system will do it for me

I named it $ (for cash/cache), chmod u+x'd it, and placed it within PATH:

#!/bin/bash

# hash all arguments
KEY="$@"

# hash last modified dates of any files
for arg in "$@"
do
  if [ -f $arg ]
  then
    KEY+=`date -r "$arg" +\ %s`
  fi
done

# use the hash as a name for temporary file
FILE="/tmp/command_cache.`echo -n "$KEY" | md5sum | cut -c -10`"

# use cached file or execute the command and cache it
if [ -f $FILE ]
then
  cat $FILE
else
  $@ | tee $FILE
fi

Now I can take advantage of it using <$ instead of <:

> plot for [i=2:4] '<$ python3 pipe_data.py 1 11' u 1:i w l t columnhead(i)
Generating data...
> plot for [i=2:4] '<$ python3 pipe_data.py 1 11' u 1:i w lp t columnhead(i)
> plot for [i=2:4] '<$ python3 pipe_data.py 5 12' u 1:i w lp t columnhead(i)
Generating data...
> plot for [i=2:4] '<$ python3 pipe_data.py 1 11' u 1:i w p t columnhead(i)
>
查看更多
姐就是有狂的资本
3楼-- · 2019-07-08 05:45

The following command should do what you want. I am generating a ona-data-point file that takes two parameters, i and j. The file is generated automatically when you call plot data(i,j) and then it's reused every subsequent time. Change my sleep 5; echo %i %i by your command. You'll also need to change the formats if you're not using integers.

data(i,j) = system(sprintf('name="temp_%i_%i"; if [ ! -s $name ]; then sleep 5; echo %i %i > $name; fi; echo $name', i, j, i, j))

Example usage:

# You'll notice a 5-second pause here while the command is running:
plot data(1,1)

# Now it will run at once because the file already exists:
plot data(1,1)
查看更多
登录 后发表回答