multi-character separator in `set datafile separat

2019-07-31 18:42发布

I have an input file example.data with a triple-pipe as separator, dates in the first column, and also some more or less unpredictable text in the last column:

2019-02-01|||123|||345|||567|||Some unpredictable textual data with pipes|,
2019-02-02|||234|||345|||456|||weird symbols @ and commas, and so on.
2019-02-03|||345|||234|||123|||text text text

When I try to run the following gnuplot5 script

set terminal png size 400,300
set output 'myplot.png'

set datafile separator "|||"
set xdata time
set timefmt "%Y-%m-%d"
set format x "%y-%m-%d"
plot "example.data" using 1:2 with linespoints

I get the following error:

line 8: warning: Skipping data file with no valid points

plot "example.data" using 1:2 with linespoints
                                              ^
"time.gnuplot", line 8: x range is invalid

Even stranger, if I change the last line to

plot "example.data" using 1:4 with linespoints

then it works. It also works for 1:7 and 1:10, but not for other numbers. Why?

2条回答
欢心
2楼-- · 2019-07-31 19:17

When using the

set datafile separator "chars"

syntax, the string is not treated as one long separator. Instead, every character listed between the quotes becomes a separator on its own. From [Janert, 2016]:

If you provide an explicit string, then each character in the string will be treated as a separator character.

Therefore,

set datafile separator "|||"

is actually equivalent to

set datafile separator "|"

and a line

2019-02-05|||123|||456|||789

is treated as if it had ten columns, of which only the columns 1,4,7,10 are non-empty.


Workaround

Find some other character that is unlikely to appear in the dataset (in the following, I'll assume \t as an example). If you can't dump the dataset with a different separator, use sed to replace ||| by \t:

sed 's/|||/\t/g' example.data > modified.data # in the command line

then proceed with

set datafile separator "\t"

and modified.data as input.

查看更多
戒情不戒烟
3楼-- · 2019-07-31 19:25

You basically gave the answer yourself.

  1. If you can influence the separator in your data, use a separator which typically does not occur in your data or text. I always thought \t was made for that.

  2. If you cannot influence the separator in your data, use an external tool (awk, Python, Perl, ...) to modify your data. In these languages it is probably a "one-liner". gnuplot has no direct replace function.

  3. If you don't want to install external tools and want ensure platform independence, there is still a way to do it with gnuplot. Not just a "one-liner", but there is almost nothing you can't also do with gnuplot ;-).

Edit: simplified version with the input from @Ethan (https://stackoverflow.com/a/54541790/7295599).

Assuming you have your data in a dataset named $Data. The following code will replace ||| with \t and puts the result into $DataOutput.

### Replace string in dataset
reset session

$Data <<EOD
# data with special string separators
2019-02-01|||123|||345|||567|||Some unpredictable textual data with pipes|,
2019-02-02|||234|||345|||456|||weird symbols @ and commas, and so on.
2019-02-03|||345|||234|||123|||text text text
EOD

# replace string function
# prefix RS_ to avoid variable name conflicts
replaceStr(s,s1,s2) = (RS_s='', RS_n=1, (sum[RS_i=1:strlen(s)] \
    ((s[RS_n:RS_n+strlen(s1)-1] eq s1 ? (RS_s=RS_s.s2, RS_n=RS_n+strlen(s1)) : \
    (RS_s=RS_s.s[RS_n:RS_n], RS_n=RS_n+1)), 0)), RS_s)

set print $DataOutput
do for [RS_j=1:|$Data|] {
    print replaceStr($Data[RS_j],"|||","\t")
}
set print

print $DataOutput
### end of code

Output:

# data with special string separators
2019-02-01  123 345 567 Some unpredictable textual data with pipes|,
2019-02-02  234 345 456 weird symbols @ and commas, and so on.
2019-02-03  345 234 123 text text text
查看更多
登录 后发表回答