I have an input file example.data
with a triple-pipe as separator, dates in the first column, and also some more or less unpredictable text in the last column:
2019-02-01|||123|||345|||567|||Some unpredictable textual data with pipes|,
2019-02-02|||234|||345|||456|||weird symbols @ and commas, and so on.
2019-02-03|||345|||234|||123|||text text text
When I try to run the following gnuplot5 script
set terminal png size 400,300
set output 'myplot.png'
set datafile separator "|||"
set xdata time
set timefmt "%Y-%m-%d"
set format x "%y-%m-%d"
plot "example.data" using 1:2 with linespoints
I get the following error:
line 8: warning: Skipping data file with no valid points
plot "example.data" using 1:2 with linespoints
^
"time.gnuplot", line 8: x range is invalid
Even stranger, if I change the last line to
plot "example.data" using 1:4 with linespoints
then it works. It also works for 1:7
and 1:10
, but not for other numbers. Why?
When using the
set datafile separator "chars"
syntax, the string is not treated as one long separator. Instead, every character listed between the quotes becomes a separator on its own. From [Janert, 2016]:
If you provide an explicit string, then each character in the string will be
treated as a separator character.
Therefore,
set datafile separator "|||"
is actually equivalent to
set datafile separator "|"
and a line
2019-02-05|||123|||456|||789
is treated as if it had ten columns, of which only the columns 1,4,7,10 are non-empty.
Workaround
Find some other character that is unlikely to appear in the dataset (in the following, I'll assume \t
as an example). If you can't dump the dataset with a different separator, use sed
to replace |||
by \t
:
sed 's/|||/\t/g' example.data > modified.data # in the command line
then proceed with
set datafile separator "\t"
and modified.data
as input.
You basically gave the answer yourself.
If you can influence the separator in your data, use a separator which typically does not occur in your data or text. I always thought \t
was made for that.
If you cannot influence the separator in your data, use an external tool (awk, Python, Perl, ...) to modify your data. In these languages it is probably a "one-liner". gnuplot has no direct replace function.
If you don't want to install external tools and want ensure platform independence, there is still a way to do it with gnuplot. Not just a "one-liner", but there is almost nothing you can't also do with gnuplot ;-).
Edit: simplified version with the input from @Ethan (https://stackoverflow.com/a/54541790/7295599).
Assuming you have your data in a dataset named $Data
. The following code will replace |||
with \t
and puts the result into $DataOutput
.
### Replace string in dataset
reset session
$Data <<EOD
# data with special string separators
2019-02-01|||123|||345|||567|||Some unpredictable textual data with pipes|,
2019-02-02|||234|||345|||456|||weird symbols @ and commas, and so on.
2019-02-03|||345|||234|||123|||text text text
EOD
# replace string function
# prefix RS_ to avoid variable name conflicts
replaceStr(s,s1,s2) = (RS_s='', RS_n=1, (sum[RS_i=1:strlen(s)] \
((s[RS_n:RS_n+strlen(s1)-1] eq s1 ? (RS_s=RS_s.s2, RS_n=RS_n+strlen(s1)) : \
(RS_s=RS_s.s[RS_n:RS_n], RS_n=RS_n+1)), 0)), RS_s)
set print $DataOutput
do for [RS_j=1:|$Data|] {
print replaceStr($Data[RS_j],"|||","\t")
}
set print
print $DataOutput
### end of code
Output:
# data with special string separators
2019-02-01 123 345 567 Some unpredictable textual data with pipes|,
2019-02-02 234 345 456 weird symbols @ and commas, and so on.
2019-02-03 345 234 123 text text text