I am trying to make some measurements, and i would like to know what is the fastest way to print something through nawk
.
at the moment i use printf ARR[2] " ";
, but it seems to take more time than normal to print.
Info: I am printing around 500 numbers and adding the space in the printf
so that not everything would be stucked together in the print out. Also i am running the script on ksh, in unix oracle solaris.
Like this, it needs around 14 seconds to print everything, is there any faster way i could do this?
Thanks in Advance!
UPDATE
The function that i care about is awkfun, in whuch i use time
when i call it in order to make my time measurements.
Think of NUMBERS
as a variable that holds 1000 random numbers, and XNUMBERS
a variable that holds 1000 random number but in this format, 123|321
, so it takes the random number reverces it and adds a |
in the middle.
I am checking for each of NUMBERS
if it exhists in XNUMBERS
and if it exhists i am printing out only the reversed number.
numfun() {
NUMBERS=`nawk ' BEGIN{
srand();
for (i=0; i<=999; i++) {
printf("%s\n", 100 + int(rand() * (899)));
}
}'`
}
numfun
sleep 1
xnumfun() {
XNUMBERS=`nawk ' BEGIN{
srand();
for (i=0; i<=999; i++) {
XNUMBERS[i]= 100 + int(rand() * (899));
}
for (i=0; i<=999; i++) {
ver=XNUMBERS[i] "";
rev = "";
for (q=length(ver); q!=0; q--) {
rev = rev substr(ver, q, 1);
}
printf("%s\n", XNUMBERS[i] "|" rev );
}
}'`
}
xnumfun
awkfun() {
for n in $NUMBERS
do
echo "${XNUMBERS}" | nawk -v VAR=$n '
{
split($1,ARR,"|")
if (VAR == ARR[1]){
printf ARR[2] " ";
exit;
}
}'
done
}
shellfun() {
for n in $NUMBERS
do
for x in $XNUMBERS
do
if test "$n" -eq "${x%%\|*}"
then
echo "${x##*\|}";
break;
fi
continue;
done
done
}
sleep 1
time awkfun;
echo "\nAWK TIME\n\n-----------------------------";
time shellfun;
echo "\nSHELL TIME\n\n-----------------------------";
time numfun;
echo "\nNUMBERS TIME\n\n-----------------------------";
time xnumfun;
echo "\nXNUMBERS TIME\n\n-----------------------------\n\nTOTAL TIME\n";
Results
Just as a reference, for the results after refining the script, AWK average Real time = 0,84
, SHELL average Real Time: 0,48
The reason your program is slow is not because of printing. Your program is slow because you invoke a new copy of
nawk
for every element of$NUMBERS
. This is very wasteful and you should rethink your program design from the beginning. It appears you are mostly trying to see which numbers from one list exist in a second list. If you want to do this in nawk, you should read the entire first list first, and store the elements in an associative array before reading each number from the second file.You could probably solve this problem more cleanly using
join
orgrep
.Edit: Here's a working solution using
grep
. It's at least 20x faster than your originalshellfun()
.The way it works is to take all the numbers from
$XNUMBERS
before the pipes (so12|21 34|43
becomes12\n34
), then pipe those togrep
with the-f
argument being all of$NUMBERS
. This means we search for all the left-hand sides of$XNUMBERS
within$NUMBERS
, and after printing the matches we simply userev
to reverse them. We don't need the right-hand sides of$XNUMBERS
at all (so maybe you can even stop generating them in the first place, saving more time).Edit: Since you've now told us you are running on Solaris instead of Linux, you don't have
rev
, so you can replacerev
in the above with this:And you can replace
grep
with/usr/xpg4/bin/grep
to get an enhanced version that supports-f
.you are launching nawk for every number in $NUMBERS, very expensive in terms of time.
you could filter
$NUMBERS
with grep to only work on the numbers you are interested in. i.e.will give you a list of XNUMBERS (in matched_numbers) that are also in NUMBERS