which is the fastest way to print in awk

2019-06-12 04:12发布

I am trying to make some measurements, and i would like to know what is the fastest way to print something through nawk. at the moment i use printf ARR[2] " ";, but it seems to take more time than normal to print.

Info: I am printing around 500 numbers and adding the space in the printf so that not everything would be stucked together in the print out. Also i am running the script on ksh, in unix oracle solaris.

Like this, it needs around 14 seconds to print everything, is there any faster way i could do this?

Thanks in Advance!

UPDATE

The function that i care about is awkfun, in whuch i use time when i call it in order to make my time measurements. Think of NUMBERS as a variable that holds 1000 random numbers, and XNUMBERS a variable that holds 1000 random number but in this format, 123|321, so it takes the random number reverces it and adds a | in the middle. I am checking for each of NUMBERS if it exhists in XNUMBERS and if it exhists i am printing out only the reversed number.

numfun() {
    NUMBERS=`nawk ' BEGIN{ 
        srand();
        for (i=0; i<=999; i++) {
            printf("%s\n", 100 + int(rand() * (899)));
        }   
    }'`
}
numfun
sleep 1
xnumfun() {
    XNUMBERS=`nawk ' BEGIN{ 
        srand();
        for (i=0; i<=999; i++) {
            XNUMBERS[i]= 100 + int(rand() * (899));
        }
        for (i=0; i<=999; i++) {
            ver=XNUMBERS[i] "";
                    rev = "";
            for (q=length(ver); q!=0; q--) {
                rev = rev substr(ver, q, 1);
            }
            printf("%s\n", XNUMBERS[i] "|" rev );
        }
    }'`
}
xnumfun
awkfun() {
    for n in $NUMBERS
    do
        echo "${XNUMBERS}" | nawk -v VAR=$n '
        {
            split($1,ARR,"|")
            if (VAR == ARR[1]){
                printf ARR[2] " ";
                exit;
            }
        }' 
    done

}
shellfun() {
    for n in $NUMBERS
    do
        for x in $XNUMBERS
        do
            if test "$n" -eq "${x%%\|*}"
                then
                echo "${x##*\|}";
                break;
            fi
            continue;
        done
    done
}
sleep 1
time awkfun;
echo "\nAWK TIME\n\n-----------------------------";
time shellfun;
echo "\nSHELL TIME\n\n-----------------------------";
time numfun;
echo "\nNUMBERS TIME\n\n-----------------------------";
time xnumfun;
echo "\nXNUMBERS TIME\n\n-----------------------------\n\nTOTAL TIME\n";

Results

Just as a reference, for the results after refining the script, AWK average Real time = 0,84 , SHELL average Real Time: 0,48

2条回答
smile是对你的礼貌
2楼-- · 2019-06-12 04:35

The reason your program is slow is not because of printing. Your program is slow because you invoke a new copy of nawk for every element of $NUMBERS. This is very wasteful and you should rethink your program design from the beginning. It appears you are mostly trying to see which numbers from one list exist in a second list. If you want to do this in nawk, you should read the entire first list first, and store the elements in an associative array before reading each number from the second file.

You could probably solve this problem more cleanly using join or grep.


Edit: Here's a working solution using grep. It's at least 20x faster than your original shellfun().

shellfun2() {
    echo $XNUMBERS | tr ' ' '\n' | cut -d '|' -f1 \
        | grep -f <(echo $NUMBERS | tr ' ' '\n') | rev
}

The way it works is to take all the numbers from $XNUMBERS before the pipes (so 12|21 34|43 becomes 12\n34), then pipe those to grep with the -f argument being all of $NUMBERS. This means we search for all the left-hand sides of $XNUMBERS within $NUMBERS, and after printing the matches we simply use rev to reverse them. We don't need the right-hand sides of $XNUMBERS at all (so maybe you can even stop generating them in the first place, saving more time).


Edit: Since you've now told us you are running on Solaris instead of Linux, you don't have rev, so you can replace rev in the above with this:

sed '/\n/!G;s/\(.\)\(.*\n\)/&\2\1/;//D;s/.//'

And you can replace grep with /usr/xpg4/bin/grep to get an enhanced version that supports -f.

查看更多
唯我独甜
3楼-- · 2019-06-12 04:36

you are launching nawk for every number in $NUMBERS, very expensive in terms of time.

you could filter $NUMBERS with grep to only work on the numbers you are interested in. i.e.

grep -f FileWithListOfNumbers FileWithListOfXnumbers >matched_numbers

will give you a list of XNUMBERS (in matched_numbers) that are also in NUMBERS

查看更多
登录 后发表回答