Sort entries of lines using shell

2019-02-27 15:38发布

Considering the following input and output:

  infile   |   outfile
1 3 5 2 4  |  1 2 3 4 5
2 4 5      |  2 4 5
4 6 2 1    |  1 2 4 6

Is there any combination of UNIX programs, not involving programming languages -- not any other than the shell scripting itself --, that sorts the entries in each line of a file faster than the following approach:

while read line; do
    tr ' ' '\n' <<< ${line} | sort | tr '\n' ' '
    echo ""
done < infile > outfile

I mean, I'm able to create a small cpp/python/awk/... program to do so, but it is just not the same as using the usual one-liners to magically solve problems.

Edit:

I must have added too much text, instead of simply asking what I wanted; straightforwardly, I wanted to confirm whether there was any UNIX program/combination of programs (using pipes, fors, whiles, ...) capable of sorting entries in a line, but without as much overhead as the one solution above.

I know I may do the nasty job in a programming language, like perl, awk, python, but I was actually looking for a composition of UNIX programs that wouldn't involve these language interpreters. From the answers, I must conclude there is no such inline sort tool(s), and I'm very thankful for the solutions I've got -- mainly the very neat Perl one-liner.

Yet, I do not really understand the reason for so much overhead on the Bash approach I posted. Is it really due to a multitude of context switches, or is it simply the overhead of translating back and fro the input, and sorting it?

I can't seem to understand which of these steps is slowing down the execution so much. It takes several minutes to sort the entries in a file with ~500k lines, with ~30 values in each line.

4条回答
该账号已被封号
2楼-- · 2019-02-27 15:59

Its not pretty (definitely not a 1-liner), but you can sort a line using only builtin shell commands, however for short lines it may be faster than repeatedly calling external functions.

#!/bin/sh
sortline(){
for x in $@;do
    [ ! "$FIRST" ] && FIRST=t && set --
    i=0
    while [ $i -le $# ];do
        [ $x -lt $((${@:$((i+1)):1})) ] && break || i=$((i+1))
    done
    set -- ${@:1:$i}  $x   ${@:$((i+1)):$(($#-$i))}
done
echo $@
}
while read LINE || [ "$LINE" ];do
    sortline $LINE
done <$1 >$2

Edit: btw this is a selection sort algorithm in case anyone wondered

Edit2: this is for numerical values only, for strings you would need to use some comparison like [ "$x" -lt "${@:$((i+1)):1}" ] (unchecked),however I use this C program for strings (I just call it qsort), but it could be modified using atoi on argv:

#include <stdlib.h>
#include <string.h>
static inline int cmp(const void *a, const void *b){
   return strcmp(*(const char **)a, *(const char **)b);
}

int main(int argc, char *argv[]){
    qsort(++argv, --argc, sizeof(char *), cmp);
    while (argc){
      write(1,argv[0],strlen(argv[0]));
      write(1,(--argc && argv++)?"\t":"\n",1);
   }
}
查看更多
爷的心禁止访问
3楼-- · 2019-02-27 16:09

Perl can do this nicely as a one-line Unix/Linux command:

perl -n -e "print join ' ', sort{a<=>b} split ' '" < input.txt > output.txt

This is "archaic" Perl with no dollars before the a and b, which allows the command to run fine in both Windows and bash shells. If you use the dollars with bash, they must either be escaped with backslashes, or you must invert the single and double quotes.

Note that the distinctions you are trying to draw between commands, programming languages, and programs are pretty thin. Bash is a programming language. Perl can certainly be used as a shell. Both are commands.

The reason your script runs slowly is that it spawns 3 processes per loop iteration. Process creation is pretty expensive.

查看更多
Luminary・发光体
4楼-- · 2019-02-27 16:24
#!awk -f
{
  baz = 0
  PROCINFO["sorted_in"] = "@val_num_asc"
  split($0, foo)
  for (bar in foo)
    $++baz = foo[bar]
}
1

Result

1 2 3 4 5
2 4 5
1 2 4 6
查看更多
时光不老,我们不散
5楼-- · 2019-02-27 16:25

The question is more subtle than it seems. You appear to be asking whether there is a quicker way to perform the sort, and you are getting a lot of (elegant!) answers with Perl and awk and so on. But your question seems to be whether you can do a quicker sort with shell built-ins, and for that, the answer is no.

Obviously, sort is not a shell built-in, and neither is tr. There isn't a built-in that does what sort does, and the built-ins that might substitute for "tr" are not likely to help you here (it would take as much work to manipulate, say, bash's IFS variable to remove the call to tr as to just live with the tr).

Personally, I would go with Perl. Note that if your data set is large or funky, you have the option of changing Perls default sorting algorithm using the sort pragma. I don;t think you will need it for sorting a file of integers, but maybe that was just an illustration on your part.

查看更多
登录 后发表回答