Fastest sort of fixed length 6 int array

2019-01-01 04:22发布

Answering to another Stack Overflow question (this one) I stumbled upon an interesting sub-problem. What is the fastest way to sort an array of 6 ints?

As the question is very low level:

  • we can't assume libraries are available (and the call itself has its cost), only plain C
  • to avoid emptying instruction pipeline (that has a very high cost) we should probably minimize branches, jumps, and every other kind of control flow breaking (like those hidden behind sequence points in && or ||).
  • room is constrained and minimizing registers and memory use is an issue, ideally in place sort is probably best.

Really this question is a kind of Golf where the goal is not to minimize source length but execution time. I call it 'Zening' code as used in the title of the book Zen of Code optimization by Michael Abrash and its sequels.

As for why it is interesting, there is several layers:

  • the example is simple and easy to understand and measure, not much C skill involved
  • it shows effects of choice of a good algorithm for the problem, but also effects of the compiler and underlying hardware.

Here is my reference (naive, not optimized) implementation and my test set.

#include <stdio.h>

static __inline__ int sort6(int * d){

    char j, i, imin;
    int tmp;
    for (j = 0 ; j < 5 ; j++){
        imin = j;
        for (i = j + 1; i < 6 ; i++){
            if (d[i] < d[imin]){
                imin = i;
            }
        }
        tmp = d[j];
        d[j] = d[imin];
        d[imin] = tmp;
    }
}

static __inline__ unsigned long long rdtsc(void)
{
  unsigned long long int x;
     __asm__ volatile (".byte 0x0f, 0x31" : "=A" (x));
     return x;
}

int main(int argc, char ** argv){
    int i;
    int d[6][5] = {
        {1, 2, 3, 4, 5, 6},
        {6, 5, 4, 3, 2, 1},
        {100, 2, 300, 4, 500, 6},
        {100, 2, 3, 4, 500, 6},
        {1, 200, 3, 4, 5, 600},
        {1, 1, 2, 1, 2, 1}
    };

    unsigned long long cycles = rdtsc();
    for (i = 0; i < 6 ; i++){
        sort6(d[i]);
        /*
         * printf("d%d : %d %d %d %d %d %d\n", i,
         *  d[i][0], d[i][6], d[i][7],
         *  d[i][8], d[i][9], d[i][10]);
        */
    }
    cycles = rdtsc() - cycles;
    printf("Time is %d\n", (unsigned)cycles);
}

Raw results

As number of variants is becoming large, I gathered them all in a test suite that can be found here. The actual tests used are a bit less naive than those showed above, thanks to Kevin Stock. You can compile and execute it in your own environment. I'm quite interested by behavior on different target architecture/compilers. (OK guys, put it in answers, I will +1 every contributor of a new resultset).

I gave the answer to Daniel Stutzbach (for golfing) one year ago as he was at the source of the fastest solution at that time (sorting networks).

Linux 64 bits, gcc 4.6.1 64 bits, Intel Core 2 Duo E8400, -O2

  • Direct call to qsort library function : 689.38
  • Naive implementation (insertion sort) : 285.70
  • Insertion Sort (Daniel Stutzbach) : 142.12
  • Insertion Sort Unrolled : 125.47
  • Rank Order : 102.26
  • Rank Order with registers : 58.03
  • Sorting Networks (Daniel Stutzbach) : 111.68
  • Sorting Networks (Paul R) : 66.36
  • Sorting Networks 12 with Fast Swap : 58.86
  • Sorting Networks 12 reordered Swap : 53.74
  • Sorting Networks 12 reordered Simple Swap : 31.54
  • Reordered Sorting Network w/ fast swap : 31.54
  • Reordered Sorting Network w/ fast swap V2 : 33.63
  • Inlined Bubble Sort (Paolo Bonzini) : 48.85
  • Unrolled Insertion Sort (Paolo Bonzini) : 75.30

Linux 64 bits, gcc 4.6.1 64 bits, Intel Core 2 Duo E8400, -O1

  • Direct call to qsort library function : 705.93
  • Naive implementation (insertion sort) : 135.60
  • Insertion Sort (Daniel Stutzbach) : 142.11
  • Insertion Sort Unrolled : 126.75
  • Rank Order : 46.42
  • Rank Order with registers : 43.58
  • Sorting Networks (Daniel Stutzbach) : 115.57
  • Sorting Networks (Paul R) : 64.44
  • Sorting Networks 12 with Fast Swap : 61.98
  • Sorting Networks 12 reordered Swap : 54.67
  • Sorting Networks 12 reordered Simple Swap : 31.54
  • Reordered Sorting Network w/ fast swap : 31.24
  • Reordered Sorting Network w/ fast swap V2 : 33.07
  • Inlined Bubble Sort (Paolo Bonzini) : 45.79
  • Unrolled Insertion Sort (Paolo Bonzini) : 80.15

I included both -O1 and -O2 results because surprisingly for several programs O2 is less efficient than O1. I wonder what specific optimization has this effect ?

Comments on proposed solutions

Insertion Sort (Daniel Stutzbach)

As expected minimizing branches is indeed a good idea.

Sorting Networks (Daniel Stutzbach)

Better than insertion sort. I wondered if the main effect was not get from avoiding the external loop. I gave it a try by unrolled insertion sort to check and indeed we get roughly the same figures (code is here).

Sorting Networks (Paul R)

The best so far. The actual code I used to test is here. Don't know yet why it is nearly two times as fast as the other sorting network implementation. Parameter passing ? Fast max ?

Sorting Networks 12 SWAP with Fast Swap

As suggested by Daniel Stutzbach, I combined his 12 swap sorting network with branchless fast swap (code is here). It is indeed faster, the best so far with a small margin (roughly 5%) as could be expected using 1 less swap.

It is also interesting to notice that the branchless swap seems to be much (4 times) less efficient than the simple one using if on PPC architecture.

Calling Library qsort

To give another reference point I also tried as suggested to just call library qsort (code is here). As expected it is much slower : 10 to 30 times slower... as it became obvious with the new test suite, the main problem seems to be the initial load of the library after the first call, and it compares not so poorly with other version. It is just between 3 and 20 times slower on my Linux. On some architecture used for tests by others it seems even to be faster (I'm really surprised by that one, as library qsort use a more complex API).

Rank order

Rex Kerr proposed another completely different method : for each item of the array compute directly its final position. This is efficient because computing rank order do not need branch. The drawback of this method is that it takes three times the amount of memory of the array (one copy of array and variables to store rank orders). The performance results are very surprising (and interesting). On my reference architecture with 32 bits OS and Intel Core2 Quad E8300, cycle count was slightly below 1000 (like sorting networks with branching swap). But when compiled and executed on my 64 bits box (Intel Core2 Duo) it performed much better : it became the fastest so far. I finally found out the true reason. My 32bits box use gcc 4.4.1 and my 64bits box gcc 4.4.3 and the last one seems much better at optimising this particular code (there was very little difference for other proposals).

update:

As published figures above shows this effect was still enhanced by later versions of gcc and Rank Order became consistently twice as fast as any other alternative.

Sorting Networks 12 with reordered Swap

The amazing efficiency of the Rex Kerr proposal with gcc 4.4.3 made me wonder : how could a program with 3 times as much memory usage be faster than branchless sorting networks? My hypothesis was that it had less dependencies of the kind read after write, allowing for better use of the superscalar instruction scheduler of the x86. That gave me an idea: reorder swaps to minimize read after write dependencies. More simply put: when you do SWAP(1, 2); SWAP(0, 2); you have to wait for the first swap to be finished before performing the second one because both access to a common memory cell. When you do SWAP(1, 2); SWAP(4, 5);the processor can execute both in parallel. I tried it and it works as expected, the sorting networks is running about 10% faster.

Sorting Networks 12 with Simple Swap

One year after the original post Steinar H. Gunderson suggested, that we should not try to outsmart the compiler and keep the swap code simple. It's indeed a good idea as the resulting code is about 40% faster! He also proposed a swap optimized by hand using x86 inline assembly code that can still spare some more cycles. The most surprising (it says volumes on programmer's psychology) is that one year ago none of used tried that version of swap. Code I used to test is here. Others suggested other ways to write a C fast swap, but it yields the same performances as the simple one with a decent compiler.

The "best" code is now as follow:

static inline void sort6_sorting_network_simple_swap(int * d){
#define min(x, y) (x<y?x:y)
#define max(x, y) (x<y?y:x) 
#define SWAP(x,y) { const int a = min(d[x], d[y]); \
                    const int b = max(d[x], d[y]); \
                    d[x] = a; d[y] = b; }
    SWAP(1, 2);
    SWAP(4, 5);
    SWAP(0, 2);
    SWAP(3, 5);
    SWAP(0, 1);
    SWAP(3, 4);
    SWAP(1, 4);
    SWAP(0, 3);
    SWAP(2, 5);
    SWAP(1, 3);
    SWAP(2, 4);
    SWAP(2, 3);
#undef SWAP
#undef min
#undef max
}

If we believe our test set (and, yes it is quite poor, it's mere benefit is being short, simple and easy to understand what we are measuring), the average number of cycles of the resulting code for one sort is below 40 cycles (6 tests are executed). That put each swap at an average of 4 cycles. I call that amazingly fast. Any other improvements possible ?

23条回答
后来的你喜欢了谁
2楼-- · 2019-01-01 05:03

I found that at least on my system, the functions sort6_iterator() and sort6_iterator_local() defined below both ran at least as fast, and frequently noticeably faster, than the above current record holder:

#define MIN(x, y) (x<y?x:y)
#define MAX(x, y) (x<y?y:x)

template<class IterType> 
inline void sort6_iterator(IterType it) 
{
#define SWAP(x,y) { const auto a = MIN(*(it + x), *(it + y)); \
  const auto b = MAX(*(it + x), *(it + y)); \
  *(it + x) = a; *(it + y) = b; }

  SWAP(1, 2) SWAP(4, 5)
  SWAP(0, 2) SWAP(3, 5)
  SWAP(0, 1) SWAP(3, 4)
  SWAP(1, 4) SWAP(0, 3)
  SWAP(2, 5) SWAP(1, 3)
  SWAP(2, 4)
  SWAP(2, 3)
#undef SWAP
}

I passed this function a std::vector's iterator in my timing code. I suspect that using iterators gives g++ certain assurances about what can and can't happen to the memory that the iterator refers to, which it otherwise wouldn't have and it is these assurances that allow g++ to better optimize the sorting code (which if I remember correctly, is also part of the reason why so many std algorithms, such as std::sort(), generally have such obscenely good performance). However, while timing I noticed that the context (i.e. surrounding code) in which the call to the sorting function was made had a significant impact on performance, which is likely due to the function being inlined and then optimized. For instance, if the program was sufficiently simple then there usually wasn't much of a difference in performance between passing the sorting function a pointer versus passing it an iterator; otherwise using iterators usually resulted in noticeably better performance and never (in my experience so far at least) any noticeably worse performance. I suspect that this may be because g++ can globally optimize sufficiently simple code.

Moreover, sort6_iterator() is sometimes (again, depending on the context in which the function is called) consistently outperformed by the following sorting function, which copies the data into local variables before sorting them:

template<class IterType> 
inline void sort6_iterator_local(IterType it) 
{
#define SWAP(x,y) { const auto a = MIN(data##x, data##y); \
  const auto b = MAX(data##x, data##y); \
  data##x = a; data##y = b; }
//DD = Define Data
#define DD1(a)   auto data##a = *(it + a);
#define DD2(a,b) auto data##a = *(it + a), data##b = *(it + b);
//CB = Copy Back
#define CB(a) *(it + a) = data##a;

  DD2(1,2)    SWAP(1, 2)
  DD2(4,5)    SWAP(4, 5)
  DD1(0)      SWAP(0, 2)
  DD1(3)      SWAP(3, 5)
  SWAP(0, 1)  SWAP(3, 4)
  SWAP(1, 4)  SWAP(0, 3)   CB(0)
  SWAP(2, 5)  CB(5)
  SWAP(1, 3)  CB(1)
  SWAP(2, 4)  CB(4)
  SWAP(2, 3)  CB(2)        CB(3)
#undef CB
#undef DD2
#undef DD1
#undef SWAP
}

Note that defining SWAP() as follows sometimes results in slightly better performance although most of the time it results in slightly worse performance or a negligible difference in performance.

#define SWAP(x,y) { const auto a = MIN(data##x, data##y); \
  data##y = MAX(data##x, data##y); \
  data##x = a; }

If you just want a sorting algorithm that gcc -O3 is consistently good at optimizing then, depending on how you pass the input, try one of the following two algorithms:

template<class T> inline void sort6(T it) {
#define SORT2(x,y) {if(data##x>data##y){auto a=std::move(data##y);data##y=std::move(data##x);data##x=std::move(a);}}
#define DD1(a)   register auto data##a=*(it+a);
#define DD2(a,b) register auto data##a=*(it+a);register auto data##b=*(it+b);
#define CB1(a)   *(it+a)=data##a;
#define CB2(a,b) *(it+a)=data##a;*(it+b)=data##b;
  DD2(1,2) SORT2(1,2)
  DD2(4,5) SORT2(4,5)
  DD1(0)   SORT2(0,2)
  DD1(3)   SORT2(3,5)
  SORT2(0,1) SORT2(3,4) SORT2(2,5) CB1(5)
  SORT2(1,4) SORT2(0,3) CB1(0)
  SORT2(2,4) CB1(4)
  SORT2(1,3) CB1(1)
  SORT2(2,3) CB2(2,3)
#undef CB1
#undef CB2
#undef DD1
#undef DD2
#undef SORT2
}

Or (it differs from the above in its first 5 lines)

template<class T> inline void sort6(T& e0, T& e1, T& e2, T& e3, T& e4, T& e5) {
#define SORT2(x,y) {if(data##x>data##y)std::swap(data##x,data##y);}
#define DD1(a)   register auto data##a=e##a;
#define DD2(a,b) register auto data##a=e##a;register auto data##b=e##b;
#define CB1(a)   e##a=data##a;
#define CB2(a,b) e##a=data##a;e##b=data##b;
  DD2(1,2) SORT2(1,2)
  DD2(4,5) SORT2(4,5)
  DD1(0)   SORT2(0,2)
  DD1(3)   SORT2(3,5)
  SORT2(0,1) SORT2(3,4) SORT2(2,5) CB1(5)
  SORT2(1,4) SORT2(0,3) CB1(0)
  SORT2(2,4) CB1(4)
  SORT2(1,3) CB1(1)
  SORT2(2,3) CB2(2,3)
#undef CB1
#undef CB2
#undef DD1
#undef DD2
#undef SORT2
}

The reason for using the register keyword is because this is one of the few times that you know that you want these values in registers. Without register, the compiler will figure this out most of the time but sometimes it doesn't. Using the register keyword helps solve this issue. Normally, however, don't use the register keyword since it's more likely to slow your code than speed it up.

Also, note the use of templates. This is done on purpose since, even with the inline keyword, template functions are generally much more aggressively optimized by gcc than vanilla C functions (this has to do with gcc needing to deal with function pointers for vanilla C functions but not with template functions).

查看更多
其实,你不懂
3楼-- · 2019-01-01 05:03

Well, if it's only 6 elements and you can leverage parallelism, want to minimize conditional branching, etc. Why you don't generate all the combinations and test for order? I would venture that in some architectures, it can be pretty fast (as long as you have the memory preallocated)

查看更多
还给你的自由
4楼-- · 2019-01-01 05:04

Maybe I am late to the party, but at least my contribution is a new approach.

  • The code really should be inlined
  • even if inlined, there are too many branches
  • the analysing part is basically O(N(N-1)) which seems OK for N=6
  • the code could be more effective if the cost of swap would be higher (irt the cost of compare)
  • I trust on static functions being inlined.
  • The method is related to rank-sort
    • instead of ranks, the relative ranks (offsets) are used.
    • the sum of the ranks is zero for every cycle in any permutation group.
    • instead of SWAP()ing two elements, the cycles are chased, needing only one temp, and one (register->register) swap (new <- old).

Update: changed the code a bit, some people use C++ compilers to compile C code ...

#include <stdio.h>

#if WANT_CHAR
typedef signed char Dif;
#else
typedef signed int Dif;
#endif

static int walksort (int *arr, int cnt);
static void countdifs (int *arr, Dif *dif, int cnt);
static void calcranks(int *arr, Dif *dif);

int wsort6(int *arr);

void do_print_a(char *msg, int *arr, unsigned cnt)
{
fprintf(stderr,"%s:", msg);
for (; cnt--; arr++) {
        fprintf(stderr, " %3d", *arr);
        }
fprintf(stderr,"\n");
}

void do_print_d(char *msg, Dif *arr, unsigned cnt)
{
fprintf(stderr,"%s:", msg);
for (; cnt--; arr++) {
        fprintf(stderr, " %3d", (int) *arr);
        }
fprintf(stderr,"\n");
}

static void inline countdifs (int *arr, Dif *dif, int cnt)
{
int top, bot;

for (top = 0; top < cnt; top++ ) {
        for (bot = 0; bot < top; bot++ ) {
                if (arr[top] < arr[bot]) { dif[top]--; dif[bot]++; }
                }
        }
return ;
}
        /* Copied from RexKerr ... */
static void inline calcranks(int *arr, Dif *dif){

dif[0] =     (arr[0]>arr[1])+(arr[0]>arr[2])+(arr[0]>arr[3])+(arr[0]>arr[4])+(arr[0]>arr[5]);
dif[1] = -1+ (arr[1]>=arr[0])+(arr[1]>arr[2])+(arr[1]>arr[3])+(arr[1]>arr[4])+(arr[1]>arr[5]);
dif[2] = -2+ (arr[2]>=arr[0])+(arr[2]>=arr[1])+(arr[2]>arr[3])+(arr[2]>arr[4])+(arr[2]>arr[5]);
dif[3] = -3+ (arr[3]>=arr[0])+(arr[3]>=arr[1])+(arr[3]>=arr[2])+(arr[3]>arr[4])+(arr[3]>arr[5]);
dif[4] = -4+ (arr[4]>=arr[0])+(arr[4]>=arr[1])+(arr[4]>=arr[2])+(arr[4]>=arr[3])+(arr[4]>arr[5]);
dif[5] = -(dif[0]+dif[1]+dif[2]+dif[3]+dif[4]);
}

static int walksort (int *arr, int cnt)
{
int idx, src,dst, nswap;

Dif difs[cnt];

#if WANT_REXK
calcranks(arr, difs);
#else
for (idx=0; idx < cnt; idx++) difs[idx] =0;
countdifs(arr, difs, cnt);
#endif
calcranks(arr, difs);

#define DUMP_IT 0
#if DUMP_IT
do_print_d("ISteps ", difs, cnt);
#endif

nswap = 0;
for (idx=0; idx < cnt; idx++) {
        int newval;
        int step,cyc;
        if ( !difs[idx] ) continue;
        newval = arr[idx];
        cyc = 0;
        src = idx;
        do      {
                int oldval;
                step = difs[src];
                difs[src] =0;
                dst = src + step;
                cyc += step ;
                if(dst == idx+1)idx=dst;
                oldval = arr[dst];
#if (DUMP_IT&1)
                fprintf(stderr, "[Nswap=%d] Cyc=%d Step=%2d Idx=%d  Old=%2d New=%2d #### Src=%d Dst=%d[%2d]->%2d <-- %d\n##\n"
                        , nswap, cyc, step, idx, oldval, newval
                        , src, dst, difs[dst], arr[dst]
                        , newval  );
                do_print_a("Array ", arr, cnt);
                do_print_d("Steps ", difs, cnt);
#endif

                arr[dst] = newval;
                newval = oldval;
                nswap++;
                src = dst;
                } while( cyc);
        }

return nswap;
}
/*************/
int wsort6(int *arr)
{
return walksort(arr, 6);
}
查看更多
余欢
5楼-- · 2019-01-01 05:05

Looking forward to trying my hand at this and learning from these examples, but first some timings from my 1.5 GHz PPC Powerbook G4 w/ 1 GB DDR RAM. (I borrowed a similar rdtsc-like timer for PPC from http://www.mcs.anl.gov/~kazutomo/rdtsc.html for the timings.) I ran the program a few times and the absolute results varied but the consistently fastest test was "Insertion Sort (Daniel Stutzbach)", with "Insertion Sort Unrolled" a close second.

Here's the last set of times:

**Direct call to qsort library function** : 164
**Naive implementation (insertion sort)** : 138
**Insertion Sort (Daniel Stutzbach)**     : 85
**Insertion Sort Unrolled**               : 97
**Sorting Networks (Daniel Stutzbach)**   : 457
**Sorting Networks (Paul R)**             : 179
**Sorting Networks 12 with Fast Swap**    : 238
**Sorting Networks 12 reordered Swap**    : 236
**Rank Order**                            : 116
查看更多
春风洒进眼中
6楼-- · 2019-01-01 05:07

This question is becoming quite old, but I actually had to solve the same problem these days: fast agorithms to sort small arrays. I thought it would be a good idea to share my knowledge. While I first started by using sorting networks, I finally managed to find other algorithms for which the total number of comparisons performed to sort every permutation of 6 values was smaller than with sorting networks, and smaller than with insertion sort. I didn't count the number of swaps; I would expect it to be roughly equivalent (maybe a bit higher sometimes).

The algorithm sort6 uses the algorithm sort4 which uses the algorithm sort3. Here is the implementation in some light C++ form (the original is template-heavy so that it can work with any random-access iterator and any suitable comparison function).

Sorting 3 values

The following algorithm is an unrolled insertion sort. When two swaps (6 assignments) have to be performed, it uses 4 assignments instead:

void sort3(int* array)
{
    if (array[1] < array[0]) {
        if (array[2] < array[0]) {
            if (array[2] < array[1]) {
                std::swap(array[0], array[2]);
            } else {
                int tmp = array[0];
                array[0] = array[1];
                array[1] = array[2];
                array[2] = tmp;
            }
        } else {
            std::swap(array[0], array[1]);
        }
    } else {
        if (array[2] < array[1]) {
            if (array[2] < array[0]) {
                int tmp = array[2];
                array[2] = array[1];
                array[1] = array[0];
                array[0] = tmp;
            } else {
                std::swap(array[1], array[2]);
            }
        }
    }
}

It looks a bit complex because the sort has more or less one branch for every possible permutation of the array, using 2~3 comparisons and at most 4 assignments to sort the three values.

Sorting 4 values

This one calls sort3 then performs an unrolled insertion sort with the last element of the array:

void sort4(int* array)
{
    // Sort the first 3 elements
    sort3(array);

    // Insert the 4th element with insertion sort 
    if (array[3] < array[2]) {
        std::swap(array[2], array[3]);
        if (array[2] < array[1]) {
            std::swap(array[1], array[2]);
            if (array[1] < array[0]) {
                std::swap(array[0], array[1]);
            }
        }
    }
}

This algorithm performs 3 to 6 comparisons and at most 5 swaps. It is easy to unroll an insertion sort, but we will be using another algorithm for the last sort...

Sorting 6 values

This one uses an unrolled version of what I called a double insertion sort. The name isn't that great, but it's quite descriptive, here is how it works:

  • Sort everything but the first and the last elements of the array.
  • Swap the first and the elements of the array if the first is greater than the last.
  • Insert the first element into the sorted sequence from the front then the last element from the back.

After the swap, the first element is always smaller than the last, which means that, when inserting them into the sorted sequence, there won't be more than N comparisons to insert the two elements in the worst case: for example, if the first element has been insert in the 3rd position, then the last one can't be inserted lower than the 4th position.

void sort6(int* array)
{
    // Sort everything but first and last elements
    sort4(array+1);

    // Switch first and last elements if needed
    if (array[5] < array[0]) {
        std::swap(array[0], array[5]);
    }

    // Insert first element from the front
    if (array[1] < array[0]) {
        std::swap(array[0], array[1]);
        if (array[2] < array[1]) {
            std::swap(array[1], array[2]);
            if (array[3] < array[2]) {
                std::swap(array[2], array[3]);
                if (array[4] < array[3]) {
                    std::swap(array[3], array[4]);
                }
            }
        }
    }

    // Insert last element from the back
    if (array[5] < array[4]) {
        std::swap(array[4], array[5]);
        if (array[4] < array[3]) {
            std::swap(array[3], array[4]);
            if (array[3] < array[2]) {
                std::swap(array[2], array[3]);
                if (array[2] < array[1]) {
                    std::swap(array[1], array[2]);
                }
            }
        }
    }
}

My tests on every permutation of 6 values ever show that this algorithms always performs between 6 and 13 comparisons. I didn't compute the number of swaps performed, but I don't expect it to be higher than 11 in the worst case.

I hope that this helps, even if this question may not represent an actual problem anymore :)

EDIT: after putting it in the provided benchmark, it is cleary slower than most of the interesting alternatives. It tends to perform a bit better than the unrolled insertion sort, but that's pretty much it. Basically, it isn't the best sort for integers but could be interesting for types with an expensive comparison operation.

查看更多
登录 后发表回答