Difference between local allocatable and automatic

2019-02-18 15:17发布

站内文章 / 后端开发

24 0

三岁会撩人

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I am interested in the difference between alloc_array and automatic_array in the following extract:

subroutine mysub(n)
integer, intent(in)  :: n
integer              :: automatic_array(n)
integer, allocatable :: alloc_array(:)

allocate(alloc_array(n))
...[code]...

I am familiar enough with the basics of allocation (not so much on advanced techniques) to know that allocation allows you to change the size of the array in the middle of the code (as pointed out in this question), but I'm interested in considering the case where you don't need to change the size of the array; they might be passed onto other subroutines for operation, but the only purpose of both variables in the code and any subroutine is to hold the data of an array of dimension n (and maybe change the data, but not the size).

(1) Is there any difference in memory usage? I am not an expert in low level procedures, but I have a very slight knowledge of how they matter and how they can impact on the higher level programming (kind of experience I'm talkng about: once trying to run a big code in fortran I was getting a mistake I didn't understand, sysadmin told me "oh, yeah, you are probably saturating the stack; try adding this line in your running script"; anything that gives me insight into how to consider this things when actually coding and not having to patch them later is welcomed). I've been told by people that it might be dependent on many other things like compiler or architecture, but I interpreted from those responses that they were not completely sure of exactly how this was so. Is it so absolutely dependant on a multitude of factors or is there a default/intended behavior in the coding that may then be over-riden by optional compiling keywords or system preferences?

(2) Would the subroutines have different interface needs? Again, not an expert, but it had happened to me before that because of the way I declare variables of subroutine, I end up having to put the subroutines in a module. I've been given to understand this may vary depending on whether I use things that are special for allocatable variables. I am thinking about the case in which everything I do with the variables can be done both by allocatables and automatics, not intentionally using anything specific of allocatables (other than allocation before usage, that is).

Finally, in case this is of use: the reason I am asking is because we are developing in a group and we have recently noticed different people use those two declarations in different ways and we needed to determine if this is something that can be left to personal preference or if there might be any reasons why it might be a good idea to set a clear criteria (and how to set that criteria). I don't need extremely detailed answers, I am trying to determine if this is something I should be doing research about to be careful on how we use it and in what aspects of it should the research be directed.

Though I would be interested to know of "interesting tricks" than can be done with allocation but are not directly related to the need of having size variability, I am leaving those for a possible future follow-up question and focusing here on the strictly functional differences (meaning: what I am explicitly telling compilers to do with my code). The two items I mentioned are the thing I could come up with due to previous experiences, but any other important one that I am missing and should consider, please do mention them.

回答1:

For the sake of clarity, I'll briefly mention terminology. Of the two arrays both are local variables and arrays of rank 1.

alloc_array is an allocatable array;
automatic_array is an explicit-shape automatic object.

Again as in the linked question, after the allocation statement both arrays are of size n. I'll answer here in that these are still two very different things. Of course, the allocatable array can have its allocation status changed and its allocation moved. I'll leave both of those (mostly) out of the scope of this answer. An allocatable array, of course, needn't have these things changed once it's defined.

Memory usage

What was partly contentious about a previous revision of the question is how ill-defined the concept of memory usage is. Fortran, as a standard, tells us that both arrays come to be the same size and they'll have the same storage layout, and are both contiguous. Beyond that, much follows terms you'll hear a lot: implementation specific, processor dependent.

In a comment you expressed interest in ifort. So that I don't wander too far, I'll just stick with that one compiler and prompt of what to consider.

Often, ifort will place automatic objects and array temporaries onto stack. There is a (default) compiler option -no-heap-arrays described as having effect

The compiler puts automatic arrays and temporary arrays in the stack storage area.

Using the alternative option -heap-arrays allows one to control that slightly:

This option puts automatic arrays and arrays created for temporary computations on the heap instead of the stack.

There is a possibility to control size thresholds for which heap/stack would be chosen (when that is known at compile-time):

If the compiler cannot determine the size at compile time, it always puts the automatic array on the heap.

As n isn't a constant expression, then, one would expect your array automatic_array to be on the heap with this option, regardless of the size specified.

There's probably more to be said, but this could be far too long if I tried.

Interface needs

There is nothing special about the interface requirements of the subroutine mysub: local variables have no impact on that. Any program unit calling that would be happy with an implicit interface. What you are asking about is how the two local arrays can be used.

This largely comes down to what uses the two arrays can be put to.

If the dummy argument of a second procedure has the allocatable attribute then only the allocatable array here can be passed to that procedure. It will also need to have an explicit interface. This is true whether or not the procedure changes the allocation.

Of course, both arrays could be passed as arguments to a dummy argument without the allocatable attribute and then we don't have different interface requirements.

Anyway, why would one want to pass an argument to an allocatable dummy when there will be no change in allocation status, etc.? There are good reasons:

there may be a code path in the procedure which does have an allocation change (controlled by a switch, say);
allocatable dummy arguments also pass bounds;
etc.,

This second one is more obvious if the subroutine had specification

subroutine mysub(n)
integer, intent(in)  :: n
integer              :: automatic_array(2:n+1)
integer, allocatable :: alloc_array(:)

allocate(alloc_array(2:n+1))

Finally, an automatic object has quite strict conditions on its size. n here is clearly allowed, but things don't have to be much more complicated before allocation is the only plausible way. Depending on how much one wants to play with block constructs.

Taking also a comment from IanH: if we have a very large n the automatic object is likely to lead to crash-and-burn. With the allocatable, one could use the stat= option to come to some amicable agreement with the compiler run-time.

回答2:

Because gfortran or ifort + Linux(x86_64) are among the most popular combinations used for HPC, I made some performance comparison between local allocatable vs automatic arrays for these combinations. The CPU used is Xeon E5-2650 v2@2.60GHz, and the compilers are gfortran4.8.2 and ifort14.0. The test program is like the following.

In test.f90:

!------------------------------------------------------------------------           
subroutine use_automatic( n )
    integer :: n

    integer :: a( n )   !! local automatic array (with unknown size at compile-time)
    integer :: i

    do i = 1, n
        a( i ) = i
    enddo

    call sub( a )
end

!------------------------------------------------------------------------           
subroutine use_alloc( n )
    integer :: n

    integer, allocatable :: a( : )  !! local allocatable array                      
    integer :: i

    allocate( a( n ) )

    do i = 1, n
        a( i ) = i
    enddo

    call sub( a )

    deallocate( a )  !! not necessary for modern Fortran but for clarity                  
end

!------------------------------------------------------------------------           
program main
    implicit none
    integer :: i, nsizemax, nsize, nloop, foo
    common /dummy/ foo

    nloop = 10**7
    nsizemax = 10

    do i = 1, nloop
        nsize = mod( i, nsizemax ) + 1

        call use_automatic( nsize )
        ! call use_alloc( nsize )                                                   
    enddo

    print *, "foo = ", foo   !! to check if sub() is really called
end

In sub.f90:

!------------------------------------------------------------------------
subroutine sub( a )
    integer a( * )
    integer foo
    common /dummy/ foo

    foo = a( 1 )
ends

In the above program, I tried avoiding compiler optimization that eliminates a(:) itself (i.e., no operation) by placing sub() in a different file and making the interface implicit. First, I compiled the program using gfortran as

gfortran -O3 test.f90 sub.f90

and tested different values of nsizemax while keeping nloop = 10^7. The result is in the following table (time is in sec, measured several times by the time command).

nsizemax    use_automatic()    use_alloc()
10          0.30               0.31               # average result
50          0.48               0.47
500         1.0                0.90
5000        4.3                4.2
100000      75.6               75.7

So the overall timing seems almost the same for two calls when -O3 is used (but see Edit for different options). Next, I compiled with ifort as

[O3]  ifort -O3 test.f90 sub.f90
or
[O3h] ifort -O3 -heap-arrays test.f90 sub.f90

In the former case the automatic array is stored on the stack, while when -heap-arrays is attached the array is stored on the heap. The obtained result is

         use_automatic()    use_alloc()
         [O3]    [O3h]      [O3]    [O3h]
10       0.064   0.39       0.48    0.48
50       0.094   0.56       0.65    0.66
500      0.45    1.03       1.12    1.12
5000     3.8     4.4        4.4     4.4
100000   74.5    75.3       76.5    75.5

So for ifort, the use of automatic arrays seems beneficial when relatively small arrays are mainly used. On the other hand, gfortran -O3 shows no difference because both arrays are treated the same way (see Edit for more details).

Additional comparison:

Below is the result for Oracle Fortran compiler 12.4 for Linux (used with f90 -O3). The overall trend seems similar; automatic arrays are faster for small n, indicating the internal use of stack.

nsizemax    use_automatic()    use_alloc()
10          0.16               0.45
50          0.17               0.62
500         0.37               0.97
5000        2.04               2.67
100000      65.6               65.7

Edit

Thanks to Vladimir's comment, it has turned out that gfortran -O3 put automatic arrays (with unknown size at compile-time) on the heap. This explains why use_automatic() and use_alloc() did not make any difference above. So I made another comparison between different options below:

[O3]  gfortran -O3
[O5]  gfortran -O5
[O3s] gfortran -O3 -fstack-arrays
[Of]  gfortran -Ofast                   # this includes -fstack-arrays

Here, -fstack-arrays means that the compiler puts all local arrays with unknown size on the stack. Note that this flag is enabled by default with -Ofast. The obtained result is

nsizemax    use_automatic()               use_alloc()
            [Of]   [O3s]  [O5]  [O3]     [Of]  [O3s]  [O5]  [O3]
10          0.087  0.087  0.29  0.29     0.29  0.29   0.29  0.29
50          0.15   0.15   0.43  0.43     0.45  0.44   0.44  0.45
500         0.57   0.56   0.84  0.84     0.92  0.92   0.92  0.92
5000        3.9    3.9    4.1   4.1      4.2   4.2    4.2   4.2
100000      75.1   75.0   75.6  75.6     75.6  75.3   75.7  76.0

where the average of ten measurements are shown. This table demonstrates that if -fstack-arrays is included, the execution time for small n becomes shorter. This trend is consistent with the results obtained for ifort above.

It should be mentioned, however, that the above comparison probably corresponds to the "best-case" scenario that highlights the difference between them, so the timing difference can be much smaller in practice. For example, I have compared the timing for the above options by using some other program (involving both small and large arrays), and the results were not much affected by the stack options. Also the result should depend on machine architecture as well as compilers, of course. So your mileage may vary.