To check memory allocations we populate single precision arrays with unit values and interrogate with the SUM
and DOT_PRODUCT
commands. These intrinsics stop counting after 16777216 (= 2^24). How can we get these commands to count billions of elements? We prefer to avoid DO
loops. This is not a problem with higher precision arrays.
program allocator
use iso_fortran_env
implicit NONE
integer, parameter :: sp = selected_real_kind ( REAL32 )
integer, parameter :: xlint = selected_int_kind ( INT64 )
integer ( xlint ) :: n = 100000000
real ( sp ), allocatable, dimension ( : ) :: array
integer ( xlint ) :: alloc_status = 0
character ( len = 255 ) :: alloc_msg = ""
! ALLOCATE
allocate ( array ( 1 : n ), stat = alloc_status, errmsg = alloc_msg )
if ( alloc_status /= 0 ) print *, 'allocation error on ', n, ' elements: stat = ', alloc_status, ', errmsg = ', alloc_msg
! POPULATE
array = 1.0_sp
write ( *, '( "number of elements allocated = ", g0 )' ) n
write ( *, '( "sum of elements = ", g0 )' ) sum ( array )
write ( *, '( "dot product = ", g0, / )' ) dot_product ( array, array )
! DEALLOCATE
deallocate ( array, stat = alloc_status, errmsg = alloc_msg )
if ( alloc_status /= 0 ) print *, 'deallocation error on ', n, ' elements: stat = ', alloc_status, ', errmsg = ', alloc_msg
write ( *, '( "compiler version = ", A )' ) compiler_version()
write ( *, '( "compiler options = ", A )' ) trim ( compiler_options() )
end program allocator
Output:
number of elements allocated = 100000000
sum of elements = 16777216.
dot product = 16777216.
compiler version = GCC version 4.6.2 20111019 (prerelease)
compiler options = -fPIC -mmacosx-version-min=10.6.8 -mtune=core2
That's due to the limited precision with single precision reals...
Since you only have 24 bits for your significant "digits", your resolution is 1/2**24 = 1/16777216. In other words, you cannot resolve an addition of 1/1677721 to 1, or in your case
To be able to resolve this operations which is required for both
sum
anddot_product
(even if calculated using simple loops), you would need (at least) another bit of precision:Output: