pgi cuda fortran compiling error

As I compile a single cuda fortran code , the compiler give me the following error, PGF90-F-0000-Internal compiler error. Device compiler exited with error status code and Attempt to call global subroutine without chevrons: increment

arch linux, pgf90 2013 the code is as follow:

module simple
contains
  attributes (global) subroutine increment(a,b)
    implicit none
    integer, intent(inout) :: a(:)
    integer , intent(in) :: b
    integer :: i , n
    n = size( a )
    do i = 1 , n
       a ( i ) = a ( i )+ b
    end do
  end subroutine increment
end module simple


program incrementTestCPU
  use simple
  implicit none
  integer  :: n = 256
  integer :: a ( n ) , b
  a = 1
  b = 3
  call increment ( a , b )
  if ( any ( a /= 4)) then
     write (* ,*) "pass"
  else
     write(*,*) "not passed"
  end if
end program incrementTestCPU

You're calling this a "cuda fortran" code, but it is syntactically incorrect whether you want to ultimately run the subroutine on the host (CPU) or device (GPU). You may wish to refer to this blog post as a quick start guide.

If you want to run the subroutine increment on the GPU, you have not called it correctly:

call increment ( a , b )

A GPU subroutine call needs kernel launch parameters, which are contained in the "triple chevron" <<<...>>> syntax which should be placed between the increment and its parameter list, like so:

call increment<<<1,1>>> ( a , b )

and this is giving rise to the error message:

Attempt to call global subroutine without chevrons

If, instead, you intend to run this subroutine on the CPU, and are just passing it through the CUDA fortran compiler, then it is incorrect to specify the global attribute on the subroutine:

attributes (global) subroutine increment(a,b)

The following is a modification of your code which would run the subroutine on the GPU, and compiles cleanly for me with PGI 14.9 tools:

$ cat test3.cuf
module simple
contains
  attributes (global) subroutine increment(a,b)
    implicit none
    integer :: a(:)
    integer, value :: b
    integer :: i , n
    n = size( a )
    do i = 1 , n
       a ( i ) = a ( i )+ b
    end do
  end subroutine increment
end module simple


program incrementTestCPU
  use simple
  use cudafor
  implicit none
  integer, parameter  :: n = 256
  integer, device :: a_d(n), b_d
  integer :: a ( n ) , b
  a = 1
  b = 3
  a_d = a
  b_d = b
  call increment<<<1,1>>> ( a_d , b_d )
  a = a_d
  if ( any ( a /= 4)) then
     write (* ,*) "pass"
  else
     write(*,*) "not passed"
  end if
end program incrementTestCPU

$ pgf90 -Mcuda -ta=nvidia,cc20,cuda6.5 -Minfo test3.cuf -o test3
incrementtestcpu:
     23, Memory set idiom, loop replaced by call to __c_mset4
     29, any reduction inlined
$ pgf90 --version

pgf90 14.9-0 64-bit target on x86-64 Linux -tp nehalem
The Portland Group - PGI Compilers and Tools
Copyright (c) 2014, NVIDIA CORPORATION.  All rights reserved.
$

If you are trying to create a CPU-only version, then remove all CUDA Fortran syntax from your program. If you still have difficulty, you can ask a Fortran-directed question, as it is not a CUDA issue at that point. As an example, the following (non-CUDA) code compiled cleanly for me:

module simple
contains
  subroutine increment(a,b)
    implicit none
    integer, intent(inout) :: a(:)
    integer , intent(in) :: b
    integer :: i , n
    n = size( a )
    do i = 1 , n
       a ( i ) = a ( i )+ b
    end do
  end subroutine increment
end module simple


program incrementTestCPU
  use simple
  implicit none
  integer, parameter  :: n = 256
  integer :: a ( n ) , b
  a = 1
  b = 3
  call increment ( a , b )
  if ( any ( a /= 4)) then
     write (* ,*) "pass"
  else
     write(*,*) "not passed"
  end if
end program incrementTestCPU