Fortran function to overload multiplication betwee

2019-08-01 05:14发布

问题:

Foreword

In order to store banded matrices whose full counterparts can have both rows and columns indexed from indices other than 1, I defined a derived data type as

TYPE CDS
  REAL, DIMENSION(:,:), ALLOCATABLE :: matrix
  INTEGER, DIMENSION(2) :: lb, ub
  INTEGER :: ld, ud
END TYPE CDS

where CDS stands for compressed diagonal storage. Given the declaration TYPE(CDS) :: A,

  1. The rank-2 component matrix is supposed to contain, as columns, the diagonals of the actual full matrix (like here, except that I store the diagonals as columns and not as rows).
  2. The components ld and ud are supposed to contain the number of lower and upper diagonals respectively, that is -lbound(A%matrix,2) and +ubound(A%matrix,2).
  3. The 2-elements components lb and ub are supposed to contain the lower bounds and upper bounds of the actual full matrix along the two dimensions. In particular lb(1) and ub(1) should be the same as lbound(A%matrix,1) and lbound(A%matrix,2).

As you can see in points 2. and 3., the derived type contains some redundant information, but I don't care, since they are just 3 couples of integers. Furthermore, in the code that I'm writing, the information about the bounds and the band of the actual full matrix is know before the matrix can be filled. So I first assign the values to the components ld, ud, lb and ub, and then I used these components to ALLOCATE the matrix component (then I can fill it properly).

The problem

I have to perform matrix multiplication between such sparse matrices, so I wrote a FUNCTION to perform such product and used it to overload the * operator.

At the moment the function is as follows,

FUNCTION CDS_mat_x_CDS_mat(A, B)
IMPLICIT NONE
TYPE(CDS), INTENT(IN) :: A, B
TYPE(CDS) :: cds_mat_x_cds_mat

! determine the lower and upper bounds and band of the result based on those of the operands
CDS_mat_x_CDS_mat%lb(1) = A%lb(1)
CDS_mat_x_CDS_mat%ub(1) = A%ub(1)
CDS_mat_x_CDS_mat%lb(2) = B%lb(2)
CDS_mat_x_CDS_mat%ub(2) = B%ub(2)
CDS_mat_x_CDS_mat%ld = A%ld + B%ld
CDS_mat_x_CDS_mat%ud = A%ud + B%ud

! allocate the matrix component
ALLOCATE(CDS_mat_x_CDS_mat%matrix(CDS_mat_x_CDS_mat%lb(1):CDS_mat_x_CDS_mat%ub(1),&
                               & -CDS_mat_x_CDS_mat%ld:+CDS_mat_x_CDS_mat%ud))

! perform the product
:
:

END FUNCTION

This means that, if I have to do the product multiple times the allocation is done multiple times inside the function. I think this is not good from a performance point of view.

I ask for suggestions on how to accomplish the task of banded sparse matrix times banded sparse matrix. I would like to use the type the I defined because I need it to be as general, in terms of bounds, as it is at the moment. But I could change the procedure to perform the product (from FUNCTION to SUBROUTINE, if necessary).

Ideas

I could rewrite the procedure as a SUBROUTINE in order to declare CDS_mat_x_CDS_mat with INTENT(INOUT) do the assignment of components other than matrix, as well as the allocation, outside the SUBROUTINE. The drawback would be that I could not overload the * operator.

I noticed the intrinsic function MATMUL can operate on any rank-2 operands, whichever the upper and lower bounds along the two dimensions. This means that the allocation is performed inside the function. I suppose it's efficient (since it is an intrinsic). The difference with respect to my function is that it accepts rank-2 arrays of any shape, wheres mine accept derived data types objects with a rank-2 array component of any shape.

回答1:

The intrinsic function MATMUL has the equivalent of an automatic (F2008 5.2.2) result - the shape of the result is expressed in such a way that it becomes a characteristic of the function (F2008 12.3.3) - the shape of the function result is determined in the specification part of the function and (in terms of implementation) the compiler therefore knows how to calculate the shape of the function result ahead of actually executing the function proper.

As a consequence there are no Fortran language ALLOCATABLE variables associated with the equivalent of the intrinsic MATMUL function result. This is not the same thing as there being "no memory allocation" - the compiler may still need to allocate memory behind the scenes for its own purposes - for things like expressions temporaries and the like.

(I say "equivalent of" above, because intrinsic procedures are inherently special - but pretend for a moment that MATMUL was just a user function.)

The equivalent of that sort of automatic result for your case can be achieved by using length type parameters. This is Fortran 2003 feature - the same base language standard that introduced allocatable components - but it is not one that has yet been implemented by all actively maintained compilers.

MODULE xyz
  IMPLICIT NONE

  TYPE CDS(lb1, ub1, ld, ud)
    INTEGER, LEN :: lb1, ub1, ld, ud
    REAL :: matrix(lb1:ub1, ld:ud)
    INTEGER :: lb2, ub2
  END TYPE CDS

  INTERFACE OPERATOR(*)
    MODULE PROCEDURE CDS_mat_x_CDS_mat
  END INTERFACE OPERATOR(*)
CONTAINS
  FUNCTION CDS_mat_x_CDS_mat(A, B) RESULT(C)
    TYPE(CDS(*,*,*,*)), INTENT(IN) :: A, B
    TYPE(CDS(A%lb1, A%ub1, A%ld+B%ld, A%ud+B%ud)) :: C

    C%lb2 = B%lb2
    C%ub2 = B%ub2

    ! perform the product.
    ! :

  END FUNCTION CDS_mat_x_CDS_mat
END MODULE xyz

Theoretically this gives the compiler more opportunity for optimisation, because it has more insight ahead of the call to the function for the storage required for the function result. Whether this actually results in better real world performance depends on the implementation of the compiler and the nature of the function references.