Unable to use cdef function in std::sort as compar

2019-08-28 02:39发布

问题:

The code below is from this file. It compiles correctly on Linux, but fails to compile on OS X.

I want to sort a vector of self-defined intervals:

stdsort(intervals.begin(), intervals.end(), compare_start_end)

My comparison function is the following:

cdef uint32_t compare_start_end(interval lhs, interval rhs):
  if (lhs.start < rhs.start):
    return <uint32_t> 1
  elif (rhs.start < lhs.start):
      return <uint32_t> 0
  elif (lhs.end < rhs.end):
      return <uint32_t> 1
  else:
    return <uint32_t> 0

The error I get is the following:

In file included from epic2/src/read_bam.cpp:651:
/Library/Developer/CommandLineTools/usr/include/c++/v1/algorithm:4117:5: error: no matching function for call to '__sort'
    __sort<_Comp_ref>(__first, __last, __comp);
    ^~~~~~~~~~~~~~~~~
epic2/src/read_bam.cpp:3305:12: note: in instantiation of function template specialization 'std::__1::sort<std::__1::__wrap_iter<__pyx_t_5epic2_3src_8read_bam_interval *>, unsigned int (__pyx_t_5epic2_3src_8read_bam_interval, __pyx_t_5epic2_3src_8read_bam_interval)>'
      requested here
      std::sort<std::vector<__pyx_t_5epic2_3src_8read_bam_interval> ::iterator,uint32_t (__pyx_t_5epic2_3src_8read_bam_interval, __pyx_t_5epic2_3src_8read_bam_interval)>(__pyx_v_intervals.begin(), __pyx_v_intervals.end(), __pyx_f_5epic2_3src_8read_bam_compare_start_end);
           ^
/Library/Developer/CommandLineTools/usr/include/c++/v1/algorithm:3914:1: note: candidate function template not viable: no known conversion from 'unsigned int (*)(__pyx_t_5epic2_3src_8read_bam_interval, __pyx_t_5epic2_3src_8read_bam_interval)' to 'unsigned int
      (&)(__pyx_t_5epic2_3src_8read_bam_interval, __pyx_t_5epic2_3src_8read_bam_interval)' for 3rd argument; dereference the argument with *
__sort(_RandomAccessIterator __first, _RandomAccessIterator __last, _Compare __comp)
^
1 warning and 1 error generated.

The problem seems to be one of types.

I have

'unsigned int (*)(__pyx_t_5epic2_3src_8read_bam_interval, __pyx_t_5epic2_3src_8read_bam_interval)' 

but my function expects

unsigned int (&)(__pyx_t_5epic2_3src_8read_bam_interval, __pyx_t_5epic2_3src_8read_bam_interval)

The hint is to try to dereference the third argument, but this does not work.

stdsort(intervals.begin(), intervals.end(), dereference(compare_start_end))

Instead, it errors with

Compiling epic2/src/read_bam.pyx because it changed.
[1/1] Cythonizing epic2/src/read_bam.pyx

Error compiling Cython file:
------------------------------------------------------------
...
        intervals = dereference(it).second
        five_ends = intvec()

        if drop_duplicates:

            stdsort(intervals.begin(), intervals.end(), dereference(compare_start_end))
                                                       ^

Do you have any advice? Ps. the above compiles on linux, but not on macOS, so the code is brittle.


System info

macOS Mojave, 10.14.6 (18G87)

gcc --version
Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include/c++/4.2.1
Apple LLVM version 10.0.1 (clang-1001.0.46.4)
Target: x86_64-apple-darwin18.7.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

Minimal reproducible example

# minimal_example.pyx
from libc.stdint cimport uint32_t
from libcpp.algorithm cimport sort as stdsort
from libcpp.vector cimport vector

ctypedef struct interval:
    uint32_t start
    uint32_t end

ctypedef vector[uint32_t] intvec
ctypedef vector[interval] interval_vector


cdef uint32_t compare_start_end(interval lhs, interval rhs):
  if (lhs.start < rhs.start):
    return <uint32_t> 1
  elif (rhs.start < lhs.start):
      return <uint32_t> 0
  elif (lhs.end < rhs.end):
      return <uint32_t> 1
  else:
    return <uint32_t> 0


cdef test(interval_vector intervals):
    stdsort(intervals.begin(), intervals.end(), compare_start_end)

Compile with:

folder_with_Python_h="/mnt/work/endrebak/software/anaconda/include/python3.7m/"
cython --cplus minimal_example.pyx
gcc -I $folder_with_Python_h  -c minimal_example.cpp -o minimal_example.o -Ofast -Wall -std=c++11

Same error message pops up on macOS, but not linux.

Other commands I have tried, giving the same results:

g++  -I /Users/endrebakkenstovner/anaconda3/include/python3.6m/ -stdlib=libc++  -c minimal_example.cpp -o minimal_example.o -Ofast -Wall
gcc  -I /Users/endrebakkenstovner/anaconda3/include/python3.6m/  -c minimal_example.cpp -o minimal_example.o -Ofast -Wall -lc++

Attempts to change the Cython code

Adding these lines before cdef test

cdef extern from "<algorithm>" namespace "std":
    void stdsort(...)

results in (on both linux and macOS)

Error compiling Cython file:
------------------------------------------------------------
...
cdef extern from "<algorithm>" namespace "std":
    void stdsort(...)


cdef test(interval_vector intervals):
    stdsort(intervals.begin(), intervals.end(), compare_start_end)
          ^
------------------------------------------------------------

minimal_example.pyx:30:11: ambiguous overloaded method

Adding "sort" at the end of stdsort(...) as per the comment results in (both linux and macOS):

Error compiling Cython file:
------------------------------------------------------------
...
  else:
    return <uint32_t> 0


cdef extern from "<algorithm>" namespace "std":
    void stdsort(...) "sort"
                     ^
------------------------------------------------------------

回答1:

The basic problem is that Cython insists on specifying the template arguments. Instead of generating C code that looks like:

std::sort(__pyx_v_intervals.begin(), __pyx_v_intervals.end(), __pyx_f_5epic2_3src_8read_bam_compare_start_end);

it generates

std::sort<std::vector<__pyx_t_5epic2_3src_8read_bam_interval> ::iterator,uint32_t (__pyx_t_5epic2_3src_8read_bam_interval, __pyx_t_5epic2_3src_8read_bam_interval)>(__pyx_v_intervals.begin(), __pyx_v_intervals.end(), __pyx_f_5epic2_3src_8read_bam_compare_start_end);

Generally in C++ it's better to let C++ figure out the template arguments. In this case I think Cython has probably messed up the function pointer argument.

The solution is to not tell Cython that you have a template function. The involves re-wrapping the functions yourself rather than using Cython's libcpp wrappers. One option is to just specify all the types - the types don't have to match exactly but have to be close enough that Cython thinks it can pass the right arguments:

cdef extern from "<algorithn>" namespace "std":
    # code is untested because this isn't the solution I used...
    void sort(vector[interval].iterator, vector[interval].iterator,
              uint32_t (*)(interval, interval))

I prefer just using .... This was designed to wrap C varargs functions like printf where you can pass anything but it works perfectly well here too:

cdef extern from "<algorithm>" namespace "std":
    void sort(...)
    # to rename to stdsort do
    void stdsort "sort"(...)

The end result is that Cython stops trying to tell C++ what the template arguments should be.


std::unique is a little more complicated since it has a return type. Therefore Cython needs to know at least one template argument. Fortunately I'm pretty sure that only the last argument is causing problems, so you can safely tell Cython about the first argument being a template:

cdef extern from "<algorithm>" namespace "std":
    Iter unique[Iter](Iter, Iter, ...)


标签: c++ macos cython