I figured out that CUDA does not work in 64bit mode on my mac (or couldn't get it running so far). Therefore I decided to compile everything for 32bit.
I use cmake 2.8 and added the following options
add_definitions(-Wall -m32)
set(CUDA_64_BIT_DEVICE_CODE OFF)
set(CMAKE_MODULE_LINKER_FLAGS -m32)
However when it tries to link it it does something like this:
/usr/bin/c++ -mmacosx-version-min=10.6 -Wl,-search_paths_first -headerpad_max_install_names CMakeFiles/SimpleTestsCUDA.dir/BlockMatrix.cpp.o CMakeFiles/SimpleTestsCUDA.dir/Matrix.cpp.o ./SimpleTestsCUDA_generated_SimpleTests.cu.o ./SimpleTestsCUDA_generated_BlockMatrix.cu.o -o SimpleTestsCUDA /usr/local/cuda/lib/libcudart.dylib /usr/local/cuda/lib/libcuda.dylib
Which fails with a lot of "file is not of required architecture" warnings from ld. Now if I add manually -m32 to the command above it works. However I have no idea how to teach cmake to add -m32 to every gcc (or ld) invocation. So far it does it for nvcc and gcc, but not for linking..