I dont have a Fermi at the moment but the targetting platform is tesla/Fermi, the question I want to ask is if Fermi support Open MP like this:
#pragma omp parallel for num_threads(N)
for (int i=0; i<1000; ++i)
{
int threadID=omp_get_thread_num();
cudafunctions<<<blocks, threads, 1024, streams[threadID]>>>(input+i*colsizeofinput);
}//where there are N streams created.
Yes, something like that is possible. OpenMP doesn't provide any specific benefit when trying to launch multiple kernels to the same device (beyond what streams provide) and isn't necessary to achieve concurrent execution of kernels, if that is your intent.
A typical use case for OpenMP with CUDA is to manage multiple devices.