I've got a following issue:
I have a code which does a very basic operation. I am passing a pointer to a concurrency::array_view because I wanted to store the values earlier to avoid the bottle-neck in the function which uses multithreading. The problem is the following construction won't compile:
parallel_for_each((*pixels).extent, [=](concurrency::index<2> idx) restrict(amp)
{
int row=idx[0];
int col=idx[1];
(*pixels)(row, col) = (*pixels)(row, col) * (*taps)(row, col); //this is the problematic place
});
Does anybody know how to solve this case? I really need to prepare the data before running the method so it's the only way to do it like this because I cannot afford spending time on copying the data between RAM and accelerator's memory.
//EDIT:
After solving some issues with the header files, I am left with following problem:
parallel_for_each((*pixels).extent, [=](concurrency::index<2> idx) restrict(amp)
{
int row=idx[0];
int col=idx[1];
});
The code above doesn't work (it gives exception). Is there ANY way to prepare the data earlier so for example the constructor of the class can handle the copying it just one time? I really need to have a pointer to array_view in my header file and initialize it in the constructor as follows:
in cci_subset.h:
concurrency::array_view<float, 2> *pixels, *taps;
and in the subset.cpp:
concurrency::array_view<float, 2> pixels(4, 4, pixel_array);
...
concurrency::array_view<float, 2> taps(4, 4, myTap4Kernel_array);
//EDIT 2:
I found out that the parameters for parallel_for_each can be only passed by value. That's why I am still looking for a way to copy the values from CPU to GPU when intializing the class or passing some arguments (i.e. image data) to the class.
Your C++ AMP issue
C++ AMP supports two core data types for referencing data on the GPU
An array represents data on an accelerator. You can construct it and
fill it with data in a single step or construct it and fill it with
data later. In either case, after some calculations have been
performed on it, you will almost certainly copy the results from an
array back to the CPU so that you can use them in some other part of
your application.
You can certainly write useful applications using only arrays,
but C++ AMP also offers the array_view, which supports features
that often make it more convenient than working directly with arrays.
An array_view looks like an array to the accelerator, but it saves you
the trouble of arranging to copy the data to and from the
accelerator.
The relationship between an array_view and an array is
somewhat (but not precisely) like that between a reference and the
object it refers to. Like a reference, array views must be initialized
when they are created. Also as with a reference, changing the
array_view changes (eventually) the data it was created from. However,
the reverse is not true: changing the data from which the
array_view was created might not automatically change the array_view,
so you should approach such operations with care.
From: C++ AMP: Accelerated Massive Parallelism
I don't think your use of pixels
is the problem per-se. You cannot use globally scoped variables within a C++ AMP lambda, period. There is no way around this. The C++ AMP code is executing on a device with a different memory space.
You can however initialize your array
or array_view
objects earlier in a separate method or constructor and then pass them to the function that does all the work. The following code does something along these lines. m_frames
is an array of pointers to (C++ AMP) array
objects that are declared as pare of the class and then initialized in ConfigureFrameBuffers
.
Note that it uses STL smart pointers, something I would strongly recommend over raw pointers.
class FrameProcessorAmpBase
{
private:
std::shared_ptr<array<float, 2> m_frame;
public:
FrameProcessorAmpBase()
{
}
void ConfigureFrameBuffers(int width, int height)
{
m_frame = std::make_shared<array<float, 2>>(height, width));
}
Your min/max header issue
This is probably because you are including windef.h, or something that takes a dependency on it. This is a known issue when mixing STL and Windows headers. The way to "fix" it is to define NOMINMAX
at the top of your file before any other includes and then use the min/max functions declared by STL or AMP (which also defines min/max for use in restrict(amp)
lambdas).
#define NOMINMAX
If you are using GDI you'll have issues there too as it requires the Windows MIN/MAX macros.
I wrap GIDPlus.h in a wrapper header that contains the following:
#pragma once
#define NOMINMAX
#include <algorithm>
#ifndef max
#define min std::min
#endif
#ifndef min
#define max std::max
#endif
#include <gdiplus.h>
#undef max
#undef min