Hybrid SOM (with MLP)

Could someone please provide some information on how to properly combine a self organizing map with a multilayer perceptron?

I recently read some articles about this technique in comparison to regular MLPs and it performed way better in prediction tasks. So, I want to use the SOM as front-end for dimension reduction by clustering the input data and pass the results to an MLP back-end.

My current idea of implementing it is it to train the SOM with a couple of training sets and to determine the clusters. Afterwards, I initialize the MLP with as many input units as SOM clusters. Next step would be to train the MLP using the SOM's output (which value?...weights of BMU?) as in input for the network (SOM's Output for the Cluster matching Input Unit and zeros for any other Input Units?).

There is no single way of doing that. Let me list some possibilities:

The one you describe. But then, your MLP will need to have K*D inputs, where K is the number of clusters and D is the input dimension. There is no dimensionality reduction.
Similar to your idea, but instead of using the weights, just send 1 for the BMU and 0 for the remaining clusters. Then your MLP will need K inputs.
Same as above, but instead of 1 or 0, send the distance from the input vector to each cluster.
Same as above, but instead of the distance, compute a Gaussian activation for each cluster.
Since the SOM preserves topology, send only the 2D coordinates of the BMU (possibly normalized between 0 and 1). Then your MLP will need only 2 inputs and you achieve real extreme dimensionality reduction.

You can read about those ideas and some more here: Principal temporal extensions of SOM: Overview. It is not about feeding the output of a SOM to a MLP, but a SOM to itself. But you'll be able to understand the various possibilities when trying to produce some output from a SOM.