Using Spark ML transformers I arrived at a DataFrame
where each row looks like this:
Row(object_id, text_features_vector, color_features, type_features)
where text_features
is a sparse vector of term weights, color_features
is a small 20-element (one-hot-encoder) dense vector of colors, and type_features
is also a one-hot-encoder dense vector of types.
What would a good approach be (using Spark's facilities) to merge these features in one single, large array, so that I measure things like the cosine distance between any two objects?