I have a dataframe of the following format.
df
A B Target
5 4 3
1 3 4
I am finding the correlation of each column (except Target) with the Target column using pd.DataFrame(df.corr().iloc[:-1,-1])
.
But the issue is - size of my actual dataframe is (216, 72391)
which atleast takes 30 minutes to process on my system. Is there any way of parallerize it using a gpu ? I need to find the values of similar kind multiple times so can't wait for the normal processing time of 30 minutes each time.
Here, I have tried to implement your operation using
numba
Link to colab notebook.
You should take a look at dask. It should be able to do what you want and a lot more. It parallelizes most of the DataFrame functions.