I have been struggling with this problem all day. I have two dataframes as follows:
Dataframe 1 - Billboards
Dataframe 2
I would like to merge Dataframe 2 with Dataframe 1 based on song to end up with a dataframe that has SongId, Song, Rank and Year. The problem is that there are some variations in how the Songs are stored. ex: Song in Billboard can be macarena bayside boys mix while Song in Dataframe 2 might be macarena. I wanted to find similarities.
I think you would need to calculate the similarity measure between the songs list in df1 and df2. I gave it a try by calculating cosine distance between the songs in df1 and df2 on randomly generated song list.
Once you have the best match you can lookup the song ID in df2
The easiest way to do it: 1. Make "Song" as an index column in both dataframes like
joined = df1.join(df2, how='inner')