I'm running into a challenge with using the FuzzyWuzzy library to store all my results in a data frame column (I'm guessing it might require a loop?) I've been scratching my head over this all day, now I want to see if any of you can help me with the solution! Would be super helpful!
As an example of what I'm trying to do, here's 2 data frame tables…
Master Table
+----+-----------------+
| ID | ITEM |
+----+-----------------+
| | |
| 1 | Pepperoni Pizza |
| | |
| 2 | Cheese Pizza |
| | |
| 3 | Chicken Salad |
| | |
| 4 | Plain Salad |
+----+-----------------+
Lookup Table
+--------------+---+
| LOOKUP VALUE | - |
+--------------+---+
| | |
| Cheese | - |
| | |
| Salad | - |
+--------------+---+
Essentially I'm trying to use the lookup table's values against the entire list of values in the Master table, and store the results in a third table.
Here's how I want the final output to look...
+--------------+----------------------------+-------------------+
| LOOKUP VALUE | MATCHED VALUES | MATCHED VALUE IDS |
+--------------+----------------------------+-------------------+
| | | |
| Cheese | Cheese Pizza | 2 |
| | | |
| Salad | Chicken Salad, Plain Salad | 3,4 |
+--------------+----------------------------+-------------------+
I know the very basics of Fuzzy Wuzzy, here's how I started:
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
choices = ["Pepperoni Pizza","Cheese Pizza","Chicken Salad", "Plain Salad"]
process.extract("salad",choices,limit=2)
Output = [('Chicken Salad', 90), ('Plain Salad', 90)]
Great, but how do you do that in a systematic way, running all my lookup values against all the values in the master table?
Thanks a ton for reading me out!
It's not a good idea to store lists in DataFrame, I suggest store every match as a row in DataFrame. Here is the code:
output:
Basically, I create a
choices
dict frommaster
for match and then for loop thelookups
and store the result as a list. And convert the list to DataFrame finally.