Google Refine recipe for reconciling messy entitie

2019-04-01 00:42发布

问题:

I have two databases of messy names such as these:

  • Jindal, Bobby
  • Fla. Gov. Bobby Jindal
  • Bobby Jindal
  • 3M Corp.
  • 3M Menomonie

I need to find the matches. Can anyone point me to or suggest a good recipe for how to do this in Google Refine?

This link gives me a starting point but I could use further advice: http://blog.ouseful.info/2011/05/06/merging-datesets-with-common-columns-in-google-refine/

回答1:

You could try our Refine extension, see especially the reconciliation part of the doc.



回答2:

cell.cross function is similar to the vlookup in Excel, it will match only if your two cells are identical. If you want to use this method you will need to cluster and clean your data a lot before.

I support Michael answer. Try a reconciliation service: rdf one or the open reconcile.