Google Refine recipe for reconciling messy entitie

2019-04-01 00:07发布

I have two databases of messy names such as these:

  • Jindal, Bobby
  • Fla. Gov. Bobby Jindal
  • Bobby Jindal
  • 3M Corp.
  • 3M Menomonie

I need to find the matches. Can anyone point me to or suggest a good recipe for how to do this in Google Refine?

This link gives me a starting point but I could use further advice: http://blog.ouseful.info/2011/05/06/merging-datesets-with-common-columns-in-google-refine/

2条回答
霸刀☆藐视天下
2楼-- · 2019-04-01 00:54

You could try our Refine extension, see especially the reconciliation part of the doc.

查看更多
放我归山
3楼-- · 2019-04-01 01:09

cell.cross function is similar to the vlookup in Excel, it will match only if your two cells are identical. If you want to use this method you will need to cluster and clean your data a lot before.

I support Michael answer. Try a reconciliation service: rdf one or the open reconcile.

查看更多
登录 后发表回答