I'm trying to normalize data and link records according to legal business entity name.
Where can I determine the legal business name, and general information about that company? I will have at least one of the following: Stock symbol, DBA (short name), dns name, or full legal name.
So far I've discovered that with the
- Relying on whois gives me private, or out of date information
- Wolfram Alpha API gives me most of what I need for public companies but nothing helpful for private companies like LEGO
- Parsing webpages for the (c) symbol may help in the resolution process, but doesn't match a name to an authoritative list.
Since all stock symbols are categorized; that one is easy.
How can I convert, normalize, and verify DBA (short name), dns name, or full legal name for non-public or non profit businesses that may even be located overseas?
(e.g. MET Museum as DBA, or metmuseum.org as site, or "Metropolitan Museum of Art" Legal name)
I'm not sure this is the best place to ask your question. Maybe your local librarian could help. Anyway, I'm answering because I've done a lot of work along these lines in the past, and because I've found that programmers and database designers often know where to find data--especially authoritative and standard data.
At the local level (in the USA), we accepted whatever the local Chamber of Commerce gave us. At the national level, we bought lists from InfoUSA. Chamber of Commerce data can be pretty flaky; InfoUSA data is very clean.
Dun & Bradstreet is the closest I know of to a one-stop global business registry. They're not cheap.
RBA, a company in the UK, seems to have a really useful introduction with a global perspective. See Official Company Registers. Much of the data there is free.
I have been doing some research in this area and found a recent paper which discusses an approach to extract, discover (via clustering) and normalize (by an enhanced edit-distance calculation) organization names. NEMO