-->

distant supervision: how to connect named entities

2019-08-07 12:11发布

问题:

I'm trying to create a distant supervision corpus. Thus far I've assembled the data, and passed it through an NER system, so you can see an example below.

Original data:

<p>
Myles Brand, the president of the National Collegiate Athletic Association, said in a telephone interview that he had not been approached about whether the N.C.A.A. might oversee a panel for the major bowl games similar to the one that chooses teams for the men's and women's basketball tournaments.
</p>

Processed with Stanford NER:

<p>
<PERSON>Myles Brand</PERSON>, the president of the <ORGANIZATION>National Collegiate Athletic Association</ORGANIZATION>, said in a telephone interview that he had not been approached about whether the <ORGANIZATION>N.C.A.A.</ORGANIZATION> might oversee a panel for the major bowl games similar to the one that chooses teams for the men's and women's basketball tournaments.
</p>

Now here is a sentence which contains the person Myles Brand and the organization National Collegiate Athletic Association.

In Freebase we have these two entities sharing the relational bond of President as you can observe:

Freebase Relationship:

One would think the following code would do the trick, based on this question, but actually it doesn't, though as you can see from the picture above Freebase seems to maintain the relationship between these two entities in their corpus. Is this something that I am doing wrong?

I've been playing around with it in here.

[{ 
 "type" : "/type/link", 
 "source" : { "id" : "/en/myles_brand" }, 
 "master_property" : null, 
 "target" : { "id" : "/en/national_collegiate_athletic_association" }, 
 "target_value" : null 
}]

Moreover, I have many thousands of entity pairs, I guess I can write some short java program using the Freebase Java API to figure out the relationships for all of these in turn, does anyone have an example of a program like that which I could take a peek at?

The real thing I want to know though is once I have the relationships, what is the best way to assosicate those with a distance supervision corpus, I'm confused about how it all looks when finally it's been fit together.

回答1:

You've got a couple of problems with the Freebase side of things. First, the relationship between Myles Brand and the NCAA isn't a direct one, but is mediated by a node representing his employment. This node has links to the employer, the employee, their title, the start date, and the end date. Second, the reflection queries have stronger directionality than the standard MQL queries and in this case Myles Brand is the target, not the source.

This query will show you the links to the /business/employment_tenure nodes:

[{
  "type": "/type/link",
  "source": {
    "id": null
  },
  "master_property": null,
  "target": {
    "id": "/en/myles_brand"
  }
}]

but it would need to be extended to deal with the multi-hop relationship that you're trying to find (and also extract the title).

Rather than doing this using reflection, you could test for the relationships directly if you've got a small enough set of them that you're interested in.

For example, you could test for an employment relationship (and fetch the title, if any) using:

[{  
 "/business/employment_tenure/person" : { "id" : "/en/myles_brand" }, 
 "/business/employment_tenure/company" : { "id" : "/en/national_collegiate_athletic_association" }, 
 "/business/employment_tenture/title": null
}]