do I use netbeans or Sparql in protege?

2019-06-09 08:49发布

问题:

I have a question in my project. I do not know whether I need to work netbeans or not. My work is about library book of recommendation systems . that as input I need book Classification ontology . in my ontology classify library books. this classification has 14 categories, beside the sibling classes Author, book, Isbn. Individuals in book class are book’s subject(about 600 subjects) , and individuals in author class are name’s author and also isbn class.

also I collected and Have got in part of belong book to categories manually. That a object properties is name “hasSubject” related individual book class with categories. Example book “A” hasSubject Categories “S” and “F” and…. But as a finally result I want to apply this formula:

sim(x,y)=(C1,1)/(C1,0+ C0,1+ C1,1)

where C1,1 represents the number of categories that book “X” and book”Y” belongs it.(they) and C1,0 represents the number of categories that book “X” belongs them but book “Y” does not belong them. And C0,1 represents the number of categories that book “y” belongs them but book “x” does not belong them. Finally Similarity is obtained between two book (“A”and”B”) . no again apply this formula to book”A” and book”C” and so on. Until Similarity is obtained between all books. Now Your opinion this work done by netbeans or sparql in protégé?

I think that maybe I tell that if I make hasSibinling properties that represented, in every book Compute The group has shared the books with her.( What do you think I am)

回答1:

You can compute this kind of metric using SPARQL, though it's a bit ugly. Let's assume some data like this:

prefix dcterms: <http://purl.org/dc/terms/>
prefix : <http://example.org/books/>

:book1 a :Book ; dcterms:subject :subject1 , :subject2, :subject3 .
:book2 a :Book ; dcterms:subject :subject2 , :subject3, :subject4 .
:book3 a :Book ; dcterms:subject :subject4 , :subject5 .

There are three books. Books 1 and 2 have two subjects in common, and one each that the other does not have. Books 2 and 3 have one subject in common, but Book 2 has 2 that Book 3 does not have, while Book 3 has only one that Book 2 does not have, Books 1 and 3 have no subjects in common.

The trick here is to use some nested subqueries, and to grab the different values (C10, C01, and C11) at different levels in the nesting. The innermost query is

select ?book1 ?book2 (count(?left) as ?c10) where {
  :Book ^a ?book1, ?book2 .
  FILTER( !sameTerm(?book1,?book2) )
  OPTIONAL { 
    ?book1 dcterms:subject ?left .
    FILTER NOT EXISTS { ?book2 dcterms:subject ?left }
  }
}
group by ?book1 ?book2

which grabs each pair of distinct books and computes the number of subjects that the left book has that the right doesn't. By wrapping this in another query, we can then grab the number of subjects that the right book has that the left doesn't. This makes the query:

select ?book1 ?book2 (count(?right) as ?c01x) (sample(?c10) as ?c10x) where {
  {
    select ?book1 ?book2 (count(?left) as ?c10) where {
      :Book ^a ?book1, ?book2 .
      FILTER( !sameTerm(?book1,?book2) )
      OPTIONAL { 
        ?book1 dcterms:subject ?left .
        FILTER NOT EXISTS { ?book2 dcterms:subject ?left }
      }
    }
    group by ?book1 ?book2
  }

  OPTIONAL { 
    ?book2 dcterms:subject ?right .
    FILTER NOT EXISTS { ?book1 dcterms:subject ?right }
  }
}
group by ?book1 ?book2 

Note that we still have to select ?book1 and ?book2, and sample(?c10) as ?c10x in order to pass the values outward. (We have to use ?c10x because the name ?c10 has already been used at this scope. Finally, we wrap this in one more query to get the common subjects, and to do the computation, which gives us:

prefix dcterms: <http://purl.org/dc/terms/> 
prefix : <http://example.org/books/> 

select ?book1 ?book2 
       (count(?both) as ?c11)
       (sample(?c10x) as ?c10)
       (sample(?c01x) as ?c01)
       (count(?both) / (count(?both) + sample(?c10x) + sample(?c01x)) as ?sim)
where {
  {
    select ?book1 ?book2 (count(?right) as ?c01x) (sample(?c10) as ?c10x) where {
      {
        select ?book1 ?book2 (count(?left) as ?c10) where {
          :Book ^a ?book1, ?book2 .
          FILTER( !sameTerm(?book1,?book2) )
          OPTIONAL { 
            ?book1 dcterms:subject ?left .
            FILTER NOT EXISTS { ?book2 dcterms:subject ?left }
          }
        }
        group by ?book1 ?book2
      }

      OPTIONAL { 
        ?book2 dcterms:subject ?right .
        FILTER NOT EXISTS { ?book1 dcterms:subject ?right }
      }
    }
    group by ?book1 ?book2 
  }

  OPTIONAL { 
    ?both ^dcterms:subject ?book1, ?book2 .
  }
}
group by ?book1 ?book2
order by ?book1 ?book2

This rather monstrous query, applied to our data, computes these results:

$ arq --data data.n3 --query similarity.sparql
--------------------------------------------
| book1  | book2  | c11 | c10 | c01 | sim  |
============================================
| :book1 | :book2 | 2   | 1   | 1   | 0.5  |
| :book1 | :book3 | 0   | 3   | 2   | 0.0  |
| :book2 | :book1 | 2   | 1   | 1   | 0.5  |
| :book2 | :book3 | 1   | 2   | 1   | 0.25 |
| :book3 | :book1 | 0   | 2   | 3   | 0.0  |
| :book3 | :book2 | 1   | 1   | 2   | 0.25 |
--------------------------------------------

If the FILTER( !sameTerm(?book1,?book2) ) line is removed, so that similarity of each book to itself is also computed, we see the correct value (1.0):

$ arq --data data.n3 --query similarity.sparql
--------------------------------------------
| book1  | book2  | c11 | c10 | c01 | sim  |
============================================
| :book1 | :book1 | 3   | 0   | 0   | 1.0  |
| :book1 | :book2 | 2   | 1   | 1   | 0.5  |
| :book1 | :book3 | 0   | 3   | 2   | 0.0  |
| :book2 | :book1 | 2   | 1   | 1   | 0.5  |
| :book2 | :book2 | 3   | 0   | 0   | 1.0  |
| :book2 | :book3 | 1   | 2   | 1   | 0.25 |
| :book3 | :book1 | 0   | 2   | 3   | 0.0  |
| :book3 | :book2 | 1   | 1   | 2   | 0.25 |
| :book3 | :book3 | 2   | 0   | 0   | 1.0  |
--------------------------------------------

If you don't need to preserve the various Cmn values, then you might be able to optimize this, e.g., by computing C01 in the innermost query, and the C10 in the next to middle query, but then instead of projecting both up individually, product just their sum (C10+C01) so that in the outermost query where you compute C11, you can just do (C11 / (C11 + (C10+C01))).