-->

converting freebase MQL to SPARQL

2019-04-11 00:06发布

问题:

following freebase MQL finds 5 artists and 50 albums for each artists.

[{
  "type" : "/music/artist",
  "name":null,
  "album" : [{
    "name" : null,
    "count":null,
    "limit":50
  }],
  "limit":5
}]

first try - without a subquery

I can write SPARQL like this:

SELECT ?artist ?album
WHERE
{
    ?artist :type :/music/artist .
    ?artist :album ?album
}
LIMIT n

but, I don't know how many n should be specified because SPARQL has no hierarchy as far as I know.

second try - with a sub-query (not sure this works correctly)

Following sub-query looks like working.

SELECT ?artist ?album
WHERE
{
    ?artist :album ?album .
    {
        SELECT ?artist
        WHERE
        {
            ?artist :type :/music/artist
        }
        LIMIT k
    }
}
LIMIT n

But I don't know how to specify k, n to get 50 albums foreach 5 artists.

Some data with endpoint

  • SPARQL Endpoint : http://dbpedia.org/sparql

Could anyone write SPARQL which print 5 artists and their 5 painting for each artists?

Below query prints artists and their paints without LIMITing result.

PREFIX dbpedia-owl:<http://dbpedia.org/ontology/>
PREFIX prop:<http://dbpedia.org/property/>

SELECT ?painting ?artist
WHERE
{
    ?painting prop:artist ?artist .
    {
        SELECT ?artist
        {
            ?artist rdf:type dbpedia-owl:Artist.
        }
    }
}

Thanks.

回答1:

Max and I had a bit of discussion in a chat, and this might end up being the same approach that Max took. I think it's a bit more readable, though. It gets 15 artists with albums, and up to 5 albums for each one. If you want to be able to include artists without any albums, you'd need to make some parts optional.

select ?artist ?album {
  #-- select 15 bands that have albums (i.e., 
  #-- such that they are the artist *of* something).
  {
    select distinct ?artist { 
      ?artist a dbpedia-owl:Band ;
              ^dbpedia-owl:artist []
    }
    limit 15
  }

  #-- grab ordered pairs (x,y) (where y > x) of their
  #-- albums.  By asking how many x's for each y, we
  #-- get just the first n y's.
  ?artist ^dbpedia-owl:artist ?album, ?album_
  filter ( ?album_ <= ?album ) 
}
group by ?artist ?album
having count(?album_) <= 5 #-- take up 5 albums for each artist
order by ?artist ?album

SPARQL results



回答2:

Based on the result you want to get, this involves some kind of nested co-related sub-query processing which is not directly feasible in a single SPARQL query (at least to my understanding, but if it is possible, I'm totally in ;) ):

Due to the bottom-up nature of SPARQL query evaluation, the subqueries are evaluated logically first, and the results are projected up to the outer query.

The second limit clause being applied after the join evaluation with the subquery, it will just limit the number of results for the outer query.

Using a LIMIT k (k=5) clause on the 2nd try's subquery will effectively return you the 5 artists you require but then limiting n to 50 would only force the album results (outer query) to a global 50 results for all these 5 artists and not a 50/artist as you would want. Turning the queries inside-out would give you a similar effect.

EDIT: A possible solution would be to build a subquery for all artists/albums and limit the subquery where to where the (somehow) ordered album count is lower than 50 (here using an album title IRI sort)

PREFIX dbpedia-owl:<http://dbpedia.org/ontology/>
PREFIX prop:<http://dbpedia.org/property/>
SELECT ?artist ?outputAlbum
WHERE 
{
    {
        SELECT ?artist (MAX(str(?album1)) as ?maxedAlbum)
        WHERE {
            ?album1 prop:artist ?artist .
            ?album2 prop:artist ?artist .
            FILTER (str(?album2) < str(?album1))
        } 
        GROUP BY ?artist 
        HAVING count(?album2)<= 50
        LIMIT 5
    } 
    ?outputAlbum prop:artist ?artist .
    FILTER (str(?outputAlbum) < str(?maxedAlbum))
}

EDIT 2: last query would be the naive approach but it seems there is some inference (unknown re"gime) on the dbpedia endpoint (as shown under). A more exact query would require to have some more filters and distinct clauses -I added distinct and global count in the output to show there is still some inference somewhere):

PREFIX dbpedia-owl:<http://dbpedia.org/ontology/>
PREFIX prop:<http://dbpedia.org/property/>
SELECT ?artist ?outputAlbum ?maxedCount ?inferredCrossJoinCount
WHERE 
{
    {
        SELECT ?artist (MAX(str(?album1)) as ?maxedAlbum) (count(distinct ?album2) as ?maxedCount) (count(?album2) as ?inferredCrossJoinCount)
        WHERE {
            ?artist rdf:type dbpedia-owl:Artist .
            ?album1 ?p ?artist .
            ?album2 ?p ?artist .
            FILTER (sameTerm(?p, prop:artist))
            FILTER (str(?album1) < str(?album2))
        } 
        GROUP BY ?artist 
        #HAVING count(?album2)<= 50
        LIMIT 5
    } 
    ?outputAlbum ?p ?artist .
    FILTER (sameTerm(?p, prop:artist))
    FILTER (str(?outputAlbum) < str(?maxedAlbum))
}