I'd like to select data property values using sparql with some restrictions on their languages:
- I have an ordered set of preferred languages ("ru", "en", ... etc )
- If an item have more than one language for value, I'd like to have only one value restricted by my set of languages (if ru is available - I want to see ru value, else if en available I want to see en else if ... etc if no lang available - no lang value).
Current query is:
select distinct ?dataProperty ?dpropertyValue where {
<http://dbpedia.org/resource/Blackmore's_Night> ?dataProperty ?dpropertyValue.
?dataProperty a owl:DatatypeProperty.
FILTER ( langmatches(lang(?dpropertyValue),"ru") || langmatches(lang(? dpropertyValue),"en") || lang(?dpropertyValue)="" )
}
The problem with it: results contain two rows for abstract (ru+en). I want only one row, which should contain ru. In case when ru is not available I'd like to get en etc.
How?
Suppose you have data like this:
@prefix : <http://stackoverflow.com/q/21531063/1281433/> .
:a a :resource;
:p "a in english"@en, "a in russian"@ru .
:b a :resource ;
:p "b in english"@en .
Then you're hoping to get results like this:
--------------------------------
| resource | label |
================================
| :b | "b in english"@en |
| :a | "a in russian"@ru |
--------------------------------
Here are two ways of doing this.
Associate language tags with ranks, find the rank of the best label, then find the label with that rank
This way uses SPARQL 1.1 subqueries, aggregates, and data provided with values
. The idea is to use values
to associate each language tag with a rank. Then you use a subquery to pull out the optimal rank over all the labels that the resource has). Then in the outer query, you have access to the optimal rank, and you just retrieve the label with the language corresponding to that rank.
prefix : <http://stackoverflow.com/q/21531063/1281433/>
select ?resource ?label where {
# for each resource, find the rank of the
# language of the most preferred label.
{
select ?resource (min(?rank) as ?langRank) where {
values (?lang ?rank) { ("ru" 1) ("en" 2) }
?resource :p ?label .
filter(langMatches(lang(?label),?lang))
}
group by ?resource
}
# ?langRank from the subquery is, for each
# resource, the best preference. With the
# values clause, we get just the language
# that we want.
values (?lang ?langRank) { ("ru" 1) ("en" 2) }
?resource a :resource ; :p ?label .
filter(langMatches(lang(?label),?lang))
}
Select the labels separately and coalesce in the order that you want
You can select an optional label for each of the languages you're considering, and then coalesce
them into (so you get the first one that's bound) in the order of your preference. This is kind of verbose, but if you need to do anything else with the labels in various languages other than the most preferred, you'll have access to them.
prefix : <http://stackoverflow.com/q/21531063/1281433/>
select ?resource ?label where {
# find resources
?resource a :resource .
# grab a russian label, if available
optional {
?resource :p ?rulabel .
filter( langMatches(lang(?rulabel),"ru") )
}
# grab an english label, if available
optional {
?resource :p ?enlabel .
filter( langMatches(lang(?enlabel),"en") )
}
# take either as the label, but russian over english
bind( coalesce( ?rulabel, ?enlabel ) as ?label )
}