After some FILTER
I end up having 2 sets, lets say A1 and A2 and I want to SELECT
only those elements in A1 that do not appear in A2. I was trying to use MINUS
but without success.
When we have something like:
MINUS { ?s foaf:givenName "Bob" }
We need to know in advance what we want to subtract. In my case, I don't know any property like the `foaf:givenName1, except that they belong to set A2. (That is a property thought).
I am confused. Any ideas?
EDIT WITH EXAMPLE FOR CLARITY:
SELECT DISTINCT ?x ?y WHERE {
?x swrc:listAuthor ?y.
?x swrc:author ?w.
FILTER (!regex(?y, " and ")).
?a swrc:listAuthor ?b.
?a swrc:author ?c.
FILTER regex(?b, " and ").
FILTER(?c != ?w).}
So what I am trying to do with this is the following. With listAuthor I get the authors in a string like "John Doe and John Nipper". Taking advantage of this format I want to have the authors that wrote a paper alone (no "and" in their authorList) . The first 3 lines are enough for that. But there are some authors that wrote 2 papers, 1 alone and 1 with co-authors. I try to somehow subtract them from the first ones. Any ideas?
Finding authors with no coäuthors
If I understand your question correctly, you're trying to ask for authors (and their papers)who have never coauthored a paper with someone else. You don't actually need to match the author list to do this, if the papers are related to the authors by the :author
property. These problems are always much easier if we have some data to work with, so consider this data:
@prefix : <http://stackoverflow.com/q/21391444/1281433/> .
:p1 :author :a, :b .
:p2 :author :a .
:p3 :author :b, :c .
:p4 :author :d .
A has written a paper with B, and also alone. B has written a paper with A, and also with C. C has written a paper with B. D has written a paper alone.
We can use a query like this to find all the authors who have never coauthored a paper (in this case, D):
prefix : <http://stackoverflow.com/q/21391444/1281433/>
select ?author ?paper where {
?paper :author ?author .
filter not exists {
?paper2 :author ?author, ?otherAuthor .
filter ( ?author != ?otherAuthor )
}
}
This corresponds to the English:
Find papers with authors such that there is no paper by that author with another author.
We get the expected results:
------------------
| author | paper |
==================
| :d | :p4 |
------------------
If you still wanted to pick and exclude based on regular expressions in the author list string, you can do that with
prefix : <http://stackoverflow.com/q/21391444/1281433/>
select ?author ?paper where {
# find authors of papers with no coauthors
?paper :author ?author ; :listAuthor ?list .
filter(!regex(?list," and "))
# and remove those that coauthored some paper
filter not exists {
?paper2 :author ?author ; :listAuthor ?list2 .
filter(regex(?list2," and "))
}
}
Debugging the original query
The original query can be abbreviated as the following, which is exactly the same, except for some syntactic sugar.
SELECT DISTINCT ?x ?y WHERE {
?x swrc:listAuthor ?y ; swrc:author ?w.
FILTER (!regex(?y, " and ")).
?a swrc:listAuthor ?b ; swrc:author ?c.
FILTER regex(?b, " and ").
FILTER(?c != ?w).
}
Aside from the filter
at the end, the pattern on ?x
, ?y
and ?w
is completely separate from the pattern on ?a
, ?b
, and ?c
. From the first pattern, you'll get one binding for each author of each paper with just one author (which means one binding for each paper with just one author). From the second pattern, you'll get one binding for each author of each paper with multiple authors. Then you're essentially taking the cartesian product of these two sets of (author,paper) pairs, to get a bindings of the form (paper1,author1,paper2,author2), and then the final filter
says "remove any bindings where author1 is the same as author2.
Consider what this means for the data I gave above, but let's look just at papers :p1
and :p2
. Since :a
authored :p1
alone, we'll have :a
as ?w
and :p1
as ?x
:
?x ?w
-------
:p1 :a
However, since :a
also authored paper :p2
with :b
, we'll have some rows for ?a
and ?c
:
?a ?c
-------
:p2 :a
:p2 :b
Now the cartesian product is:
?x ?w ?a ?c
--------------
:p1 :a :p2 :a
:p1 :a :p2 :b
The filter removes the first of these rows, leaving us with
?x ?w ?a ?c
--------------
:p1 :a :p2 :b
and this has :a
as ?w
, even though :a
coäuthored papers with someone. In general:
- Each
?x
is a paper with a single author (?w
).
- Each
?w
is an author who has written a paper alone (?x
).
- Each
?a
is paper with multiple authors, one of which is ?c
, and one of which (?w
) wrote a paper (?x
) alone.
- Each
?c
is an author who has coauthored a paper (?a
) with someone (?w
) who has written a paper alone (?x
).