I would like to know if there is a standard or generally accepted way of representing an equivalent of NULL used in databases for RDF data.
More specifically, I'm interested in a way to distinguish the following cases for a value o of a property p (p is the predicate, o the object of an RDF triple):
- The value is not applicable, i.e. property p does not exist or does not make sense in the context.
- The value is unknown, i.e. it should be there but we don't know it.
- The value doesn't exist, i.e. the property doesn't have a value (e.g. year of death for a person alive).
- The value is witheld, e.g. when the data consumer is not allowed to access it.
I do a bit of modelling in RDF. I know of no widely used vocabulary for representing the kind of information you are looking for. There is however a widely accepted pattern which is applicable.
In work I did about a year ago I had a similar requirement to represent properties with "nullable values". A property with a nullable value either had a value or a reason why the value wasn't present.
I represented this by introducing a b-node as the value of the property. That b-node would have either an rdf:value property linking to a value, or a reason property linking to a reason the value is not available, e.g.
Like others on the w3 mailing list have pointed out: don't create triples with value 'NULL'. You should ignore this data when creating triples.
Just wanted to link the discussion on this problem in public-lod mailing list. It also mentioned some other alternatives not mentioned here, like usage of rdf:nil.
I don't know of a standard way of doing this, but one of the advantages of working in RDF is that you have a lot of flexibility in how you decide to do this. RDF, per se, cannot express negation (i.e., there is no incredibly convenient way to say that a triple s p o does not hold), but OWL can. As to the four cases you descibed, here are some approaches that you might make:
If it does not make much sense for a property p to be have a value for a subject s, then it's probably acceptable to just not write any triples of the of the form s p o. Since RDF makes an open world assumption, it is often the case that, in data retrieval, one only queries for the data that one is interested in, and does not make too much of an effort to check where there are unexpected things. If you do want to do some sanity checking, then you can declare RDFS domains and ranges for properties. For instance, you might have:
According to the semantics, if you then have
then you'll also infer that
and you might run a sanity check that looks for things that are both
AnimateObject
s andInanimateObject
s. If anything is both, you probably have a problem that you should look into. If you use OWL, then you can actually declare that theAnimateObject
andInanimateObject
are disjoint and check for logical consistency. Alternatively, in OWL, you can add assertions such aswhich says that
object82
should have no values for the propertyhasConstructionDate
.In any case, add
rdfs:comment
s to your properties explaining what the property should be used for and what it should not be used for. When appropriate, addrdfs:comment
s to individuals to explain why they should not have a value for a given property, if they should not have such a value.In this case, it is important to pin down what exactly “should” means. In OWL, for instance, you can say that
to assert that every
person
is related to at least oneString
by the propertyhasName
; that is, every person has at least one name. That is one way of saying that there is some value, but we might not know what it is in a particular case. If you cannot work with OWL, but only with RDF, then you should probably add anrdfs:comment
to the propertyhasName
along the lines of “eachNamedEntity
should have at least one value for this property.”This is an interesting case, because RDF has no built in notion of time (in the sense that some triple holds until a given time, and after which time some other triple holds). If you are simply using an RDF graph as a database-like store that you can update (both by removing and inserting new triples), you could probably use some special reserved value for “I'm not dead yet!”. Having an open ended data model, as we do in RDF, makes it particularly easy to do something like this, because you really can just use some new value for it:
Of course, you can also be a bit more refined and use a boolean-valued property to indicate whether or not a value for the first property makes sense:
This, in my opinion, is the most interesting case, because it potentially involves the most interesting data transformation. If you have a nice dataset that people can query, and you want to indicate something about the results that they would obtain except for their lack of permission, you have lots of options in representing this. For instance, you could use something like HTTP status codes to replace nodes in the graph with blank nodes acting like redaction. For instance, you might have the data:
When someone asks for the data, you might respond (supposing that the first value is valid, and the second one invalid):
In general, you could present a different view of the data to consumers than what you actually possess. I do not know of any standards for doing this sort of thing. You might be interested in the, somewhat related, recent W3C recommendation, PROV-O: The PROV Ontology, a vocabulary for describing the provenance of information (e.g., what it was generated from, to what is it attributed); it could be useful in describing the sorts of resources that might not, in their full form, be available to requesters.