How much of the concepts conveyed in natural language is RDF/OWL able to represent? I'm still learning RDF and other semantic technologies, but as I currently understand it, information is typically represented as triples of the form (subject,predicate,object). So I can imagine how the sentence "Bob has a hat" might be represented. However, how would you represent a more complicated sentence like "Bob, over on 42nd street, will have a job at the Mall after the owner approves"? Are there conventions for tags representing nouns/verbs/ownership/causality/tense/etc?
Note, I'm not asking how to automatically convert arbitrary natural language text to RDF (as this currently appears impossible). I'm just trying to understand how RDF might be used to represent the same information that natural language represents.
Maybe have a look at the Attempto project the goal of which is to define a fragment of English that can be automatically mapped to first-order logic. Part of this effort is a mapping to OWL 2 DL. See e.g. Writing OWL ontologies in ACE.
Your example sentence
Bob, over on 42nd street, will have a job at the Mall after the owner approves
could be rewritten in Attempto Controlled English (ACE) as
If an owner of Mall approves John whose address is "42nd street"
then he is employed by Mall.
(or something similar, depending on what you exactly intend to say.)
This sentence can be automatically mapped to an OWL2 SubClassOf-axiom
SubClassOf(
ObjectIntersectionOf(
ObjectOneOf(
:Mall
)
ObjectSomeValuesFrom(
:owner
ObjectSomeValuesFrom(
:approve
ObjectIntersectionOf(
ObjectOneOf(
:John
)
DataHasValue(
:address
"42nd street"^^<http://www.w3.org/2001/XMLSchema#string>
)
)
)
)
)
ObjectSomeValuesFrom(
:employ
ObjectOneOf(
:John
)
)
)
This mapping implements certain conventions about basic word classes:
- common nouns map to OWL class names
- proper names map to OWL individual names
- transitive verbs, transitive adjectives, and of-constructions map to OWL property names: data property names if their argument is a number or string, object property names otherwise
Many word classes that ACE supports are not supported by this mapping, e.g. intransitive and ditransitive verbs, intransitive adjectives, and adverbs. The coverage could be extended, e.g. intransitive verbs could map to OWL classes (e.g. "John sleeps." could be taken to mean that the individual John belongs to the class of sleepers). It is less clear how to handle e.g. ditransitive verbs and adverbs.
In general, English is much richer in terms of its building blocks (nouns, different types of adjectives, different types of verbs, ...) than OWL (which has classes, individuals, object and data properties, and (typed) data items such as strings and numbers). And this is just the "word vs entity" level. Things like tense are more complicated as they have many surface representations in English and lack any built-ins on the OWL side.