I would like to know a simple method to write this SPARQL query in Java Code:
select ?input
?string
(strlen(?match)/strlen(?string) as ?percent)
where {
values ?string { "London" "Londn" "London Fog" "Lando" "Land Ho!"
"concatenate" "catnap" "hat" "cat" "chat" "chart" "port" "part" }
values (?input ?pattern ?replacement) {
("cat" "^x[^cat]*([c]?)[^at]*([a]?)[^t]*([t]?).*$" "$1$2$3")
("Londn" "^x[^Londn]*([L]?)[^ondn]*([o]?)[^ndn]*([n]?)[^dn]*([d]?)[^n]*([n]?).*$" "$1$2$3$4$5")
}
bind( replace( concat('x',?string), ?pattern, ?replacement) as ?match )
}
order by ?pattern desc(?percent)
This code is contained in the discussion To use iSPARQL to compare values using similarity measures. The purpose of this code is to find the resources similar to a given word on DBPedia. This method takes into consideration that I know in advance the strings and the length of it. I would like to know how I can write this query in a parameterized method that, regardless of the word and the length of it, it returns to me the similarity measures.
Update: ARQ - Writing Property Functions is now part of the standard Jena documentation.
It looks like you'd enjoy having a syntactic extension to SPARQL that performs the more complex portions of your query. For example:
In this example, it's assumed that
<urn:ex:fn#matches>
is a property function that will automatically perform the matching operation and calculate the similarity.The Jena documentation does a great job explaining how to write a custom filter function, but (as of 07/08/2014) does little to explain how to implement a custom property function.
I will make the assumption that you can convert your answer into java code for the purpose of calculating string similarity, and focus on the implementation of a property function that can house your code.
Implementing a Property Function
Every property function is associated with a particular
Context
. This allows you to limit the availability of the function to be global or associated with a particular dataset.Assuming you have an implementation of
PropertyFunctionFactory
(shown later), you can register the function as follows:Registration
The only difference between global and dataset-specific registration is where the
Context
object comes from:MatchesPropertyFunctionFactory
Because the property function that we create takes a list as an argument, we use
PFuncSimpleAndList
as an abstract implementation. Aside from that, most of the magic that happens inside these property functions is the creation ofBinding
s,QueryIterator
s, and performing validation of the input arguments.Validation/Closing Notes
This should be more than enough to get you going on writing your own property function, if that is where you'd like to house your string-matching logic.
What hasn't been shown is input validation. In this answer, I assume that
subject
and the first list argument (object.getArg(0)
) are bound (Node.isConcrete()
), and that the second list argument (object.getArg(1)
) is not (Node.isVariable()
). If your method isn't called in this manner, things would explode. Hardening the method (putting manyif-else
blocks with condition checks) or supporting alternative use-cases (ie: looking up values forobject.getArg(0)
if it is a variable) are left to the reader (because it's tedious to demonstrate, easily testable, and readily apparent during implementation).