How I can write SPARQL query that uses similarity

2019-05-20 08:06发布

问题:

I would like to know a simple method to write this SPARQL query in Java Code:

select ?input
       ?string
       (strlen(?match)/strlen(?string) as ?percent)
where {
  values ?string { "London" "Londn" "London Fog" "Lando" "Land Ho!"
                   "concatenate" "catnap" "hat" "cat" "chat" "chart" "port" "part" }

  values (?input ?pattern ?replacement) {
    ("cat"   "^x[^cat]*([c]?)[^at]*([a]?)[^t]*([t]?).*$"                              "$1$2$3")
    ("Londn" "^x[^Londn]*([L]?)[^ondn]*([o]?)[^ndn]*([n]?)[^dn]*([d]?)[^n]*([n]?).*$" "$1$2$3$4$5")
  }

  bind( replace( concat('x',?string), ?pattern, ?replacement) as ?match )
}
order by ?pattern desc(?percent)

This code is contained in the discussion To use iSPARQL to compare values using similarity measures. The purpose of this code is to find the resources similar to a given word on DBPedia. This method takes into consideration that I know in advance the strings and the length of it. I would like to know how I can write this query in a parameterized method that, regardless of the word and the length of it, it returns to me the similarity measures.

回答1:

Update: ARQ - Writing Property Functions is now part of the standard Jena documentation.

It looks like you'd enjoy having a syntactic extension to SPARQL that performs the more complex portions of your query. For example:

SELECT ?input ?string ?percent WHERE
{
   VALUES ?string { "London" "Londn" "London Fog" "Lando" "Land Ho!"
                    "concatenate" "catnap" "hat" "cat" "chat" "chart" "port" "part" }

   VALUES ?input  { "cat" "londn" }

   ?input <urn:ex:fn#matches> (?string ?percent) .
}
ORDER BY DESC(?percent)

In this example, it's assumed that <urn:ex:fn#matches> is a property function that will automatically perform the matching operation and calculate the similarity.

The Jena documentation does a great job explaining how to write a custom filter function, but (as of 07/08/2014) does little to explain how to implement a custom property function.

I will make the assumption that you can convert your answer into java code for the purpose of calculating string similarity, and focus on the implementation of a property function that can house your code.

Implementing a Property Function

Every property function is associated with a particular Context. This allows you to limit the availability of the function to be global or associated with a particular dataset.

Assuming you have an implementation of PropertyFunctionFactory (shown later), you can register the function as follows:

Registration

final PropertyFunctionRegistry reg = PropertyFunctionRegistry.chooseRegistry(ARQ.getContext());
reg.put("urn:ex:fn#matches", new MatchesPropertyFunctionFactory);
PropertyFunctionRegistry.set(ARQ.getContext(), reg);

The only difference between global and dataset-specific registration is where the Context object comes from:

final Dataset ds = DatasetFactory.createMem();
final PropertyFunctionRegistry reg = PropertyFunctionRegistry.chooseRegistry(ds.getContext());
reg.put("urn:ex:fn#matches", new MatchesPropertyFunctionFactory);
PropertyFunctionRegistry.set(ds.getContext(), reg);

MatchesPropertyFunctionFactory

public class MatchesPropertyFunctionFactory implements PropertyFunctionFactory {
    @Override
    public PropertyFunction create(final String uri)
    {   
        return new PFuncSimpleAndList()
        {
            @Override
            public QueryIterator execEvaluated(final Binding parent, final Node subject, final Node predicate, final PropFuncArg object, final ExecutionContext execCxt) 
            {   
                /* TODO insert your stuff to perform testing. Note that you'll need
                 * to validate that things like subject/predicate/etc are bound
                 */
                final boolean nonzeroPercentMatch = true; // XXX example-specific kludge
                final Double percent = 0.75; // XXX example-specific kludge
                if( nonzeroPercentMatch ) {
                    final Binding binding = 
                                BindingFactory.binding(parent, 
                                                       Var.alloc(object.getArg(1)),
                                                       NodeFactory.createLiteral(percent.toString(), XSDDatatype.XSDdecimal));
                    return QueryIterSingleton.create(binding, execCtx);
                }
                else {
                    return QueryIterNullIterator.create(execCtx);
                }
            }
        };
    }

}

Because the property function that we create takes a list as an argument, we use PFuncSimpleAndList as an abstract implementation. Aside from that, most of the magic that happens inside these property functions is the creation of Bindings, QueryIterators, and performing validation of the input arguments.

Validation/Closing Notes

This should be more than enough to get you going on writing your own property function, if that is where you'd like to house your string-matching logic.

What hasn't been shown is input validation. In this answer, I assume that subject and the first list argument (object.getArg(0)) are bound (Node.isConcrete()), and that the second list argument (object.getArg(1)) is not (Node.isVariable()). If your method isn't called in this manner, things would explode. Hardening the method (putting many if-else blocks with condition checks) or supporting alternative use-cases (ie: looking up values for object.getArg(0) if it is a variable) are left to the reader (because it's tedious to demonstrate, easily testable, and readily apparent during implementation).