Check if given string matches one of set of prefix

2019-07-22 17:23发布

What algorithm to use to check if a given string matches one of set of prefixes, and which prefix from that set?

Other variation: given path and a set of directories, how to check if path is in one of set of directories (assuming that there are no symbolic links, or they do not matter)?

I'm interested in description or name of algorithm, or Perl module which solves this (or can be used to solve this).

Edit
Bonus points for solution which allow to effectively find 'is prefix of' relation between set of strings (set of directories)

For example, given set of directories: foo, foo/bar, foo/baz, quux, baz/quux, baz/quux/plugh the algorithm is to find that foo is prefix of foo/bar and foo/baz, and that baz/quux is prefix of baz/quux/plugh... hopefully without O(n^2) time.

3条回答
趁早两清
2楼-- · 2019-07-22 18:05

The first function in the List::Util Core module can find if a prefix matches a string. It searches through the list of prefixes, and returns as soon as it finds a match. It does not search through the whole list if it is not necessary:

first returns the first element where the result from BLOCK is a true value. If BLOCK never returns true or LIST was empty then undef is returned.

查看更多
太酷不给撩
3楼-- · 2019-07-22 18:06

The efficient way to do this would be using a Trie:

http://en.wikipedia.org/wiki/Trie

There is a package for it on CPAN:

https://metacpan.org/pod/Tree::Trie

(never used that package myself though)

You need to consider your what operations need to be the most efficient. The lookup is very cheap in a Trie, but if you only build the trie for one lookup, it might not be the fastest way...

查看更多
做自己的国王
4楼-- · 2019-07-22 18:07

You pose an interesting question, but as I went out to look for such a thing (in List::MoreUtils for example), I kept coming back to, how is this any different than a grep. So here it is, my basic implementation based on grep. If you don't mind searching the whole list, or want all the matches here is an example:

#!/usr/bin/perl

use strict;
use warnings;

my @prefixes = qw/ pre1 pre2 pre3 /;

my $test = 'pre1fixed';
my @found = grep { $test =~ /^$_/ } @prefixes;

print "$_ is a prefix of $test\n" for @found;

I also I imagine that there must be some way to use the smart-match operator ~~ to do this in a short-circuiting way. Also, as toolic points out the List::Util function could be used for this too. This stops the search after a match is found.

#!/usr/bin/perl

use strict;
use warnings;

use List::Util qw/first/;

my @prefixes = qw/ pre1 pre2 pre3 /;

my $test = 'pre1fixed';
my $found = first { $test =~ /^$_/ } @prefixes;

print "$found is the prefix of $test\n";

The only algorithm I am aware of is the Aho-Corasick though I will leave it as an exercise to the reader (i.e. I don't know) to see if this will help you. I see that there is a module (Algorithm::AhoCorasick). I also believe I have read somewhere that this and trie structures are implemented in Perl's matching under certain circumstances. Perhaps someone knows where I read that? Edit: found it in SO question on matching alternatives

查看更多
登录 后发表回答