What algorithm to use to check if a given string matches one of set of prefixes, and which prefix from that set?
Other variation: given path and a set of directories, how to check if path is in one of set of directories (assuming that there are no symbolic links, or they do not matter)?
I'm interested in description or name of algorithm, or Perl module which solves this (or can be used to solve this).
Edit
Bonus points for solution which allow to effectively find 'is prefix of' relation between set of strings (set of directories)
For example, given set of directories: foo, foo/bar, foo/baz, quux, baz/quux, baz/quux/plugh
the algorithm is to find that foo
is prefix of foo/bar
and foo/baz
, and that baz/quux
is prefix of baz/quux/plugh
... hopefully without O(n^2) time.
The efficient way to do this would be using a Trie:
http://en.wikipedia.org/wiki/Trie
There is a package for it on CPAN:
https://metacpan.org/pod/Tree::Trie
(never used that package myself though)
You need to consider your what operations need to be the most efficient. The lookup is very cheap in a Trie, but if you only build the trie for one lookup, it might not be the fastest way...
The first
function in the List::Util Core module can find if a prefix matches a string. It searches through the list of prefixes, and returns as soon as it finds a match. It does not search through the whole list if it is not necessary:
first returns the first element where the
result from BLOCK is a true value. If
BLOCK never returns true or LIST was
empty then undef is returned.
You pose an interesting question, but as I went out to look for such a thing (in List::MoreUtils
for example), I kept coming back to, how is this any different than a grep
. So here it is, my basic implementation based on grep
. If you don't mind searching the whole list, or want all the matches here is an example:
#!/usr/bin/perl
use strict;
use warnings;
my @prefixes = qw/ pre1 pre2 pre3 /;
my $test = 'pre1fixed';
my @found = grep { $test =~ /^$_/ } @prefixes;
print "$_ is a prefix of $test\n" for @found;
I also I imagine that there must be some way to use the smart-match operator ~~
to do this in a short-circuiting way. Also, as toolic points out the List::Util
function could be used for this too. This stops the search after a match is found.
#!/usr/bin/perl
use strict;
use warnings;
use List::Util qw/first/;
my @prefixes = qw/ pre1 pre2 pre3 /;
my $test = 'pre1fixed';
my $found = first { $test =~ /^$_/ } @prefixes;
print "$found is the prefix of $test\n";
The only algorithm I am aware of is the Aho-Corasick though I will leave it as an exercise to the reader (i.e. I don't know) to see if this will help you. I see that there is a module (Algorithm::AhoCorasick
). I also believe I have read somewhere that this and trie structures are implemented in Perl's matching under certain circumstances. Perhaps someone knows where I read that? Edit: found it in SO question on matching alternatives