The Most Efficient Algorithm to Find First Prefix-

2020-05-15 05:10发布

问题:

Input:

1) A huge sorted array of string SA;

2) A prefix string P;

Output:

The index of the first string matching the input prefix if any. If there is no such match, then output will be -1.

Example:

SA = {"ab", "abd", "abdf", "abz"}
P = "abd"

The output should be 1 (index starting from 0).

What's the most algorithm way to do this kind of job?

回答1:

If you only want to do this once, use binary search, if on the other hand you need to do it for many different prefixes but on the same string array, building a radix tree can be a good idea, after you've built the tree each look up will be very fast.



回答2:

It can be done in linear time using a Suffix Tree. Building the suffix tree takes linear time.



回答3:

This is just a modified bisection search:

  • Only check as many characters in each element as are in the search string; and
  • If you find a match, keep searching backwards (either linearly or by further bisection searches) until you find a non-matching result and then return the index of the last matching result.


回答4:

The FreeBSD kernel use a Radix tree for its routing table, you should check that.



回答5:

My current solution in mind is, instead of to find the "prefix", try to find a "virtual prefix".

For example, prefix is “abd", try to find a virtual-prefix “abc(255)". (255) just represents the max char number. After locating the "abc(255)". The next word should be the first word matching "abd" if any.



回答6:

Are you in the position to precalculate all possible prefixes?

If so, you can do that, then use a binary search to find the prefix in the precalculated table. Store the subscript to the desired value with the prefix.



回答7:

My solution: Used binary search.

private static int search(String[] words, String searchPrefix) {

        if (words == null || words.length == 0) {
            return -1;
        }
        int low = 0;
        int high = words.length - 1;
        int searchPrefixLength = searchPrefix.length();

        while (low <= high) {
            int mid = low + (high - low) / 2;

            String word = words[mid];
            int compare = -1;

            if (searchPrefixLength <= word.length()) {
                compare = word.substring(0, searchPrefixLength).compareTo(searchPrefix);
            }

            if (compare == 0) {
                return mid;
            } else if (compare > 0) {
                high = mid - 1;
            } else {
                low = mid + 1;
            }

        }
        return -1;
    }


回答8:

Here is a possible solution (in Python), which has O(k.log(n)) time complexity and O(1) additional space complexity (considering n strings and k prefix length).

The rationale behind it to perform a binary search which only considers a given character index of the strings. If these are present, continue to the next character index. If any of the prefix characters cannot be found in any string, it returns immediately.

from typing import List

def first(items: List[str], prefix: str, i: int, c: str, left: int, right: int):
    result = -1

    while left <= right:
        mid = left + ((right - left) // 2)
        if ( i >= len(items[mid]) ):
            left = mid + 1
        elif (c < items[mid][i]):
            right = mid - 1
        elif (c > items[mid][i]):
            left = mid + 1
        else:
            result = mid
            right = mid - 1

    return result

def last(items: List[str], prefix: str, i: int, c: str, left: int, right: int):
    result = -1

    while left <= right:
        mid = left + ((right - left) // 2)
        if ( i >= len(items[mid]) ):
            left = mid + 1
        elif (c < items[mid][i]):
            right = mid - 1
        elif (c > items[mid][i]):
            left = mid + 1
        else:
            result = mid
            left = mid + 1

    return result

def is_prefix(items: List[str], prefix: str):
    left = 0
    right = len(items) - 1
    for i in range(len(prefix)):
        c = prefix[i]
        left = first(items, prefix, i, c, left, right)
        right = last(items, prefix, i, c, left, right)

        if (left == -1 or right == -1):
            return False

    return True

# Test cases
a = ['ab', 'abjsiohjd', 'abikshdiu', 'ashdi','abcde Aasioudhf', 'abcdefgOAJ', 'aa', 'aaap', 'aas', 'asd', 'bbbbb', 'bsadiojh', 'iod', '0asdn', 'asdjd', 'bqw', 'ba']
a.sort()
print(a)
print(is_prefix(a, 'abcdf'))
print(is_prefix(a, 'abcde'))
print(is_prefix(a, 'abcdef'))
print(is_prefix(a, 'abcdefg'))
print(is_prefix(a, 'abcdefgh'))
print(is_prefix(a, 'abcde Aa'))
print(is_prefix(a, 'iod'))
print(is_prefix(a, 'ZZZZZZiod'))

This gist is available at https://gist.github.com/lopespm/9790d60492aff25ea0960fe9ed389c0f