Implementing a best match search in Java

I am trying to get a best match string matching to work using existing Java data structures. It is quite slow though, any suggestions to improve its performance will be welcomed .

the Sample data would look like this

Key | V
--------------------- 
0060175559138 | VIP
--------------
006017555     | National
--------------
006017        | Local
---------------
0060          | X
--------------

so a best match search on the key = 0060175552020 will return 006017555

One way I can think of is having multiple TreeMaps using hashing to divert the data into different maps hence making the search area smaller.

private final TreeMap<String, V> index;

public Set<V> syncBestMatch(String key) {              
    Entry<String,V> entry = index.headMap(key, true)
                .descendingMap().entrySet().stream()
                .filter(e -> isPartiallyOrFullyMatching(key, e.getKey()))
                .findFirst()
                .orElseThrow(() -> new NoMatchException("No match found"));

    Set<V> results = new HashSet<>();
    results.add(entry.getValue());
    return results;
}

标签： java collections

2条回答

够拽才男人

2楼-- · 2020-06-03 02:49

I prefer the TreeMap answer, but for completeness the same algorithm, now with binary search.

String[][] data = {
        { "0060175559138", "VIP" },           // <-- found insert position
        { "00601755511", "International" },   // <-- skipped
        { "00601755510", "International" },   // <-- skipped
        { "006017555", "National" },          // <-- final find
        { "006017", "Local" },
        { "0060", "X" },
};
Comparator<String[]> comparator = (lhs, rhs) -> lhs[0].compareTo(rhs[0]);
Arrays.sort(data, comparator);

String searchKey = "0060175552020";
int ix = Arrays.binarySearch(data, new String[] { searchKey }, comparator);
if (ix < 0) {
    ix = ~ix; // Not found, insert position
    --ix;
    while (ix >= 0) {
        if (searchKey.startsWith(data[ix][0])) {
            break;
        }
        if (searchKey.compareTo(data[ix][0]) < 0) {
            ix = -1; // Not found
            break;
        }
        --ix;
    }
}
if (ix == -1) {
    System.out.println("Not found");
} else {
    System.out.printf("Found: %s - %s%n", data[ix][0], data[ix][1]);
}

This algorithm is first logarithmic, and then does a loop. If there are no skipped entries, logarithmic time: fine. So the question is, how many entries need to be skipped.

If you store at every element a reference to its prefix: from { "00601755511", "International" }, to { "006017555", "National" }, then you would only need to follow the prefix back links.

0人赞添加讨论(0) 举报

淡お忘

3楼-- · 2020-06-03 02:58

Use a TreeMap and the floorEntry(K key) method:

Returns a key-value mapping associated with the greatest key less than or equal to the given key, or null if there is no such key.

The following is simplified. Real code would need to search if an invalid entry is found, e.g. if the map had a key 0060175551000, in which case you'd need to find the common prefix between the search key and the found key, then do the lookup again. Rinse and repeat.

TreeMap<String, String> map = new TreeMap<>();
map.put("0060175559138", "VIP");
map.put("006017555"    , "National");
map.put("006017"       , "Local");
map.put("0060"         , "X");

String key = "0060175552020";
Entry<String, String> entry = map.floorEntry(key);
if (entry == null)
    System.out.println("Not found: " + key);
else {
    System.out.println(key);
    System.out.println(entry);
}

Output

0060175552020
006017555=National

UPDATE There is the full code, with loop for extended search.

private static Entry<String, String> lookup(NavigableMap<String, String> map, String key) {
    String keyToFind = key;
    for (;;) {
        Entry<String, String> entry = map.floorEntry(keyToFind);
        if (entry == null)
            return null;
        String foundKey = entry.getKey();
        int prefixLen = 0;
        while (prefixLen < keyToFind.length() && prefixLen < foundKey.length() &&
               keyToFind.charAt(prefixLen) == foundKey.charAt(prefixLen))
            prefixLen++;
        if (prefixLen == 0)
            return null;
        if (prefixLen == foundKey.length())
            return entry;
        keyToFind = key.substring(0, prefixLen);
    }
}

Test

TreeMap<String, String> map = new TreeMap<>();
map.put("0060175559138", "VIP");
map.put("0060175551000", "Other");
map.put("006017555"    , "National");
map.put("006017"       , "Local");
map.put("0060"         , "X");

System.out.println(lookup(map, "0060175559138"));
System.out.println(lookup(map, "0060175552020"));
System.out.println(lookup(map, "0055708570068"));
System.out.println(lookup(map, "8684064893870"));

Output

0060175559138=VIP
006017555=National
null
null

0人赞添加讨论(0) 举报

Implementing a best match search in Java

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间