Easiest way of checking if a string consists of un

2020-03-01 18:21发布

I need to check in Java if a word consists of unique letters (case insensitive). As straight solution is boring, I came up with:

  1. For every char in a string check if indexOf(char) == lastIndexOf(char).
  2. Add all chars to HashSet and check if set size == string length.
  3. Convert a string to a char array, sort it alphabetically, loop through array elements and check if c[i] == c[i+1].

Currently I like #2 the most, seems like the easiest way. Any other interesting solutions?

12条回答
做个烂人
2楼-- · 2020-03-01 18:37

Here's the code I wrote for Kache's answer(referred from cracking the code and modified):

public boolean check() {
    int[] checker = new int[8];
    String inp = "!a~AbBC#~";
    boolean flag = true;
    if (inp.length() > 256)
        flag = false;
    else {
        for(int i=0;i<inp.length();i++) {
            int x = inp.charAt(i);
            int index = x/32;
            x = x%32;
            if((checker[index] & (1<<x)) > 0) { 
                flag = false;
                break;
            }
            else
                checker[index] = checker[index] | 1<<x;
        }
    }
    return flag;
}
查看更多
Fickle 薄情
3楼-- · 2020-03-01 18:39
public boolean hasUniqChars(String s){
  Hashset h = new Hashset();
  HashSet<Character> h = new HashSet<Character>();
  for (char c : s.toCharArray()) {
  if (!h.add(Character.toUpperCase(c))) // break if already present
    return false;
  }
  return true;
}

You should use hashset technique if you are performing char sets like utf-8 and for internationalization's sake.

Javadoc on Character.toUpperCase for cases of utf: This method (toUpperCase(char) ) cannot handle supplementary characters. To support all Unicode characters, including supplementary characters, use the toUpperCase(int) method.

查看更多
乱世女痞
4楼-- · 2020-03-01 18:42

You can optimize the first solution(indexof == lastindexof) just by checking the condition for all the 26 alphabets i.e. for a, b, c, d,..,z. So this way you don't have to traverse all the string.

查看更多
在下西门庆
5楼-- · 2020-03-01 18:49

Option 2 is the best of the three - Hashing is faster than searching.

However, there's an even faster method, if you have enough memory for it.

Take advantage of the fact that a character set is limited and already enumerated, and keep track of what's appeared and what hasn't as you check each character.

For example, if you're using one-byte chars, there are only 256 possibilities. You would only need 256 bits to keep track as you read through the string. If the character 0x00 occurs, flip the first bit. If the character 0x05 occurs, flip the sixth bit, and so on. When an already-flipped bit is encountered, the string isn't unique.

It's worst case O(min(n, m)) where n is the length of the string, and m is the size of the character set.

And of course, as I saw in another person's comment, if n > m (i.e. length of string > size of character set), then by pigeon-hole principle, there is a repeated character, determinable in O(1) time.

查看更多
一纸荒年 Trace。
6楼-- · 2020-03-01 18:50

I would suggest a variant of (2) - use an array of "character already seen" flags instead of a hashset. As you loop through the string, exit immediately if the current character was already seen.

If you have a bitvector class available (I forget whether Java provides one), you could use that, though the memory saving will not necessarily result in any speed improvement and could easily slow things down.

It's O(n) worst case, though, and could have far better average performance depending on your strings - you may well find that most have a repeat near the start. In fact, strictly speaking, it's O(1) worst case anyway since a string longer than the size of the character set must have repeated characters so you have a constant bound to the number of characters you need to check in each string.

查看更多
淡お忘
7楼-- · 2020-03-01 18:52

I like the HashSet idea. It's conceptually simple, and only does one pass through the string. For a simple performance improvement, check the add return value. One thing you should be aware of is that this works by case-folding. in one direction. You could create a wrapper class around Character with different equals semantics to really be case-insensitive.

Interestingly, Apache Commons has a CaseInsensitiveMap (src) which works by upper-casing then lower-casing the key. As you probably know, Java's HashSet is backed by a HashMap.

public static boolean allUnique(String s)
{
  // This initial capacity can be tuned.
  HashSet<Character> hs = new HashSet<Character>(s.length());
  for(int i = 0; i < s.length(); i++)
  {
    if(!hs.add(s.charAt(i).toUpperCase())
      return false;
  }
  return true;
}
查看更多
登录 后发表回答