I need to check in Java if a word consists of unique letters (case insensitive). As straight solution is boring, I came up with:
- For every char in a string check if
indexOf(char) == lastIndexOf(char)
. - Add all chars to
HashSet
and check if set size == string length. - Convert a string to a char array, sort it alphabetically, loop through array elements and check if
c[i] == c[i+1]
.
Currently I like #2 the most, seems like the easiest way. Any other interesting solutions?
Here's the code I wrote for Kache's answer(referred from cracking the code and modified):
You should use hashset technique if you are performing char sets like utf-8 and for internationalization's sake.
Javadoc on Character.toUpperCase for cases of utf: This method (toUpperCase(char) ) cannot handle supplementary characters. To support all Unicode characters, including supplementary characters, use the toUpperCase(int) method.
You can optimize the first solution(indexof == lastindexof) just by checking the condition for all the 26 alphabets i.e. for a, b, c, d,..,z. So this way you don't have to traverse all the string.
Option 2 is the best of the three - Hashing is faster than searching.
However, there's an even faster method, if you have enough memory for it.
Take advantage of the fact that a character set is limited and already enumerated, and keep track of what's appeared and what hasn't as you check each character.
For example, if you're using one-byte chars, there are only 256 possibilities. You would only need 256 bits to keep track as you read through the string. If the character 0x00 occurs, flip the first bit. If the character 0x05 occurs, flip the sixth bit, and so on. When an already-flipped bit is encountered, the string isn't unique.
It's worst case O(min(n, m)) where n is the length of the string, and m is the size of the character set.
And of course, as I saw in another person's comment, if n > m (i.e. length of string > size of character set), then by pigeon-hole principle, there is a repeated character, determinable in O(1) time.
I would suggest a variant of (2) - use an array of "character already seen" flags instead of a hashset. As you loop through the string, exit immediately if the current character was already seen.
If you have a bitvector class available (I forget whether Java provides one), you could use that, though the memory saving will not necessarily result in any speed improvement and could easily slow things down.
It's O(n) worst case, though, and could have far better average performance depending on your strings - you may well find that most have a repeat near the start. In fact, strictly speaking, it's O(1) worst case anyway since a string longer than the size of the character set must have repeated characters so you have a constant bound to the number of characters you need to check in each string.
I like the HashSet idea. It's conceptually simple, and only does one pass through the string. For a simple performance improvement, check the add return value. One thing you should be aware of is that this works by case-folding. in one direction. You could create a wrapper class around Character with different equals semantics to really be case-insensitive.
Interestingly, Apache Commons has a CaseInsensitiveMap (src) which works by upper-casing then lower-casing the key. As you probably know, Java's HashSet is backed by a HashMap.