I am confused about how a bit vector would work to do this (not too familiar with bit vectors). Here is the code given. Could someone please walk me through this?
public static boolean isUniqueChars(String str) {
int checker = 0;
for (int i = 0; i < str.length(); ++i) {
int val = str.charAt(i) - 'a';
if ((checker & (1 << val)) > 0) return false;
checker |= (1 << val);
}
return true;
}
Particularly, what is the checker
doing?
int checker
is used here as a storage for bits. Every bit in integer value can be treated as a flag, so eventuallyint
is an array of bits (flag). Each bit in your code states whether the character with bit's index was found in string or not. You could use bit vector for the same reason instead ofint
. There are two differences between them:Size.
int
has fixed size, usually 4 bytes which means 8*4=32 bits (flags). Bit vector usually can be of different size or you should specify the size in constructor.API. With bit vectors you will have easier to read code, probably something like this:
vector.SetFlag(4, true); // set flag at index 4 as true
for
int
you will have lower-level bit logic code:checker |= (1 << 5); // set flag at index 5 to true
Also probably
int
may be a little bit faster, because operations with bits are very low level and can be executed as-is by CPU. BitVector allows writing a little bit less cryptic code instead plus it can store more flags.For future reference: bit vector is also known as bitSet or bitArray. Here are some links to this data structure for different languages/platforms:
Lets break down the code line by line.
int checker = 0; We are initiating a checker which will help us find duplicate values.
int val = str.charAt(i) - 'a'; We are getting the ASCII value of the character at the 'i'th position of the string and subtracting it with the ASCII value of 'a'. Since the assumption is that the string is lower characters only, the number of characters in limited to 26. Hece, the value of 'val' will always be >= 0.
if ((checker & (1 << val)) > 0) return false;
checker |= (1 << val);
Now this is the tricky part. Lets us consider an example with string "abcda". This should ideally return false.
For loop iteration 1:
Checker: 00000000000000000000000000000000
val: 97-97 = 0
1 << 0: 00000000000000000000000000000001
checker & (1 << val): 00000000000000000000000000000000 is not > 0
Hence checker: 00000000000000000000000000000001
For loop iteration 2:
Checker: 00000000000000000000000000000001
val: 98-97 = 1
1 << 0: 00000000000000000000000000000010
checker & (1 << val): 00000000000000000000000000000000 is not > 0
Hence checker: 00000000000000000000000000000011
For loop iteration 3:
Checker: 00000000000000000000000000000011
val: 99-97 = 0
1 << 0: 00000000000000000000000000000100
checker & (1 << val): 00000000000000000000000000000000 is not > 0
Hence checker: 00000000000000000000000000000111
For loop iteration 4:
Checker: 00000000000000000000000000000111
val: 100-97 = 0
1 << 0: 00000000000000000000000000001000
checker & (1 << val): 00000000000000000000000000000000 is not > 0
Hence checker: 00000000000000000000000000001111
For loop iteration 5:
Checker: 00000000000000000000000000001111
val: 97-97 = 0
1 << 0: 00000000000000000000000000000001
checker & (1 << val): 00000000000000000000000000000001 is > 0
Hence return false.
Previous Posts explain well what the code block does and i want to add my simple Solution using the BitSet java Data structure :
Reading Ivan's answer above really helped me, although I would phrase it somewhat differently.
The
<<
in(1 << val)
is a bit shifting operator. It takes1
(which in binary is represented as000000001
, with as many preceding zeroes as you like / are allocated by memory) and shifts it to the left byval
spaces. Since we're assuming a-z only and subtractinga
each time, each letter will have a value of 0-25, which will be that letter's index from the right in thechecker
integer's boolean representation, since we will move the1
to the left inchecker
val
times.At the end of each check, we see the
|=
operator. This merges two binary numbers, replacing all0
's with1
's if a1
exists in either operand at that index. Here, that means that wherever a1
exists in(1 << val)
, that1
will be copied over intochecker
, while all ofchecker
's existing 1's will be preserved.As you can probably guess, a
1
functions here as a boolean flag for true. When we check to see if a character is already represented in the string, we comparechecker
, which at this point is essentially an array of boolean flags (1
values) at the indexes of characters that have already been represented, with what is essentially an array of boolean values with a1
flag at the index of the current character.The
&
operator accomplishes this check. Similar to the|=
, the&
operator will copy over a1
only if both operands have a1
at that index. So, essentially, only flags already present inchecker
that are also represented in(1 << val)
will be copied over. In this case, that means only if the current character has already been represented will there be a1
present anywhere in the result ofchecker & (1 << val)
. And if a1
is present anywhere in the result of that operation, then the value of the returned boolean is> 0
, and the method returns false.This is, I'm guessing, why bit vectors are also called bit arrays. Because, even though they aren't of the array data type, they can be used similar to the way arrays are used in order to store boolean flags.
I have a sneaking suspicion you got this code from the same book I'm reading...The code itself here isn't nearly as cryptic as the the operators- |=, &, and << which aren't normally used by us layman- the author didn't bother taking the extra time out in explaining the process nor what the actual mechanics involved here are. I was content with the previous answer on this thread in the beginning but only on an abstract level. I came back to it because I felt there needed to be a more concrete explanation- the lack of one always leaves me with an uneasy feeling.
This operator << is a left bitwise shifter it takes the binary representation of that number or operand and shifts it over however many places specified by the operand or number on the right like in decimal numbers only in binaries. We are multiplying by base 2-when we move up however many places not base 10- so the number on the right is the exponent and the number on the left is a base multiple of 2.
This operator |= take the operand on the left and or's it with the operand on the right- and this one -'&'and's the bits of both operands to left and right of it.
So what we have here is a hash table which is being stored in a 32 bit binary number every time the checker gets or'd (
checker |= (1 << val)
) with the designated binary value of a letter its corresponding bit it is being set to true. The character's value is and'd with the checker (checker & (1 << val)) > 0
)- if it is greater than 0 we know we have a dupe- because two identical bits set to true and'd together will return true or '1''.There are 26 binary places each of which corresponds to a lowercase letter-the author did say to assume the string only contains lowercase letters- and this is because we only have 6 more (in 32 bit integer) places left to consume- and than we get a collision
So, for an input string 'azya', as we move step by step
string 'a'
string 'az'
string 'azy'
string 'azya'
Now, it declares a duplicate