I found an interesting algorithm to calculate hamming distance on this site:
def hamming2(x,y):
"""Calculate the Hamming distance between two bit strings"""
assert len(x) == len(y)
count,z = 0,x^y
while z:
count += 1
z &= z-1 # magic!
return count
The point is that this algorithm only works on bit strings and I'm trying to compare two strings that are binary but they are in string format, like
'100010'
'101000'
How can I make them work with this algorithm?
Implement it:
def hamming2(s1, s2):
"""Calculate the Hamming distance between two bit strings"""
assert len(s1) == len(s2)
return sum(c1 != c2 for c1, c2 in zip(s1, s2))
And test it:
assert hamming2("1010", "1111") == 2
assert hamming2("1111", "0000") == 4
assert hamming2("1111", "1111") == 0
If we are to stick with the original algorithm, we need to convert the strings to integers to be able to use the bitwise operators.
def hamming2(x_str, y_str):
"""Calculate the Hamming distance between two bit strings"""
assert len(x_str) == len(y_str)
x, y = int(x_str, 2), int(y_str, 2) # '2' specifies we are reading a binary number
count, z = 0, x ^ y
while z:
count += 1
z &= z - 1 # magic!
return count
Then we can call it as follows:
print(hamming2('100010', '101000'))
While this algorithm is cool as a novelty, having to convert to a string likely negates any speed advantage it might have. The answer @dlask posted is much more succinct.
This is what I use to calculate the Hamming distance.
It counts the # of differences between equal length strings.
def hamdist(str1, str2):
diffs = 0
for ch1, ch2 in zip(str1, str2):
if ch1 != ch2:
diffs += 1
return diffs
I think this explains well The Hamming distance
between two strings
def hammingDist(s1, s2):
bytesS1=bytes(s1, encoding="ascii")
bytesS2=bytes(s2, encoding="ascii")
diff=0
for i in range(min(len(bytesS1),len(bytesS2))):
if(bytesS1[i]^bytesS2[i]!=0):
diff+=1
return(diff)