Is there a method to calculate something like general "similarity score" of a string? In a way that I am not comparing two strings together but rather I get some number/scores (hash) for each string that can later tell me that two strings are or are not similar. Two similar strings should have similar (close) scores/hashes.
Let's consider these strings and scores as an example:
Hello world 1000
Hello world! 1010
Hello earth 1125
Foo bar 3250
FooBarbar 3750
Foo Bar! 3300
Foo world! 2350
You can see that Hello world! and Hello world are similar and their scores are close to each other.
This way, finding the most similar strings to a given string would be done by subtracting given strings score from other scores and then sorting their absolute value.
My end aim is : there would be streaming log messages(only pure messages) and i wanna find the pattern of those messages(some sort of regular expression type).But that gets started only when i can bucket similar strings. I again focus that I should get some number/scores (hash) for each string AND THAT CAN LATER tell me that two strings are or are not similar
You might want to look at using a BK-Tree. Here is a discussion and python implementation.
A BK-Tree stores strings in a tree, sorted by Levenshtein distance to the parent nodes. This is normally used to prune the search space when looking for similar strings, but it seems that this tree would form a natural ordering that could be used to create clusters.
Have a look at locality-sensitive hashing.
There's a very good explanation available here together with some sample code.