I would like to retrieve a column of how many differences in letters in each row. For instance
If you have a a value "test" and another row has a value "testing ", then the differences is 4 letter between "test" and "testing ". The data of the column would be value 4
I have reflected about it and I don't know where to begin
id || value || category || differences
--------------------------------------------------
1 || test || 1 || 4
2 || testing || 1 || null
11 || candy || 2 || -3
12 || ca || 2 || null
In this scenario and context it is no difference between "Test" and "rest".
I think what you are looking for is a measure of edit difference, rather than just counting prefix similarity, for which there are a few common algorithms. Levenshtein's method is one that I've used before and I've seen it implemented as TSQL functions. The answers to this SO question suggest a couple of implementations in TSQL that you might just be able to take and use as-is.
(though take time to test the code and understand the method rather than just copying the code and using it, so that you can understand the output if something seems to go wrong - otherwise you could be creating some technical debt you'll have to pay back later)
Exactly which distance calculation method you want will depend on how you want to count certain things, for instance do you count a substitution as one change or a delete and an insert, and if your strings are long enough for it to matter do you want to consider substring moves, and so forth.
I think you just want
len()
andlead()
:You read a next record with
LEAD
. Then compare the strings with LIKE or other string functions:If you only want to compare values within the same category use a partition clause:
UPDATE: Please see DhruvJoshi's answer on SQL Server's
LEN
. This function doesn't count trailing blanks, as I assumed, so you need his trick in case you want to have them counted. Here is the doc onLEN
confirming this behaviour: https://technet.microsoft.com/en-us/library/ms190329(v=sql.105).aspxyou can also use self joining query like below:
Update: I noticed that you have oddly positioned the space at the end. SQL server most times does not count the trailing spaces when calculating length. So here's the hack on above query
As pointed out in comments, that Id's may not be consecutive, in such cases try this :