SELECT REPLACE('<strong>100</strong><b>.00 GB', '%^(^-?\d*\.{0,1}\d+$)%', '');
I want to replace any markup between two parts of the number with above regex, but it does not seem to work. I'm not sure if it is regex syntax that's wrong because I tried simpler one such as '%[^0-9]%'
just to test but it didn't work either. Does anyone know how can I achieve this?
Wrapping the solution inside a SQL function could be useful if you want to reuse it. I'm even doing it at the cell level, that's why I'm putting this as a different answer:
I stumbled across this post looking for something else but thought I'd mention a solution I use which is far more efficient - and really should be the default implementation of any function when used with a set based query - which is to use a cross applied table function. Seems the topic is still active so hopefully this is useful to someone.
Example runtime on a few of the answers so far based on running recursive set based queries or scalar function, based on 1m rows test set removing the chars from a random newid, ranges from 34s to 2m05s for the WHILE loop examples and from 1m3s to {forever} for the function examples.
Using a table function with cross apply achieves the same goal in 10s. You may need to adjust it to suit your needs such as the max length it handles.
Function:
Usage:
You can use PATINDEX to find the first index of the pattern (string's) occurrence. Then use STUFF to stuff another string into the pattern(string) matched.
Loop through each row. Replace each illegal characters with what you want. In your case replace non numeric with blank. The inner loop is if you have more than one illegal character in a current cell that of the loop.
Caution: This is slow though! Having a varchar column may impact. So using LTRIM RTRIM may help a bit. Regardless, it is slow.
Credit goes to this StackOverFlow answer.
EDIT Credit also goes to @srutzky
Edit (by @Tmdean) Instead of doing one row at a time, this answer can be adapted to a more set-based solution. It still iterates the max of the number of non-numeric characters in a single row, so it's not ideal, but I think it should be acceptable in most situations.
You can also improve efficiency quite a lot if you maintain a bit column in the table that indicates whether the field has been scrubbed yet. (NULL represents "Unknown" in my example and should be the column default.)
If you don't want to change your schema, this is easy to adapt to store intermediate results in a table valued variable which gets applied to the actual table at the end.
If you are doing this just for a parameter coming into a Stored Procedure, you can use the following:
I think a simpler and faster approach is iterate by each character of the alphabet:
Instead of stripping out the found character by its sole position, using
Replace(Column, BadFoundCharacter, '')
could be substantially faster. Additionally, instead of just replacing the one bad character found next in each column, this replaces all those found.I am convinced this will work better than the accepted answer, if only because it does fewer operations. There are other ways that might also be faster, but I don't have time to explore those right now.