How to remove any trailing numbers from a string?

2019-04-26 21:09发布

问题:

Sample inputs:

"Hi there how are you"

"What is the #1 pizza place in NYC?"

"Dominoes is number 1"

"Blah blah 123123"

"More blah 12321 123123 123132"

Expected output:

"Hi there how are you"

"What is the #1 pizza place in NYC?"

"Dominoes is number"

"Blah blah"

"More blah"

I'm thinking it's a 2 step process:

  1. Split the entire string into characters, one row per character (including spaces), in reverse order
  2. Loop through, and for each one if it's a space or a number, skip, otherwise add to the start of another array.

And i should end up with the desired result.

I can think of a few quick and dirty ways, but this needs to perform fairly well, as it's a trigger that runs on a busy table, so thought i'd throw it out to the T-SQL pros.

Any suggestions?

回答1:

This solution should be a bit more efficient because it first checks to see if the string contains a number, then it checks to see if the string ends in a number.

 CREATE FUNCTION dbo.trim_ending_numbers(@columnvalue AS VARCHAR(100)) RETURNS VARCHAR(100)
    BEGIN
    --This will make the query more efficient by first checking to see if it contains any numbers at all
    IF @columnvalue NOT LIKE '%[0-9]%'
        RETURN @columnvalue

    DECLARE @counter INT
    SET @counter = LEN(@columnvalue)

    IF ISNUMERIC(SUBSTRING(@columnvalue,@counter,1)) = 0
        RETURN @columnvalue 

    WHILE ISNUMERIC(SUBSTRING(@columnvalue,@counter,1)) = 1 OR SUBSTRING(@columnvalue,@counter,1) = ' '
    BEGIN
        SET @counter = @counter -1
        IF @counter < 0
            BREAK
    END
    SET @columnvalue = SUBSTRING(@columnvalue,0,@counter+1)

    RETURN @columnvalue
    END

If you run

SELECT dbo.trim_ending_numbers('More blah 12321 123123 123132')

It will return

'More blah'


回答2:

A loop on a busy table will be very unlikely to perform adequately. Use REVERSE and PATINDEX to find the first non digit, begin a SUBSTRING there, then REVERSE the result. This will be plenty slow with no loops.

Your examples imply that you also don't want to match spaces.

DECLARE @t TABLE (s NVARCHAR(500))
INSERT INTO @t (s)
VALUES 
('Hi there how are you'),('What is the #1 pizza place in NYC?'),('Dominoes is number 1'),('Blah blah 123123'),('More blah 12321 123123 123132')

select s 
, reverse(s) as beginning
, patindex('%[^0-9 ]%',reverse(s)) as progress
, substring(reverse(s),patindex('%[^0-9 ]%',reverse(s)), 1+len(s)-patindex('%[^0-9 ]%',reverse(s))) as [more progress]
, reverse(substring(reverse(s),patindex('%[^0-9 ]%',reverse(s)), 1+len(s)-patindex('%[^0-9 ]%',reverse(s)))) as SOLUTION
from @t

Final answer: reverse( substring( reverse( @s ), patindex( '%[^0-9 ]%', reverse( @s ) ), 1 + len( @s ) - patindex( '%[^0-9 ]%', reverse( @s ) ) ) )



回答3:

I believe that the below query is fast and useful

select reverse(substring(reverse(colA),PATINDEX('%[0-9][a-z]%',reverse(colA))+1,
len(colA)-PATINDEX('%[0-9][a-z]%',reverse(colA))))
from TBLA


回答4:

--DECLARE @String VARCHAR(100) = 'the fat cat sat on the mat'
--DECLARE @String VARCHAR(100) = 'the fat cat 2 sat33 on4 the mat'
--DECLARE @String VARCHAR(100) = 'the fat cat sat on the mat1'
--DECLARE @String VARCHAR(100) = '2121'
DECLARE @String VARCHAR(100) = 'the fat cat 2 2 2 2 sat on the mat2121'



DECLARE @Answer NVARCHAR(MAX),
    @Index INTEGER = LEN(@String),
    @Character CHAR,
    @IncorrectCharacterIndex SMALLINT


-- Start from the end, going to the front.
WHILE @Index > 0 BEGIN

    -- Get each character, starting from the end
    SET @Character = SUBSTRING(@String, @Index, 1)

    -- Regex check.
    SET @IncorrectCharacterIndex = PATINDEX('%[A-Za-z-]%', @Character)

    -- Is there a match? We're lucky here because it will either match on index 1 or not (index 0)
    IF (@IncorrectCharacterIndex != 0)
    BEGIN
        -- We have a legit character.
        SET @Answer = SUBSTRING(@String, 0, @Index + 1)
        SET @Index = 0
    END
    ELSE
        SET @Index = @Index - 1 -- No match, lets go back one index slot.


END

PRINT LTRIM(RTRIM(@Answer))

NOTE: I've included a dash in the valid regex match.



回答5:

Thanks for all the contributions which were very helpful. To go further and extract off JUST the trailing number:

, substring(s, 2 + len(s) - patindex('%[^0-9 ]%',reverse(s)), 99) as numeric_suffix

I needed to sort on the number suffix so had to restrict the pattern to numerics and to get around numbers of different lengths sorting as text (ie I wanted 2 to sort before 19) cast the result:

,cast(substring(s, 2 + len(s) - patindex('%[^0-9]%',reverse(s)),99) as integer) as numeric_suffix