-->

Finding strings with duplicate letters inside

2019-02-26 20:25发布

问题:

Can somebody help me with this little task? What I need is a stored procedure that can find duplicate letters (in a row) in a string from a table "a" and after that make a new table "b" with just the id of the string that has a duplicate letter.

Something like this:

Table A

ID Name   
1  Matt
2  Daave
3  Toom
4  Mike
5  Eddie

And from that table I can see that Daave, Toom, Eddie have duplicate letters in a row and I would like to make a new table and list their ID's only. Something like:

Table B

ID     
2
3
5

Only 2,3,5 because that is the ID of the string that has duplicate letters in their names.

I hope this is understandable and would be very grateful for any help.

回答1:

In any version of SQL, you can do this with a brute force approach:

select *
from t
where t.name like '%aa%' or
      t.name like '%bb%' or
      . . .
      t.name like '%zz%'

If you have a case sensitive collation, then use:

where lower(t.name) like '%aa%' or
      . . .


回答2:

In your answer with stored procedure, you have 2 mistakes, one is missing space between column name and LIKE clause, second is missing single quotes around search parameter.

I first create user-defined scalar function which return 1 if string contains duplicate letters:

EDITED

CREATE FUNCTION FindDuplicateLetters
(
    @String NVARCHAR(50)
)
RETURNS BIT
AS
BEGIN

    DECLARE @Result BIT = 0 
    DECLARE @Counter INT = 1

    WHILE (@Counter <= LEN(@String) - 1) 
    BEGIN


    IF(ASCII((SELECT SUBSTRING(@String, @Counter, 1))) = ASCII((SELECT SUBSTRING(@String, @Counter + 1, 1))))
        BEGIN
             SET @Result = 1
             BREAK
        END


        SET @Counter = @Counter + 1 
    END

    RETURN @Result

END
GO

After function was created, just call it from simple SELECT query like following:

SELECT 
    * 
FROM
    (SELECT 
        *, 
        dbo.FindDuplicateLetters(ColumnName) AS Duplicates
    FROM TableName) AS a
WHERE a.Duplicates = 1

With this combination, you will get just rows that has duplicate letters.



回答3:

Though this is an old post it's worth posting a solution that will be faster than a brute force approach or one that uses a scalar udf (which generally drag down performance). Using NGrams8K this is rather simple.

--sample data
declare @table table (id int identity primary key, [name] varchar(20));
insert @table([name]) values ('Mattaa'),('Daave'),('Toom'),('Mike'),('Eddie');

-- solution #1
select id
from @table
cross apply dbo.NGrams8k([name],1)
where charindex(replicate(token,2), [name]) > 0
group by id;

-- solution #2 (SQL 2012+ solution using LAG)
select id
from
(
  select id, token, prevToken = lag(token,1) over (partition by id order by position)
  from @table
  cross apply dbo.NGrams8k([name],1)
) prep
where token = prevToken
group by id; -- optional id you want to remove possible duplicates. 


回答4:

Here's one way.

First create a table of numbers

CREATE TABLE dbo.Numbers
  (
     number INT PRIMARY KEY
  );

INSERT INTO dbo.Numbers
SELECT number
FROM   master..spt_values
WHERE  type = 'P'
       AND number > 0;

Then with that in place you can use

SELECT *
FROM   TableA
WHERE  EXISTS (SELECT *
               FROM   dbo.Numbers
               WHERE  number < LEN(Name)
                      AND SUBSTRING(Name, number, 1) = SUBSTRING(Name, number + 1, 1))