Proper/Title Case a Column with Exceptions table i

2019-09-21 20:55发布

问题:

I am trying to convert a column which is in upper case to proper case but with exceptions like certain acronyms, abbreviations. I am following the below code to implement that. But looks like this will be an ongoing process and so, I want to create a table with the exceptions in order to make it easy to clean the data and I want to be able to call the exceptions table from the function. It would be great if anyone can help me with any codes they have which is similar to this or any ideas on how to implement it.

ALTER FUNCTION [dbo].[Business_ProperCase]
     (@Text AS VARCHAR(8000)) 
RETURNS VARCHAR(8000) 
AS 
BEGIN 
    -- declare some variables 
    DECLARE @Reset BIT; DECLARE @Ret VARCHAR(8000); DECLARE @i INT; 
    DECLARE @c0 CHAR(1); DECLARE @c1 CHAR(1); DECLARE @c2 CHAR(1); 
    DECLARE @CaseLen INT; 
    DECLARE @CaseExceptions VARCHAR(8000); 
    DECLARE @CaseValue VARCHAR(8000); 

    -- Set some default values 
    SELECT @Reset = 1, @i=1, @Ret = ''; 

    -- only apply if all characters are already in uppercase 
    IF (UPPER(@Text)=@Text COLLATE Latin1_General_CS_AI) 
         BEGIN 

                 -- add a leading and trailing space to indicate word delimiters (bol & eol) 
                 SET @Text = ' ' + @Text + ' '; 

                 -- cycle through each character, 
                 -- if non-alpha, uppercase next alpha character. 
                 -- if alpha then lowercase subsequent alphas. 
                 WHILE (@i <= LEN(@Text)) 
                         SELECT 
                                 @c0=SUBSTRING(@Text,@i-2,1), @c1=SUBSTRING(@Text,@i-1,1), @c2=SUBSTRING(@Text,@i,1), 
                                 @Ret = @Ret + CASE WHEN @Reset=1 THEN UPPER(@c2) ELSE LOWER(@c2) END, 
                                 @Reset = CASE 
                                                                                             WHEN @c0 = ' ' AND @c1 = 'M' AND @c2 = 'c' THEN 1 

                                                 WHEN @c0 = ' ' AND @c1 IN ('D', 'I', 'O') AND @c2 = '''' THEN 1 


                                                 WHEN @c2 LIKE '[a-zA-Z'']' THEN 0               -- Apply LOWER to any character after alphas or apostrophes 
                                                ELSE 1                                                                  -- Apply UPPER to any character after symbols/punctuation 
                                         END, 
                                 @i = @i +1 

                 -- add a trailing space in case the previous rule changed this. 
                 SET @Ret = @Ret + ' '; 

                 -- custom exceptions: this search is case-insensitive and will 
                 -- replace the word to the case as it is written in the list. 
                 -- NOTE: this list has to end with a comma! 
                 SELECT @i=0, @CaseLen=0, 
                         @CaseExceptions = 'ABS,LLC,MD,MBA,MA,

--Want to create a table for these exceptions and call them from this function 
                 -- Loop through exception cases 
                 WHILE CHARINDEX(',', @CaseExceptions, @i+1)>0 
                         BEGIN 
                                 -- get the delimited word 
                                 SET @CaseLen = CHARINDEX(',', @CaseExceptions, @i+1) - @i 
                                 SET @CaseValue = SUBSTRING(@CaseExceptions, @i, @CaseLen) 

                                 -- replace it in the original text 
                                 SET @Ret = REPLACE(@Ret, ' '+@CaseValue+' ', ' '+@CaseValue+' ') 

                                 -- get position of next word 
                                 SET @i = CHARINDEX(',', @CaseExceptions, @i+@CaseLen) +1 
                         END 

                 -- remove any leading and trailing spaces 
                 SET @Ret = LTRIM(RTRIM(@Ret)); 

                 -- capitalize first character of data irrespective of previous rules 
                 SET @Ret = UPPER(SUBSTRING(@Ret,1,1)) + SUBSTRING(@Ret,2,LEN(@Ret)); 

         END 
    ELSE 
     BEGIN 
                 -- return the string unaffected if it is not in uppercase 
                 SET @Ret=@Text 
         END 

    RETURN @Ret 
 END 

回答1:

Here's an example for you to reference:

declare @s varchar(256) = 'This is a SQL test';
declare @t table (ignore varchar(256) not null);

insert into @t (ignore) values ('SQL');

declare @pos int = 1;
declare @nextpos int;
declare @w varchar(256);

while @pos <= len(@s)
begin
    set @nextpos = charindex(' ', @s + ' ', @pos);
    set @w = substring(@s, @pos, @nextpos - @pos);
    if not exists (select 1 from @t where ignore = @w)
        set @s = stuff(
            @s, @pos, @nextpos - @pos,
            stuff(lower(@w), 1, 1, upper(left(@w, 1)))
        );
    set @pos = @nextpos + 1;
    select @s;
end


回答2:

To answer the original request.. set up a table "Exceptions" with a single column ConcatList of type nvarchar (100) and add the exceptions to this table... then create a view with to concatenate them together...

    create table exceptions (ConcatList nvarchar(100))

    create view [dbo].vExceptions
    as
     Select distinct
        substring(
        (
            Select ','+ up.ConcatList  AS [text()]
            From exceptions up 
            ORDER BY up.ConcatList
            For XML PATH ('')
        ), 2, 4000) [exceptions]

     From exceptions p

Here is a slightly enhanced version of the stored procedure from the question. (although an admittedly inelegant solution) to account for:

  • Lower case words (of, the, an, etc)
  • Hhyphenated acronyms
  • Exceptions that are immediately preceeded or followed with a dash or comma.

    alter FUNCTION [dbo].[Business_ProperCase]
         (@Text AS VARCHAR(8000)) 
    RETURNS VARCHAR(8000) 
    AS 
    BEGIN 
        -- declare some variables 
        DECLARE @Reset BIT; DECLARE @Ret VARCHAR(8000); DECLARE @i INT; 
        DECLARE @c0 CHAR(1); DECLARE @c1 CHAR(1); DECLARE @c2 CHAR(1); 
        DECLARE @CaseLen INT; 
        DECLARE @CaseExceptions VARCHAR(8000); 
        DECLARE @CaseValue VARCHAR(8000); 
    
        -- Set some default values 
        SELECT @Reset = 1, @i=1, @Ret = ''; 
    
        -- only apply if all characters are already in uppercase 
        IF (UPPER(@Text)=@Text COLLATE Latin1_General_CS_AI) 
             BEGIN 
    
                     -- add a leading and trailing space to indicate word delimiters (bol & eol) 
                     SET @Text = ' ' + @Text + ' '; 
    
                     -- cycle through each character, 
                     -- if non-alpha, uppercase next alpha character. 
                     -- if alpha then lowercase subsequent alphas. 
                     WHILE (@i <= LEN(@Text)) 
                             SELECT 
                                     @c0=SUBSTRING(@Text,@i-2,1), @c1=SUBSTRING(@Text,@i-1,1), @c2=SUBSTRING(@Text,@i,1), 
                                     @Ret = @Ret + CASE WHEN @Reset=1 THEN UPPER(@c2) ELSE LOWER(@c2) END, 
                                     @Reset = CASE WHEN @c0 = ' ' AND @c1 = 'M' AND @c2 = 'c' THEN 1 
                                                   WHEN @c0 = ' ' AND @c1 IN ('D', 'I', 'O') AND @c2 = '''' THEN 1 
                                                   WHEN @c2 LIKE '[a-zA-Z'']' THEN 0               -- Apply LOWER to any character after alphas or apostrophes 
                                                    ELSE 1                                         -- Apply UPPER to any character after symbols/punctuation 
                                              END, 
                                     @i = @i +1 
    
                     -- add a trailing space in case the previous rule changed this. 
                     SET @Ret = @Ret + ' '; 
    
                     -- custom exceptions: this search is case-insensitive and will 
                     -- replace the word to the case as it is written in the list. 
                     -- NOTE: this list has to end with a comma! 
                     SELECT @i=0, @CaseLen=0, 
                             @CaseExceptions = exceptions from vExceptions
    
    --Want to create a table for these exceptions and call them from this function 
                     -- Loop through exception cases 
                     WHILE CHARINDEX(',', @CaseExceptions, @i+1)>0 
                             BEGIN 
                                     -- get the delimited word 
                                     SET @CaseLen = CHARINDEX(',', @CaseExceptions, @i+1) - @i 
                                     SET @CaseValue = SUBSTRING(@CaseExceptions, @i, @CaseLen) 
                                     if (@CaseValue = 'OF' or @CaseValue = 'AND' or @CaseValue ='THE' or @CaseValue='FOR')
                                     begin
                                         --replace with lower case 'of', 'and', 'the', 'for'
                                         SET @Ret = REPLACE(@Ret, ' '+@CaseValue+' ', ' '+lower(@CaseValue)+' ') 
                                     end
                                     else
                                     begin  
                                        if (CHARINDEX(' '+ @CaseValue +' ', @Ret)>0 )
                                        begin
                                             -- replace it in the original text 
                                             SET @Ret = REPLACE(@Ret, ' '+@CaseValue+' ', ' '+@CaseValue+' ') 
                                        end
                                        else if (CHARINDEX(' '+@CaseValue+',', @Ret)>0 )
                                        begin
                                            --replace text (with no spaces around it)
                                            SET @Ret = REPLACE(@Ret, ' '+@CaseValue+',', ' '+@CaseValue+',') 
                                        end
                                        else if (CHARINDEX(' '+@CaseValue+'-', @Ret)>0 )
                                        begin
                                            --replace text (with no spaces around it)
                                            SET @Ret = REPLACE(@Ret, ' '+@CaseValue+'-', ' '+@CaseValue+'-') 
                                        end
                                        else if (CHARINDEX('-'+@CaseValue+' ', @Ret)>0 )
                                        begin
                                            --replace text (with no spaces around it)
                                            SET @Ret = REPLACE(@Ret, '-'+@CaseValue+' ', '-'+@CaseValue+' ') 
                                        end
                                        else if (CHARINDEX(','+@CaseValue+' ', @Ret)>0 )
                                        begin
                                            --replace text (with no spaces around it)
                                            SET @Ret = REPLACE(@Ret, ','+@CaseValue+' ', '-'+@CaseValue+' ') 
                                        end
                                     end
    
                                     -- get position of next word 
                                     SET @i = CHARINDEX(',', @CaseExceptions, @i+@CaseLen) +1 
                             END 
    
                     -- remove any leading and trailing spaces 
                     SET @Ret = LTRIM(RTRIM(@Ret)); 
    
                     -- capitalize first character of data irrespective of previous rules 
                     SET @Ret = UPPER(SUBSTRING(@Ret,1,1)) + SUBSTRING(@Ret,2,LEN(@Ret)); 
    
             END 
        ELSE 
         BEGIN 
                     -- return the string unaffected if it is not in uppercase 
                     SET @Ret=@Text 
             END 
    
        RETURN @Ret 
     END 
    


回答3:

Create a table (I use ExceptionsTable as my example) with a column WordExcepts. Then add the following after your last DECLARE at the top of the page:

DECLARE @sql nvarchar(2000);

SET @sql = 'N select WordExcepts from ExceptionsTable' 

Then down below adjust your exceptions to be:

@CaseExceptions = @sql

Just add to your table as needed and they get filtered out of the function.