SQL Server's isNumeric() equivalent in amazon

2020-03-01 07:35发布

问题:

  • I'm using amazon redshift as my data warehouse
  • I have a field (field1)of type string. Some of the strings start with four numbers and others with letters:

'test alpha'
'1382 test beta'

  • I want to filter out rows where the string does not start with four numbers
  • Looking at the redshift documentation, I don't believe isnumber or isnumeric are functions. It seems that the 'like' function is the best possibility.
  • I tried

    where left(field1, 4) like '[0-9][0-9][0-9][0-9]'

this did not work and from the link below seems like redshift may not support that:

https://forums.aws.amazon.com/message.jspa?messageID=439850

is there an error in the 'where' clause? if not and that clause isn't supported in redshift, is there a way to filter? I was thinking of using cast

cast(left(field1,4) as integer) 

and then passing over the row if it generated an error, but not sure how to do this in amazon redshift. or is there some other proxy for the isnumeric filter.

thanks

回答1:

Try something like:

where field1 ~ '^[0-9]{4}'

It will match any string, that starts with 4 digits.



回答2:

Although long time has passed since this question was asked I have not found an adequate response. So I feel obliged to share my solution which works fine on my Redshift cluster today (March 2016).

The UDF function is:

create or replace function isnumeric (aval VARCHAR(20000))
  returns bool
IMMUTABLE 
as $$
    try:
       x = int(aval);
    except:
       return (1==2);
    else:
       return (1==1);
$$ language plpythonu;

Usage would be:

select isnumeric(mycolumn), * from mytable
    where isnumeric(mycolumn)=false


回答3:

It seems that redshift doesn't support any of the following:

where left(field1,4) like '[0-9][0-9][0-9][0-9]' 
where left(field1,4) ~ '^[0-9]{4}'
where left(field1,4) like '^[0-9]{4}'

what does seem to work is:

where left(field1,4) between 0 and 9999

this returns all rows that start with four numeric characters.

it seems that even though field1 is type string, the 'between' function interprets left(field1,4) as a single integer when the string characters are numeric (and does not give an error when they are not numeric). I'll follow up if I find a problem. For instance I don't deal with anything less than 1000, so I assume, but am not sure, that 0001 is interpreted as 1.



回答4:

looks like what you are looking for the is the similar to function (Redshift doc)

where left(field,4) similar to [0-9]{4}


回答5:

Per Amazon, the posix style ~regex style expressions are slow... https://docs.aws.amazon.com/redshift/latest/dg/pattern-matching-conditions.html

Using their own REGEXP_* functions seems to be faster. https://docs.aws.amazon.com/redshift/latest/dg/String_functions_header.html

For checking just a true/false for integers I've been using the following with success. REGEXP_COUNT(my_field_to_check, '^[0-9]+$') > 0

this returns 1 if only numeric, 0 if anything else



回答6:

where regexp_instr(field1,'^[0-9]{4}') = 0

will remove rows starting with 4 digits (the above regexp_instr will return 1 for the rows with field1 starting with 4 digits)



回答7:

We have tried the following and worked for most of our scenarios:

columnn ~ '^[-]{0,1}[0-9]{1,}[.]{0,1}[0-9]{0,}$'

This will positive, negative, integer and float numbers.



回答8:

redshift should support similar to.

WHERE field1 SIMILAR TO '[0-9]{4}%'

This reads as where field1 starts with 4 characters in the range of 0-9, then anything else.