可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
- I'm using amazon redshift as my data warehouse
- I have a field (field1)of type string. Some of the strings start with four numbers and others with letters:
'test alpha'
'1382 test beta'
this did not work and from the link below seems like redshift may not support that:
https://forums.aws.amazon.com/message.jspa?messageID=439850
is there an error in the 'where' clause? if not and that clause isn't supported in redshift, is there a way to filter? I was thinking of using cast
cast(left(field1,4) as integer)
and then passing over the row if it generated an error, but not sure how to do this in amazon redshift. or is there some other proxy for the isnumeric filter.
thanks
回答1:
Try something like:
where field1 ~ '^[0-9]{4}'
It will match any string, that starts with 4 digits.
回答2:
Although long time has passed since this question was asked I have not found an adequate response. So I feel obliged to share my solution which works fine on my Redshift cluster today (March 2016).
The UDF function is:
create or replace function isnumeric (aval VARCHAR(20000))
returns bool
IMMUTABLE
as $$
try:
x = int(aval);
except:
return (1==2);
else:
return (1==1);
$$ language plpythonu;
Usage would be:
select isnumeric(mycolumn), * from mytable
where isnumeric(mycolumn)=false
回答3:
It seems that redshift doesn't support any of the following:
where left(field1,4) like '[0-9][0-9][0-9][0-9]'
where left(field1,4) ~ '^[0-9]{4}'
where left(field1,4) like '^[0-9]{4}'
what does seem to work is:
where left(field1,4) between 0 and 9999
this returns all rows that start with four numeric characters.
it seems that even though field1 is type string, the 'between' function interprets left(field1,4) as a single integer when the string characters are numeric (and does not give an error when they are not numeric). I'll follow up if I find a problem. For instance I don't deal with anything less than 1000, so I assume, but am not sure, that 0001 is interpreted as 1.
回答4:
looks like what you are looking for the is the similar to
function (Redshift doc)
where left(field,4) similar to [0-9]{4}
回答5:
Per Amazon, the posix style ~regex style expressions are slow...
https://docs.aws.amazon.com/redshift/latest/dg/pattern-matching-conditions.html
Using their own REGEXP_*
functions seems to be faster.
https://docs.aws.amazon.com/redshift/latest/dg/String_functions_header.html
For checking just a true/false for integers I've been using the following with success.
REGEXP_COUNT(my_field_to_check, '^[0-9]+$') > 0
this returns 1 if only numeric, 0 if anything else
回答6:
where regexp_instr(field1,'^[0-9]{4}') = 0
will remove rows starting with 4 digits (the above regexp_instr will return 1 for the rows with field1 starting with 4 digits)
回答7:
We have tried the following and worked for most of our scenarios:
columnn ~ '^[-]{0,1}[0-9]{1,}[.]{0,1}[0-9]{0,}$'
This will positive, negative, integer and float numbers.
回答8:
redshift should support similar to.
WHERE field1 SIMILAR TO '[0-9]{4}%'
This reads as where field1 starts with 4 characters in the range of 0-9, then anything else.