I am trying to filter all the ip adresses in a username. But this doesnt really work properly in my query:
select distinct regexp_extract(username, '^([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})$', 0) from ips.
The problem is that he even recognizes numbers as 1000000 as ip adress.
Any idea how to fix it?
You need extra backslashes to escape special characters like . or \s. There's some more info on the wiki at https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
Try something like:
select
distinct regexp_extract(ip, '^([0-9]{1,3})\\.([0-9]{1,3})\\.([0-9]{1,3})\\.([0-9]{1,3})$', 0) as match
from
ips
having
match <> "";