I'm trying to use regexp_extract on hive.
I have data which is varying in nature, such as:
a2=new something
a1=asdasdsad;a2=old something;a3=asadasdsadsa
a2=Some place;alksndklsand;a1=asdklsad
Now, I need to extract the a2 data only. The semi colon denotes the end of a2 data but it might not present in every case.
What I've been trying is to concat a ';' to the column and then running regexp_extract to extract the data between the "a2=" and the first ";" (addding the ";" in order to make the logic compatible with all the cases):
regexp_extract(concat(other_data,';'),'(.*)a2=?(.*?);.*',2)
But this isn't working at all.
Could someone suggest a better regexp for this?
Thanks.
I think RegEx101 is pretty helpful for you to understand the logic. I tried the above expression and it seem to work to extract the content for
a2
with or without;
.Try this.See demo.
https://www.regex101.com/r/rC2mH4/7
This simple regex will do the work:
It's your same regex but with only one capturing group (you don't need to capture what it's before the a2 key).