I'm trying to use regexp_extract on hive.
I have data which is varying in nature, such as:
a2=new something
a1=asdasdsad;a2=old something;a3=asadasdsadsa
a2=Some place;alksndklsand;a1=asdklsad
Now, I need to extract the a2 data only. The semi colon denotes the end of a2 data but it might not present in every case.
What I've been trying is to concat a ';' to the column and then running regexp_extract to extract the data between the "a2=" and the first ";" (addding the ";" in order to make the logic compatible with all the cases):
regexp_extract(concat(other_data,';'),'(.*)a2=?(.*?);.*',2)
But this isn't working at all.
Could someone suggest a better regexp for this?
Thanks.