Hive regexp_extract data

2020-05-09 02:14发布

问题:

I'm trying to use regexp_extract on hive.

I have data which is varying in nature, such as:

a2=new something a1=asdasdsad;a2=old something;a3=asadasdsadsa a2=Some place;alksndklsand;a1=asdklsad

Now, I need to extract the a2 data only. The semi colon denotes the end of a2 data but it might not present in every case.

What I've been trying is to concat a ';' to the column and then running regexp_extract to extract the data between the "a2=" and the first ";" (addding the ";" in order to make the logic compatible with all the cases):

regexp_extract(concat(other_data,';'),'(.*)a2=?(.*?);.*',2)

But this isn't working at all.

Could someone suggest a better regexp for this?

Thanks.

回答1:

(?<=a2=)[^;]*(?:;|$)

Try this.See demo.

https://www.regex101.com/r/rC2mH4/7



回答2:

This simple regex will do the work:

.*a2=?(.*?);

It's your same regex but with only one capturing group (you don't need to capture what it's before the a2 key).



回答3:

(?<=a2=)=?([^;\n]*)

I think RegEx101 is pretty helpful for you to understand the logic. I tried the above expression and it seem to work to extract the content for a2 with or without ;.



标签: regex hive