Hive regexp_extract data

2020-05-09 01:31发布

I'm trying to use regexp_extract on hive.

I have data which is varying in nature, such as:

a2=new something a1=asdasdsad;a2=old something;a3=asadasdsadsa a2=Some place;alksndklsand;a1=asdklsad

Now, I need to extract the a2 data only. The semi colon denotes the end of a2 data but it might not present in every case.

What I've been trying is to concat a ';' to the column and then running regexp_extract to extract the data between the "a2=" and the first ";" (addding the ";" in order to make the logic compatible with all the cases):

regexp_extract(concat(other_data,';'),'(.*)a2=?(.*?);.*',2)

But this isn't working at all.

Could someone suggest a better regexp for this?

Thanks.

标签: regex hive
3条回答
淡お忘
2楼-- · 2020-05-09 02:03
(?<=a2=)=?([^;\n]*)

I think RegEx101 is pretty helpful for you to understand the logic. I tried the above expression and it seem to work to extract the content for a2 with or without ;.

查看更多
Emotional °昔
3楼-- · 2020-05-09 02:12
(?<=a2=)[^;]*(?:;|$)

Try this.See demo.

https://www.regex101.com/r/rC2mH4/7

查看更多
放我归山
4楼-- · 2020-05-09 02:16

This simple regex will do the work:

.*a2=?(.*?);

It's your same regex but with only one capturing group (you don't need to capture what it's before the a2 key).

查看更多
登录 后发表回答