I have a string of words that are ::
delimited. How can I use the Hive
UDF
regexp_extract()
to extract words from the string?
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
regexp_extract('2foa1fa::12hjk','^(\\w.*)\\:{2}(\\w.*)$',1) as word1
OUTPUT: 2foa1fa
regexp_extract('2foa1fa::12hjk','^(\\w.*)\\:{2}(\\w.*)$',2) as word2
OUTPUT: 12hjk
- ^ anchors to the beginning of the string
- The \\w looks for a word character and .* means any number of times
- The \\:{2} looks for two : in a row (this is your :: delimiter)
- $ anchors the string to the end of the string
- specifying the third parameter in regexp_extract extracts the indexed (pattern)
Now just stick your column name in the place of the string literal and you should be good to go.
You can also use a split function creating an array and then query by the array location as well. Which will look something like this:
select my_array[2] from
(select split('2foa1fa::12hjk','\\::') as my_array from my_table) b;
OUTPUT: 12hjk