I was wondering if someone could help me understand how to use Hive's regexp_replace function to capture groups in the regex and use those groups in the replacement string.
I have an example problem I'm working through below that involves date-munging. In this example, my goal is to take a string date that is not compatible with SimpleDateFormat parsing and make a small adjustment to get it to be compatible. The date string (shown below) needs "GMT" prepended to the offset sign (+/-) in the string.
So, Given input:
'2015-01-01 02:03:04 +0:00'
-or-
'2015-01-01 02:03:04 -1:00'
I want output:
'2015-01-01 02:03:04 GMT+0:00'
-or-
'2015-01-01 02:03:04 GMT-1:00'
Here is a simple example of a statement that I 'thought' would work, but I get strange output.
Hive query:
select regexp_replace('2015-01-01 02:03:04 +0:00', ' ([+-])', ' GMT\1');
Actual result:
2015-01-01 02:03:04 GMT10:00
Note that the "\1" should output the matched group, but instead replaces the matched group with the number "1".
Can someone help me understand the right way to reference/output matched groups in the replacement string?
Thanks!