I need to clean up a string column with both whitespaces and tabs included within, at the beginning or at the end of strings (it's a mess !). I want to keep just one whitespace between each word. Say we have the following string that includes every possible situation :
mystring = ' one two three four '
- 2 whitespaces before 'one'
- 1 whitespace between 'one' and 'two'
- 4 whitespaces between 'two' and 'three'
- 2 tabs after 'three'
- 1 tab after 'four'
Here is the way I do it :
- I delete leading and trailing whitespaces
- I delete leading and trailing tabs
- I replace both 'whitespaces repeated at least two' and tabs by a sole whitespace
WITH
t1 AS (SELECT' one two three four '::TEXT AS mystring),
t2 AS (SELECT TRIM(both ' ' from mystring) AS mystring FROM t1),
t3 AS (SELECT TRIM(both '\t' from mystring) AS mystring FROM t2)
SELECT regexp_replace(mystring, '(( ){2,}|\t+)', ' ', 'g') FROM t3 ;
I eventually get the following string, which looks nice but I still have a trailing whitespace...
'one two three four '
Any idea on doing it in a more simple way and solving this last issue ?
Many thanks !