PostgreSQL regexp_replace() to keep just one white

2019-03-24 05:47发布

问题:

I need to clean up a string column with both whitespaces and tabs included within, at the beginning or at the end of strings (it's a mess !). I want to keep just one whitespace between each word. Say we have the following string that includes every possible situation :

mystring = '  one two    three      four    '
  • 2 whitespaces before 'one'
  • 1 whitespace between 'one' and 'two'
  • 4 whitespaces between 'two' and 'three'
  • 2 tabs after 'three'
  • 1 tab after 'four'

Here is the way I do it :

  1. I delete leading and trailing whitespaces
  2. I delete leading and trailing tabs
  3. I replace both 'whitespaces repeated at least two' and tabs by a sole whitespace

WITH

  t1 AS (SELECT'  one two    three      four    '::TEXT AS mystring),

  t2 AS (SELECT TRIM(both ' ' from mystring) AS mystring FROM t1),

  t3 AS (SELECT TRIM(both '\t' from mystring) AS mystring FROM t2)

  SELECT regexp_replace(mystring, '(( ){2,}|\t+)', ' ', 'g') FROM t3 ;

I eventually get the following string, which looks nice but I still have a trailing whitespace...

'one two three four '

Any idea on doing it in a more simple way and solving this last issue ?

Many thanks !

回答1:

SELECT trim(regexp_replace(col_name, '\s+', ' ', 'g')) as col_name FROM table_name;

Or In case of update :

UPDATE table_name SET col_name = trim(regexp_replace(col_name, '\s+', ' ', 'g'));


回答2:

SELECT trim(regexp_replace(mystring, '\s+', ' ', 'g')) as mystring FROM t1;

Posting an answer in case folks don't look at comments.

Use '\s+'

Not '\\s+'

Worked for me.



回答3:

It didn't work for me with trim and regexp_replace. So I came with another solution:

SELECT trim(
    array_to_string(
        regexp_split_to_array('  test    with many  spaces  for        this   test  ', E'\\s+')
    , ' ')
) as mystring;

First regexp_split_to_array eliminates all spaces leaving "blanks" at the beginning and the end.

-- regexp_split_to_array output:
-- {"",test,with,many,spaces,for,this,test,""}

When using array_to_string all the ',' become spaces

-- regexp_split_to_array output ( '_' instead of spaces for viewing ):
-- _test_with_many_spaces_for_this_test_

The trim is to remove the head and tail

-- trim output ( '_' instead of spaces for viewing ):
-- test_with_many_spaces_for_this_test