Multiple split and assign order_id

2020-04-10 02:52发布

问题:

I have some address string that needs to be analyzed for each component.

SQL Fiddle Demo

CREATE TABLE Table1 ("ID" int, "address" varchar(41));

INSERT INTO Table1 ("ID", "address")
VALUES
    (1, 'calle 15, lomas del avila'),
    (2, 'calle av. solano edif:Apalache apt-15');

So I need to split by different characters like ( ., ,, :, ;, <space>, -)

For one character I know how to do it.

SELECT ID, s.token
FROM   Table1 t, unnest(string_to_array(t.address, ' ')) s(token);

How chain multiple unnest and how I assign an OrderID?

| ID |         token | orderID
|----|---------------|--------
|  1 |         calle |    1
|  1 |           15, |    2
|  1 |         lomas |    3
|  1 |           del |    4
|  1 |         avila |    5
|  2 |         calle |    1
|  2 |           av. |    2
|  2 |        solano |    3
|  2 | edif:Apalache |    4
|  2 |        apt-15 |    5

For this example second row 15, will be split into 15 and null so second result can be discard and order won't change.

But in the last 2 rows edif:Apalache and apt-15 the second split will produce edif, Apalache, apt and 15 so orderID will go from 4 to 7:

|  2 | edif     |    4
|  2 | Apalache |    5
|  2 | apt      |    6
|  2 | 15       |    7

回答1:

Use translate():

SELECT "ID", token, row_number() over (partition by "ID")
FROM (
    SELECT "ID", s.token
    FROM   
        Table1 t, 
        unnest(string_to_array(translate(t.address, '.,:;-', '     '), ' ')) 
            s(token)
    WHERE token <> ''
    ) sub

SqlFiddle.