Replace an entire field value in a file using awk

I have an export from postgresql table with multiple fields, including boolean (exported by postgresql as t and f character), and I need to import it in another database (monetdb) that won't understand t/f as bool values.

(EDIT removed spaces to reflect true file aspect and avoid angry comments - previously there was spaces displayed)

id|val_str|bool_1|bool2|bool_3|bool4|
1|help|t|t|f|t|
2|test|f|t|f|f|
...

As I cannot replace all occurence of t/f I need to integrate the field separator in my pattern. I tried to use awk to replace fields t with TRUE and f with FALSE:

awk -F'|' '{gsub(/\|t\|/, "|TRUE|"); gsub(/\|f\|/, "|FALSE|"); print;}'

This is working partially, as consecutive fields with a same value (|t|t|) will have only the first occurrence replaced (|TRUE|t| - as 2nd occurence is in fact t| and not |t|).

id|val_str|bool_1|bool2|bool_3|bool4|
1|help|TRUE|t|FALSE|TRUE|
2|test|FALSE|TRUE|FALSE|f|
...

Table has ~450 columns so I can't really specify the list of columns to be replaced, nor work in postgres to 'transform' boolean columns (I could but ...).

I could run the gsub() twice, but I was looking for more elegant way to match the entire field content for all fields.

gsub(/^t$/, ...) is not helping either as we are in the middle of a line most of the time.

标签： bash postgresql awk monetdb

4条回答

放我归山

2楼-- · 2019-08-23 04:55

if perl is okay, you can use lookarounds:

$ cat ip.txt 
id |  val_str  | bool_1 | bool2  | bool_3 | bool4  | 
1  |    help   |   t    |   t    |   f    |   t    |
2  |    test   |   f    |   t    |   f    |   f    | 

$ perl -pe 's/\|\K\h*t\h*(?=\|)/  TRUE  /g; s/\|\K\h*f\h*(?=\|)/  FALSE /g' ip.txt 
id |  val_str  | bool_1 | bool2  | bool_3 | bool4  | 
1  |    help   |  TRUE  |  TRUE  |  FALSE |  TRUE  |
2  |    test   |  FALSE |  TRUE  |  FALSE |  FALSE |

\|\K positive lookbehind to match |
\h* optional horizontal space, remove if not actually present in input
(?=\|) positive lookahead to match |

Can also use looping with sed. Tested on GNU sed 4.2.2, syntax may vary with other implementations

$ sed ':a s/| *t *|/|  TRUE  |/;ta; :b s/| *f *|/|  FALSE |/;tb' ip.txt 
id |  val_str  | bool_1 | bool2  | bool_3 | bool4  | 
1  |    help   |  TRUE  |  TRUE  |  FALSE |  TRUE  |
2  |    test   |  FALSE |  TRUE  |  FALSE |  FALSE |

:a label
s/| *t *|/| TRUE |/ substitute command
ta branch to label a as long as substitute command succeeds
similarly for :b

With no spaces in input

perl -pe 's/\|\Kt(?=\|)/TRUE/g; s/\|\Kf(?=\|)/FALSE/g' ip.txt 
sed ':a s/|t|/|TRUE|/;ta; :b s/|f|/|FALSE|/;tb' ip.txt 
awk 'BEGIN{FS=OFS="|"} {for(i=1;i<=NF;i++){if($i=="t"){$i="TRUE"} if($i=="f"){$i="FALSE"}} print}' ip.txt

0人赞添加讨论(0) 举报

Bombasti

3楼-- · 2019-08-23 04:58

Use sed, it`s standard.

sed 's/| *t */| TRUE /g;s/| *f */| FALSE /g'

This tells sed to substitute every substring that begins with a pipe character, an unknown number of spaces (maybe zero), t and a space followed by an unknown number of spaces with | TRUE; same with f.

Pipe the output through column -t if line lengths get messed up.

0人赞添加讨论(0) 举报

Rolldiameter

4楼-- · 2019-08-23 05:07

Table has ~450 columns so I can't really specify the list of columns to be replaced, nor work in postgres to 'transform' boolean columns (I could but ...).

You can let Postgres do the work for you. Basic query to produce the SELECT list:

SELECT string_agg(CASE WHEN atttypid = 'bool'::regtype
                       THEN quote_ident(attname) || '::text'
                       ELSE quote_ident(attname) END, ', ' ORDER BY attnum)
FROM   pg_attribute
WHERE  attrelid = 'mytable'::regclass  -- provide table name here
AND    attnum > 0
AND    NOT attisdropped;

Produces a string of the form:

col1, "CoL 2", bool1::text, "Bool 2"::text

All identifiers are escaped properly. Columns are in default order. Copy and execute it. Use COPY to export to file. (Or \copy in psql.) Performance is about the same as exporting a plain table. If you don't need upper case omit upper().

Why is a simple cast to text enough?

Return Boolean Value as TRUE or FALSE in Select (PostgreSQL/pgAdmin)

About regclass and escaping identifiers properly:

Table name as a PostgreSQL function parameter

If you need a complete statement with TRUE / FALSE / NULL in upper case, standard SQL cast notation (without colons ::), still original column names and maybe a schema-qualified tablename:

SELECT 'SELECT '
     || string_agg(CASE WHEN atttypid = 'bool'::regtype
                        THEN format('upper(cast(%1$I AS text)) AS %1$I', attname)
                        ELSE quote_ident(attname) END, ', ' ORDER BY attnum)
     || ' FROM myschema.mytable;'           -- provide table name twice now
FROM   pg_attribute
WHERE  attrelid = 'myschema.mytable'::regclass
AND    attnum > 0
AND    NOT attisdropped;

Produces a complete statement of the form:

SELECT col1, "CoL 2", upper(cast(bool1 AS text) AS bool1, upper(cast("Bool 2" AS text)) AS "Bool 2" FROM myschema.mytable;

0人赞添加讨论(0) 举报

成全新的幸福

5楼-- · 2019-08-23 05:10

Assuming (based on your comments) that your input file actually doesn't look like the sample you posted but instead looks like this:

$ cat file
id|val_str|bool_1|bool2|bool_3|bool4|
1|help|t|t|f|t|
2|test|f|t|f|f|

then all you need is:

$ awk '{while(gsub(/\|t\|/,"|TRUE|")); while(gsub(/\|f\|/,"|FALSE|"));}1' file
id|val_str|bool_1|bool2|bool_3|bool4|
1|help|TRUE|TRUE|FALSE|TRUE|
2|test|FALSE|TRUE|FALSE|FALSE|

with the general solution for N replacement strings being:

$ awk 'BEGIN{m["f"]="FALSE"; m["t"]="TRUE"} {for (k in m) while(gsub("\\|"k"\\|","|"m[k]"|"));} 1' file
id|val_str|bool_1|bool2|bool_3|bool4|
1|help|TRUE|TRUE|FALSE|TRUE|
2|test|FALSE|TRUE|FALSE|FALSE|

0人赞添加讨论(0) 举报

Replace an entire field value in a file using awk

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间