I have an export from postgresql table with multiple fields, including boolean (exported by postgresql as t
and f
character), and I need to import it in another database (monetdb) that won't understand t/f as bool values.
(EDIT removed spaces to reflect true file aspect and avoid angry comments - previously there was spaces displayed)
id|val_str|bool_1|bool2|bool_3|bool4|
1|help|t|t|f|t|
2|test|f|t|f|f|
...
As I cannot replace all occurence of t
/f
I need to integrate the field separator in my pattern.
I tried to use awk
to replace fields t
with TRUE
and f
with FALSE
:
awk -F'|' '{gsub(/\|t\|/, "|TRUE|"); gsub(/\|f\|/, "|FALSE|"); print;}'
This is working partially, as consecutive fields with a same value (|t|t|
) will have only the first occurrence replaced (|TRUE|t|
- as 2nd occurence is in fact t|
and not |t|
).
id|val_str|bool_1|bool2|bool_3|bool4|
1|help|TRUE|t|FALSE|TRUE|
2|test|FALSE|TRUE|FALSE|f|
...
Table has ~450 columns so I can't really specify the list of columns to be replaced, nor work in postgres to 'transform' boolean columns (I could but ...).
I could run the gsub()
twice, but I was looking for more elegant way to match the entire field content for all fields.
gsub(/^t$/, ...)
is not helping either as we are in the middle of a line most of the time.
Table has ~450 columns so I can't really specify the list of columns
to be replaced, nor work in postgres to 'transform' boolean columns (I
could but ...).
You can let Postgres do the work for you. Basic query to produce the SELECT
list:
SELECT string_agg(CASE WHEN atttypid = 'bool'::regtype
THEN quote_ident(attname) || '::text'
ELSE quote_ident(attname) END, ', ' ORDER BY attnum)
FROM pg_attribute
WHERE attrelid = 'mytable'::regclass -- provide table name here
AND attnum > 0
AND NOT attisdropped;
Produces a string of the form:
col1, "CoL 2", bool1::text, "Bool 2"::text
All identifiers are escaped properly. Columns are in default order. Copy and execute it. Use COPY
to export to file. (Or \copy
in psql.) Performance is about the same as exporting a plain table. If you don't need upper case omit upper()
.
Why is a simple cast to text
enough?
- Return Boolean Value as TRUE or FALSE in Select (PostgreSQL/pgAdmin)
About regclass
and escaping identifiers properly:
- Table name as a PostgreSQL function parameter
If you need a complete statement with TRUE
/ FALSE
/ NULL
in upper case, standard SQL cast notation (without colons ::
), still original column names and maybe a schema-qualified tablename:
SELECT 'SELECT '
|| string_agg(CASE WHEN atttypid = 'bool'::regtype
THEN format('upper(cast(%1$I AS text)) AS %1$I', attname)
ELSE quote_ident(attname) END, ', ' ORDER BY attnum)
|| ' FROM myschema.mytable;' -- provide table name twice now
FROM pg_attribute
WHERE attrelid = 'myschema.mytable'::regclass
AND attnum > 0
AND NOT attisdropped;
Produces a complete statement of the form:
SELECT col1, "CoL 2", upper(cast(bool1 AS text) AS bool1, upper(cast("Bool 2" AS text)) AS "Bool 2" FROM myschema.mytable;
Assuming (based on your comments) that your input file actually doesn't look like the sample you posted but instead looks like this:
$ cat file
id|val_str|bool_1|bool2|bool_3|bool4|
1|help|t|t|f|t|
2|test|f|t|f|f|
then all you need is:
$ awk '{while(gsub(/\|t\|/,"|TRUE|")); while(gsub(/\|f\|/,"|FALSE|"));}1' file
id|val_str|bool_1|bool2|bool_3|bool4|
1|help|TRUE|TRUE|FALSE|TRUE|
2|test|FALSE|TRUE|FALSE|FALSE|
with the general solution for N replacement strings being:
$ awk 'BEGIN{m["f"]="FALSE"; m["t"]="TRUE"} {for (k in m) while(gsub("\\|"k"\\|","|"m[k]"|"));} 1' file
id|val_str|bool_1|bool2|bool_3|bool4|
1|help|TRUE|TRUE|FALSE|TRUE|
2|test|FALSE|TRUE|FALSE|FALSE|
if perl
is okay, you can use lookarounds:
$ cat ip.txt
id | val_str | bool_1 | bool2 | bool_3 | bool4 |
1 | help | t | t | f | t |
2 | test | f | t | f | f |
$ perl -pe 's/\|\K\h*t\h*(?=\|)/ TRUE /g; s/\|\K\h*f\h*(?=\|)/ FALSE /g' ip.txt
id | val_str | bool_1 | bool2 | bool_3 | bool4 |
1 | help | TRUE | TRUE | FALSE | TRUE |
2 | test | FALSE | TRUE | FALSE | FALSE |
\|\K
positive lookbehind to match |
\h*
optional horizontal space, remove if not actually present in input
(?=\|)
positive lookahead to match |
Can also use looping with sed
. Tested on GNU sed 4.2.2
, syntax may vary with other implementations
$ sed ':a s/| *t *|/| TRUE |/;ta; :b s/| *f *|/| FALSE |/;tb' ip.txt
id | val_str | bool_1 | bool2 | bool_3 | bool4 |
1 | help | TRUE | TRUE | FALSE | TRUE |
2 | test | FALSE | TRUE | FALSE | FALSE |
:a
label
s/| *t *|/| TRUE |/
substitute command
ta
branch to label a
as long as substitute command succeeds
- similarly for
:b
With no spaces in input
perl -pe 's/\|\Kt(?=\|)/TRUE/g; s/\|\Kf(?=\|)/FALSE/g' ip.txt
sed ':a s/|t|/|TRUE|/;ta; :b s/|f|/|FALSE|/;tb' ip.txt
awk 'BEGIN{FS=OFS="|"} {for(i=1;i<=NF;i++){if($i=="t"){$i="TRUE"} if($i=="f"){$i="FALSE"}} print}' ip.txt
Use sed
, it`s standard.
sed 's/| *t */| TRUE /g;s/| *f */| FALSE /g'
This tells sed
to substitute every substring that begins with a pipe character, an unknown number of spaces (maybe zero), t
and a space followed by an unknown number of spaces with | TRUE
; same with f
.
Pipe the output through column -t
if line lengths get messed up.