updating table rows based on txt file

2019-04-16 15:09发布

问题:

I have been searching but so far I only found how to insert date into tables based on a csv files.

I have the following scenario:

Directory name = ticketID

Inside this directory I have a couple of files, like:

  • Description.txt
  • Summary.txt - Contains ticket header and has been imported succefully.
  • Progress_#.txt - this is everytime a ticket gets udpdated. I get a new file.
  • Solution.txt

Importing the Issue.txt was easy since this was actually a CSV.

Now my problem is with Description and Progress files.

I need to update the existing rows with the data from this files. Something on the line of

update table_ticket set table_ticket.description = Description.txt where ticket_number = directoryname

I'm using PostgreSQL and the COPY command is valid for new data and it would still fail due to the ',;/ special chars.

I wanted to do this using bash script, but it seem that it is it won't be possible:

for i in `find . -type d`
do
  update table_ticket 
  set table_ticket.description = $i/Description.txt
  where ticket_number = $i
done

Of course the above code would take into consideration connection to the database.

Anyone has a idea on how I could achieve this using shell script. Or would it be better to just make something in Java and read and update the record, although I would like to avoid this approach.

Thanks Alex

回答1:

Thanks for the answer, but I came across this:

psql -U dbuser -h dbhost db 
\set content = `cat PATH/Description.txt`
update table_ticket set description = :'content' where ticketnr = TICKETNR;

Putting this into a simple script I created the following:

#!/bin/bash
for i in `find . -type d|grep ^./CS`
do
    p=`echo $i|cut -b3-12 -`
    echo $p
    sed s/PATH/${p}/g cmd.sql > cmd.tmp.sql
    ticketnr=`echo $p|cut -b5-10 -`
    sed -i s/TICKETNR/${ticketnr}/g cmd.tmp.sql
    cat cmd.tmp.sql
    psql -U supportAdmin -h localhost supportdb -f cmd.tmp.sql
done

The downside is that it will create always a new connection, later I'll change to create a single file

But it does exactly what I was looking for, putting the contents inside a single column.



回答2:

psql can't read the file in for you directly unless you intend to store it as a large object in which case you can use lo_import. See the psql command \lo_import.


Update: @AlexandreAlves points out that you can actually slurp file content in using

  \set myvar = `cat somefile`

then reference it as a psql variable with :'myvar'. Handy.


While it's possible to read the file in using the shell and feed it to psql it's going to be awkward at best as the shell offers neither a native PostgreSQL database driver with parameterised query support nor any text escaping functions. You'd have to roll your own string escaping.

Even then, you need to know that the text encoding of the input file is valid for your client_encoding otherwise you'll insert garbage and/or get errors. It quickly lands up being easier to do it in a langage with proper integration with PostgreSQL like Python, Perl, Ruby or Java.

There is a way to do what you want in bash if you really must, though: use Pg's delimited dollar quoting with a randomized delimiter to help prevent SQL injection attacks. It's not perfect but it's pretty darn close. I'm writing an example now.


Given problematic file:

$ cat > difficult.txt <__END__
Shell metacharacters like: $!(){}*?"'
SQL-significant characters like "'()
__END__

and sample table:

psql -c 'CREATE TABLE testfile(filecontent text not null);'

You can:

#!/bin/bash
filetoread=$1
sep=$(printf '%04x%04x\n' $RANDOM $RANDOM)
psql <<__END__
INSERT INTO testfile(filecontent) VALUES (
\$x${sep}\$$(cat ${filetoread})\$x${sep}\$
);
__END__

This could be a little hard to read and the random string generation is bash specific, though I'm sure there are probably portable approaches.

A random tag string consisting of alphanumeric characters (I used hex for convenience) is generated and stored in seq.

psql is then invoked with a here-document tag that isn't quoted. The lack of quoting is important, as <<'__END__' would tell bash not to interpret shell metacharacters within the string, wheras plain <<__END__ allows the shell to interpret them. We need the shell to interpret metacharacters as we need to substitute sep into the here document and also need to use $(...) (equivalent to backticks) to insert the file text. The x before each substitution of seq is there because here-document tags must be valid PostgreSQL identifiers so they must start with a letter not a number. There's an escaped dollar sign at the start and end of each tag because PostgreSQL dollar quotes are of the form $taghere$quoted text$taghere$.

So when the script is invoked as bash testscript.sh difficult.txt the here document lands up expanding into something like:

INSERT INTO testfile(filecontent) VALUES (
$x0a305c82$Shell metacharacters like: $!(){}*?"'
SQL-significant characters like "'()$x0a305c82$
);

where the tags vary each time, making SQL injection exploits that rely on prematurely ending the quoting difficult.

I still advise you to use a real scripting language, but this shows that it is indeed possible.



回答3:

The best thing to do is to create a temporary table, COPY those from the files in question, and then run your updates.

Your secondary option would be to create a function in a language like pl/perlu and do this in the stored procedure, but you will lose a lot of performance optimizations that you can do when you update from a temp table.