Avoiding duplicated data in PostgreSQL database in

2019-07-21 09:56发布

问题:

I am working on PostgreSQL and psycopg2. Trying to get feed data which is updated every after 10 mins and keep this feeds contents in PostgreSQL database.My target is to retrieve and print those data from that table. But facing problem as duplicate data is also stored in the database every time I run that script due to insertion operation on table.

To get out off this problem ,I made primary key constraint of column location_title in table Locations-musiq1 where I intend to store my feed data.But facing error.

Here is my code:

import psycopg2
import sys
import feedparser
import codecs
import psycopg2.extensions


# Parsing data from Geofeed location feeds

data = feedparser.parse("some URL")
psycopg2.extensions.register_type(psycopg2.extensions.UNICODE)



try:

    conn=psycopg2.connect("dbname='name' user='postgres' host='localhost'     password='abcds'")
    conn.set_client_encoding('UNICODE')


except:
    print "I am unable to connect to the database, exiting."
    sys.exit()
cur=conn.cursor()


for i in range(len(data['entries'])):
    cur.execute("INSERT INTO locations_musiq1(location, location_title) VALUES (%s, %s)",    (data.entries[i].title,data.entries[i].summary))
    conn.commit()
cur.execute("SELECT * FROM locations_musiq1;")
cur.fetchone()
for row in cur:
    print '   '.join(row[1:])


cur.close()
conn.close()

My error after changing "locations_musiq1" tables column "location_title" as primary key is:

    Traceback (most recent call last):
      File "F:\JavaWorkspace\Test\src\postgr_example.py", line 28, in 
        cur.execute("INSERT INTO locations_musiq1(location, location_title) VALUES (%s, %s)",    (data.entries[i].title,data.entries[i].summary))
    psycopg2.IntegrityError: duplicate key value violates unique constraint "locations_musiq1_pkey"

Can anybody have any idea to get out of this problem ?..Thanks in advance..

回答1:

You could try something like this:

cur.execute("""
  INSERT INTO locations_musiq1(location, location_title) 
  SELECT %s, %s WHERE NOT EXISTS 
      (SELECT location_title FROM locations_musiq1 WHERE location_title=%s);
  """, (data.entries[i].title, data.entries[i].summary, data.entries[i].summary))


回答2:

Your code only has INSERT, so what do you think is going to happen when you fetch the same data for a second time?

Your update is failing because you're trying to insert a row which has an identical field value to one that already exists in a column with a unique constraint.

You either need to match entries in from the feed to your table and INSERT, UPDATE, DELETE as appropriate. Where appropriate is defined by the feed data and the reasons for you synchronising. Or you empty your table and populate it from the feed each time.

What are you trying to achieve?



回答3:

Rahman. You are asking a second question in your comment that should probably be made into its own question instead.

Anyhow to return the results in a specified order, you need an order by clause. I don't see a timestamp column here but I assume that your feed data is formatted in XML. You could order by some xpath expression. But if you just want them in the order they were inserted, you can sort by the hidden system column xmin, which is the transaction id of the insert operation.

See documentation on system columns.