Create/Insert Json in Postgres with requests and p

2020-05-24 18:36发布

Just started a project with PostgreSQL. I would like to make the leap from Excel to a database and I am stuck on create and insert. Once I run this I will have to switch it to Update I believe so I don't continue to write over the current data. I know my connection is working but i get the following error.

My Error is: TypeError: not all arguments converted during string formatting

#!/usr/bin/env python
import requests
import psycopg2

conn = psycopg2.connect(database='NHL', user='postgres', password='postgres', host='localhost', port='5432')

req = requests.get('http://www.nhl.com/stats/rest/skaters?isAggregate=false&reportType=basic&isGame=false&reportName=skatersummary&sort=[{%22property%22:%22playerName%22,%22direction%22:%22ASC%22},{%22property%22:%22goals%22,%22direction%22:%22DESC%22},{%22property%22:%22assists%22,%22direction%22:%22DESC%22}]&cayenneExp=gameTypeId=2%20and%20seasonId%3E=20172018%20and%20seasonId%3C=20172018') 
data = req.json()['data']

my_data = []
for item in data:
    season = item['seasonId']
    player = item['playerName']
    first_name = item['playerFirstName']
    last_Name = item['playerLastName']
    playerId = item['playerId']
    height = item['playerHeight']
    pos = item['playerPositionCode']
    handed = item['playerShootsCatches']
    city = item['playerBirthCity']
    country = item['playerBirthCountry']   
    state = item['playerBirthStateProvince']
    dob = item['playerBirthDate']
    draft_year = item['playerDraftYear']
    draft_round = item['playerDraftRoundNo']
    draft_overall = item['playerDraftOverallPickNo']
    my_data.append([playerId, player, first_name, last_Name, height, pos, handed, city, country, state, dob, draft_year, draft_round, draft_overall, season])

cur = conn.cursor()
cur.execute("CREATE TABLE t_skaters (data json);")
cur.executemany("INSERT INTO t_skaters VALUES (%s)", (my_data,))

Sample of data:

[[8468493, 'Ron Hainsey', 'Ron', 'Hainsey', 75, 'D', 'L', 'Bolton', 'USA', 'CT', '1981-03-24', 2000, 1, 13, 20172018], [8471339, 'Ryan Callahan', 'Ryan', 'Callahan', 70, 'R', 'R', 'Rochester', 'USA', 'NY', '1985-03-21', 2004, 4, 127, 20172018]]

1条回答
对你真心纯属浪费
2楼-- · 2020-05-24 19:42

It seems like you want to create a table with one column named "data". The type of this column is JSON. (I would recommend creating one column per field, but it's up to you.)

In this case the variable data (that is read from the request) is a list of dicts. As I mentioned in my comment, you can loop over data and do the inserts one at a time as executemany() is not faster than multiple calls to execute().

What I did was the following:

  1. Create a list of fields that you care about.
  2. Loop over the elements of data
  3. For each item in data, extract the fields into my_data
  4. Call execute() and pass in json.dumps(my_data) (Converts my_data from a dict into a JSON-string)

Try this:

#!/usr/bin/env python
import requests
import psycopg2
import json

conn = psycopg2.connect(database='NHL', user='postgres', password='postgres', host='localhost', port='5432')

req = requests.get('http://www.nhl.com/stats/rest/skaters?isAggregate=false&reportType=basic&isGame=false&reportName=skatersummary&sort=[{%22property%22:%22playerName%22,%22direction%22:%22ASC%22},{%22property%22:%22goals%22,%22direction%22:%22DESC%22},{%22property%22:%22assists%22,%22direction%22:%22DESC%22}]&cayenneExp=gameTypeId=2%20and%20seasonId%3E=20172018%20and%20seasonId%3C=20172018') 

# data here is a list of dicts
data = req.json()['data']

cur = conn.cursor()
# create a table with one column of type JSON
cur.execute("CREATE TABLE t_skaters (data json);")

fields = [
    'seasonId',
    'playerName',
    'playerFirstName',
    'playerLastName',
    'playerId',
    'playerHeight',
    'playerPositionCode',
    'playerShootsCatches',
    'playerBirthCity',
    'playerBirthCountry',
    'playerBirthStateProvince',
    'playerBirthDate',
    'playerDraftYear',
    'playerDraftRoundNo',
    'playerDraftOverallPickNo'
]

for item in data:
    my_data = {field: item[field] for field in fields}
    cur.execute("INSERT INTO t_skaters VALUES (%s)", (json.dumps(my_data),))


# commit changes
conn.commit()
# Close the connection
conn.close()

I am not 100% sure if all of the postgres syntax is correct here (I don't have access to a PG database to test), but I believe that this logic should work for what you are trying to do.

Update For Separate Columns

You can modify your create statement to handle multiple columns, but it would require knowing the data type of each column. Here's some psuedocode you can follow:

# same boilerplate code from above
cur = conn.cursor()
# create a table with one column per field
cur.execute(
"""CREATE TABLE t_skaters (seasonId INTEGER, playerName VARCHAR, ...);"""
)

fields = [
    'seasonId',
    'playerName',
    'playerFirstName',
    'playerLastName',
    'playerId',
    'playerHeight',
    'playerPositionCode',
    'playerShootsCatches',
    'playerBirthCity',
    'playerBirthCountry',
    'playerBirthStateProvince',
    'playerBirthDate',
    'playerDraftYear',
    'playerDraftRoundNo',
    'playerDraftOverallPickNo'
]

for item in data:
    my_data = [item[field] for field in fields]
    # need a placeholder (%s) for each variable 
    # refer to postgres docs on INSERT statement on how to specify order
    cur.execute("INSERT INTO t_skaters VALUES (%s, %s, ...)", tuple(my_data))


# commit changes
conn.commit()
# Close the connection
conn.close()

Replace the ... with the appropriate values for your data.

查看更多
登录 后发表回答