-->

Populating a SQLite3 database from a .txt file wit

2019-02-06 12:40发布

问题:

I am trying to setup a website in django which allows the user to send queries to a database containing information about their representatives in the European Parliament. I have the data in a comma seperated .txt file with the following format:

Parliament, Name, Country, Party_Group, National_Party, Position

7, Marta Andreasen, United Kingdom, Europe of freedom and democracy Group, United Kingdom Independence Party, Member

etc....

I want to populate a SQLite3 database with this data, but so far all the tutorials I have found only show how to do this by hand. Since I have 736 observations in the file I dont really want to do this.

I suspect this is a simple matter, but I would be very grateful if someone could show me how to do this.

Thomas

回答1:

So assuming your models.py looks something like this:

class Representative(models.Model):
    parliament = models.CharField(max_length=128)
    name = models.CharField(max_length=128)
    country = models.CharField(max_length=128)
    party_group = models.CharField(max_length=128)
    national_party = models.CharField(max_length=128)
    position = models.CharField(max_length=128)

You can then run python manage.py shell and execute the following:

import csv
from your_app.models import Representative
# If you're using different field names, change this list accordingly.
# The order must also match the column order in the CSV file.
fields = ['parliament', 'name', 'country', 'party_group', 'national_party', 'position']
for row in csv.reader(open('your_file.csv')):
    Representative.objects.create(**dict(zip(fields, row)))

And you're done.

Addendum (edit)

Per Thomas's request, here's an explanation of what **dict(zip(fields,row)) does:

So initially, fields contains a list of field names that we defined, and row contains a list of values that represents the current row in the CSV file.

fields = ['parliament', 'name', 'country', ...]
row = ['7', 'Marta Andreasen', 'United Kingdom', ...]

What zip() does is it combines two lists into one list of pairs of items from both lists (like a zipper); i.e. zip(['a','b,'c'], ['A','B','C']) will return [('a','A'), ('b','B'), ('c','C')]. So in our case:

>>> zip(fields, row)
[('parliament', '7'), ('name', 'Marta Andreasen'), ('country', 'United Kingdom'), ...]

The dict() function simply converts the list of pairs into a dictionary.

>>> dict(zip(fields, row))
{'parliament': '7', 'name': 'Marta Andreasen', 'country': 'United Kingdom', ...}

The ** is a way of converting a dictionary into a keyword argument list for a function. So function(**{'key': 'value'}) is the equivalent of function(key='value'). So in out example, calling create(**dict(zip(field, row))) is the equivalent of:

create(parliament='7', name='Marta Andreasen', country='United Kingdom', ...)

Hope this clears things up.



回答2:

As SiggyF says and only slightly differently than Joschua:

Create a text file with your schema, e.g.:

CREATE TABLE politicians (
    Parliament text, 
    Name text, 
    Country text, 
    Party_Group text, 
    National_Party text, 
    Position text
);

Create table:

>>> import csv, sqlite3
>>> conn = sqlite3.connect('my.db')
>>> c = conn.cursor()
>>> with open('myschema.sql') as f:            # read in schema file 
...   schema = f.read()
... 
>>> c.execute(schema)                          # create table per schema 
<sqlite3.Cursor object at 0x1392f50>
>>> conn.commit()                              # commit table creation

Use csv module to read file with data to be inserted:

>>> csv_reader = csv.reader(open('myfile.txt'), skipinitialspace=True)
>>> csv_reader.next()                          # skip the first line in the file
['Parliament', 'Name', 'Country', ...

# put all data in a tuple
# edit: decoding from utf-8 file to unicode
>>> to_db = tuple([i.decode('utf-8') for i in line] for line in csv_reader)
>>> to_db                                      # this will be inserted into table
[(u'7', u'Marta Andreasen', u'United Kingdom', ...

Insert data:

>>> c.executemany("INSERT INTO politicians VALUES (?,?,?,?,?,?);", to_db)
<sqlite3.Cursor object at 0x1392f50>
>>> conn.commit()

Verify that all went as expected:

>>> c.execute('SELECT * FROM politicians').fetchall()
[(u'7', u'Marta Andreasen', u'United Kingdom', ...

Edit:
And since you've decoded (to unicode) on input, you need to be sure to encode on output.
For example:

with open('encoded_output.txt', 'w') as f:
  for row in c.execute('SELECT * FROM politicians').fetchall():
    for col in row:
      f.write(col.encode('utf-8'))
      f.write('\n')


回答3:

You could read the data using the csv module. Then you can create an insert sql statement and use the method executemany:

  cursor.executemany(sql, rows)

or use add_all if you use sqlalchemy.



回答4:

You asked what the create(**dict(zip(fields, row))) line did.

I don't know how to reply directly to your comment, so I'll try to answer it here.

zip takes multiple lists as args and returns a list of their correspond elements as tuples.

zip(list1, list2) => [(list1[0], list2[0]), (list1[1], list2[1]), .... ]

dict takes a list of 2-element tuples and returns a dictionary mapping each tuple's first element (key) to its second element (value).

create is a function that takes keyword arguments. You can use **some_dictionary to pass that dictionary into a function as keyword arguments.

create(**{'name':'john', 'age':5}) => create(name='john', age=5)



回答5:

Something like the following should work: (not tested)

# Open database (will be created if not exists)
conn = sqlite3.connect('/path/to/your_file.db')

c = conn.cursor()

# Create table
c.execute('''create table representatives
(parliament text, name text, country text, party_group text, national_party text, position text)''')

f = open("thefile.txt")
for i in f.readlines():
    # Insert a row of data
    c.execute("""insert into representatives
                 values (?,?,?,?,?,?)""", *i.split(", ")) # *i.split(", ") does unpack the list as arguments

# Save (commit) the changes
conn.commit()

# We can also close the cursor if we are done with it
c.close()


回答6:

If you want to do it with a simple method using sqlite3, you can do it using these 3 steps:

$ sqlite3 db.sqlite3
sqlite> .separator ","
sqlite> .import myfile.txt table_name

However do keep the following points in mind:

  1. The .txt file should be in the same directory as your db.sqlite3,
    otherwise use an absolute path "/path/myfile.txt" when importing
  2. Your schema for tables (number of columns) should match with the number of values seperated by commas in each row in the txt file

You can use the .tables command to verify your table name

SQLite version 3.23.1 2018-04-10 17:39:29
Enter ".help" for usage hints.
sqlite> .tables
auth_group                  table_name               
auth_group_permissions      django_admin_log          
auth_permission             django_content_type       
auth_user                   django_migrations         
auth_user_groups            django_session            
auth_user_user_permissions