I have an unnormalized events-diary CSV from a client that I'm trying to load into a MySQL table so that I can refactor into a sane format. I created a table called 'CSVImport' that has one field for every column of the CSV file. The CSV contains 99 columns , so this was a hard enough task in itself:
CREATE TABLE 'CSVImport' (id INT);
ALTER TABLE CSVImport ADD COLUMN Title VARCHAR(256);
ALTER TABLE CSVImport ADD COLUMN Company VARCHAR(256);
ALTER TABLE CSVImport ADD COLUMN NumTickets VARCHAR(256);
...
ALTER TABLE CSVImport Date49 ADD COLUMN Date49 VARCHAR(256);
ALTER TABLE CSVImport Date50 ADD COLUMN Date50 VARCHAR(256);
No constraints are on the table, and all the fields hold VARCHAR(256) values, except the columns which contain counts (represented by INT), yes/no (represented by BIT), prices (represented by DECIMAL), and text blurbs (represented by TEXT).
I tried to load data into the file:
LOAD DATA INFILE '/home/paul/clientdata.csv' INTO TABLE CSVImport;
Query OK, 2023 rows affected, 65535 warnings (0.08 sec)
Records: 2023 Deleted: 0 Skipped: 0 Warnings: 198256
SELECT * FROM CSVImport;
| NULL | NULL | NULL | NULL | NULL |
...
The whole table is filled with NULL
.
I think the problem is that the text blurbs contain more than one line, and MySQL is parsing the file as if each new line would correspond to one databazse row. I can load the file into OpenOffice without a problem.
The clientdata.csv file contains 2593 lines, and 570 records. The first line contains column names. I think it is comma delimited, and text is apparently delimited with doublequote.
UPDATE:
When in doubt, read the manual: http://dev.mysql.com/doc/refman/5.0/en/load-data.html
I added some information to the LOAD DATA
statement that OpenOffice was smart enough to infer, and now it loads the correct number of records:
LOAD DATA INFILE "/home/paul/clientdata.csv"
INTO TABLE CSVImport
COLUMNS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
ESCAPED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 LINES;
But still there are lots of completely NULL
records, and none of the data that got loaded seems to be in the right place.
Yet another solution is to use csvsql tool from amazing csvkit suite.
Usage example:
This tool can automatically infer the data types (default behavior), create table and insert the data into the created table.
--overwrite
option can be used to drop table if it already exists.--insert
option — to populate the table from the file.To install the suite
Prerequisites:
python-dev
,libmysqlclient-dev
,MySQL-python
I Used this method to import more than 100K records (~5MB) in 0.046sec
Here's how you do it:
It is very important to include the last line , if you have more than one field i.e normally it skips the last field (MySQL 5.6.17)
Then, assuming you have the first row as the title for your fields, you might want to include this line also
This is what it looks like if your file has a header row.
Try this, it worked for me
IGNORE 1 ROWS here ignores the first row which contains the fieldnames. Note that for the filename you must type the absolute path of the file.
You could also try this online tool for generating SQL create/insert statements based on your CSV. http://www.convertcsvtomysql.com/
What I like about this tool:
Disadvantages