For MySQL, I found the following query to import a CSV file:
LOAD DATA LOCAL INFILE '/example.csv'
INTO TABLE exampletable
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
ESCAPED BY '\\'
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES
(example_field_name,example_field_city,example_field_state);
I'm just starting to learn, so please forgive the trivial question, but I'm wondering if someone can explain the following parts of this in a simpler way than what's in the MySQL help docs:
What does this mean?
OPTIONALLY ENCLOSED BY '"'
What does this mean?
ESCAPED BY '\\'
What do the \r and \n mean in the following?
LINES TERMINATED BY '\r\n'
Thank you for your help.
Everyone thinks CSV file format is simple, it's not.... You can read it's history on wikipedia
OPTIONALLY ENCLOSED BY '"'
Some columns include the separation character. You want the following row to be loaded as 5 columns,
one, two, "three, tres, trois", four, five
ESCAPED BY '\\'
And what if your data contained a quote character?
one, two, "In France, one would say \"trois\"", four, five
LINES TERMINATED BY '\r\n'
Windows file format. Lines end with two white-space characters "carriage return" (\r) and "End of line" (\n).
Ever notice that windows notepad can't open unix files properly? That's because such files terminate lines with only the EOL character. All this dates back to the type-writer days of computing and incompatible text standards between operating systems.
OPTIONALLY ENCLOSED BY '"'
This is stating that values can, but don't have to contain double quotes at the beginning and end of the value "john doe"
ESCAPED BY '\\'
There are certain characters which aren't safe in CSV files, and those characters are escaped with the double backslashes Denver\\, CO
LINES TERMINATED BY '\r\n'
This identifies how the line, or record, is terminated to tell mysql to start the next record "last data for this line or record" \r\n
. the \r\n = line break in a text file
CSV represents strings which are separated by characters. In your case, a line could look like this
foo,bar,baz
This represents three fields with the values of foo
, bar
, and baz
. Now the question is, what happens if one of the fields contains a comma? If you don't mark is specially, it would be considered a field separator. This mark is done using escape-characters. In your case a single backslash. So we could have something like
foo\, blarb,bar,baz
which would be read as foo\, blarb
, bar
, and baz
. Now you could optionally surround your fields with quotation marks, but you would still need to escape quotation marks which would be part of your field values. In the end, it is always a question how to create a representation of your data which is unambigious.
For your line termination question: \r
is a common representation the ASCII character for Carriage return (the Byte 13). \n
is a newline character (ASCII 10). These two characters are used to represent the end of a line on Windows and certain protocols like HTTP. On Unix/Linux, only \n
is typically used. Incidentally, the \n
and \r
notation also employs escapes.
As a general note, if you see backslashes just before certain characters in serialized text, it is often some kind of escape mechanism. The backslash is a very popular character for this as it is rather rare in normal text.