I am trying to use data from the USDA found here: http://www.ars.usda.gov/Services/docs.htm?docid=23634
ASCII (8.6Mb) - This file contains the SR26 data in ASCII (ISO/IEC 8859-1), delimited files. These tables are organized in a relational format, and are best used with a relational database management system (RDBMS), which will allow you to form your own queries of the database and generate custom reports.
I am new to manipulating data like this and think I would like to get it in CSV, maybe? But, then I might lose the relationships so maybe I should go right to PGSQL. Not sure how to approach this.
Looking for guidance, thanks.
Using the tips from @craig-ringer I've created a script to migrate the original USDA Nutrition Database files to postgreSQL.
Is it very rough at the moment, but should help to get started.
Check it out here: https://github.com/rlucha/usda-nutrition-database-migration
They have a version of that database for Microsoft access, if you have it you could download to access then convert to MySQL using Access To MySQL.
The zip contains a number of files:
each of which appears to be in a bizarre almost-CSV-like format, e.g.
NUTR_DEF.txt
:plus
sr26_doc.pdf
, the documentation.Creating table definitions
So what you need to do here is create SQL table definitions for the database - with one table for each input file. You need the
CREATE TABLE
command for this; see the PostgreSQL documentation.Page 35 of the PDF should help you - "Figure 1. Relationships among files in the USDA National Nutrient Database for Standard Reference". The following pages describe the file formats, telling you what each column means. You can write
CREATE TABLE
statements based on this description.Here's an example, for
FOOD_DES.txt
(food description), the first entry.That's a pretty literal copy of the description. It's not how I'd design the table
I've used
NUMERIC
arbitrary-precision decimal floating point types for accuracy in non-integer numeric types. If performance is more important than accuracy, you can usefloat4
instead.For relationships, you use
FOREIGN KEY
constraints - justcolname coltype REFERENCES othertable(othercol)
is sufficient to create one.Important: I double quoted the column names to preserve the same name as in the definitions. That means you have to always double quote them when you refer to them, e.g.
SELECT "NDB_No" FROM food_des;
. If you don't want that, just leave off the double quotes - or pick different names. You don't have to stick to the clumsy abbreviated column names they used, and it's quite reasonable to write:etc. Similarly, if you're working with Rails, you can convert the table definitions to follow Rails's conventions, especially if you then intend to do the data loading via Rails.
Loading data
If these were sane, sensible delimited files you could then just load each table using the
psql
command\copy
, or PgAdmin-III's "import" option.It is actually CSV, they've just decided to use totally bizarre delimiter and quote chars. Import via
psql
with:or the equivalent in whatever tool you use to talk to PostgreSQL.
The results are a sensible looking table:
Similarly, if using Rails you can use whatever Rails CSV library you want and bulk-load into models.