LOAD DATA INFILE parent child table and foreign ke

2019-08-21 10:04发布

So I have 2 tables. Path table has foreign key 'media_id' connected with media table 'id'. Each media can have several paths. This all works well.

The problem comes when I try to export and import these tables with csv. I can export them ok, but when I go to import them (first media table) primary key 'id' in media table is set to auto increment, and when media table is being imported with LOAD DATA INFILE it will generate new 'id' on its own, so I will loose any connection to path table.

$sql = "CREATE TABLE $media_table (
            `id` int(11) NOT NULL AUTO_INCREMENT,
            `title` varchar(255) DEFAULT NULL,
            `description` varchar(2000) DEFAULT NULL,
            PRIMARY KEY (`id`),
         ) $charset_collate;";

        $sql = "CREATE TABLE $path_table (
            `id` int(11) NOT NULL AUTO_INCREMENT,
            `path` varchar(500) DEFAULT NULL,
            `def` varchar(50) DEFAULT NULL,
            `media_id` int(11) NOT NULL,
            PRIMARY KEY (`id`),
            INDEX `media_id` (`media_id`),
            CONSTRAINT `mvp_path_ibfk_1` FOREIGN KEY (`media_id`) REFERENCES {$media_table} (`id`) ON DELETE CASCADE ON UPDATE CASCADE
        ) $charset_collate;";

Here is the query:

$query = "LOAD DATA INFILE '$csv' INTO TABLE {$table}
      FIELDS OPTIONALLY ENCLOSED BY '^'
      TERMINATED BY '|'
      ESCAPED BY ''
      LINES TERMINATED BY '\n'
      IGNORE 1 LINES";

Is there something wrong with my database design and how could I improve this, or get around this problem?

2条回答
Fickle 薄情
2楼-- · 2019-08-21 10:39

I don't normally run this type of operation, so this may be pseudo-code and need tweaking, but I would think you could do this:

https://stackoverflow.com/a/2710714

CREATE TEMPORARY TABLE
IF NOT EXISTS temp_{$table_name_with_datetime}
AS (SELECT * FROM {$table_name});

# Not sure if this will work, it would need some way
# for the CREATE to be gotten.
SELECT (SHOW CREATE TABLE temp_{$table_name_with_datetime})
  INTO OUTFILE '{$table_name_with_datetime}_create.sql'
  FIELDS TERMINATED BY ''
  OPTIONALLY ENCLOSED BY ''
  LINES TERMINATED BY '\n'
  FROM temp_{$table_name_with_datetime};

SELECT * INTO OUTFILE '{$table_name_with_datetime}.csv'
  FIELDS TERMINATED BY ','
  OPTIONALLY ENCLOSED BY '"'
  LINES TERMINATED BY '\n'
  FROM temp_{$table_name_with_datetime};

Then with PHP, say, you pull those files and load them:

// This creates the table.
$dbo->query(file_get_contents("{$table_from_filename}_create.sql"));
$dbo->query("
LOAD DATA INFILE '{$table_from_filename}.csv'
  INTO TABLE temp_{$table_from_filename}
  FIELDS OPTIONALLY ENCLOSED BY '"'
  TERMINATED BY '\n'
  ESCAPED BY ''
  LINES TERMINATED BY '\n'
");

If $table_from_filename were media_201809041045, now you've imported it as temp_media_201809041045, so

$tablename = "temp_$table_from_filename";

Now, with five tables things can get complex, for instance you have to maintain the order of precedence (parent before children), unless there's a way to disable it until you're through importing each table.

If you can get them into the database as a (temp) table, now you can work on them for instance by querying for each parent row, doing the insert on the origin table, then with last_insert_id, you can create a swap index for later child rows. If foreign keys always follow the same pattern, e.g. media and media_id, then you could fairly easily make a function to handle that just with the table name's of the five you're copying.

Another thing, on the CREATE call, you may want to remove TEMPORARY from the call and keep the table around if you need more than to call and get one or a series of UNION'd SELECTs. You can end in a query, but with a temp table, it's gone by the next query. Then at the end, run a DROP TABLE for the now "temporary" regular table, just make sure it's the actual temporary one.

查看更多
我只想做你的唯一
3楼-- · 2019-08-21 10:46

As I attempted to explain (probably poorly) in the comments, I think using an identifier instead of tracking integer would be more better in this case, since it provides a crucial insight, one that a simple number generated to disambiguate it from it's neighbors doesn't:

It's a thing, so it lives.

What this powerful design pattern means is that, when you go to import it, if you're operating off the (hopefully known) assumption that some old import file is being used (as opposed to the magical one from the future), to do an integral import over data that may already be represented in the dataset (seriously, there could have been two other snapshots loaded just while I was typing with overlapping datasets), you would simply need to ignore the update, since whatever you have represented at present should be newer than that old data file, and the things in it have identity:

A UUID is a uuid is a UUID is a uuid...

This is assuming that a domain's data model is consistent, while the loaded representation is simply that, a representation. So serializing a prior external representation with a unique-across-systems identifier gives it the ability to be handled more easily when it's removed from the system, then reintroduced.

Especially in smaller systems, UUID's are easy to work with (generating many number in a short time and integrating across a domain should be thought out) but they're not necessary unless you need this type of export/import functionality and you want it to keep it's identity on import.

# Note there's `temp_`, $table_from_filename would be "media".
LOAD DATA INFILE '{$table_from_filename}.csv'
  IGNORE INTO TABLE {$table_from_filename}
  FIELDS OPTIONALLY ENCLOSED BY '"'
  TERMINATED BY '\n'
  ESCAPED BY ''
  LINES TERMINATED BY '\n'

You don't have to do it this way, and please don't interpret this as me promoting a rewrite. However, if you think about it at design time, ask yourself, does my data need to live outside the box?

查看更多
登录 后发表回答