CSV file upload to handle status update & insertin

2019-02-21 04:20发布

问题:

While working on a project, hosted locally, I'm stuck at managing CSV uploads. One of tasks require me to upload data on daily basis that has either new entries or updated status for existing entries. There is also an probability that some of the entries (that exists in database) has no updated status.

Problem statement;

I've created a CSV upload feature that uploads the CSV file to a particular location and imports the information in assigned TABLE. I want to know on what is the best way to verify the database records when I do the CSV upload.

It should ideally work as following;

  1. if entry doesn't exists (INSERT new entry basis data from CSV file)
  2. if the entry exists and has status SAME as the new uploaded CSV file (IGNORE & do nothing)
  3. if the entry exists and has DIFFERENT status than the one in new uploaded CSV file (UPDATE status to what is mentioned in CSV file)

Database / CSV file structure

  • tracking_id (auto increment)
  • odanumber (uploaded through CSV & can have duplicate entries)
  • airwaybill (uploaded through CSV & UNIQUE)
  • courierful (uploaded through CSV & can have duplicate entries)
  • delstatus (uploaded through CSV & is what gets updated mostly)
  • deliverydate (uploaded through CSV & gets updated with each delivery)

From the above, delstatus gets updated almost each time (for existing entries) the new CSV is uploaded and hence needs to be checked.

I assume that we can pick 'airwaybill' to check if it exists, and if it does, check if the delstatus is same as of CSV file or update. If 'airwaybill' doesn't exist then a new records must be added to the database. As that would save me from entering all records in database unnecessarily. Or can be done may be in a better way (that I'm yet to explore).


What's happening right now;

I'm able to upload the complete set of CSV file, creating new entries in database through following code.

<?php 

if(isset($_POST['csv']))
{
$sqlname= 'localhost';
$username= 'root';
$table= 'tracking';
$password= '';
$db='aatrack';
$file=$_POST['csv'];
$cons= mysqli_connect("$sqlname", "$username","$password","$db") or die(mysql_error());

$result1=mysqli_query($cons,"select count(*) count from $table");
$r1=mysqli_fetch_array($result1);
$count1=(int)$r1['count'];


mysqli_query($cons, '
    LOAD DATA LOCAL INFILE "'.$file.'"
        INTO TABLE '.$table.'
        FIELDS TERMINATED by \',\'
        LINES TERMINATED BY \'\n\'
        IGNORE 1 LINES
')or die(mysql_error());

$result2=mysqli_query($cons,"select count(*) count from $table");
$r2=mysqli_fetch_array($result2);
$count2=(int)$r2['count'];

$count=$count2-$count1;
if($count>0)
{
    header("location:success.php?id=$count");
}

}

?>

Can you please help in guiding the best way possible to achieve the same. I understand that it can be done by first uploading the information to a temp_table and comparing the same before entries are updated in the LIVE table.

Please suggest an optimum way to achieve the results.

Thank you for reading this far.

Best regards,

Amit Agnihotri

回答1:

How LOAD DATA INFILE works

Based on an UNIQUE index, LOAD DATA INFILE inserts a new record or updates an existing one (only if the REPLACE option is active).

(1) Regarding insert:

If the csv input value for the UNIQUE index column is NOT found in the db table, then a new record is added, with the (defined) input values from csv file.

(2) Regarding update:

If the csv input value for the UNIQUE index column is found in the db table, then the LOAD DATA INIFILE query performs the following operations (in this order!):

  • It inserts the new csv values as a new record with a new PRIMARY KEY id;
  • It deletes the old record from the db.

NB: In the rest of my answer I will speak only about the update part (2).

BEFORE INSERT-TRIGGER as solution for conditional updates

Since LOAD DATA INFILE runs an insert operation before a delete one, you can make use of the fact that the old db record still exists when the new record with the csv values is inserted. So, you can customize your new input values based on the values contained in the old record. The really cool part of this is: you can even maintain the old value of the PRIMARY KEY field.

The key is to define a BEFORE INSERT-TRIGGER in which all the needed customizations, validations and assignments reside:

  • Fetch the old record's values by running a SELECT sql statement;
  • Store the fetched values into prior defined user variables;
  • Use the user variables to compare the old values with the csv input values;
  • Based on this comparisons: assign the old value of the PRIMARY KEY field as the new one and change the new csv values to the old ones or to others, if needed, too.

Then perform the LOAD DATA INFILE query from PHP.

The codes

Create table syntax:

CREATE TABLE `tracking` (
  `tracking_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `odanumber` int(11) DEFAULT NULL,
  `airwaybill` int(11) DEFAULT NULL,
  `courierful` varchar(100) DEFAULT NULL,
  `delstatus` tinyint(1) DEFAULT NULL,
  `deliverydate` varchar(19) DEFAULT NULL,
  PRIMARY KEY (`tracking_id`),
  UNIQUE KEY `uni_airwaybill` (`airwaybill`)
) ENGINE=InnoDB AUTO_INCREMENT=15 DEFAULT CHARSET=utf8;

BEFORE INSERT-TRIGGER:

USE `tests`;

DELIMITER $$

DROP TRIGGER IF EXISTS tests.tracking_BEFORE_INSERT$$
USE `tests`$$
CREATE DEFINER = CURRENT_USER TRIGGER `tests`.`tracking_BEFORE_INSERT` BEFORE INSERT ON `tracking` FOR EACH ROW
BEGIN

    /* Define vars to store old record values. */
    SET @old_tracking_id = NULL;
    SET @old_odanumber = NULL;
    SET @old_courierful = NULL;
    SET @old_delstatus = NULL;
    SET @old_deliverydate = NULL;

    /* 
        Fetch the existing record if exists and pass 
        its values into the correspnding vars.
    */
    SELECT 
        tracking_id,
        odanumber,
        courierful,
        delstatus,
        deliverydate 
    INTO 
        @old_tracking_id,
        @old_odanumber,
        @old_courierful,
        @old_delstatus,
        @old_deliverydate 
    FROM tracking 
    WHERE airwaybill = NEW.airwaybill
    LIMIT 1;

    /* If an old record was found... */
    IF @old_tracking_id IS NOT NULL THEN

        /* ...set the new record's tracking_id to it. */
        SET NEW.tracking_id = @old_tracking_id;

        /* ...and if delstatus are the same... */
        IF NEW.delstatus = @old_delstatus THEN

            /* ...maintain the old record values. */
            SET NEW.odanumber = @old_odanumber;
            SET NEW.courierful = @old_courierful;
            SET NEW.deliverydate = @old_deliverydate;

        END IF;

    END IF;

END$$
DELIMITER ;

CSV file (tracking.csv)

odanumber,airwaybill,"courierful",delstatus,"deliverydate"
19,1,abc,0,2017-04-31
25,2,def,1,2017-05-31
103,3,ghi,1,2017-06-31
324,4,jkl,1,2017-07-31
564,5,mno,0,2017-08-31

LOAD DATA INFILE function (called from PHP)

LOAD DATA INFILE "<PATH-TO>/tracking.csv"
REPLACE
INTO TABLE tests.tracking
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(odanumber, airwaybill, courierful, delstatus, deliverydate);

Notes:

*) In regards of LOAD DATA INFILE, it can be that you run into the error:

ERROR 1290 (HY000): The MySQL server is running with the --secure-file-priv option so it cannot execute this statement

It means: The LOAD DATA INFILE has no permission to read the csv file. So you must set secure-file-priv in the configuration file of your database (my.cnf, or my.ini) yourself. Like this:

[mysqld]
secure-file-priv = "<PATH-TO-FOLDER-CONTAINING-THE-CSV-FILES>/"

*) You can NOT define a stored procedure from which to run the LOAD DATA INFILE.

In the end, there are also other solutions involving temporary tables, which, no doubt, can work perfectly. One of them is presented in this great article. So, the trigger solution is just another approach.

Good luck!



回答2:

There are two scenarios here:

  1. the table's columns exactly match the csv columns. in that case REPLACE is the answer - it's a keyword to the LOAD DATA INFILE see doc entry

  2. the table's columns don't match the csv columns: REPLACE would cause conflicting records to be removed and reinserted, effectively removing the additional data. In which case LOAD DATA INFILE is not effective by itself, you need another approach with either filtering your file before, doing updates via php or some other method.

In any case, if you want to add more "logic" to the import process, maybe LOAD DATA INFIlE isn't really the right approach, but using temp tables may very well be to benefit from all the goodness databases provide.