有没有更好的方式来处理300,000行的文本文件中的数据并将其插入到MySQL的?(Is there

2019-08-07 18:48发布

我在做什么,现在是阅读的文本文件的内容,并将其存储在一个变量。 阅读全部内容后,我跑了块数据,并在那里调用将读取的块数据的每一行,并传递到每行处理数据的每列的处理其他功能的功能,并在将其插入一个循环数据库的批次。 该批次是整个块。

该代码的过程太长,有超过500KB大小的更多每个文件。 我的问题是存在的,我可以用,这样我可以申请“LOAD DATA INFILE”,这让我在这种情况下,通过块处理文本文件的文本文件中没有唯一标识符。

在700K整天几乎参加了处理,但仍这取决于机器的规格。 该代码在运行CentOS的。 第一个文本文件处理与800KB的下一个文本文件后++的大小几乎花了一个星期的过程。 在这些变与超过800KB大小等文本文件,花了将近或超过一周时间来处理特别是1MB大小的文件。

谁能告诉我,我在做什么错的,什么是我需要有效地让我的代码运行的选项。


/*
====================================================================================
                RECORDS FETCH 
====================================================================================

Needs path and filename with extension.
The page do an iteration of records in the file by line.
After by line, it iterates again per delimiter ","..
It concatenates the part of the records for bulk insert process.
PID address is incremental, every three PID correspond to one Chamber
and the reading in each Chamber is CO2 for first PID address, RH for the
second PID address, TEMP for the third PID address.


====================================================================================
*/
$path = "";
$filename = "";
error_reporting(0);
include_once ("connect.php");
$p_results = mysql_query("SELECT PATH, FILENAME FROM tbl_path"); 
if(mysql_num_rows($p_results) > 0 ){
while ( $rows = mysql_fetch_assoc($p_results) )
{
    $path = $rows['PATH'];
    $filename = $rows['FILENAME'];
}
}
else
{
mysql_query("INSERT INTO tbl_log (LOG_DATE, DETAILS) VALUES ( NOW(), 'There is no path and filename to be accessed. Please provide.' )");
}
$path = str_replace('\\','/',$path);
//holds the path..NOTE: Change backslash (\) to forward slash (/)
//$path = "E:/";
//holds the filename.. NOTE: Include the file extension
//$filename = "sample2.txt"; //"chamber_1_con.txt";
if ($path <> "" && $filename <> "")
     is_connected($path, $filename);

echo ('<script language="javascript">window.location = "chambers_monitoring.php"  </script>');

//function for DB writing in table data
function InsertData($rec, &$errorDataCnt, &$sql, $y, $i, $x, &$dCnt)
{

$dDate = (!isset($rec[0]) ? 0 : (trim($rec[0]) == "" ? 0 : trim($rec[0]))); 
$dTime = (!isset($rec[1]) ? 0 : (trim($rec[1]) == "" ? 0 : trim($rec[1]))); 
$address = (!isset($rec[2]) ? 0 : (trim($rec[2]) == "" ? 0 : trim($rec[2]))); 
$co2SV = (!isset($rec[3]) ? 0 : (trim($rec[3]) == "" ? 0 : trim($rec[3]))); 
$co2PV = (!isset($rec[4]) ? 0 : (trim($rec[4]) == "" ? 0 : trim($rec[4]))); 
$tempSV = (!isset($rec[5]) ? 0 : (trim($rec[5]) == "" ? 0 : trim($rec[5]))); 
$tempPV = (!isset($rec[6]) ? 0 : (trim($rec[6]) == "" ? 0 : trim($rec[6]))); 
$rhSV = (!isset($rec[7]) ? 0 : (trim($rec[7]) == "" ? 0 : trim($rec[7]))); 
$rhPV = (!isset($rec[8]) ? 0 : (trim($rec[8]) == "" ? 0 : trim($rec[8]))); 


    /* include('connect.php'); */
    set_time_limit(36000);
    ini_set('max_execution_time','43200');
    $e_results = mysql_query("SELECT ID FROM tbl_reading WHERE (READING_DATE = '".date("Y-m-d",strtotime($dDate))."' AND READING_TIME = '".date("H:i:s",strtotime($dTime))."') AND READING_ADDRESS = $address LIMIT 1"); 
    if(mysql_num_rows($e_results) <= 0 ){
      if (!($dDate == 0 || $dTime == 0 || $address == 0) ) {
        if ($y == 0){
            $sql = "INSERT INTO tbl_reading (READING_DATE,   READING_TIME, READING_ADDRESS, CO2_SET_VALUE, CO2_PROCESS_VALUE, TEMP_SET_VALUE, TEMP_PROCESS_VALUE, RH_SET_VALUE, RH_PROCESS_VALUE) VALUES ('".date("Y/m/d",strtotime($dDate))."','".date("H:i:s",strtotime($dTime))."', ".  mysql_real_escape_string($address).",". mysql_real_escape_string($co2SV).",". mysql_real_escape_string($co2PV).",". mysql_real_escape_string($tempSV).",". mysql_real_escape_string($tempPV).",". mysql_real_escape_string($rhSV).",". mysql_real_escape_string($rhPV).")";
        }
        else {
            $sql .= ",   ('".date("Y/m/d",strtotime($dDate))."','".date("H:i:s",strtotime($dTime))."', ". mysql_real_escape_string($address).",". mysql_real_escape_string($co2SV).",". mysql_real_escape_string($co2PV).",". mysql_real_escape_string($tempSV).",". mysql_real_escape_string($tempPV).",". mysql_real_escape_string($rhSV).",". mysql_real_escape_string($rhPV).")";

        }
       }
      }

        if(($x + 1) == $i){
            //echo ($x + 1)." = ".$i."<br>";
            if (substr($sql, 0, 1) == ",")
                $sql = "INSERT INTO tbl_reading (READING_DATE, READING_TIME, READING_ADDRESS, CO2_SET_VALUE, CO2_PROCESS_VALUE, TEMP_SET_VALUE, TEMP_PROCESS_VALUE, RH_SET_VALUE, RH_PROCESS_VALUE) VALUES".substr($sql, 1);
            //echo $sql."<br>";
            set_time_limit(36000);
            try {

                $result = mysql_query($sql) ;
                $dCnt = mysql_affected_rows();
                if( $dCnt  == 0)
                {
                    $errorDataCnt = $errorDataCnt + 1;
                }
            }
            catch (Exception $e)
            {
                mysql_query("INSERT INTO tbl_log (LOG_DATE,  DETAILS) VALUES ( NOW(), '".$e->getMessage()."' )");
            }
            //mysql_free_result($result);
        }

unset($dDate); 
unset($dTime); 
unset($address); 
unset($co2SV); 
unset($co2PV); 
unset($tempSV); 
unset($tempPV); 
unset($rhSV); 
unset($rhPV);  

 }

//function for looping into the records per line
function loop($data)
{
$errorDataCnt = 0; $sql = ""; $exist = 0;
$i = count( $data); $x = 0; $y = 0; $tmpAdd = ""; $cnt = 0; $t = 0; $dCnt = 0; 

ini_set('max_execution_time','43200');
while($x < $i) 
{
    $rec = explode(",", $data[$x]); 
    InsertData($rec, $errorDataCnt, $sql, $y, $i, $x, $dCnt);
    $x++; 
    $y++;
    unset($rec);
}

    $errFetch = ($i - $dCnt);
if($errorDataCnt > 0)
    mysql_query("INSERT INTO tbl_log (LOG_DATE, DETAILS) VALUES ( NOW(), 'Error inserting $errFetch records. Check if there is a NULL or empty value or if it is the correct data type.' )");
if($dCnt > 0)
    mysql_query("INSERT INTO tbl_log (LOG_DATE, DETAILS) VALUES ( NOW(), 'Saved   $dCnt of $i records into the database. Total $exist records already existing in the database.' )");


}

// functions in looping records and passing into $contents variable
function DataLoop($file)
{
ini_set("auto_detect_line_endings", true);
set_time_limit(36000);
ini_set('max_execution_time','43200');
$contents = ''; $j = 0;
if ($handle = fopen($file,"rb")){
    while (!feof($handle)) {
        $rdata = fgets($handle, 3359232);//filesize($file));
        //$rdata = fread($handle, filesize($file));
        if(trim($rdata) != "" || $rdata === FALSE){
            if (feof($handle)) break;
            else {
            $contents .= $rdata;
            $j = $j + 1; }}
    }   
    fclose($handle);
    $data = explode("\n", $contents);
    unset($contents);
    unset($rdata);
}
/* echo count($contents)." ".count($data); */
/* $query = "SELECT MAX(`ID`) AS `max` FROM `tbl_reading`";
$result = mysql_query($query) or die(mysql_error());
$row = mysql_fetch_assoc($result);
$max = $row['max']; */
/* $res =   mysql_fetch_assoc(mysql_query("SELECT COUNT(*) as total FROM  tbl_reading")) or die(mysql_error());
echo "<script>alert('".$res['total']."')</script>"; */
$p = 0;
ini_set('memory_limit','512M');
if($j != 0)
{
    foreach(array_chunk($data, ceil(count($data)/200)) as $rec_data){
        loop($rec_data);
        $p++;
    }
} 

}
//function to test if filename exists
function IsExist($file)
{
if ($con = fopen($file, "r"))// file_exists($file))
{
    fclose($con);
    DataLoop($file);
}
else
    mysql_query("INSERT INTO tbl_log (LOG_DATE, DETAILS) VALUES ( NOW(), '$filename is not existing in $path. Check if the filename or the path is correct.' )");

}

//function to test connection to where the file is.
function is_connected($path, $filename)
{
//check to see if the local machine is connected to the network 
$errno = ""; $errstr = ""; 
if (substr(trim($path), -1) == '/')
  $file = $path.$filename;
else
    $file = $path."/".$filename; 

IsExist($file);

}

Answer 1:

从代码,似乎你的“唯一标识符”(对于该插入的目的,至少)是复合(READING_DATE, READING_TIME, READING_ADDRESS)

如果定义这样一个UNIQUE数据库中的键,然后LOAD DATAIGNORE关键字应该做的正是你需要什么:

ALTER TABLE tbl_reading
  ADD UNIQUE KEY (READING_DATE, READING_TIME, READING_ADDRESS)
;

LOAD DATA INFILE '/path/to/csv'
    IGNORE
    INTO TABLE tbl_reading
    FIELDS
        TERMINATED BY ','
        OPTIONALLY ENCLOSED BY '"'
        ESCAPED BY ''
    LINES
        TERMINATED BY '\r\n'
    (@rec_0, @rec_1, @rec_2, @rec_3, @rec_4, @rec_5, @rec_6, @rec_7, @rec_8)
    SET
        READING_DATE = DATE_FORMAT(STR_TO_DATE(TRIM(@rec_0), '???'), '%Y/%m/%d'),
        READING_TIME = DATE_FORMAT(STR_TO_DATE(TRIM(@rec_1), '???'), '%H:%i:%s'),
        READING_ADDRESS    = TRIM(@rec_2),
        CO2_SET_VALUE      = TRIM(@rec_3),
        CO2_PROCESS_VALUE  = TRIM(@rec_4),
        TEMP_SET_VALUE     = TRIM(@rec_5),
        TEMP_PROCESS_VALUE = TRIM(@rec_6),
        RH_SET_VALUE       = TRIM(@rec_7),
        RH_PROCESS_VALUE   = TRIM(@rec_8)
;

(其中'???'与代表您的CSV的日期和时间格式字符串替换)。

需要注意的是,你真的应该存储READING_DATEREADING_TIME一起在一个DATETIMETIMESTAMP列:

ALTER TABLE tbl_reading
  ADD COLUMN READING_DATETIME DATETIME AFTER READING_TIME,
  ADD UNIQUE KEY (READING_DATETIME, READING_ADDRESS)
;

UPDATE tbl_reading SET READING_DATETIME = STR_TO_DATE(
  CONCAT(READING_DATE, ' ', READING_TIME),
  '%Y/%m/%d %H:%i:%s'
);

ALTER TABLE tbl_reading
  DROP COLUMN READING_DATE,
  DROP COLUMN READING_TIME
;

在这种情况下, SET的条款LOAD DATA命令反而会包括:

READING_DATETIME = STR_TO_DATE(CONCAT(TRIM(@rec_0), ' ', TRIM(@rec_1)), '???')


Answer 2:

由线读取1 MB文件行需要不到一秒。 即使串联,然后再分割的所有行不采取任何的时间。

用一个简单的测试,插入100,000行历时约90秒。

但是 ,在插入之前做一个选择查询,增加了所需要的时间超过一个数量级。

从这个学习的教训是,如果你需要插入大量的数据,做在大块(见LOAD DATA INFILE )。 如果你不能不管是什么原因要这样做, 单独刀片和刀片。

更新

作为@eggyal已经建议,添加一个唯一的关键是你的表定义。 在我小的,一次测试,我添加了一个独特的密钥和改变insertinsert ignore 。 挂钟时间增加15%-30%(〜100-110秒),这比增加至38分钟,单独的选择+刀片好得多(25倍!)。

所以,作为一个结论,(从eggyal偷)添加

ALTER TABLE tbl_reading
  ADD UNIQUE KEY (READING_DATE, READING_TIME, READING_ADDRESS)

你的表,删除selectInsertData()并更改insertinsert ignore



Answer 3:

您需要在开始您的刀片因为InnoDB引擎使得刀片也使用默认设置缓慢前做一些准备工作。

无论是设置插入之前,此选项

innodb_flush_log_at_trx_commit=0

或者让所有插入到一个事务。
这将是速度极快,无论你选择什么样的语法或驱动程序。



文章来源: Is there a better way to process a 300,000 line text file data and insert it into MySQL?