Handling large result set from mysql with limited

2020-02-15 03:18发布

问题:

I have a large database that contains results of an experiment for 1500 individuals. Each individual has 96 data points. I wrote the following script to summarize and then format the data so it can be used by the analysis software. At first all was good until I had more than 500 individuals. Now I am running out of memory.

I was wondering if anyone has a suggestion on now to overcome the memory limit problem without sacrificing speed.

This is how the table look in the database

fishId assayId allele1 allele2

14_1_1 1 A T

14_1_1 2 A A

$mysql = new PDO('mysql:host=localhost; dbname=aquatech_DB', $db_user, $db_pass);
$query = $mysql->prepare("SELECT genotyped.fishid, genotyped.assayid, genotyped.allele1, genotyped.allele2, fishId.sex, " .
"fishId.role FROM `fishId` INNER JOIN genotyped ON genotyped.fishid=fishId.catId WHERE fishId.projectid=:project");
$query->bindParam(':project', $project, PDO::PARAM_INT);
$query->execute();  

So this is the call to the database. It is joining information from two tables to build the file I need.

 if(!$query){
    $error = $query->errorInfo();
    print_r($error);
} else { 
    $data = array();
    $rows = array();
    if($results = $query->fetchAll()){
        foreach($results as $row)
        {
            $rows[] = $row[0];
            $role[$row[0]] = $row[5];
            $data[$row[0]][$row[1]]['alelleY'] = $row[2];
            $data[$row[0]][$row[1]]['alelleX'] = $row[3];
        }
        $rows = array_unique($rows);
        foreach($rows as $ids)
        {
            $col2 = $role[$ids];
            $alelleX = $alelleY = $content = "";
            foreach($snp as $loci)
            {
                $alelleY = convertAllele($data[$ids][$loci]['alelleY']);
                $alelleX = convertAllele($data[$ids][$loci]['alelleX']);
                $content .= "$alelleY\t$alelleX\t";
            }
            $body .= "$ids\t$col2\t" . substr($content, 0, -1) . "\n";

This parses the data. In the file I need I have to have one row per individual rather than 96 rows per individual, that is why the data has to be formatted. In the end of the script I just write $body to a file.

I need the output file to be

FishId Assay 1 Assay 2

14_1_1 A T A A

$location = "results/" . "$filename" . "_result.txt";
$fh = fopen("$location", 'w') or die ("Could not create destination file");
if(fwrite($fh, $body))

回答1:

Instead of reading the whole result from your database query into a variable with fetchAll(), fetch it row by row:

while($row = $query->fetch()) { ... }


回答2:

  1. fetchAll() fetches the entire result in one go, which has its uses but is greedy with memory. Why not just use fetch() which handles one row at a time?

  2. You seem to indexing the rows by the first column, creating another large array, and then removing duplicate items. Why not use SELECT DISTINCT in the query to remove duplicates before they get to PHP?

I'm not sure what the impact would be on speed - fetch() may be slower than fetchAll() - but you don't have to remove duplicates from the array which saves some processing.

I'm also not sure what your second foreach is doing but you should be able to do it all in a single pass. I.e. a foreach loop within a fetch loop.

Other observations on your code above:

  • the $role array seems to do the same indexing job as $rows - using $row[0] as the key effectively removes the duplicates in a single pass. Removing the duplicates by SELECT DISTINCT is probably better but, if not, do you need the $rows array and the array_unique function at all?
  • if the same value of $row[0] can have different values of $row[5] then your indexing method will be discarding data - but you know what's in your data so I guess you've already thought of that (the same could be true of the $data array)