How would I compare two text files for matches wit

2020-08-01 05:29发布

$domains = file('../../domains.txt');
$keywords = file('../../keywords.txt');

$domains will be in format of:

3kool4u.com,9/29/2013 12:00:00 AM,AUC
3liftdr.com,9/29/2013 12:00:00 AM,AUC
3lionmedia.com,9/29/2013 12:00:00 AM,AUC
3mdprod.com,9/29/2013 12:00:00 AM,AUC
3mdproductions.com,9/29/2013 12:00:00 AM,AUC

keywords will be in format of:

keyword1
keyword2
keyword3

I guess I would really like to do an array for keywords from a file and search each line of domains.txt for matches. Not sure where to start as I'm confused at the difference of preg_match, preg_match_all, and strpos and more or less when to use one over the other.

Thanks ahead for the help.

1条回答
来,给爷笑一个
2楼-- · 2020-08-01 06:04
//EMPTY array to hold each line on domains that has a match
$matches = array();

//for each line on the domains file
foreach($domains as $domain){

    //for each keyword
    foreach($keywords as $keyword){

          //if the domain line contains the keyword on any position no matter the case
          if(preg_match("/$keyword/i", $domain)) {
                    //Add the domain line to the matches array
            $matches[] = $domain;
          }     
     }   
}

Now you have the $matches array with all the lines of the domain file that match the keywords

NOTE THAT WITH THE PREVIOUS APPROACH THE TWO ENTIRE FILES ARE LOADED INTO MEMORY AND DEPENDING ON THE FILE SIZES YOU CAN RUN OUT OF MEMORY OR THE OS WILL START USING THE SWAP WHICH IS MUCH SLOWER THAN RAM

THIS IS ANOTHER AND MORE EFFICIENT APPROACH THAT WILL LOAD ONE LINE IF THE FILE AT THE TIME.

<?php

// Allow automatic detection of line endings
ini_set('auto_detect_line_endings',true);

//Array that will hold the lines that match
$matches = array();

//Opening the two files on read mode
$domains_handle = fopen('../../domains.txt', "r");
$keywords_handle = fopen('../../keywords.txt', "r");

    //Iterate the domains one line at the time
    while (($domains_line = fgets($domains_handle)) !== false) {

        //For each line on the domains file, iterate the kwywords file a line at the time
        while (($keywords_line = fgets($keywords_handle)) !== false) {

              //remove any whitespace or new line from the beginning or the end of string
              $trimmed_keyword = trim($keywords_line);

              //Check if the domain line contains the keyword on any position
              // using case insensitive comparison
              if(preg_match("/$trimmed_keyword/i", trim($domains_line))) {
                    //Add the domain line to the matches array
                $matches[] = $domains_line;
              } 
        }
        //Set the pointer to the beginning of the keywords file
        rewind($keywords_handle);
    }

//Release the resources
fclose($domains_handle);
fclose($keywords_handle);

var_dump($matches);
查看更多
登录 后发表回答