Download external file with PHP fopen and/or cURL

2019-07-21 20:14发布

问题:

I can't get my download script to work with external files, the file will download but is corrupted/not working. I think it's because I can't get the filesize of the external file with filesize() function.

This is my script:

function getMimeType($filename){
    $ext = pathinfo($filename, PATHINFO_EXTENSION);
    $ext = strtolower($ext);

    $mime_types=array(
        "pdf" => "application/pdf",
        "txt" => "text/plain",
        "html" => "text/html",
        "htm" => "text/html",
        "exe" => "application/octet-stream",
        "zip" => "application/zip",
        "doc" => "application/msword",
        "xls" => "application/vnd.ms-excel",
        "ppt" => "application/vnd.ms-powerpoint",
        "gif" => "image/gif",
        "png" => "image/png",
        "jpeg"=> "image/jpg",
        "jpg" =>  "image/jpg",
        "php" => "text/plain",
        "csv" => "text/csv",
        "xlsx" => "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
        "pptx" => "application/vnd.openxmlformats-officedocument.presentationml.presentation",
        "docx" => "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
    );

    if(isset($mime_types[$ext])){
        return $mime_types[$ext];
    } else {
        return 'application/octet-stream';
    }
}

$path = "http://www.example.com/file.zip";

/* Does not work on external files
// check file is readable or not exists
if (!is_readable($path))
    die('File is not readable or does not exists!');
*/

$file_headers = @get_headers($path);
if($file_headers[0] == 'HTTP/1.1 404 Not Found') {
    echo "Files does not exist.";
} else {

$filename = pathinfo($path, PATHINFO_BASENAME);

// get mime type of file by extension
$mime_type = getMimeType($filename);

// set headers
header('Pragma: public');
header('Expires: -1');
header('Cache-Control: public, must-revalidate, post-check=0, pre-check=0');
header('Content-Transfer-Encoding: binary');
header("Content-Disposition: attachment; filename=\"$filename\"");
header("Content-Length: " . filesize($path));
header("Content-Type: $mime_type");
header("Content-Description: File Transfer");

// read file as chunk
if ( $fp = fopen($path, 'rb') ) {
    ob_end_clean();

    while( !feof($fp) and (connection_status()==0) ) {
        print(fread($fp, 8192));
        flush();
    }

    @fclose($fp);
    exit;
}

}

I believe it can be done with cURL - but my knowledge is lacking. What I would like to know:

  • How do I check if the file exist and how do I get the filesize with cURL?

  • Would it be better just to use cURL and forget about fopen?

  • Is the headers set correctly?

Any advice is much appreciated!

回答1:

You can try this process as well, I am assuming that your source url is $sourceUrl and destination/ path to save file is $destinationPath

$destFilename = 'my_file_name.ext';
$destinationPath = 'your/destination/path/'.$destFilename;

if(ini_get('allow_url_fopen')) {                                
    if( ! @file_put_contents($destinationPath, file_get_contents($sourceUrl))){
        $http_status = $http_response_header[0];
        sprintf('%s encountered while attempting to download %s',$http_status, $sourceUrl );
        break;
    }
} elseif(function_exists('curl_init')) {
    $ch = curl_init($sourceUrl);
    $fp = fopen($destinationPath, "wb");

    $options = array(
        CURLOPT_FILE => $fp,
        CURLOPT_HEADER => 0,
        CURLOPT_FOLLOWLOCATION => 1,
        CURLOPT_TIMEOUT => 120); // in seconds

    curl_setopt_array($ch, $options);
    curl_exec($ch);
    $http_status = intval(curl_getinfo($ch, CURLINFO_HTTP_CODE));
    curl_close($ch);
    fclose($fp);

    //delete the file if the download was unsuccessful
    if($http_status != 200) {
        unlink($destinationPath);
        sprintf('HTTP status %s encountered while attempting to download %s', $http_status, $sourceUrl );

    }
} else {    
    sprintf('Looks like %s is off and %s is not enabled. No images were imported.', '<code>allow_url_fopen</code>', '<code>cURL</code>'  );
    break;
}

You can use curl_getinfo($ch, CURLINFO_CONTENT_TYPE); in case of curl to get the file info and use it as per your requirement.



回答2:

The problem comes from your content-length that gets set to 0. Since you already have the content-length from the get_headers call, simply change the following line:

header("Content-Length: " . filesize($path));

to:

header($file_headers[8]);

Note that the content of $file_headers might vary (8 worked for me), check the manual for details, or execute a print_r($file_headers) to see what you get in there.

If you don't care about the content-length header, simply comment it out, most browsers should handle this without any problem.



回答3:

this code is work fine to download from url :

set_time_limit(0);

//File to save the contents to
$fp = fopen ('r.jpg', 'w+');

$url = "http://cgr.ir/test.jpg";

//Here is the file we are downloading, replace spaces with %20
$ch = curl_init(str_replace(" ","%20",$url));

curl_setopt($ch, CURLOPT_TIMEOUT, 50);

//give curl the file pointer so that it can write to it
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);

$data = curl_exec($ch);//get curl response

//done
curl_close($ch);
?>


回答4:

Function:

<?php
/**
 * Returns the size of a file without downloading it, or -1 if the file
 * size could not be determined.
 *
 * @param $url - The location of the remote file to download. Cannot
 * be null or empty.
 *
 * @return The size of the file referenced by $url, or -1 if the size
 * could not be determined.
 */
function curl_get_file_size( $url ) {
  // Assume failure.
  $result = -1;

  $curl = curl_init( $url );

  // Issue a HEAD request and follow any redirects.
  curl_setopt( $curl, CURLOPT_NOBODY, true );
  curl_setopt( $curl, CURLOPT_HEADER, true );
  curl_setopt( $curl, CURLOPT_RETURNTRANSFER, true );
  curl_setopt( $curl, CURLOPT_FOLLOWLOCATION, true );
  curl_setopt( $curl, CURLOPT_USERAGENT, get_user_agent_string() );

  $data = curl_exec( $curl );
  curl_close( $curl );

  if( $data ) {
    $content_length = "unknown";
    $status = "unknown";

    if( preg_match( "/^HTTP\/1\.[01] (\d\d\d)/", $data, $matches ) ) {
      $status = (int)$matches[1];
    }

    if( preg_match( "/Content-Length: (\d+)/", $data, $matches ) ) {
      $content_length = (int)$matches[1];
    }

    // http://en.wikipedia.org/wiki/List_of_HTTP_status_codes
    if( $status == 200 || ($status > 300 && $status <= 308) ) {
      $result = $content_length;
    }
  }

  return $result;
}
?>

Function call:

$file_size = curl_get_file_size( "http://stackoverflow.com/questions/2602612/php-remote-file-size-without-downloading-file" );



回答5:

Try using something like this:

function get_data($url) 
{
    $ch = curl_init();
    $timeout = 5;
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
    $data = curl_exec($ch);
    curl_close($ch);

    return $data;
}

Unfortunately your lack of detail about your specific query or files rendered me unable to come up with more exact code to match your situation. And the above (or below) curl_get_file_size will help you with the size in case you ever need it.



回答6:

IMHO it is a good idea not to rely on php curl module availability. Your snippet works with a little modification:

First change

$file_headers = @get_headers($path);

to

$file_headers = @get_headers($path,1);

to get named array keys (see php reference).

With this modification the http status code still comes in $file_headers[0] but you'll get some more and useful data which can be passed thru (validation recommended): Content-Length and even Content-Type (which allows you waiving your approach of mime-type detection upon file suffix).

Change

header("Content-Length: " . filesize($path));

to

header("Content-Length: " . $file_headers['Content-Length']);

and

header("Content-Type: $mime_type");

to

header("Content-Type: " . $file_headers['Content-Type']);

Even if your "path" is a trusted source you might want to add some validation as you should not trust exernal data being of the kind you expect.