PHP x86 How to get filesize of > 2 GB file without

2019-01-04 01:34发布

I need to get the file size of a file over 2 GB in size. (testing on 4.6 GB file). Is there any way to do this without an external program?

Current status:

  • filesize(), stat() and fseek() fails
  • fread() and feof() works

There is a possibility to get the file size by reading the file content (extremely slow!).

$size = (float) 0;
$chunksize = 1024 * 1024;
while (!feof($fp)) {
    fread($fp, $chunksize);
    $size += (float) $chunksize;
}
return $size;

I know how to get it on 64-bit platforms (using fseek($fp, 0, SEEK_END) and ftell()), but I need solution for 32-bit platform.


Solution: I've started open-source project for this.

Big File Tools

Big File Tools is a collection of hacks that are needed to manipulate files over 2 GB in PHP (even on 32-bit systems).

14条回答
Evening l夕情丶
2楼-- · 2019-01-04 01:50

I found a nice slim solution for Linux/Unix only to get the filesize of large files with 32-bit php.

$file = "/path/to/my/file.tar.gz";
$filesize = exec("stat -c %s ".$file);

You should handle the $filesize as string. Trying to casting as int results in a filesize = PHP_INT_MAX if the filesize is larger than PHP_INT_MAX.

But although handled as string the following human readable algo works:

formatBytes($filesize);

public function formatBytes($size, $precision = 2) {
    $base = log($size) / log(1024);
    $suffixes = array('', 'k', 'M', 'G', 'T');
    return round(pow(1024, $base - floor($base)), $precision) . $suffixes[floor($base)];
}

so my output for a file larger than 4 Gb is:

4.46G
查看更多
你好瞎i
3楼-- · 2019-01-04 01:50

You can't reliably get the size of a file on a 32 bit system by checking if filesize() returns negative, as some answers suggest. This is because if a file is between 4 and 6 gigs on a 32 bit system filesize will report a positive number, then negative from 6 to 8 then positive from 8 to 10 and so on. It loops, in a manner of speaking.

So you're stuck using an external command that works reliably on your 32 bit system.

However, one very useful tool is the ability to check if the file size is bigger than a certain size and you can do this reliably on even very big files.

The following seeks to 50 megs and tries to read one byte. It is very fast on my low spec test machine and works reliably even when the size is much greater than 2 gigs.

You can use this to check if a file is greater than 2147483647 bytes (2147483648 is max int on 32 bit systems) and then handle the file differently or have your app issue a warning.

function isTooBig($file){
        $fh = @fopen($file, 'r');
        if(! $fh){ return false; }
        $offset = 50 * 1024 * 1024; //50 megs
        $tooBig = false;
        if(fseek($fh, $offset, SEEK_SET) === 0){
                if(strlen(fread($fh, 1)) === 1){
                        $tooBig = true;
                }
        } //Otherwise we couldn't seek there so it must be smaller

        fclose($fh);
        return $tooBig;
}
查看更多
Bombasti
4楼-- · 2019-01-04 01:51

When IEEE double is used (very most of systems), file sizes below ~4EB (etabytes = 10^18 bytes) do fit into double as precise numbers (and there should be no loss of precision when using standard arithmetic operations).

查看更多
爷的心禁止访问
5楼-- · 2019-01-04 01:51

I iterated on the BigFileTools class/answer:
-option to disable curl method because some platforms (Synology NAS for example) don't support FTP protocol for Curl
-extra non posix, but more accurate, implementation of sizeExec, instead of size on disk the actual filesize is returned by using stat instead of du
-correct size results for big files (>4GB) and almost as fast for sizeNativeSeek
-debug messages option

<?php

/**
 * Class for manipulating files bigger than 2GB
 * (currently supports only getting filesize)
 *
 * @author Honza Kuchař
 * @license New BSD
 * @encoding UTF-8
 * @copyright Copyright (c) 2013, Jan Kuchař
 */
class BigFileTools {

    /**
     * Absolute file path
     * @var string
     */
    protected $path;

    /**
     * Use in BigFileTools::$mathLib if you want to use BCMath for mathematical operations
     */
    const MATH_BCMATH = "BCMath";

    /**
     * Use in BigFileTools::$mathLib if you want to use GMP for mathematical operations
     */
    const MATH_GMP = "GMP";

    /**
     * Which mathematical library use for mathematical operations
     * @var string (on of constants BigFileTools::MATH_*)
     */
    public static $mathLib;

    /**
     * If none of fast modes is available to compute filesize, BigFileTools uses to compute size very slow
     * method - reading file from 0 byte to end. If you want to enable this behavior,
     * switch fastMode to false (default is true)
     * @var bool
     */
    public static $fastMode = true;

  //on some platforms like Synology NAS DS214+ DSM 5.1 FTP Protocol for curl is not working or disabled
  // you will get an error like "Protocol file not supported or disabled in libcurl"
    public static $FTPProtocolCurlEnabled = false; 
  public static $debug=false; //shows some debug/error messages
  public static $posix=true; //more portable but it shows size on disk not actual filesize so it's less accurate: 0..clustersize in bytes inaccuracy

    /**
     * Initialization of class
     * Do not call directly.
     */
    static function init() {
        if (function_exists("bcadd")) {
            self::$mathLib = self::MATH_BCMATH;
        } elseif (function_exists("gmp_add")) {
            self::$mathLib = self::MATH_GMP;
        } else {
            throw new BigFileToolsException("You have to install BCMath or GMP. There mathematical libraries are used for size computation.");
        }
    }

    /**
     * Create BigFileTools from $path
     * @param string $path
     * @return BigFileTools
     */
    static function fromPath($path) {
        return new self($path);
    }

    static function debug($msg) {
        if (self::$debug) echo $msg;
    }

    /**
     * Gets basename of file (example: for file.txt will return "file")
     * @return string
     */
    public function getBaseName() {
        return pathinfo($this->path, PATHINFO_BASENAME);
    }

    /**
     * Gets extension of file (example: for file.txt will return "txt")
     * @return string
     */
    public function getExtension() {
        return pathinfo($this->path, PATHINFO_EXTENSION);
    }


    /**
     * Gets extension of file (example: for file.txt will return "file.txt")
     * @return string
     */
    public function getFilename() {
        return pathinfo($this->path, PATHINFO_FILENAME);
    }

    /**
     * Gets path to file of file (example: for file.txt will return path to file.txt, e.g. /home/test/)
     * ! This will call absolute path!
     * @return string
     */
    public function getDirname() {
        return pathinfo($this->path, PATHINFO_DIRNAME);
    }

    /**
     * Gets md5 checksum of file content
     * @return string
     */
    public function getMd5() {
        return md5_file($this->path);
    }

    /**
     * Gets sha1 checksum of file content
     * @return string
     */
    public function getSha1() {
        return sha1_file($this->path);
    }

    /**
     * Constructor - do not call directly
     * @param string $path
     */
    function __construct($path, $absolutizePath = true) {
        if (!static::isReadableFile($path)) {
            throw new BigFileToolsException("File not found at $path");
        }

        if($absolutizePath) {
            $this->setPath($path);
        }else{
            $this->setAbsolutePath($path);
        }
    }

    /**
     * Tries to absolutize path and than updates instance state
     * @param string $path
     */
    function setPath($path) {

        $this->setAbsolutePath(static::absolutizePath($path));
    }

    /**
     * Setts absolute path
     * @param string $path
     */
    function setAbsolutePath($path) {
        $this->path = $path;
    }

    /**
     * Gets current filepath
     * @return string
     */
    function getPath($a = "") {
        if(a != "") {
            trigger_error("getPath with absolutizing argument is deprecated!", E_USER_DEPRECATED);
        }

        return $this->path;
    }

    /**
     * Converts relative path to absolute
     */
    static function absolutizePath($path) {

        $path = realpath($path);
        if(!$path) {
            // TODO: use hack like http://stackoverflow.com/questions/4049856/replace-phps-realpath or http://www.php.net/manual/en/function.realpath.php#84012
            //       probaly as optinal feature that can be turned on when you know, what are you doing

            throw new BigFileToolsException("Not possible to resolve absolute path.");
        }
        return $path;
    }

    static function isReadableFile($file) {
        // Do not use is_file
        // @link https://bugs.php.net/bug.php?id=27792
        // $readable = is_readable($file); // does not always return correct value for directories

        $fp = @fopen($file, "r"); // must be file and must be readable
        if($fp) {
            fclose($fp);
            return true;
        }
        return false;
    }

    /**
     * Moves file to new location / rename
     * @param string $dest
     */
    function move($dest) {
        if (move_uploaded_file($this->path, $dest)) {
            $this->setPath($dest);
            return TRUE;
        } else {
            @unlink($dest); // needed in PHP < 5.3 & Windows; intentionally @
            if (rename($this->path, $dest)) {
                $this->setPath($dest);
                return TRUE;
            } else {
                if (copy($this->path, $dest)) {
                    unlink($this->path); // delete file
                    $this->setPath($dest);
                    return TRUE;
                }
                return FALSE;
            }
        }
    }

    /**
     * Changes path of this file object
     * @param string $dest
     */
    function relocate($dest) {
        trigger_error("Relocate is deprecated!", E_USER_DEPRECATED);
        $this->setPath($dest);
    }

    /**
     * Size of file
     *
     * Profiling results:
     *  sizeCurl        0.00045299530029297
     *  sizeNativeSeek  0.00052094459533691
     *  sizeCom         0.0031449794769287
     *  sizeExec        0.042937040328979
     *  sizeNativeRead  2.7670161724091
     *
     * @return string | float
     * @throws BigFileToolsException
     */
    public function getSize($float = false) {
        if ($float == true) {
            return (float) $this->getSize(false);
        }

        $return = $this->sizeCurl();
        if ($return) {
      $this->debug("sizeCurl succeeded");
            return $return;
        }
    $this->debug("sizeCurl failed");

        $return = $this->sizeNativeSeek();
        if ($return) {
      $this->debug("sizeNativeSeek succeeded");
            return $return;
        }
    $this->debug("sizeNativeSeek failed");

        $return = $this->sizeCom();
        if ($return) {
      $this->debug("sizeCom succeeded");
            return $return;
        }
    $this->debug("sizeCom failed");

        $return = $this->sizeExec();
        if ($return) {
      $this->debug("sizeExec succeeded");
            return $return;
        }
    $this->debug("sizeExec failed");

        if (!self::$fastMode) {
            $return = $this->sizeNativeRead();
            if ($return) {
        $this->debug("sizeNativeRead succeeded");
                return $return;
            }
      $this->debug("sizeNativeRead failed");
        }

        throw new BigFileToolsException("Can not size of file $this->path !");
    }

    // <editor-fold defaultstate="collapsed" desc="size* implementations">
    /**
     * Returns file size by using native fseek function
     * @see http://www.php.net/manual/en/function.filesize.php#79023
     * @see http://www.php.net/manual/en/function.filesize.php#102135
     * @return string | bool (false when fail)
     */
    protected function sizeNativeSeek() {
        $fp = fopen($this->path, "rb");
        if (!$fp) {
            return false;
        }

        flock($fp, LOCK_SH);
    $result= fseek($fp, 0, SEEK_END);

    if ($result===0) {
      if (PHP_INT_SIZE < 8) {
        // 32bit
        $return = 0.0;
        $step = 0x7FFFFFFF;
        while ($step > 0) {
          if (0 === fseek($fp, - $step, SEEK_CUR)) {
            $return += floatval($step);
          } else {
            $step >>= 1;
          }
        }
      }
      else { //64bit
        $return = ftell($fp);
      }
    }
    else $return = false;

    flock($fp, LOCK_UN);
    fclose($fp);
    return $return;
    }

    /**
     * Returns file size by using native fread function
     * @see http://stackoverflow.com/questions/5501451/php-x86-how-to-get-filesize-of-2gb-file-without-external-program/5504829#5504829
     * @return string | bool (false when fail)
     */
    protected function sizeNativeRead() {
        $fp = fopen($this->path, "rb");
        if (!$fp) {
            return false;
        }
        flock($fp, LOCK_SH);

        rewind($fp);
        $offset = PHP_INT_MAX - 1;

        $size = (string) $offset;
        if (fseek($fp, $offset) !== 0) {
            flock($fp, LOCK_UN);
            fclose($fp);
            return false;
        }
        $chunksize = 1024 * 1024;
        while (!feof($fp)) {
            $read = strlen(fread($fp, $chunksize));
            if (self::$mathLib == self::MATH_BCMATH) {
                $size = bcadd($size, $read);
            } elseif (self::$mathLib == self::MATH_GMP) {
                $size = gmp_add($size, $read);
            } else {
                throw new BigFileToolsException("No mathematical library available");
            }
        }
        if (self::$mathLib == self::MATH_GMP) {
            $size = gmp_strval($size);
        }
        flock($fp, LOCK_UN);
        fclose($fp);
        return $size;
    }

    /**
     * Returns file size using curl module
     * @see http://www.php.net/manual/en/function.filesize.php#100434
     * @return string | bool (false when fail or cUrl module not available)
     */
    protected function sizeCurl() {
        // curl solution - cross platform and really cool :)
        if (self::$FTPProtocolCurlEnabled && function_exists("curl_init")) {
            $ch = curl_init("file://" . $this->path);
            curl_setopt($ch, CURLOPT_NOBODY, true);
            curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
            curl_setopt($ch, CURLOPT_HEADER, true);
            $data = curl_exec($ch);
      if ($data=="" || empty($data)) $this->debug(stripslashes(curl_error($ch)));
            curl_close($ch);
            if ($data !== false && preg_match('/Content-Length: (\d+)/', $data, $matches)) {
                return (string) $matches[1];
            }
        } else {
            return false;
        }
    }

    /**
     * Returns file size by using external program (exec needed)
     * @see http://stackoverflow.com/questions/5501451/php-x86-how-to-get-filesize-of-2gb-file-without-external-program/5502328#5502328
     * @return string | bool (false when fail or exec is disabled)
     */
    protected function sizeExec() {
        // filesize using exec
        if (function_exists("exec")) {

            if (strtoupper(substr(PHP_OS, 0, 3)) == 'WIN') { // Windows
                // Try using the NT substition modifier %~z
        $escapedPath = escapeshellarg($this->path);
                $size = trim(exec("for %F in ($escapedPath) do @echo %~zF"));
            }else{ // other OS
                // If the platform is not Windows, use the stat command (should work for *nix and MacOS)
        if (self::$posix) {
          $tmpsize=trim(exec("du \"".$this->path."\" | cut -f1")); 
          //du returns blocks/KB
          $size=(int)$tmpsize*1024; //make it bytes
        }
        else $size=trim(exec('stat "'.$this->path.'" | grep -i -o -E "Size: ([0-9]+)" | cut -d" " -f2'));

        if (self::$debug) var_dump($size);
        return $size;
            }

        }
        return false;
    }

    /**
     * Returns file size by using Windows COM interface
     * @see http://stackoverflow.com/questions/5501451/php-x86-how-to-get-filesize-of-2gb-file-without-external-program/5502328#5502328
     * @return string | bool (false when fail or COM not available)
     */
    protected function sizeCom() {
        if (class_exists("COM")) {
            // Use the Windows COM interface
            $fsobj = new COM('Scripting.FileSystemObject');
            if (dirname($this->path) == '.')
                $this->path = ((substr(getcwd(), -1) == DIRECTORY_SEPARATOR) ? getcwd() . basename($this->path) : getcwd() . DIRECTORY_SEPARATOR . basename($this->path));
            $f = $fsobj->GetFile($this->path);
            return (string) $f->Size;
        }
    }

    // </editor-fold>
}

BigFileTools::init();

class BigFileToolsException extends Exception{}
查看更多
冷血范
6楼-- · 2019-01-04 01:54

Here's one possible method:

It first attempts to use a platform-appropriate shell command (Windows shell substitution modifiers or *nix/Mac stat command). If that fails, it tries COM (if on Windows), and finally falls back to filesize().

/*
 * This software may be modified and distributed under the terms
 * of the MIT license.
 */

function filesize64($file)
{
    static $iswin;
    if (!isset($iswin)) {
        $iswin = (strtoupper(substr(PHP_OS, 0, 3)) == 'WIN');
    }

    static $exec_works;
    if (!isset($exec_works)) {
        $exec_works = (function_exists('exec') && !ini_get('safe_mode') && @exec('echo EXEC') == 'EXEC');
    }

    // try a shell command
    if ($exec_works) {
        $cmd = ($iswin) ? "for %F in (\"$file\") do @echo %~zF" : "stat -c%s \"$file\"";
        @exec($cmd, $output);
        if (is_array($output) && ctype_digit($size = trim(implode("\n", $output)))) {
            return $size;
        }
    }

    // try the Windows COM interface
    if ($iswin && class_exists("COM")) {
        try {
            $fsobj = new COM('Scripting.FileSystemObject');
            $f = $fsobj->GetFile( realpath($file) );
            $size = $f->Size;
        } catch (Exception $e) {
            $size = null;
        }
        if (ctype_digit($size)) {
            return $size;
        }
    }

    // if all else fails
    return filesize($file);
}
查看更多
做个烂人
7楼-- · 2019-01-04 01:56

I wrote an function which returns the file size exactly and is quite fast:

function file_get_size($file) {
    //open file
    $fh = fopen($file, "r"); 
    //declare some variables
    $size = "0";
    $char = "";
    //set file pointer to 0; I'm a little bit paranoid, you can remove this
    fseek($fh, 0, SEEK_SET);
    //set multiplicator to zero
    $count = 0;
    while (true) {
        //jump 1 MB forward in file
        fseek($fh, 1048576, SEEK_CUR);
        //check if we actually left the file
        if (($char = fgetc($fh)) !== false) {
            //if not, go on
            $count ++;
        } else {
            //else jump back where we were before leaving and exit loop
            fseek($fh, -1048576, SEEK_CUR);
            break;
        }
    }
    //we could make $count jumps, so the file is at least $count * 1.000001 MB large
    //1048577 because we jump 1 MB and fgetc goes 1 B forward too
    $size = bcmul("1048577", $count);
    //now count the last few bytes; they're always less than 1048576 so it's quite fast
    $fine = 0;
    while(false !== ($char = fgetc($fh))) {
        $fine ++;
    }
    //and add them
    $size = bcadd($size, $fine);
    fclose($fh);
    return $size;
}
查看更多
登录 后发表回答