Let's say I want XML Files only with upto 10MB to be loaded from a remote server.
Something like
$xml_file = "http://example.com/largeXML.xml";// size= 500MB
//PRACTICAL EXAMPLE: $xml_file = "http://www.cs.washington.edu/research/xmldatasets/data/pir/psd7003.xml";// size= 683MB
/*GOAL: Do anything that can be done to hinder this large file from being loaded by the DOMDocument without having to load the File n check*/
$dom = new DOMDocument();
$dom->load($xml_file /*LOAD only IF the file_size is <= 10MB....else...echo 'File is too large'*/);
How can this possibly be achieved?.... Any idea or alternative? or best approach to achieving this would be highly appreciated.
I checked PHP: Remote file size without downloading file but when I try with something like
var_dump(
curl_get_file_size(
"http://www.dailymotion.com/rss/user/dialhainaut/"
)
);
I get string 'unknown' (length=7)
When I try with get_headers
as suggested below, the Content-Length is missing in the headers, so this will not work reliably either.
Please kindly advise how to determine the length
and avoid sending it to the DOMDocument
if it exceeds 10MB
Ok, finally working. The headers solution was obviously not going to work broadly. In this solution, we open a file handle and read the XML line by line until it hits the threshold of $max_B. If the file is too big, we still have the overhead of reading it up until the 10MB mark, but it's working as expected. If the file is less than $max_B, it proceeds...
$xml_file = "http://www.dailymotion.com/rss/user/dialhainaut/";
//$xml_file = "http://www.cs.washington.edu/research/xmldatasets/data/pir/psd7003.xml";
$fh = fopen($xml_file, "r");
if($fh){
$file_string = '';
$total_B = 0;
$max_B = 10485760;
//run through lines of the file, concatenating them into a string
while (!feof($fh)){
if($line = fgets($fh)){
$total_B += strlen($line);
if($total_B < $max_B){
$file_string .= $line;
} else {
break;
}
}
}
if($total_B < $max_B){
echo 'File ok. Total size = '.$total_B.' bytes. Proceeding...';
//proceed
$dom = new DOMDocument();
$dom->loadXML($file_string); //NOTE the method change because we're loading from a string
} else {
//reject
echo 'File too big! Max size = '.$max_B.' bytes.';
}
fclose($fh);
} else {
echo '404 file not found!';
}
10MB is equal to 10485760 B. If content-length is not specified, it will use curl which is available since php5. I got this source from somewhere in SO but could not remember it.:
function get_filesize($url) {
$headers = get_headers($url, 1);
if (isset($headers['Content-Length'])) return $headers['Content-Length'];
if (isset($headers['Content-length'])) return $headers['Content-length'];
$c = curl_init();
curl_setopt_array($c, array(
CURLOPT_URL => $url,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_HTTPHEADER => array('User-Agent: Mozilla/5.0
(Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1.3)
Gecko/20090824 Firefox/3.5.3'),
));
curl_exec($c);
return curl_getinfo($c, CURLINFO_SIZE_DOWNLOAD);
}
}
$filesize = get_filesize("http://www.dailymotion.com/rss/user/dialhainaut/");
if($filesize<=10485760){
echo 'Fine';
}else{
echo $filesize.'File is too big';
}
.
Check demo here
Edit: New Answer a bit workaroundish:
You can't check the Dom Elements Length, BUT, you can make a header request and get the filesize from the URL:
<?php
function i_hope_this_works( $XmlUrl ) {
//lets assume we fk up so we set size to -1
$size = -1;
$request = curl_init( $XmlUrl );
// Go for a head request, so the body of a 1 gb file will take the same as 1 kb
curl_setopt( $request, CURLOPT_NOBODY, true );
curl_setopt( $request, CURLOPT_HEADER, true );
curl_setopt( $request, CURLOPT_RETURNTRANSFER, true );
curl_setopt( $request, CURLOPT_FOLLOWLOCATION, true );
curl_setopt( $request, CURLOPT_USERAGENT, get_user_agent_string() );
$requesteddata = curl_exec( $request );
curl_close( $request );
if( $requesteddata ) {
$content_length = "unknown";
$status = "unknown";
if( preg_match( "/^HTTP\/1\.[01] (\d\d\d)/", $requesteddata, $matches ) ) {
$status = (int)$matches[1];
}
if( preg_match( "/Content-Length: (\d+)/", $requesteddata, $matches ) ) {
$content_length = (int)$matches[1];
}
// you can google status qoutes 200 is Ok for example
if( $status == 200 || ($status > 300 && $status <= 308) ) {
$result = $content_length;
}
}
return $result;
}
?>
You should now be able to get every Filesize you want by URL just with
$file_size = i_hope_this_works('yourURLasString')