I have a directory with files that need processing in a batch with PHP. The files are copied on the server via FTP. Some of the files are very big and take a long time to copy. How can I determine in PHP if a file is still being transferred (so I can skip the processing on that file and process it in the next run of the batch process)?
A possibility is to get the file size, wait a few moments, and verify if the file size is different. This is not waterproof because there is a slight chance that the transfer was simply stalled for a few moments...
One of the safest ways of doing this is to upload the files with a temporary name, and rename them once the transfer is finished. You program should skip files with the temporary name (a simple extension works just fine.) Obviously this requires the client (uploader) to cooperate, so it's not ideal.
[This also allows you to delete failed (partial) transfers after a given time period if you need that.]
Anything based on polling the file size is racy and unsafe.
Another scheme (that also requires cooperation from the uploader) can involve uploading the file's hash and size first, then the actual file. That allows you to know both when the transfer is done, and if it is consistent. (There are lots of variants around this idea.)
Something that doesn't require cooperation from the client is checking whether the file is open by another process or not. (How you do that is OS dependent - I don't know of a PHP builtin that does this.
lsof
and/orfuser
can be used on a variety of Unix-type platforms, Windows has APIs for this.) If another process has the file open, chances are it's not complete yet.Note that this last approach might not be fool-proof if you allow restarting/resuming uploads, or if your FTP server software doesn't keep the file open for the entire duration of the transfer, so YMMV.
Some FTP servers allow running commands when certain event occurs. So if your FTP server allows this, then you can build a simple signalling scheme to let your application know that the file has been uploaded more or less successfully (more or less is because you don't know if the user intended to upload the file completely or in parts). The signalling scheme can be as simple as creation of "uploaded_file_name.ext.complete" file, and you will monitor existence of files with ".complete" extension.
Now, you can check if you can open file for writing. Most FTP servers won't let you do this if the file is being uploaded.
One more approach mentioned by Mat is using system-specific techniques to check if the file is opened by other process.
Our server admin suggested ftpwho, which outputs which files are currently transferred.
http://www.castaglia.org/proftpd/doc/ftpwho.html
So the solution is to parse the output of ftpwho to see if a file in the directory is being transferred.
It's not realy nice trick, but it's simple :-), the same u can do with filemtime
Best way to check would be to try and get an exclusive lock on the file using flock. The sftp/ftp process will be using the fopen libraries.